# 3. Advanced Merging and Concatenating

## Filtering joins

- filter observations from table based on whether or not they match an observation in another table

### Semi joins

- returns the intersection, similar to an inner join
- returns only columns from the left table and **not** the right
- no duplicates
<br><br>

Steps:
- merges the left and right tables on key column using an inner join;
- search if the key column in the left table is in the merged tables using the ```.isin()``` method creating a Boolean Series
- subset the rows of the left table

In [None]:
# Some of the tracks that have generated the most significant amount of revenue are from TV-shows or are other non-musical audio. You have been given a table of invoices that include top revenue-generating items. Additionally, you have a table of non-musical tracks from the streaming service. In this exercise, you'll use a semi join to find the top revenue-generating non-musical tracks.

# Merge the non_mus_tcks and top_invoices tables on tid
tracks_invoices = non_mus_tcks.merge(top_invoices, on="tid")

# Use .isin() to subset non_mus_tcks to rows with tid in tracks_invoices
top_tracks = non_mus_tcks[non_mus_tcks['tid'].isin(tracks_invoices["tid"])]

# Group the top_tracks by gid and count the tid rows
cnt_by_gid = top_tracks.groupby(['gid'], as_index=False).agg({'tid':'count'})

# Merge the genres table to cnt_by_gid on gid and print
print(cnt_by_gid.merge(genres, on="gid"))

### Anti join

- returns the left table, excluding the intersection
- returns only columns from the left table and **not** the right

In [None]:
# In our music streaming company dataset, each customer is assigned an employee representative to assist them. 

# In this exercise, filter the employee table by a table of top customers, returning only those employees who are not assigned to a customer. 

# The results should resemble the results of an anti join. The company's leadership will assign these employees additional training so that they can work with high valued customers.

# Merge employees and top_cust
empl_cust = employees.merge(top_cust, on='srid', 
                                 how='left', indicator=True)

# Select the srid column where _merge is left_only
srid_list = empl_cust.loc[empl_cust['_merge'] == 'left_only', 'srid']

# Get employees not working with top customers
print(employees[employees['srid'].isin(srid_list)])

## Concatenate DataFrames together vertically

In [None]:
# You have been given a few tables of data with musical track info for different albums from the metal band, Metallica. The track info comes from their Ride The Lightning, Master Of Puppets, and St. Anger albums. Try various features of the .concat() method by concatenating the tables vertically together in different ways.

# Concatenate tracks_master, tracks_ride, and tracks_st, in that order, setting sort to True.
tracks_from_albums = pd.concat([tracks_master, tracks_ride, tracks_st],
                               sort=True)
print(tracks_from_albums)

# Concatenate the tracks so the index goes from 0 to n-1
tracks_from_albums = pd.concat([tracks_master, tracks_ride, tracks_st],
                               ignore_index=True,
                               sort=True)
print(tracks_from_albums)

# Concatenate the tracks, show only columns names that are in all tables
tracks_from_albums = pd.concat([tracks_master, tracks_ride, tracks_st],
                               join="inner",
                               sort=True)
print(tracks_from_albums)