# 2. Merging Tables with Different Join Types

## Left Join

### Counting missing rows with left join
The Movie Database is supported by volunteers going out into the world, collecting data, and entering it into the database. This includes financial data, such as movie budget and revenue. If you wanted to know which movies are still missing data, you could use a left join to identify them. Practice using a left join by merging the ```movies``` table and the ```financials``` table.

In [1]:
import pandas as pd

movies = pd.read_pickle("movies.p")
financials = pd.read_pickle("financials.p")

print(movies.info())
print(financials.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            4803 non-null   int64  
 1   title         4803 non-null   object 
 2   popularity    4803 non-null   float64
 3   release_date  4802 non-null   object 
dtypes: float64(1), int64(1), object(2)
memory usage: 150.2+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3229 entries, 0 to 3228
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   id       3229 non-null   int64  
 1   budget   3229 non-null   int64  
 2   revenue  3229 non-null   float64
dtypes: float64(1), int64(2)
memory usage: 75.8 KB
None


In [2]:
# Merge movies and financials with a left join
movies_financials = movies.merge(financials, on=("id"), how="left")

In [3]:
# Merge the movies table with the financials table with a left join
movies_financials = movies.merge(financials, on='id', how='left')

# Count the number of rows in the budget column that are missing
number_of_missing_fin = movies_financials['budget'].isnull().sum()

# Print the number of movies missing financials
print(number_of_missing_fin)

1574


### Enriching a dataset

Setting ```how='left'``` with the ```.merge()```method is a useful technique for enriching or enhancing a dataset with additional information from a different table. In this exercise, you will start off with a sample of movie data from the movie series Toy Story. Your goal is to enrich this data by adding the marketing tag line for each movie. You will compare the results of a left join versus an inner join.

In [5]:
dict_toystory = {
    "id": [10193, 863, 862],
    "title": ["Toy Story 3", "Toy Story 2", "Toy Story"],
    "popularity": [59995, 73575, 73640],
    "release_date": ["2010-06-16", "1999-10-30", "1995-10-30"]
}

toy_story = pd.DataFrame(dict_toystory)
print(toy_story)

      id        title  popularity release_date
0  10193  Toy Story 3       59995   2010-06-16
1    863  Toy Story 2       73575   1999-10-30
2    862    Toy Story       73640   1995-10-30


In [6]:
taglines = pd.read_pickle("taglines.p")
print(taglines.info())

<class 'pandas.core.frame.DataFrame'>
Index: 3955 entries, 0 to 4801
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   id       3955 non-null   int64 
 1   tagline  3955 non-null   object
dtypes: int64(1), object(1)
memory usage: 92.7+ KB
None
