# Left join

## Video lecture Transcript


**1. Left join**
Greetings, and welcome back! In this lesson, we will discuss how a left join works, which is another way to merge two tables. Before we start talking about left joins, let's quickly review what we have learned so far.


**2. Quick review**
In chapter 1, we introduced the pandas merge method that allows us to combine two tables by specifying one or more key columns to link the tables by. By default, the merge method performs an inner join, returning only the rows of data with matching values in the key columns of both tables.


**3. Left join**
In this lesson, we'll talk about the idea of a left join. A left join returns all rows of data from the left table and only those rows from the right table where key columns match.


**4. Left join**
Here we have two tables named left and right. We want to use a left join to merge them on key column C. A left join returns all of the rows from the left table and only those rows from the right table where column C matches in both. Notice the second ...in both tables. However, notice an additional argument named 'how'. This argument defines how to merge the two tables. In this case, we use 'left' for a left join. The default value for how is 'inner', so we didn't need to specify this in Chapter 1 since we were only working with inner joins. The result of the merge shows a table with all of the rows from the movies table and a value for tag line where the ID column matches in both tables. Wherever there isn't a matching ID in the taglines table, a null value is entered for the tag line. Remember that pandas uses NaN to denote missing data.


**9. Number of rows returned**
After the merge, our resulting table has 4,805 rows. This is because we are returning all of the rows of data from the movies table, and the relationship between the movies table and taglines is one-to-one. Therefore, in a one-to-one merge like this one, a left join will always return the same number rows as the left table.


**10. Let's practice!**
Now, let's practice some.

## Exercise 1:
Counting missing rows with left join
The Movie Database is supported by volunteers going out into the world, collecting data, and entering it into the database. This includes financial data, such as movie budget and revenue. If you wanted to know which movies are still missing data, you could use a left join to identify them. Practice using a left join by merging the `movies` table and the `financials` table.

The `movies` and `financials` tables have been loaded for you.
```python
print(financials.columns)
print(movies.columns)

Index(['id', 'budget', 'revenue'], dtype='object')
Index(['id', 'title', 'popularity', 'release_date'], dtype='object')
```
___

### Instructions 1:
- Question
    - What column is likely the best column to merge the two tables on?
        - Possible answers
            - on='budget'
            - on='popularity'
            - on='id' ✅

___

### Instructions 2:
- Question
    - Merge the `movies` table, as the left table, with the `financials` table using a left join, and save the result to `movies_financials`

- Answer
```python
# Merge movies and financials with a left join
movies_financials = movies.merge(financials, on="id", how="left")
```
___

### Instructions 3:
- Question
    - Count the number of rows in `movies_financials` with a `null` value in the budget column.

- Answer
```python
# Merge the movies table with the financials table with a left join
movies_financials = movies.merge(financials, on='id', how='left')

# Count the number of rows in the budget column that are missing
number_of_missing_fin = movies_financials['budget'].isna().sum()

# Print the number of movies missing financials
print(number_of_missing_fin)



# Shell response
1574
```

## Exercise 1 Recap:
Great job! You used a left join to find out which rows in the financials table were missing data. When performing a left join, the .merge() method returns a row full of null values for columns in the right table if the key column does not have a matching value in both tables. We see that there are at least 1,500 rows missing data. Wow! That sounds like a lot of work.
___

## Exercise 2:

Enriching a dataset
Setting `how='left'` with the `.merge()` method is a useful technique for enriching or enhancing a dataset with additional information from a different table. In this exercise, you will start off with a sample of movie data from the movie series Toy Story. Your goal is to enrich this data by adding the marketing tag line for each movie. You will compare the results of a left join versus an inner join.

The `toy_story` DataFrame contains the Toy Story movies. The `toy_story` and `taglines` DataFrames have been loaded for you.

___

### Instructions 1:
- Merge `toy_story` and `taglines` on the `id` column with a left join, and save the result as `toystory_tag`.

- Answer
```python
toystory_tag = toy_story.merge(taglines, on="id", how="left")

# Shell output
      id        title  popularity release_date                   tagline
0  10193  Toy Story 3      59.995   2010-06-16  No toy gets left behind.
1    863  Toy Story 2      73.575   1999-10-30        The toys are back!
2    862    Toy Story      73.640   1995-10-30                       NaN
(3, 5)

```

___

### Instructions 2:
- With `toy_story` as the left table, merge to it `taglines` on the `id` column with an inner join, and save as `toystory_tag`.

- Answer
```python
toystory_tag = toy_story.merge(taglines, on="id")

# Shell output
      id        title  popularity release_date                   tagline
0  10193  Toy Story 3      59.995   2010-06-16  No toy gets left behind.
1    863  Toy Story 2      73.575   1999-10-30        The toys are back!
(2, 5)
```

## Exercise 2 Recap:
That's fantastic work! If your goal is to enhance or enrich a dataset, then you do not want to lose any of your original data. A left join will do that by returning all of the rows of your left table, while using an inner join may result in lost data if it does not exist in both tables.

## Exercise 3:

How many rows with a left join?
Select the true statement about left joins.

Try running the following code statements:

`left_table.merge(one_to_one, on='id', how='left').shape`
`left_table.merge(one_to_many, on='id', how='left').shape`
Note that the `left_table` starts out with 4 rows.

```python
left_table.merge(one_to_one, on='id', how='left').shape

# Shell response
(4, 5)

left_table.merge(one_to_many, on='id', how='left').shape
# Shell response
(232, 6)
```

### Instructions:
- Possible answers
    - The output of a one-to-one merge with a left join will have more rows than the left table.
    - The output of a one-to-one merge with a left join will have fewer rows than the left table.
    - The output of a one-to-many merge with a left join will have greater than or equal rows than the left table. ✅

## Exercise 3 Recap:
That's correct! A left join will return all of the rows from the left table. If those rows in the left table match multiple rows in the right table, then all of those rows will be returned. Therefore, the returned rows must be equal to if not greater than the left table. Knowing what to expect is useful in troubleshooting any suspicious merges.