# Merging a table to itself

## Video lecture Transcript

**1. Merging a table to itself**

Hello again! In this lesson, we will talk about merging a table to itself. This type of merge is also referred to as a self join. So, let's get started.

**2. Sequel movie data**

So when would you ever need to merge a table to itself? The table shown here is called sequels and has three columns. It contains a column for movie id, title, and sequel. The sequel number refers to the movie id that is a sequel to the original movie.

For example, in the second row the movie is titled Toy Story, and has an id equal to 862. The sequel number of this row is 863. This is the movie id for Toy Story 2, the sequel to Toy Story. If we continue, 10193 is the movie id Toy Story 3 which is the sequel for Toy Story 2.

**3. Merging a table to itself**

If we would like to see a table with the movies and the corresponding sequel movie in one row of the table, we will need to merge the table to itself. In the left table, the sequel ID for Toy Story of 863 is matched with 863 in the I... its sequel.

**6. Merging a table to itself with left join**

Pausing here is a good time to highlight again that when merging a table to itself, we can use the different types of joins we have already reviewed. Let's take the same merge from earlier but make it a left join. The 'how' argument is set in the merge method to left from the default 'inner'. 

Now the resulting table will show all of our original movie info. If the sequel movie exists in the table, it will fill out the rest of the row. If you compare this to our earlier merger, you now see movies like Avatar and Titanic in the result set.

**7. When to merge at table to itself**

You might need to merge a table to itself when working with tables that have a hierarchical relationship, like employee and manager. 

You might use this on sequential relationships such as logistic movements.

Graph data, such as networks of friends, might also require this technique.

**8. Let's practice!**

Alright, let's practice merging a table to itself.

## Exercise 1:
Self join
Merging a table to itself can be useful when you want to compare values in a column to other values in the same column. In this exercise, you will practice this by creating a table that for each movie will list the movie director and a member of the crew on one row. You have been given a table called `crews`, which has columns `id`, `job`, and `name`. First, merge the table to itself using the movie ID. This merge will give you a larger table where for each movie, every job is matched against each other. Then select only those rows with a director in the left table, and avoid having a row where the director's job is listed in both the left and right tables. This filtering will remove job combinations that aren't with the director.

The `crews` table has been loaded for you.

___

### Instructions 1:
To a variable called `crews_self_merged`, merge the `crews` table to itself on the `id` column using an inner join, setting the suffixes to `'_dir'` and `'_crew'` for the left and right tables respectively.

### Code 1:
```python
# Merge the crews table to itself
crews_self_merged = crews.merge(crews, on="id", suffixes=["_dir", "_crew"])
```

___

### Instructions 2:
Create a Boolean index, named `boolean_filter`, that selects rows from the left table with the job of `'Director'` and avoids rows with the job of `'Director'` in the right table.

### Code 2:

```python
# Merge the crews table to itself
crews_self_merged = crews.merge(crews, on='id', how='inner', suffixes=('_dir','_crew'))

# Create a Boolean index to select the appropriate
boolean_filter =    ((crews_self_merged['job_dir'] == 'Director') 
                    & 
                    (crews_self_merged['job_crew'] != 'Director'))

direct_crews = crews_self_merged[boolean_filter]
```
___

### Instructions 3:
Use the `.head()` method to print the first few rows of `direct_crews`.

### Code 3:
```python
# ...
print(direct_crews.head())
```
___


## Exercise 1 Recap:
Great job! By merging the table to itself, you compared the value of the `__director__` from the `jobs` column to other values from the `jobs` column. With the output, you can quickly see different movie directors and the people they worked with in the same movie.
___

# Quiz: How does pandas handle self joins?

Question 1: Select the **false** statement about merging a table to itself.

Possible Answers

1. You can merge a table to itself with a right join.
2. Merging a table to itself can allow you to compare values in a column to other values in the same column.
3. The pandas module limits you to one merge where you merge a table to itself. You cannot repeat this process over and over. ✅
4. Merging a table to itself is like working with two separate tables.

# Recap

Your recent learnings
When you left 19 hours ago, you worked on Merging Tables With Different Join Types, chapter 2 of the course Joining Data with pandas. Here is what you covered in your last lesson:

You learned about the concept of merging a table to itself, also known as a self join, using the TMDb movie data. This technique is particularly useful for creating a table that lists related entries, such as movies and their sequels, in a single row. Key points covered include:

- Self Join Basics: A self join is used when you need to merge a table with itself to compare values within the same table. For example, linking movies to their sequels by matching a movie's sequel ID with another movie's ID.

- Inner Join on Self: Initially, you performed an inner join on the table, which resulted in a table where each movie and its sequel were listed in one row. Movies without sequels, like Avatar and Titanic, were excluded.

- Left Join Modification: By changing the merge to a left join, you included all original movies in the result, showing sequel information where available. This demonstrated how movies without direct sequels could still be included in the output.

- Practical Application: You explored how self joins could be applied beyond movies, such as hierarchical relationships in employee-manager scenarios, sequential relationships, or network graphs.
Code Implementation: You used the `.merge()` method with `left_on` and `right_on` attributes to specify the columns for matching, and suffixes to differentiate between original and sequel movies in the result.

```python
sequels.merge(sequels, left_on='sequel', right_on='id', suffixes=('_org', '_seq'))
```

This lesson underscored the versatility of self joins in handling complex data relationships within a single table, enhancing your ability to analyze and manipulate data effectively.

The goal of the next lesson is to learn how to efficiently merge tables using DataFrame indexes in pandas, including handling multiIndex structures and different index level names, to enhance data manipulation and analysis skills.