# Do sequels earn more?
It is time to put together many of the aspects that you have learned in this chapter. In this exercise, you'll find out which movie sequels earned the most compared to the original movie. To answer this question, you will merge a modified version of the *sequels* and *financials* tables where their index is the movie ID. You will need to choose a merge type that will return all of the rows from the *sequels* table and not all the rows of *financials* table need to be included in the result. From there, you will join the resulting table to itself so that you can compare the revenue values of the original movie to the sequel. Next, you will calculate the difference between the two revenues and sort the resulting dataset.

The *sequels* and *financials* tables have been provided.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
path=r'/media/documentos/Cursos/Data Science/Python/Data_Science_Python/data_sets/'

sequels=pd.read_pickle(path+'sequels.p').set_index('id')

print('sequels \n',sequels.head(),'\n')

financials=pd.read_pickle(path+'financials.p').set_index('id')
print('financials \n',financials.head(),'\n')

sequels 
               title  sequel
id                         
19995        Avatar    <NA>
862       Toy Story     863
863     Toy Story 2   10193
597         Titanic    <NA>
24428  The Avengers    <NA> 

financials 
            budget       revenue
id                             
19995   237000000  2.787965e+09
285     300000000  9.610000e+08
206647  245000000  8.806746e+08
49026   250000000  1.084939e+09
49529   260000000  2.841391e+08 



With the *sequels* table on the left, merge to it the *financials* table on index named *id*, ensuring that all the rows from the *sequels* are returned and some rows from the other table may not be returned, Save the results to *sequels_fin*.

In [3]:
# Merge sequels and financials on index id
sequels_fin = sequels.merge(financials,on='id',how='left')
print(sequels_fin.head())

              title  sequel       budget       revenue
id                                                    
19995        Avatar    <NA>  237000000.0  2.787965e+09
862       Toy Story     863   30000000.0  3.735540e+08
863     Toy Story 2   10193   90000000.0  4.973669e+08
597         Titanic    <NA>  200000000.0  1.845034e+09
24428  The Avengers    <NA>  220000000.0  1.519558e+09


Merge the *sequels_fin* table to itself with an inner join, where the left and right tables merge on *sequel* and *id* respectively with suffixes equal to *('_org','_seq')*, saving to *orig_seq*.

In [5]:
# Merge sequels and financials on index id
sequels_fin = sequels.merge(financials, on='id', how='left')

# Self merge with suffixes as inner join with left on sequel and right on id
orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel', 
                             right_on='id', right_index=True,
                             suffixes=('_org','_seq'))

# Add calculation to subtract revenue_org from revenue_seq 
orig_seq['diff'] = orig_seq['revenue_seq'] - orig_seq['revenue_org']
print(orig_seq.head())

     sequel                                          title_org  sequel_org  \
id                                                                           
862     863                                          Toy Story         863   
863   10193                                        Toy Story 2       10193   
675     767          Harry Potter and the Order of the Phoenix         767   
121     122              The Lord of the Rings: The Two Towers         122   
120     121  The Lord of the Rings: The Fellowship of the Ring         121   

      budget_org  revenue_org                                      title_seq  \
id                                                                             
862   30000000.0  373554033.0                                    Toy Story 2   
863   90000000.0  497366869.0                                    Toy Story 3   
675  150000000.0  938212738.0         Harry Potter and the Half-Blood Prince   
121   79000000.0  926287400.0  The Lord of the Rings:

- Select the *title_org*, *title_seq*, and *diff* columns of *orig_seq* and save this as *titles_diff*.

In [7]:
# Merge sequels and financials on index id
sequels_fin = sequels.merge(financials, on='id', how='left')

# Self merge with suffixes as inner join with left on sequel and right on id
orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel', 
                             right_on='id', right_index=True,
                             suffixes=('_org','_seq'))

# Add calculation to subtract revenue_org from revenue_seq 
orig_seq['diff'] = orig_seq['revenue_seq'] - orig_seq['revenue_org']

# Select the title_org, title_seq, and diff 
titles_diff = orig_seq[['title_org','title_seq','diff']]
print(titles_diff.head())

                                             title_org  \
id                                                       
862                                          Toy Story   
863                                        Toy Story 2   
675          Harry Potter and the Order of the Phoenix   
121              The Lord of the Rings: The Two Towers   
120  The Lord of the Rings: The Fellowship of the Ring   

                                         title_seq         diff  
id                                                               
862                                    Toy Story 2  123812836.0  
863                                    Toy Story 3  569602834.0  
675         Harry Potter and the Half-Blood Prince   -4253541.0  
121  The Lord of the Rings: The Return of the King  192601579.0  
120          The Lord of the Rings: The Two Towers   54919036.0  


Sort by *titles_diff* by *diff* in descending order and print the first few rows.

In [8]:
# Print the first rows of the sorted titles_diff
print(titles_diff.sort_values('diff',ascending=False).head())

               title_org        title_seq          diff
id                                                     
331    Jurassic Park III   Jurassic World  1.144748e+09
272        Batman Begins  The Dark Knight  6.303398e+08
10138         Iron Man 2       Iron Man 3  5.915067e+08
863          Toy Story 2      Toy Story 3  5.696028e+08
10764  Quantum of Solace          Skyfall  5.224703e+08
