## Lab:  Working With the Movies Database API 

This lab will guide you through working accessing different aspects of the movies database api.  The documentation can be found here:  https://www.themoviedb.org/documentation/api

**Step 1:** Make sure you have an access key.

**Step 2:** The complete documentation that you'll need to reference can be found here:  https://developers.themoviedb.org/3/getting-started

For now, we are going to use the `/discover` tool to answer the following questions.

In [2]:
# import libraries
import pandas as pd
import requests

# variables that will be reused
api_key    = 'c6c78f5b65558f9fc6ab9a3ef2d8ba7d'
base       = 'https://api.themoviedb.org/3'

**Step 3:** Select all movies made with `primary_release_year` set to 2012.  The `total_results` key should return 10000 values.

In [18]:
# your code here
api_string = f'/discover/movie?primary_release_year=2012&api_key={api_key}'
data       = requests.get(base+api_string).json()

**Step 4:** Sort the movies from the previous API call by their popularity, in descending order. (Use an API call for this, not pandas).  What was the most popular movie released in 2012?

In [20]:
# your code here
api_string = f'/discover/movie?primary_release_year=2012&api_key={api_key}&sort_by=popularity.desc'
data       = requests.get(base+api_string).json()

In [23]:
# it was The Avengers
data['results'][0]['original_title']

'The Avengers'

**Step 5:** Create an api call that collects the 100 most popular movies released in 2017, 2018, and 2019.  

**Hint:** You may have noticed that the results returned to you only contain 20 items.  To get more you have to use the `page` parameter.

In [94]:
# generate the api_strings
api_strings = [ 
    f'/discover/movie?primary_release_year=2017&primary_release_year=2018&primary_release_year=2019&api_key={api_key}&sort_by=popularity.desc&page={i}'
    for i in range(1, 6)
]

# and get their json endpoints
api_results = [requests.get(base + api_strings[i]).json() for i in range(len(api_strings))]

**Step 6:**  Take the results from a previous step and turn them into a dataframe.  Make columns for title, popularity, vote_count, and vote_average.

In [95]:
# turn each dictionary into its own dataframe
movie_df = [pd.DataFrame(page['results'], columns=['title', 'popularity', 'vote_count', 'vote_average']) for page in api_results]
movie_df = pd.concat(movie_df).reset_index(drop=True)

**Step 7:** Do a new API call for all movies with Samuel Jackson in them. His id is 2231. Sort them by popularity in descending order.  Just the first page is fine.

In [97]:
# your code here
api_string = f'/discover/movie?with_cast=2231&sort_by=popularity.desc&api_key={api_key}'
data       = requests.get(base + api_string).json()

**Step 8:** How would you check to see if there are any Samuel L. Jackson movies **not** in the csv file that you just created?

**Hint:** There are a few ways to do this, but the `merge` method gives you some nice options with the `indicator` option.

In [99]:
# turn Samuel L. Jackson's data into a dataframe
slj = pd.DataFrame(data['results'], columns=['title', 'popularity', 'vote_count', 'vote_average'])

In [104]:
# anything in the _merge column with a value of left_only would be a value not in the other df
slj.merge(movie_df, how='left', indicator=True)

Unnamed: 0,title,popularity,vote_count,vote_average,_merge
0,Avengers: Infinity War,75.228,16044,8.3,left_only
1,Spider-Man: Far from Home,59.49,5363,7.6,both
2,Captain Marvel,42.094,7851,7.0,both
3,The Avengers,45.673,21073,7.7,left_only
4,Avengers: Endgame,43.047,10488,8.3,both
5,Iron Man,28.723,17008,7.6,left_only
6,Star Wars: Episode I - The Phantom Menace,36.807,8688,6.4,left_only
7,Pulp Fiction,39.511,16866,8.5,left_only
8,Glass,35.294,4299,6.5,both
9,Thor,27.419,13786,6.7,left_only


**Bonus:** See if you can write a function that accepts two arguments:  the movies database id of an actor, as well as a dataframe.  Make the function look up the actor via an api call, and return a dataframe with the new values appended to it.  Use the same columns that we had before.

**Extra Bonus:** Write the original dataframe to a csv file, and write a modified version of the above function that does the following:
 - reads in a csv file with existing movies
 - accepts an actor/actress id as an argument
 - checks for movies of that performer that are not in the original dataframe
 - adds the new rows to the df from the imported csv file
 - exports the new df back to its original file
 
**Hint:** If you're connecting to api data or web scraping, processes like this can be helpful if you want to regularly collect and store new data.

Here are the id's of some performers:

 - **Will Smith**: 2888
 - **Nicole Kidman**: 2227
 - **Robert Downey Jr**: 3223
 - **Reese Witherspoon**: 368

**Bonus I:**

In [121]:
# your code here
def find_new_rows(imdb_id, df):
    # store api key -- probably good to use as an argument
    api_key  = 'c6c78f5b65558f9fc6ab9a3ef2d8ba7d'
    # root url
    base     = 'https://api.themoviedb.org/3'
    # formatted api string
    api_str  = f'/discover/movie?with_cast={imdb_id}&api_key={api_key}'
    # connect to the api
    data     = requests.get(base+api_str).json()
    # turn into a dataframe
    api_df   = pd.DataFrame(data['results'], columns=['title', 'popularity', 'vote_count', 'vote_average'])
    # check for new values on the left dataframe
    new_vals = api_df.merge(df, how='left', indicator=True)
    # return only values that exist on the left side of the join  new_vals = new_vals[new_vals._merge == 'left_only'].iloc[:, :-1]
    return new_vals

**Bonus II:**

In [125]:
# first, we'll write the df to a csv
movie_df.to_csv('movies.csv', index=False)

In [126]:
# and write the function
def append_new_rows(imdb_id, path='movies.csv'):
    api_key  = 'c6c78f5b65558f9fc6ab9a3ef2d8ba7d' # same as before
    base     = 'https://api.themoviedb.org/3' # ditto
    api_str  = f'/discover/movie?with_cast={imdb_id}&api_key={api_key}' # ditto
    data     = requests.get(base+api_str).json() # ditto
    old_df   = pd.read_csv(path) # read in old df
    api_df   = pd.DataFrame(data['results'], columns=['title', 'popularity', 'vote_count', 'vote_average'])
    new_vals = api_df.merge(old_df, how='left', indicator=True)
    new_vals = new_vals[new_vals._merge == 'left_only'].iloc[:, :-1]
    print(f"Found {new_vals.shape[0]} new values, adding them to dataframe")
    old_df   = pd.concat([old_df, new_vals])
    old_df.to_csv(path, index=False)