## Lab:  Working With the Movies Database API 

This lab will guide you through working accessing different aspects of the movies database api.  The documentation can be found here:  https://www.themoviedb.org/documentation/api

**Step 1:** Make sure you have an access key.

**Step 2:** The complete documentation that you'll need to reference can be found here:  https://developers.themoviedb.org/3/getting-started

For now, we are going to use the `/discover` tool to answer the following questions.

**Step 3:** Select all movies made in the year 2012.   The `total_results` key should return 10000 values.

In [3]:
import requests

In [1]:
url = 'https://api.themoviedb.org/3/discover/movie?api_key=072f1b08f2f53c9384b7284080afac18&primary_release_year=2012'


In [4]:
data = requests.get(url).json()

**Step 4:** Sort the movies from the previous API call by their popularity, in descending order.  (Use an API call for this, not pandas).  What was the most popular movie in 2012?

In [6]:
api_key = '072f1b08f2f53c9384b7284080afac18'
url_base = 'https://api.themoviedb.org/3/discover/movie'
query = f'?api_key={api_key}&primary_release_year=2012&sort_by=popularity.desc'

In [7]:
data = requests.get(url_base+query).json()

In [11]:
data['results'][0]['title']

'The Avengers'

**Step 5:** Create an api call that collects the 100 most popular movies released in 2017, 2018, and 2019.  

**Hint:** You may have noticed that the results returned to you only contain 20 items.  To get more you have to use the `page` parameter.

In [35]:
api_key = '072f1b08f2f53c9384b7284080afac18'
url_base = 'https://api.themoviedb.org/3/discover/movie'
query = f'?api_key={api_key}&primary_release_year=2017,2018,2019&sort_by=popularity.desc&page='
data = requests.get(url_base+query).json()

In [44]:
full_data = []
for page_num in [1, 2, 3, 4, 5]:
    query = f'?api_key={api_key}&primary_release_year=2017,2018,2019&sort_by=popularity.desc&page={page_num}'
    data = requests.get(url_base+query).json()
    full_data.append(data)

In [45]:
full_data

[{'page': 1,
  'total_results': 10000,
  'total_pages': 500,
  'results': [{'popularity': 443.655,
    'vote_count': 1306,
    'video': False,
    'poster_path': '/xBHvZcjRiWyobQ9kxBhO6B2dtRI.jpg',
    'id': 419704,
    'adult': False,
    'backdrop_path': '/5BwqwxMEjeFtdknRV792Svo0K1v.jpg',
    'original_language': 'en',
    'original_title': 'Ad Astra',
    'genre_ids': [12, 18, 9648, 878, 53],
    'title': 'Ad Astra',
    'vote_average': 6.1,
    'overview': 'The near future, a time when both hope and hardships drive humanity to look to the stars and beyond. While a mysterious phenomenon menaces to destroy life on planet Earth, astronaut Roy McBride undertakes a mission across the immensity of space and its many perils to uncover the truth about a lost expedition that decades before boldly faced emptiness and silence in search of the unknown.',
    'release_date': '2019-09-17'},
   {'popularity': 386.206,
    'vote_count': 319,
    'video': False,
    'poster_path': '/l4iknLOenijaB8

**Step 6:**  Take the results from a previous step and turn them into a dataframe.  Make columns for title, popularity, vote_count, and vote_average.

In [47]:
import pandas as pd

In [136]:
df= pd.DataFrame(index=[0],columns=['id','title','popularity','vote_count','vote_average'])
for results in full_data:
    df = pd.concat([df, pd.DataFrame(results['results'])[['id','title','popularity','vote_count','vote_average']]], axis=0)

In [137]:
df = df.dropna()

In [138]:
df.reset_index(drop=True, inplace=True)

**Step 7:** Do a new API call for all movies with Samuel Jackson in them. His id is 2231. Sort them by popularity in descending order.  Just the first page is fine. 

In [139]:
api_key = '072f1b08f2f53c9384b7284080afac18'
url_base = 'https://api.themoviedb.org/3/discover/movie'
query = f'?api_key={api_key}&with_people=2231&sort_by=popularity.desc&page=1'
data = requests.get(url_base+query).json()

In [140]:
data['results']

[{'popularity': 287.655,
  'vote_count': 116,
  'video': False,
  'poster_path': '/db32LaOibwEliAmSL2jjDF6oDdj.jpg',
  'id': 181812,
  'adult': False,
  'backdrop_path': '/jOzrELAzFxtMx2I4uDGHOotdfsS.jpg',
  'original_language': 'en',
  'original_title': 'Star Wars: The Rise of Skywalker',
  'genre_ids': [28, 12, 878],
  'title': 'Star Wars: The Rise of Skywalker',
  'vote_average': 6.7,
  'overview': 'The next installment in the franchise and the conclusion of the “Star Wars“ sequel trilogy as well as the “Skywalker Saga.”',
  'release_date': '2019-12-18'},
 {'popularity': 65.013,
  'vote_count': 16054,
  'video': False,
  'poster_path': '/7WsyChQLEftFiDOVTGkv3hFpyyt.jpg',
  'id': 299536,
  'adult': False,
  'backdrop_path': '/bOGkgRGdhrBYJSLpXaxhXVstddV.jpg',
  'original_language': 'en',
  'original_title': 'Avengers: Infinity War',
  'genre_ids': [28, 12, 878],
  'title': 'Avengers: Infinity War',
  'vote_average': 8.3,
  'overview': 'As the Avengers and their allies have continued 

**Step 8:** How would you check to see if there are any Samuel L. Jackson movies **not** in the csv file that you just created?

**Hint:** There are a few ways to do this, but the `merge` method gives you some nice options with the `indicator` option.

In [141]:
SLJ = pd.DataFrame(data['results'])[['id','title','popularity','vote_count','vote_average']]

In [142]:
df.shape

(100, 5)

In [143]:
SLJ.shape

(20, 5)

In [144]:
df[df['title']=='Avengers: Infinity War']

Unnamed: 0,id,title,popularity,vote_count,vote_average
21,299536,Avengers: Infinity War,75.228,16044,8.3


In [145]:
SLJ[SLJ['title']=='Avengers: Infinity War']

Unnamed: 0,id,title,popularity,vote_count,vote_average
1,299536,Avengers: Infinity War,65.013,16054,8.3


In [146]:
new_df = pd.merge(df, SLJ['title'], how='right',on=['title'], indicator=True)

In [147]:
new_df.head()

Unnamed: 0,id,title,popularity,vote_count,vote_average,_merge
0,181812,Star Wars: The Rise of Skywalker,287.655,116,6.7,both
1,299536,Avengers: Infinity War,75.228,16044,8.3,both
2,429617,Spider-Man: Far from Home,59.49,5363,7.6,both
3,299537,Captain Marvel,42.094,7851,7.0,both
4,24428,The Avengers,44.506,21077,7.7,both


In [148]:
new_df[new_df['_merge']=='right_only'][['title']]

Unnamed: 0,title
7,Pulp Fiction
8,Star Wars: Episode I - The Phantom Menace
9,Glass
10,Thor
11,Avengers: Age of Ultron
12,Star Wars: Episode III - Revenge of the Sith
13,GoodFellas
14,Incredibles 2
15,Kong: Skull Island
16,Star Wars: Episode II - Attack of the Clones


**Bonus:** See if you can write a function that accepts the IMDB id of an actor/actress as an argument, and adds their movies to the dataframe of 100 movies you just created if it doesn't exist.

**Extra Bonus:** Write the original dataframe to a csv file, and write a modified version of the above function that does the following:
 - reads in a csv file with existing movies
 - accepts an actor/actress id as an argument
 - checks for movies of that performer that are not in the original dataframe
 - adds the new rows to the df from the imported csv file
 - exports the new df back to its original file
 
**Hint:** If you're connecting to api data or web scraping, processes like this can be helpful if you want to regularly collect and store new data.

In [170]:
SLJ[SLJ['title']=='Avengers: Infinity War'].id.isin(df.id)[1]

True

In [169]:
df_new

NameError: name 'df_new' is not defined

In [177]:
def add_actor(actor_id, df):
    api_key = '072f1b08f2f53c9384b7284080afac18'
    url_base = 'https://api.themoviedb.org/3/discover/movie'
    query = f'?api_key={api_key}&with_people=2231&sort_by=popularity.desc&page=1&with_people={actor_id}'
    data = requests.get(url_base+query).json()
    for i in data['results']:
        df_new = pd.DataFrame(i)[['id','title','popularity','vote_count','vote_average']]
        if df_new.id.isin(df.id)[0]:
            break
        else:
            df = pd.concat([df, df_new], axis=0)
    return df        

In [178]:
add_actor(8891, df)

Unnamed: 0,id,title,popularity,vote_count,vote_average
0,419704,Ad Astra,443.655,1306,6.1
1,512200,Jumanji: The Next Level,386.206,319,6.8
2,509967,6 Underground,296.997,417,6.5
3,181812,Star Wars: The Rise of Skywalker,287.655,116,6.7
4,330457,Frozen II,201.075,1057,7.0
5,475557,Joker,193.784,6512,8.4
6,920,Cars,109.280,8443,6.7
7,466272,Once Upon a Time… in Hollywood,130.976,3263,7.5
8,492188,Marriage Story,127.278,715,8.1
9,449924,Ip Man 4: The Finale,137.514,12,5.8
