## Lab:  Working With the Movies Database API 

This lab will guide you through working accessing different aspects of the movies database api.  The documentation can be found here:  https://www.themoviedb.org/documentation/api

**Step 1:** Make sure you have an access key.

**Step 2:** The complete documentation that you'll need to reference can be found here:  https://developers.themoviedb.org/3/getting-started

For now, we are going to use the `/discover` tool to answer the following questions.

**Step 3:** Select all movies made in the year 2012.   The `total_results` key should return 10000 values.

In [1]:
# your code here

import requests

In [2]:
url = "https://api.themoviedb.org/3/discover/movie?api_key=ebac3c8bd3e3fc43e0cbceb1b2598f18&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&page=1&primary_release_year=2012"

In [3]:
data = requests.get(url).json()

In [4]:
data

{'page': 1,
 'total_results': 10000,
 'total_pages': 500,
 'results': [{'popularity': 44.506,
   'vote_count': 21077,
   'video': False,
   'poster_path': '/cezWGskPY5x7GaglTTRN4Fugfb8.jpg',
   'id': 24428,
   'adult': False,
   'backdrop_path': '/hbn46fQaRmlpBuUrEiFqv0GDL6Y.jpg',
   'original_language': 'en',
   'original_title': 'The Avengers',
   'genre_ids': [28, 12, 878],
   'title': 'The Avengers',
   'vote_average': 7.7,
   'overview': 'When an unexpected enemy emerges and threatens global safety and security, Nick Fury, director of the international peacekeeping agency known as S.H.I.E.L.D., finds himself in need of a team to pull the world back from the brink of disaster. Spanning the globe, a daring recruitment effort begins!',
   'release_date': '2012-04-25'},
  {'popularity': 35.701,
   'vote_count': 2141,
   'video': False,
   'poster_path': '/q83VacoG2hHLvORTPtvCXKsdBSN.jpg',
   'id': 59961,
   'adult': False,
   'backdrop_path': '/wX0mOAa91dTAT2WCGRvvpWUKAeD.jpg',
   'or

**Step 4:** Sort the movies from the previous API call by their popularity, in descending order.  (Use an API call for this, not pandas).  What was the most popular movie in 2012?

In [77]:
# your code here
url_2 = "https://api.themoviedb.org/3/discover/movie?api_key=ebac3c8bd3e3fc43e0cbceb1b2598f18&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&page=1&primary_release_year=2012"

In [78]:
data = requests.get(url_2).json()

In [80]:
data['results'][0]['title']

'The Avengers'

**Step 5:** Create an api call that collects the 100 most popular movies released in 2017, 2018, and 2019.  

**Hint:** You may have noticed that the results returned to you only contain 20 items.  To get more you have to use the `page` parameter.

In [85]:
# your code here
api_key    = 'ebac3c8bd3e3fc43e0cbceb1b2598f18'
base       = 'https://api.themoviedb.org/3'

api_string = [
    f'/discover/movie?primary_release_year=2017&primary_release_year=2018&primary_release_year=2019&api_key={api_key}&sort_by=popularity.desc&page={i}'
    for i in range(1, 6)
    ]

In [87]:
data_2 = [requests.get(base + api_string[i]).json() for i in range(len(api_string))]

In [88]:
data_2

[{'page': 1,
  'total_results': 10000,
  'total_pages': 500,
  'results': [{'popularity': 476.043,
    'vote_count': 1323,
    'video': False,
    'poster_path': '/xBHvZcjRiWyobQ9kxBhO6B2dtRI.jpg',
    'id': 419704,
    'adult': False,
    'backdrop_path': '/5BwqwxMEjeFtdknRV792Svo0K1v.jpg',
    'original_language': 'en',
    'original_title': 'Ad Astra',
    'genre_ids': [12, 18, 9648, 878, 53],
    'title': 'Ad Astra',
    'vote_average': 6.1,
    'overview': 'The near future, a time when both hope and hardships drive humanity to look to the stars and beyond. While a mysterious phenomenon menaces to destroy life on planet Earth, astronaut Roy McBride undertakes a mission across the immensity of space and its many perils to uncover the truth about a lost expedition that decades before boldly faced emptiness and silence in search of the unknown.',
    'release_date': '2019-09-17'},
   {'popularity': 293.642,
    'vote_count': 344,
    'video': False,
    'poster_path': '/l4iknLOenijaB8

**Step 6:**  Take the results from a previous step and turn them into a dataframe.  Make columns for title, popularity, vote_count, and vote_average.

In [90]:
# your code here
import pandas as pd

In [98]:
df = [pd.DataFrame(page['results'], columns =['title', 'popularity','vote_count','vote_average']) for page in data_2]

In [99]:
df = pd.concat(df).reset_index(drop=True)

In [100]:
df.shape

(100, 4)

**Step 7:** Do a new API call for all movies with Samuel Jackson in them. His id is 2231. Sort them by popularity in descending order.  Just the first page is fine. 

In [101]:
# your code here
api_string = f'/discover/movie?with_cast=2231&api_key={api_key}'

In [102]:
data_sj= requests.get(base+api_string).json()

In [103]:
data_sj

{'page': 1,
 'total_results': 187,
 'total_pages': 10,
 'results': [{'popularity': 287.655,
   'vote_count': 116,
   'video': False,
   'poster_path': '/db32LaOibwEliAmSL2jjDF6oDdj.jpg',
   'id': 181812,
   'adult': False,
   'backdrop_path': '/jOzrELAzFxtMx2I4uDGHOotdfsS.jpg',
   'original_language': 'en',
   'original_title': 'Star Wars: The Rise of Skywalker',
   'genre_ids': [28, 12, 878],
   'title': 'Star Wars: The Rise of Skywalker',
   'vote_average': 6.7,
   'overview': 'The next installment in the franchise and the conclusion of the “Star Wars“ sequel trilogy as well as the “Skywalker Saga.”',
   'release_date': '2019-12-18'},
  {'popularity': 65.013,
   'vote_count': 16054,
   'video': False,
   'poster_path': '/7WsyChQLEftFiDOVTGkv3hFpyyt.jpg',
   'id': 299536,
   'adult': False,
   'backdrop_path': '/bOGkgRGdhrBYJSLpXaxhXVstddV.jpg',
   'original_language': 'en',
   'original_title': 'Avengers: Infinity War',
   'genre_ids': [28, 12, 878],
   'title': 'Avengers: Infinity W

**Step 8:** How would you check to see if there are any Samuel L. Jackson movies **not** in the csv file that you just created?

**Hint:** There are a few ways to do this, but the `merge` method gives you some nice options with the `indicator` option.

In [104]:
# your code here
sj = pd.DataFrame(data_sj['results'], columns =['id','title', 'popularity','vote_count','vote_average'])

In [105]:
sj.head()

Unnamed: 0,id,title,popularity,vote_count,vote_average
0,181812,Star Wars: The Rise of Skywalker,287.655,116,6.7
1,299536,Avengers: Infinity War,65.013,16054,8.3
2,429617,Spider-Man: Far from Home,60.627,5374,7.6
3,299537,Captain Marvel,44.726,7858,7.0
4,24428,The Avengers,44.506,21077,7.7


In [110]:
sj_check = sj.merge(df, how='left', indicator=True)

In [113]:
sj_check[sj_check['_merge']=='left_only'].count()

id              15
title           15
popularity      15
vote_count      15
vote_average    15
_merge          15
dtype: int64

**Bonus:** See if you can write a function that accepts the IMDB id of an actor/actress as an argument, and adds their movies to the dataframe of 100 movies you just created if it doesn't exist.

**Extra Bonus:** Write the original dataframe to a csv file, and write a modified version of the above function that does the following:
 - reads in a csv file with existing movies
 - accepts an actor/actress id as an argument
 - checks for movies of that performer that are not in the original dataframe
 - adds the new rows to the df from the imported csv file
 - exports the new df back to its original file
 
**Hint:** If you're connecting to api data or web scraping, processes like this can be helpful if you want to regularly collect and store new data.

In [None]:
# your code here