# Imports and loading

**Imports**

In [1]:
import json
import tmdbsimple as tmdb

**Getting API credentials**

In [2]:
with open('/Users/Malue/.secret/tmdb_api.json', 'r') as f:
    login = json.load(f)
# Display the keys of the loaded dict
login.keys()

dict_keys(['client-id', 'api-key'])

In [3]:
tmdb.API_KEY = login['api-key']

# Querying Movies by ID

For our project we need to extract three pieces of info for each movie:
- Revenue
- Budget
- Certification (P, PG, etc.)

Read through the documentation and explore the output to determine where the required information is located.
- Next, use the MovieID with the tmdb.Movies() function to create a movie instance.
    - This is just an example that uses the arbitrary MovieID of 603 (The Matrix).

In [4]:
# Make a movie object using the .Movies function from tmdb
movie = tmdb.Movies(603)

- The data for the movie can be extracted as a dicy by running info() method on the movie obj.

In [5]:
# Movie objects have a .info dict
info = movie.info()
info

{'adult': False,
 'backdrop_path': '/oMsxZEvz9a708d49b6UdZK1KAo5.jpg',
 'belongs_to_collection': {'id': 2344,
  'name': 'The Matrix Collection',
  'poster_path': '/bV9qTVHTVf0gkW0j7p7M0ILD4pG.jpg',
  'backdrop_path': '/bRm2DEgUiYciDw3myHuYFInD7la.jpg'},
 'budget': 63000000,
 'genres': [{'id': 28, 'name': 'Action'},
  {'id': 878, 'name': 'Science Fiction'}],
 'homepage': 'http://www.warnerbros.com/matrix',
 'id': 603,
 'imdb_id': 'tt0133093',
 'original_language': 'en',
 'original_title': 'The Matrix',
 'overview': 'Set in the 22nd century, The Matrix tells the story of a computer hacker who joins a group of underground insurgents fighting the vast and powerful computers who now rule the earth.',
 'popularity': 79.245,
 'poster_path': '/f89U3ADr1oiB1s9GkdPOEpXUk5H.jpg',
 'production_companies': [{'id': 79,
   'logo_path': '/tpFpsqbleCzEE2p5EgvUq6ozfCA.png',
   'name': 'Village Roadshow Pictures',
   'origin_country': 'US'},
  {'id': 372,
   'logo_path': None,
   'name': 'Groucho II Film

- There is a lot of info here, including the budget and revenue; however, it does not include Certification, one of the three required pieces of information for our project.

In [6]:
info['budget']

63000000

In [7]:
info['revenue']

463517383

- We do have access to the imdb_id though, which we can match to our imdb dataframe of movies. We may recognize the initial 'tt' in the movie id.

In [8]:
info['imdb_id']

'tt0133093'

**Searching with IMDB_ID**
- Try searching for info using the imdb_id. For example what was the budget of 'Tom and Jerry' (which had id = tt1361336)?

In [9]:
movie = tmdb.Movies('tt1361336')
info = movie.info()
info['budget']

50000000

- Searching by IMDB_ID will allow us to make API calls for the specific movies we have already filtered out from the IMDB database in the first part of our movie project.

# Saving the Movie Certification/MPAA Rating

- While most of the data we need is stored in .info(), the certification rating is not.
- The README for the package repository shows how to obtain this information.

In [10]:
# Example from package README
# source = https://github.com/celiao/tmdbsimple
releases = movie.releases()
for c in releases['countries']:
    if c['iso_3166_1'] == 'US':
        print(c['certification'])

PG
PG
PG


- This above code will print the ratinng of a movie if it is from the US.
    - One specification from our project was to limit the data to movies from the US only, but this more generalize code will give us only US movies if there are future situations without previously filtered data.
- Instead of printing the cert separately , we want to add that to our dictionary results for movie.info()

In [11]:
# Get the movie object for the currently loaded id
movie = tmdb.Movies('tt1361336')
# Save the .info .releases dictionaries
info = movie.info()
releases = movie.releases()
# Loop through countries in releases
for c in releases['countries']:
    # if the country abbreviation==US
    if c['iso_3166_1'] == 'US':
        # Save a 'certification' key in the info dict with the certification
        info['certification'] = c['certification']

- This block will be very useful in the future, so let's save it as a function.

## Defining the function

- The function should accept the movie_id as an argument
    - Make sure to replace the previous test id with an actual variable
- The function should return a dict of results that includes certification

In [14]:
def get_movie_with_rating(movie_id):
    """Adapted from source = https://github.com/celiao/tmdbsimple"""
    # Get the movie object for the current id
    movie = tmdb.Movies(movie_id)

    # Save the .info and .releases dictionaries
    info = movie.info()
    releases = movie.releases()

    # Loop through countries in releases
    for c in releases['countries']:
        # If the country abbreviation==US
        if c['iso_3166_1']=='US':
            # Save a 'certification' key in info with the fetched certification
            info['certification'] = c['certification']

    return info

## Testing the function


**Sinlge movie test - The Avengers**

In [16]:
test = get_movie_with_rating('tt0848228')
test

{'adult': False,
 'backdrop_path': '/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg',
 'belongs_to_collection': {'id': 86311,
  'name': 'The Avengers Collection',
  'poster_path': '/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg',
  'backdrop_path': '/zuW6fOiusv4X9nnW3paHGfXcSll.jpg'},
 'budget': 220000000,
 'genres': [{'id': 878, 'name': 'Science Fiction'},
  {'id': 28, 'name': 'Action'},
  {'id': 12, 'name': 'Adventure'}],
 'homepage': 'https://www.marvel.com/movies/the-avengers',
 'id': 24428,
 'imdb_id': 'tt0848228',
 'original_language': 'en',
 'original_title': 'The Avengers',
 'overview': 'When an unexpected enemy emerges and threatens global safety and security, Nick Fury, director of the international peacekeeping agency known as S.H.I.E.L.D., finds himself in need of a team to pull the world back from the brink of disaster. Spanning the globe, a daring recruitment effort begins!',
 'popularity': 122.648,
 'poster_path': '/RYMX2wcKCBAr24UyPD7xwmjaTn.jpg',
 'production_companies': [{'id': 420,
   'logo_path

- The final value in our info dict is now 'certification' instead of 'vote_count', which will make our project much easier.

**More thorough testing**
- Ultimately we want to be able to loop through a list of movie ID's and collect all the results into a final file/dataframe.
- Let's test our function in a similar context - looping through several movie ID's.

In [18]:
# Testing our function by looping through a list of id's
import pandas as pd
test_ids = ["tt0848228", "tt0115937","tt0848228","tt0332280"]

"""
# Loop through test_ids
results = []
for movie_id in test_ids:
    movie_info = get_movie_with_rating(movie_id)
    results.append(movie_info)

pd.DataFrame(results)
"""

'\n# Loop through test_ids\nresults = []\nfor movie_id in test_ids:\n    movie_info = get_movie_with_rating(movie_id)\n    results.append(movie_info)\n\npd.DataFrame(results)\n'

ERROR^^^

**The above code gives us an error. What happened?**

- IMDB is a very large database with many movies beyond just box office hits. Many movies in IMDB's database do not exist in the movie database of TMDB.

**Handling the Error with Try and Except**
- We can handle the error by using try and except.
- Briefly, it works as follows:
    - Just like an if statement, we start with the word try followed by ':', then an indented newline
        - The code that runs into errors is the code we want to 'try' to run
        - If there are no errors, the code will continue running as usual.
    - Unlike an if statement, a try statement MUST have an except statement too.
        - The except block includes code to run if the try statement hits an error.

In [19]:
# Using a try an except statement to silence an error
results = []
for movie_id in test_ids:
    try:
        movie_info = get_movie_with_rating(movie_id)
        results.append(movie_info)
    except:
        pass
pd.DataFrame(results)

Unnamed: 0,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,certification
0,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",220000000,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",https://www.marvel.com/movies/the-avengers,24428,tt0848228,en,The Avengers,...,1518815515,143,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Some assembly required.,The Avengers,False,7.71,29277,PG-13
1,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",220000000,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",https://www.marvel.com/movies/the-avengers,24428,tt0848228,en,The Avengers,...,1518815515,143,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Some assembly required.,The Avengers,False,7.71,29277,PG-13
2,False,/qom1SZSENdmHFNZBXbtJAU0WTlC.jpg,,29000000,"[{'id': 10749, 'name': 'Romance'}, {'id': 18, ...",http://www.newline.com/properties/notebookthe....,11036,tt0332280,en,The Notebook,...,115603229,123,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Behind every great love is a great story.,The Notebook,False,7.88,10689,PG-13


- Now we can run our loop and get the remaining movie's info, even though one caused an error with the TMDB API.
- How do we know which movie caused the error?
    - With only 4 entries, we can simply inspect the data. But with larger datasets, we need a way to find and address this.

**Saving our error messages**
- We can save our error messages as a variable, 'e' here, by replacing 'except:' with 'except Exception as e:'
- We can then display the message when an error occurs, and since it is only displaying it, it will not interupt the loop.

In [20]:
# Using a try an except statement to silence an error
results = []
for movie_id in test_ids:
    try:
        movie_info = get_movie_with_rating(movie_id)
        results.append(movie_info)
    except Exception as e:
        display(e)

requests.exceptions.HTTPError('404 Client Error: Not Found for url: https://api.themoviedb.org/3/movie/tt0115937?api_key=f29d905280af1a20e9085e98e6f42234')

- However, in a large dataset, this could get very cluttered.
- Instead we can create a list and append any error messages, along with the matching movie_id. We can use this to later check which movies caused the errors.

In [21]:
# Using a try-except statement to catch and append the error message to a list
import pandas as pd
test_ids = ["tt0848228", "tt0115937","tt0848228","tt0332280"]
results = []
errors = []
for movie_id in test_ids:
    
    try:
        movie_info = get_movie_with_rating(movie_id)
        results.append(movie_info)
        
    except Exception as e: 
        errors.append([movie_id, e])
    
pd.DataFrame(results)

Unnamed: 0,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,certification
0,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",220000000,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",https://www.marvel.com/movies/the-avengers,24428,tt0848228,en,The Avengers,...,1518815515,143,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Some assembly required.,The Avengers,False,7.71,29277,PG-13
1,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",220000000,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",https://www.marvel.com/movies/the-avengers,24428,tt0848228,en,The Avengers,...,1518815515,143,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Some assembly required.,The Avengers,False,7.71,29277,PG-13
2,False,/qom1SZSENdmHFNZBXbtJAU0WTlC.jpg,,29000000,"[{'id': 10749, 'name': 'Romance'}, {'id': 18, ...",http://www.newline.com/properties/notebookthe....,11036,tt0332280,en,The Notebook,...,115603229,123,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Behind every great love is a great story.,The Notebook,False,7.88,10689,PG-13


In [22]:
print(f'- Number of errors: {len(errors)}')
errors

- Number of errors: 1


[['tt0115937',
  requests.exceptions.HTTPError('404 Client Error: Not Found for url: https://api.themoviedb.org/3/movie/tt0115937?api_key=f29d905280af1a20e9085e98e6f42234')]]

# Summary

- This lesson covered creating an account on TMDB, along with saving our credentials in a .secret file.
- We saw how large amounts of data can be accessed through API calls.
- We created a function to combine the ratings (certification) from the IMDB dataframe with info from TMDB.