In [1]:
!pip install tmdbsimple

Collecting tmdbsimple
  Using cached tmdbsimple-2.9.1-py3-none-any.whl (38 kB)
Installing collected packages: tmdbsimple
Successfully installed tmdbsimple-2.9.1


In [8]:
import json
# Load API credentials
with open ('/Users/ERNESTO/.secret/tmdb_api.json', 'r') as f:
    login = json.load(f)
    



In [9]:
# DISPLAY THE KEYS
login.keys()

dict_keys(['client-id', 'api-key'])

In [10]:
import tmdbsimple as tmdb
tmdb.API_KEY = login['api-key']

In [12]:
## make a movie object using the .Movies function from tmdb
movie = tmdb.Movies(603)
movie

<tmdbsimple.movies.Movies at 0x137e90cc9d0>

In [13]:
## movie objects have a .info dictionary 
info = movie.info()
info



{'adult': False,
 'backdrop_path': '/l4QHerTSbMI7qgvasqxP36pqjN6.jpg',
 'belongs_to_collection': {'id': 2344,
  'name': 'The Matrix Collection',
  'poster_path': '/bV9qTVHTVf0gkW0j7p7M0ILD4pG.jpg',
  'backdrop_path': '/bRm2DEgUiYciDw3myHuYFInD7la.jpg'},
 'budget': 63000000,
 'genres': [{'id': 28, 'name': 'Action'},
  {'id': 878, 'name': 'Science Fiction'}],
 'homepage': 'http://www.warnerbros.com/matrix',
 'id': 603,
 'imdb_id': 'tt0133093',
 'original_language': 'en',
 'original_title': 'The Matrix',
 'overview': 'Set in the 22nd century, The Matrix tells the story of a computer hacker who joins a group of underground insurgents fighting the vast and powerful computers who now rule the earth.',
 'popularity': 63.374,
 'poster_path': '/f89U3ADr1oiB1s9GkdPOEpXUk5H.jpg',
 'production_companies': [{'id': 79,
   'logo_path': '/tpFpsqbleCzEE2p5EgvUq6ozfCA.png',
   'name': 'Village Roadshow Pictures',
   'origin_country': 'US'},
  {'id': 174,
   'logo_path': '/IuAlhI9eVC9Z8UQWOIDdWRKSEJ.png'

In [14]:
info['budget']


63000000

In [15]:
info['revenue']


463517383

In [16]:
info['imdb_id']

'tt0133093'

# Searching with IMDB_ID

Try searching by the IMDb number. For example, What was the budget of Tom and Jerry which had an IMDb id of "tt1361336"?

In [17]:
movie = tmdb.Movies('tt1361336')
info = movie.info()
info['budget']


50000000

Searching by IMDB_ID will allow us to make API calls for the specific movies we have already filtered out from the IMDB database in the first part of our project!

# Saving the movie Certification/MPAA Rating
While MOST of the data we are interested in is stored in the .info(), the certification rating is not.

The README for the package's repository shows how to obtain this information

In [18]:
# example from package README
# source = https://github.com/celiao/tmdbsimple
releases = movie.releases()
for c in releases['countries']:
    if c['iso_3166_1'] == 'US':
        print(c['certification'])


PG
PG
PG


This above code will print the rating of the movie if it is in the US. (Recall that one specification for this project was that all of our movies will be US movies, but this is more generalizable for future projects where that may not be the case).



Instead of printing the certification separately, we want to add that to our dictionary results for movie.info().

In [20]:
# Get the movie object for the current id
movie = tmdb.Movies('tt1361336')
# save the .info .releases dictionaries
info = movie.info()
releases = movie.releases()
# Loop through countries in releases
for c in releases['countries']:
    # if the country abbreviation==US
    if c['iso_3166_1' ] =='US':
        ## save a "certification" key in the info dict with the certification
       info['certification'] = c['certification']


# Defining Our Function

The function should accept the movie_id as an argument.

Make sure to replace the current test id we used with a movie_id variable in your function definition!

It should return a dictionary of results that includes certification

In [21]:
def get_movie_with_rating(movie_id):
    # get the movie object for the currend id
    movie = tmdb.Movies(movie_id)
    
    #save the .info .releases dictionaries
    info = movie.info()
    
    releases = movie.releases()
    # loop thrpoght countries in releases 
    for c in releases['countries']:
        # if the country abbreviation == US
        if c['iso_3166_1'] =='US':
            # save a certificatiion key in info with the certification 
            info['certification'] = c['certification']
            
    return info

# Testing Our Function
## Single Test Movie - the Avengers
Now test your function on "The Avengers" (id="tt0848228"). What is its certification?

In [22]:
test = get_movie_with_rating("tt0848228") #put your function name here
test



{'adult': False,
 'backdrop_path': '/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg',
 'belongs_to_collection': {'id': 86311,
  'name': 'The Avengers Collection',
  'poster_path': '/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg',
  'backdrop_path': '/zuW6fOiusv4X9nnW3paHGfXcSll.jpg'},
 'budget': 220000000,
 'genres': [{'id': 878, 'name': 'Science Fiction'},
  {'id': 28, 'name': 'Action'},
  {'id': 12, 'name': 'Adventure'}],
 'homepage': 'https://www.marvel.com/movies/the-avengers',
 'id': 24428,
 'imdb_id': 'tt0848228',
 'original_language': 'en',
 'original_title': 'The Avengers',
 'overview': 'When an unexpected enemy emerges and threatens global safety and security, Nick Fury, director of the international peacekeeping agency known as S.H.I.E.L.D., finds himself in need of a team to pull the world back from the brink of disaster. Spanning the globe, a daring recruitment effort begins!',
 'popularity': 130.982,
 'poster_path': '/RYMX2wcKCBAr24UyPD7xwmjaTn.jpg',
 'production_companies': [{'id': 420,
   'logo_path

Check the very end of the output. You will see that certification is now included here:
    
Recall that originally, the movie.info() stopped at 'vote_count', but now we have successfully extracted the certification.

This function will make the project must easier and more efficient!

In [None]:
# More Thorough Testing
Ultimately, we want to be able to loop through a list of movie IDs and collect all of the results into a final file/dataframe.

Let's test our function in a similar context - looping through several movie IDs.

In [23]:
## testing our function by looping through a list of ids
import pandas as pd
test_ids = ["tt0848228", "tt0115937","tt0848228","tt0332280"]
results = []
for movie_id in test_ids:
    movie_info = get_movie_with_rating(movie_id)
    results.append(movie_info)
    
    
pd.DataFrame(results)



HTTPError: 404 Client Error: Not Found for url: https://api.themoviedb.org/3/movie/tt0115937?api_key=37b41ee55b32d5b7e8b71fd232f537cc

# What happened?
IMDB has a very large collection of movies of all sorts beyond just mainstream box office releases. Many of the movies in IMDB's database do not exist in the Movie Database's (TMDB's) collection.

# Handling this Error with Try and Except
We can handle this error by using a try and except statement.

In brief, a try-except statement works like an if-else statement:

Just like an if statement, we use the word try and a : and then start a new line indented inside of the try statement.

This is the code that currently errors that we want the function to try to execute.

We MUST include an except statement too, which works just like an else statement.

the except block includes the instructions for what to do if the try statement hits an error. (Errors are also called Exceptions).

In [24]:
## testing our function by looping through a list of ids
import pandas as pd
test_ids = ["tt0848228", "tt0115937","tt0848228","tt0332280"]
results = []
for movie_id in test_ids:
    
    try:
        movie_info = get_movie_with_rating(movie_id)
        results.append(movie_info)
        
    except: 
        pass
    
pd.DataFrame(results)



Unnamed: 0,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,certification
0,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",220000000,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",https://www.marvel.com/movies/the-avengers,24428,tt0848228,en,The Avengers,...,1518815515,143,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Some assembly required.,The Avengers,False,7.708,27908,PG-13
1,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",220000000,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",https://www.marvel.com/movies/the-avengers,24428,tt0848228,en,The Avengers,...,1518815515,143,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Some assembly required.,The Avengers,False,7.708,27908,PG-13
2,False,/qom1SZSENdmHFNZBXbtJAU0WTlC.jpg,,29000000,"[{'id': 10749, 'name': 'Romance'}, {'id': 18, ...",http://www.newline.com/properties/notebookthe....,11036,tt0332280,en,The Notebook,...,115603229,123,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Behind every great love is a great story.,The Notebook,False,7.881,9980,PG-13


Now we can run our loop and get the remaining movies' information, even though one of our IDs caused an error with the TMDB API. How can we know which movies caused the problem? With only 4 movies in our list, we can simply check which is missing from our final results, but what if we had thousands of movie IDs?

## Saving Our Error Messages

Additionally, we can save the error message in a variable, which we will call e, just by saying except Exception as e: instead of just saying except:.

Once we've saved the error message, we can print the error if we want to see the error message. Since it is just being printed, it won't actually stop the loop from running, but we will still see the information.



However, If we are looping through a large collection of movies like you will be doing for the project, this could get very cluttered and will interfere with our progress bars.

Instead, we can create a list where we can append the movie id and error message. This way, we can check this list later to see how many movies caused an error - and which ones.

In [25]:
## testing our function by looping through a list of ids
import pandas as pd
test_ids = ["tt0848228", "tt0115937","tt0848228","tt0332280"]
results = []
errors = []
for movie_id in test_ids:
    
    try:
        movie_info = get_movie_with_rating(movie_id)
        results.append(movie_info)
        
    except Exception as e: 
        errors.append([movie_id, e])
    
pd.DataFrame(results)



Unnamed: 0,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,certification
0,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",220000000,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",https://www.marvel.com/movies/the-avengers,24428,tt0848228,en,The Avengers,...,1518815515,143,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Some assembly required.,The Avengers,False,7.708,27908,PG-13
1,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,"{'id': 86311, 'name': 'The Avengers Collection...",220000000,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",https://www.marvel.com/movies/the-avengers,24428,tt0848228,en,The Avengers,...,1518815515,143,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Some assembly required.,The Avengers,False,7.708,27908,PG-13
2,False,/qom1SZSENdmHFNZBXbtJAU0WTlC.jpg,,29000000,"[{'id': 10749, 'name': 'Romance'}, {'id': 18, ...",http://www.newline.com/properties/notebookthe....,11036,tt0332280,en,The Notebook,...,115603229,123,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Behind every great love is a great story.,The Notebook,False,7.881,9980,PG-13


Now we can check our error list to see if we have any errors and what they were.

In [26]:
print(f"- Number of errors: {len(errors)}")
errors



- Number of errors: 1


[['tt0115937',
  requests.exceptions.HTTPError('404 Client Error: Not Found for url: https://api.themoviedb.org/3/movie/tt0115937?api_key=37b41ee55b32d5b7e8b71fd232f537cc')]]

# Efficient TMDB API Calls
You have already written a function to combine the certification with the rest of the .info() from the TMDB API results. This lesson will help prepare you for the project. You may want to check out the specifications of project - part 2 for an overview of the task prior to working through this lesson.

# BEFORE THE LOOPS
## Designate a folder
You will save API call data in the data folder you created for project part 1