# Project 3 Part 2 API

<mark> ***Use an API to extract box office revenue and profit data to add to your IMDB data and perform exploratory data analysis.***

### Task

Your stakeholder wants you to extract the budget, revenue, and MPAA Rating (G/PG/PG-13/R), which is also called "Certification".

Note: this process can take a long time and may need to run overnight.
Specifications - Financial Data
Your stakeholder would like you to extract and save the results for movies that meet all of the criteria established in part 1 of the project (You should already have a filtered dataframe saved from part one as a csv.gz file)

* [ ] As a proof-of-concept, they requested you perform a test extraction of movies that started in 2000 or 2001

* [ ] Each year should be saved as a separate .csv.gz file

Hint: Use the two custom functions from the lessons (Intro to TMDB API, and Efficient TMDB API Calls). Be sure to define these functions prior to calling them in your code!

One function will add the certification (MPGG Rating) to movie.info
The other function will help you append/extend a JSON file with Python
Confirm Your API Function works.

* [ ] In order to ensure your function for extracting movie data from TMDB is working, test your function on these 2 movie ids: tt0848228 ("The Avengers") and tt0332280 ("The Notebook"). Make sure that your function runs without error and that it returns the correct movie's data for both test ids.

Hint: Ideally you can organize the code segments from the previous lesson to create an outer and inner loop, but if you get stuck, you can complete 1 year at a time.

Once you have retrieved and saved the final results to 2 separate .csv.gz files, move on to a new Exploratory Data Analysis notebook to explore the following questions.


Exploratory Data Analysis:

* [ ] Load in your csv.gz's of results for each year extracted.
* [ ] Concatenate the data into 1 dataframe for the remainder of the analysis.
* Once you have your data from the API, they would like you to perform some light EDA to show:
    * [ ] How many movies had at least some valid financial information (values > 0 for budget OR revenue)?
    * [ ] Please exclude any movies with 0's for budget AND revenue from the remaining visualizations.
    * [ ] How many movies are there in each of the certification categories (G/PG/PG-13/R)?
    * [ ] What is the average revenue per certification category?
    * [ ] What is the average budget per certification category?


Deliverables:
* After you have joined the tmdb results into 1 dataframe in the EDA Notebook
    * [ ] Save a final merged .csv.gz of all of the tmdb api data
    * [ ] The file name should be "tmdb_results_combined.csv.gz"
    * [ ] Make sure this is pushed to your github repository along with all of your code
    * [ ] One code file for API calls
    * [ ] One code file for EDA
    * [ ] Submit the link

### Imports

In [2]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import tmdbsimple as tmdb
import json

## Setting up the API

In [3]:
# loading api-key
with open('/Users/cameron/.secret/tmdb_api.json', 'r') as f:
    login = json.load(f)
# confirming
login.keys()

dict_keys(['api-key'])

In [4]:
# setting key in tmdb module
tmdb.API_KEY =  login['api-key']

In [7]:
movie = tmdb.Movies(603)
movie.info()

{'adult': False,
 'backdrop_path': '/l4QHerTSbMI7qgvasqxP36pqjN6.jpg',
 'belongs_to_collection': {'id': 2344,
  'name': 'The Matrix Collection',
  'poster_path': '/bV9qTVHTVf0gkW0j7p7M0ILD4pG.jpg',
  'backdrop_path': '/bRm2DEgUiYciDw3myHuYFInD7la.jpg'},
 'budget': 63000000,
 'genres': [{'id': 28, 'name': 'Action'},
  {'id': 878, 'name': 'Science Fiction'}],
 'homepage': 'http://www.warnerbros.com/matrix',
 'id': 603,
 'imdb_id': 'tt0133093',
 'original_language': 'en',
 'original_title': 'The Matrix',
 'overview': 'Set in the 22nd century, The Matrix tells the story of a computer hacker who joins a group of underground insurgents fighting the vast and powerful computers who now rule the earth.',
 'popularity': 119.14,
 'poster_path': '/f89U3ADr1oiB1s9GkdPOEpXUk5H.jpg',
 'production_companies': [{'id': 79,
   'logo_path': '/tpFpsqbleCzEE2p5EgvUq6ozfCA.png',
   'name': 'Village Roadshow Pictures',
   'origin_country': 'US'},
  {'id': 372,
   'logo_path': None,
   'name': 'Groucho II Film