In [4]:
import pandas as pd
import requests           #We will create our http request by using this library.

### Collecting data by using TMDB's API

TMDB is a database for movies. There are lots of API's to fetch different types of movie data from their database.

In this notebook, we will fetch the data of <b style = "color:orange">top rated movies</b>. First, we need to go to <b style = "color:orange">tmdb api</b>. Then from there, we can select the api for top rated movies. After that, from <b style = "color:orange">Try out</b> section we will get a URL and by hitting that URL, tmdb will give us the data of top rated movies in <b style = "color:orange">JSON</b> format. But to do this step we need to provide an <b style = "color:orange">api_key</b> which can be found from TMDB's Account <b style = "color:orange">Settings -> API -> API Key</b> and put in the URL. And of course, One need to <b>sign up</b> for TMDB to do all this.

To view the details of the data in more readable format we can go to <b style = "color:orange">JSON viewer</b>. In the meantime, we need to copy the text, which we got from the TMDB's API and paste it onto JSON viewer's <b style = "color:orange">text field</b> and then click the button <b style = "color:orange">viewer</b>. This will give us the structure of the JSON, which is more like python Dictionary.

In [5]:
#URL for the first page of data 
url = 'https://api.themoviedb.org/3/movie/top_rated?api_key=2c9f8b40300fbb0d09ad79a573e3a65d&language=en-US&page=1'
response = requests.get(url)     #By creating a http request to TMDB's API, we can get the data from that database.
response              
#If we get Response [200], it means everything went smoothly.
#If we get Response [404], it means data is not availabe.
#If we get Response [500], it means the server is down.

<Response [200]>

In [9]:
"""
As we have only interest in 'results' key, because the data of movies are in it. So we will work on it.
"""
response.json()['results']   #To explicitly convert the incoming data into JSON format.

[{'adult': False,
  'backdrop_path': '/kXfqcdQKsToO0OUXHcrrNCHDBzO.jpg',
  'genre_ids': [18, 80],
  'id': 278,
  'original_language': 'en',
  'original_title': 'The Shawshank Redemption',
  'overview': 'Framed in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts his accounting skills to work for an amoral warden. During his long stretch in prison, Dufresne comes to be admired by the other inmates -- including an older prisoner named Red -- for his integrity and unquenchable sense of hope.',
  'popularity': 76.108,
  'poster_path': '/q6y0Go1tsGEsmtFryDOJo3dEmqu.jpg',
  'release_date': '1994-09-23',
  'title': 'The Shawshank Redemption',
  'video': False,
  'vote_average': 8.7,
  'vote_count': 21668},
 {'adult': False,
  'backdrop_path': '/90ez6ArvpO8bvpyIngBuwXOqJm5.jpg',
  'genre_ids': [35, 18, 10749],
  'id': 19404,
  'original_language': 'hi',
  'original_title': 'दिलवाले दुल्हनिया ले जा

#### Creating a DataFrame from 1 page

In [27]:
pd.DataFrame(response.json()['results']).head(10)   #Converting the 'results' list into a dataframe.

Unnamed: 0,adult,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count
0,False,/kXfqcdQKsToO0OUXHcrrNCHDBzO.jpg,"[18, 80]",278,en,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,76.108,/q6y0Go1tsGEsmtFryDOJo3dEmqu.jpg,1994-09-23,The Shawshank Redemption,False,8.7,21668
1,False,/90ez6ArvpO8bvpyIngBuwXOqJm5.jpg,"[35, 18, 10749]",19404,hi,दिलवाले दुल्हनिया ले जायेंगे,"Raj is a rich, carefree, happy-go-lucky second...",29.725,/2CAL2433ZeIihfX1Hb2139CX0pW.jpg,1995-10-19,Dilwale Dulhania Le Jayenge,False,8.7,3698
2,False,/rSPw7tgCH9c6NqICZef4kZjFOQ5.jpg,"[18, 80]",238,en,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",74.836,/3bhkrj58Vtu7enYsRolD1fZdja1.jpg,1972-03-14,The Godfather,False,8.7,16136
3,False,/loRmRzQXZeqG78TqZuyvSlEQfZb.jpg,"[18, 36, 10752]",424,en,Schindler's List,The true story of how businessman Oskar Schind...,43.867,/sF1U4EUQS8YHUYjNl3pMGNIQyr0.jpg,1993-11-30,Schindler's List,False,8.6,12891
4,False,/poec6RqOKY9iSiIUmfyfPfiLtvB.jpg,"[18, 80]",240,en,The Godfather: Part II,In the continuing saga of the Corleone crime f...,51.759,/hek3koDUyRQk7FIhPXsa6mT2Zc3.jpg,1974-12-20,The Godfather: Part II,False,8.6,9734
5,False,/ryr532va3rN7MADPAO2updA4Akz.jpg,"[10751, 18]",667257,es,Cosas imposibles,A widow who is tormented by the memory of her ...,16.954,/eaf7GQj0ieOwm08rrvjJQNbN0kN.jpg,2021-06-17,Impossible Things,False,8.5,248
6,False,/hZth9NCeXvvO7Xi98d8q34e1Ier.jpg,"[16, 10751, 14]",129,ja,千と千尋の神隠し,"A young girl, Chihiro, becomes trapped in a st...",88.086,/39wmItIWsg5sZMyRUHLkWBcuVCM.jpg,2001-07-20,Spirited Away,False,8.5,12983
7,False,/3RMLbSEXOn1CzLoNT7xFeLfdxhq.jpg,"[10749, 16]",372754,ja,同級生,"Rihito Sajo, an honor student with a perfect s...",15.143,/cIfRCA5wEvj9tApca4UDUagQEiM.jpg,2016-02-20,Dou kyu sei – Classmates,False,8.5,235
8,False,/mMtUybQ6hL24FXo0F3Z4j2KG7kZ.jpg,"[10749, 16, 18]",372058,ja,君の名は。,High schoolers Mitsuha and Taki are complete s...,172.632,/q719jXXEzOoYaps6babgKnONONX.jpg,2016-08-26,Your Name.,False,8.5,8829
9,False,/v5CEt88iDsuoMaW1Q5Msu9UZdEt.jpg,"[10749, 18]",730154,ja,きみの瞳が問いかけている,"A tragic accident lead to Kaori's blindness, b...",58.45,/cVn8E3Fxbi8HzYYtaSfsblYC4gl.jpg,2020-10-23,Your Eyes Tell,False,8.5,328


In [12]:
#We will create a dataframe using this below list of key values as columns.
df = pd.DataFrame(response.json()['results'])[['id','adult','title','overview','popularity']]
df.head(5)

Unnamed: 0,id,adult,title,overview,popularity
0,278,False,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,76.108
1,19404,False,Dilwale Dulhania Le Jayenge,"Raj is a rich, carefree, happy-go-lucky second...",29.725
2,238,False,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",74.836
3,424,False,Schindler's List,The true story of how businessman Oskar Schind...,43.867
4,240,False,The Godfather: Part II,In the continuing saga of the Corleone crime f...,51.759


#### Creating a DataFrame from all the pages.

In [31]:
"""
First we create an empty DataFrame.
"""
df1 = pd.DataFrame()
df1

In [32]:
"""
Then we collect data for every pages and append the data in dataframe. But We won't be able to move past page 500 because
the API will not allow us to do that.
"""
for i in range(1,501):
    res = requests.get('https://api.themoviedb.org/3/movie/top_rated?api_key=2c9f8b40300fbb0d09ad79a573e3a65d&language=en-US&page={}'.format(i))
    temp_df = pd.DataFrame(res.json()['results'])[['id','adult','title','overview','popularity']]
    df1 = df1.append(temp_df,ignore_index=True)
    
"""
ignore_index is set to True otherwise we will get index from 0 to 19 for 500 pages 
instead of automatically increasing indices.
"""

'\nignore_index is set to True otherwise we will get index from 0 to 19 for 500 pages \ninstead of automatically increasing indices.\n'

In [33]:
df1.head(7)

Unnamed: 0,id,adult,title,overview,popularity
0,278,False,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,76.108
1,19404,False,Dilwale Dulhania Le Jayenge,"Raj is a rich, carefree, happy-go-lucky second...",29.725
2,238,False,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",74.836
3,424,False,Schindler's List,The true story of how businessman Oskar Schind...,43.867
4,240,False,The Godfather: Part II,In the continuing saga of the Corleone crime f...,51.759
5,667257,False,Impossible Things,A widow who is tormented by the memory of her ...,16.954
6,129,False,Spirited Away,"A young girl, Chihiro, becomes trapped in a st...",88.086


In [34]:
df1.shape

(10000, 5)

In [35]:
df1.to_csv('Top_Rated_Movies.CSV')

In [36]:
#Working With newly created CSV file.
df2 = pd.read_csv("Top_Rated_Movies.CSV")
df2 = df2.iloc[:,1:]
df2

Unnamed: 0,id,adult,title,overview,popularity
0,278,False,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,76.108
1,19404,False,Dilwale Dulhania Le Jayenge,"Raj is a rich, carefree, happy-go-lucky second...",29.725
2,238,False,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",74.836
3,424,False,Schindler's List,The true story of how businessman Oskar Schind...,43.867
4,240,False,The Godfather: Part II,In the continuing saga of the Corleone crime f...,51.759
...,...,...,...,...,...
9995,407559,False,I Am the Pretty Thing That Lives in the House,A young nurse takes care of elderly author who...,10.172
9996,9320,False,The Avengers,"British Ministry agent John Steed, under direc...",23.100
9997,17532,False,S. Darko,"S. Darko follows Samantha Darko, the younger s...",18.298
9998,439015,False,Slender Man,"In a small town in Massachusetts, four high sc...",31.404


In [37]:
"""
If you want, you can upload your fetched data from some API into your Kaggle account. To get more datasets from
API's, you can go to "https://rapidapi.com/collections/" where you can find free API's across different categories to fetch
data from and convert them into CSV files and upload the dataset in Kaggle to boost your profile.
"""

'\nIf you want, you can upload your fetched data from some API in Kaggle to boost your profile. To get more datasets from\nAPI\'s you can go to "https://rapidapi.com/collections/" where you can find free API\'s across different categories to fetch\ndata and convert them into CSV files and upload the dataset in Kaggle to boost your profile.\n'