<a href="https://colab.research.google.com/github/Rohan-1103/Data-Science/blob/main/session_27_fetch_from_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import requests

### Notebook Concepts Summary

This notebook demonstrates the process of programmatically collecting data from a web API, processing it with `pandas`, and saving it locally.

Key steps and concepts include:

1.  **API Data Retrieval**: Using the `requests` library to make HTTP GET requests to `The Movie Database (TMDb)` API to fetch top-rated movie data.
2.  **API Pagination**: Iterating through multiple pages of the API endpoint (`page=1` to `page=428`) to collect a comprehensive dataset that spans beyond a single API call's limit.
3.  **Data Structuring with Pandas**: Converting JSON responses from the API into `pandas DataFrames`, selecting specific relevant columns such as `id`, `title`, `overview`, `release_date`, `popularity`, `vote_average`, and `vote_count`.
4.  **Data Concatenation**: Efficiently combining data from multiple API responses into a single, unified `pandas DataFrame` using `pd.concat()`. This is crucial for aggregating paginated data.
5.  **Data Exploration**: Briefly checking the dimensions of the final DataFrame using `df.shape` to confirm the total number of rows and columns collected.
6.  **Data Persistence**: Saving the collected and processed DataFrame to a CSV file named `movies.csv` using `df.to_csv()`, making the data available for future use or analysis.

In [2]:
response = requests.get('https://api.themoviedb.org/3/movie/top_rated?api_key=8265bd1679663a7ea12ac168da84d2e8&language=en-US&page=1')

In [3]:
df = pd.DataFrame(response.json()['results'])[['id','title','overview','release_date','popularity','vote_average','vote_count']]

In [4]:
df.head()

Unnamed: 0,id,title,overview,release_date,popularity,vote_average,vote_count
0,278,The Shawshank Redemption,Imprisoned in the 1940s for the double murder ...,1994-09-23,52.1244,8.714,29409
1,238,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",1972-03-14,33.7437,8.685,22200
2,240,The Godfather Part II,In the continuing saga of the Corleone crime f...,1974-12-20,20.14,8.571,13424
3,424,Schindler's List,The true story of how businessman Oskar Schind...,1993-12-15,14.2693,8.565,16939
4,389,12 Angry Men,The defense and the prosecution have rested an...,1957-04-10,10.4166,8.549,9609


In [5]:
df = pd.DataFrame()

In [11]:
df.shape

(8560, 7)

In [7]:
for i in range(1,429):
    response = requests.get('https://api.themoviedb.org/3/movie/top_rated?api_key=8265bd1679663a7ea12ac168da84d2e8&language=en-US&page={}'.format(i))
    temp_df = pd.DataFrame(response.json()['results'])[['id','title','overview','release_date','popularity','vote_average','vote_count']]
    df = pd.concat([df, temp_df], ignore_index=True)


In [8]:
df

Unnamed: 0,id,title,overview,release_date,popularity,vote_average,vote_count
0,278,The Shawshank Redemption,Imprisoned in the 1940s for the double murder ...,1994-09-23,52.1244,8.714,29409
1,238,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",1972-03-14,33.7437,8.685,22200
2,240,The Godfather Part II,In the continuing saga of the Corleone crime f...,1974-12-20,20.1400,8.571,13424
3,424,Schindler's List,The true story of how businessman Oskar Schind...,1993-12-15,14.2693,8.565,16939
4,389,12 Angry Men,The defense and the prosecution have rested an...,1957-04-10,10.4166,8.549,9609
...,...,...,...,...,...,...,...
8555,609681,The Marvels,When her duties send her to an anomalous wormh...,2023-11-08,17.3203,6.000,3192
8556,352492,XOXO,XOXO follows six strangers whose lives collide...,2016-08-26,4.5611,5.956,596
8557,228973,Backcountry,A couple on a deep-wilderness hike become hope...,2015-03-20,3.4224,5.956,707
8558,91314,Transformers: Age of Extinction,As humanity picks up the pieces after the batt...,2014-06-25,21.1305,5.956,8550


In [9]:
df.shape

(8560, 7)

In [10]:
df.to_csv('movies.csv')

### You can create your own datasets on Kaggle using the apis.

- https://rapidapi.com/hub
- https://www.kaggle.com/