API for streaming platforms

`Data Engineering`

¡Welcome!

We have 4 datasets from well-known streaming platforms: Amazon Prime, Disney+, Hulu, and Netflix. These datasets are contained in files with different extensions (csv and json), and our task is to ingest them, apply relevant transformations, and then make the clean data available for querying through an API. This API should be built in a Dockerized virtual environment and will be deployed on Mogenius.com.

Links:

API on Mogenius: https://pi01-data05-prod-pi01-data05-z6994f.mo6.mogenius.io/ (Update: no longer available due to service changes)

Video presentation: https://www.youtube.com/watch?v=j5qsUhYff_0 (In Spanish)

Project Scope

For this project, we should be able to:

Ingest and normalize the data.
Relate and transform the dataset as necessary to perform queries.
Create an API in a Docker environment.
Perform the requested queries, as outlined in the following section.
Create a demonstration video.
Deploy the API on Mogenius.

The queries to be performed are:

Maximum duration by type of film (movie/series), by platform, and by year: The request should be: get_max_duration(year, platform, [min or season])
Number of movies and series (separate) by platform: The request should be: get_count_platform(platform)
Number of times a genre and platform occur together most frequently. The request should be: get_listed_in('genre') As an example of genre, you can use 'comedy', which should return a count of 2099 for the Amazon platform.
Actor who appears most frequently by platform and year. The request should be: get_actor(platform, year)

Files within the Repository

Within this repository, we have a series of files and folders that we will briefly outline.

'Processes.ipynb' is a Jupyter notebook where the project is essentially deployed. It consists of a first part where the data is ingested, explored, and then transformed based on our needs (ETL). The final output is a '.csv' file containing the processed information that will be used by the API. A second part exposes the functions that will be used to perform queries on the API we create.
'Dockerfile' contains the necessary instructions to create a container image from scratch and indicates the base image that will be used, in our case FastAPI.
'Datasets' folder includes 5 files. Four of them are the sources of our data that were provided to us for this project. The fifth file is the '.csv' resulting from the ETL process we performed in the notebook, which will be taken by the API to perform the queries.
'app' folder contains the 'main.py' file, a Python file that contains everything needed to set up the API. Here, FastAPI is instantiated, the homepage for our API is defined, and the code for performing queries (which is also brought from the notebook) is written.

Technologies Used

Python
Docker
Python libraries: pandas, FastAPI
Mogenius.com
Github

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Datasets		Datasets
app		app
Processes.ipynb		Processes.ipynb
README.md		README.md
dockerfile		dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets

Datasets

app

app

Processes.ipynb

Processes.ipynb

README.md

README.md

dockerfile

dockerfile

Repository files navigation

API for streaming platforms

`Data Engineering`

¡Welcome!

Links:

Project Scope

The queries to be performed are:

Files within the Repository

Technologies Used

About

Releases

Packages

Languages

ValenteFog/API-Streaming-Platforms

Folders and files

Latest commit

History

Repository files navigation

API for streaming platforms

Data Engineering

¡Welcome!

Links:

Project Scope

The queries to be performed are:

Files within the Repository

Technologies Used

About

Resources

Stars

Watchers

Forks

Languages

`Data Engineering`