Skip to content

In this project, I scraped "nba.com" using selenium, beautiful soup and used the data extracted to build a remote PostgreSQL database of NBA players on elephantsql.com. I also deployed the project's data product on Heroku, an Apache Superset Dashboard for public consumption.

License

Notifications You must be signed in to change notification settings

Chizzy-codes/NBA_Players_Project_Fullstack_Datascience

Repository files navigation

NBA_Players_Project_Fullstack_Datascience

A Full-Stack, End to End Data Science Project implemented with data i got from the NBA Website.

Project Aim

The aim of this was to simplify and limit the time it takes for an NBA fan or a curious individual to find out useful facts about the current active players in the NBA.

DATA PIPELINE

The data pipeline for this process was built and scheduled (weekly) using Apache Airflow (https://airflow.apache.org/). After each successful run (every week) the data warehouse is updated using the most recent data from the data lake.

Project Workflow

The first component of the workflow was data collection. I used the NBA website as my data source and proceeded to gather data on all active players via web scraping (using requests, beautifulsoup4 and selenium). This includes players name, team, position, date of birth, nationality, last attended school, height, weight, age, current basketball stats for the ongoing season, etc.

This data is inserted into a sqlite3 database file and stored in a data lake; which in this case is a simple file folder (Second Component).

The third component extracts the most recent entry to the data lake, performs a custom ETL (Extraction, Transformation and Loading) process and Feature Engineering on the data. The processed data is then stored or used to update my data warehouse; a remote PostgreSQL Database instance on elephantsql.com.

Machine Learning

I also developed an outlier detection machine learning model to identify outlier players present in the database. I was also able to successfully predict the NBA Top Players. Out of my top 3 predictions, two of the players Nikola Jokic and Giannis Antetokounmpo took home awards at the end of the season. (Jupyter Notebook nbviewer link: https://nbviewer.jupyter.org/github/Chizzy-codes/NBA_Players_Project_Fullstack_Datascience/blob/master/jupyter_notebook/project_notebook.ipynb)

DATA PRODUCT

The final product was an Apache Superset Dashboard which i published on Heroku. The dashboard provides users with an intuitive, interactive and simple one stop shop for finding out the most important information on the current active players in the NBA. Check it out here https://nba-superset.herokuapp.com/superset/dashboard/4/

Achievement

As a result of completing this project, i ended up writing an article which is the guide on how to install apache superset on heroku on the internet.

ENJOY!

About

In this project, I scraped "nba.com" using selenium, beautiful soup and used the data extracted to build a remote PostgreSQL database of NBA players on elephantsql.com. I also deployed the project's data product on Heroku, an Apache Superset Dashboard for public consumption.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages