Skip to content

Analysis of Spotify music statistics and characteristics for the most popular tracks, artists, and albums from 2017-2021, as determined by Spotify Top playlists and Billboard year-end charts.

Notifications You must be signed in to change notification settings

katieravenwood/Spotify-Five-Year-Analysis-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spotify x Billboard Top 20 Albums Five Year Analysis Project

Analysis of Spotify music statistics and characteristics for the most popular tracks, artists, and albums from 2017-2021, as determined by Billboard year-end Top Album charts.

For completion of the Final Project requirement of the Entity Academy / Woz-U Data Science curriculum.

A collaborative project of Bianca Serrano and Katie Ravenwood.

Phase 1: Dataset Creation

(Completed 27 December 2021)

The dataset was created via Python using Spotify's public API and playlists created based on the Billboard Top 200 Albums charts for 2017-2021.

Spotify Playlists:

Billboard 200 Top Albums 2017
Billboard 200 Top Albums 2018
Billboard 200 Top Albums 2019
Billboard 200 Top Albums 2020
Billboard 200 Top Albums 2021

Included variables:

  • Billboard Album Chart Name & Year
  • Album ID, Name, Release Year
  • Album artists' names, IDs, popularity, and associated genres
  • Track artists' names, IDs, popularity, and associated genres
  • Explicit designation
  • Audio features

Dataset creation notebooks:

Master Chart Table Creation
All Album Track Table Creation

Phase 2: Data Wrangling, Cleaning and Recoding

(Completed 11 January 2022)

Data was cleaned and recoded for analysis and machine learning predictions.

Data Wrangling and Cleaning Notebook:

Wrangling Cleaning and Recoding

Phase 3: Exploratory Analysis

(Completed 17 January 2022)

Exploratory analyses included visualization and standardization of audio feature variables, as well as correlation analysis and plotting.

Exploratory Analysis Notebook and RScript:

Exploratory Analysis Notebook
Exploratory Analysis RScript

Phase 4: Data Analysis and Machine Learning

(Completed 23 January 2022)

Linear regression and dependent t-tests were used to analyze the correlation between several audio features and McNemar Chi square was used to determine significant changes in genre presence over the dataset time frame. Tracks were grouped using the K Means method. Classification of track genres was tested via K Nearest Neighbors and Random Forest algorithms.

Data Analysis and Machine Learning Notebook and RScript:

Data Analysis and Machine Learning Notebook
Data Analysis RScript

Visualization

(Completed 1 February 2022)

Visualizations were created in Python and R for analyses and models.

Presentation

(Presented 2 February 2022)

Project was presented via Zoom to Woz U / Entity Academy faculty and students for internal review on 2 February 2022.

View the project presentation video on VIMEO:
Good Vibrations? Spotify x Billboard Top 200 Albums Five-Year Analysis Project

About

Analysis of Spotify music statistics and characteristics for the most popular tracks, artists, and albums from 2017-2021, as determined by Spotify Top playlists and Billboard year-end charts.

Topics

Resources

Stars

Watchers

Forks