Hidden Hits

Notebooks

Hidden Hits 1 - Data Collection (written by Mir Adnan Mahmood) contains code used to generate a dataset containing information on songs that appeared in the Billboard Hot 100's year end list from 2010 to 2020. The dataset contains the following features:
- Information on Title, Artists, Year of appearance on Hot 100 and Rank in Hot 100 are scraped through Billboard's website
- Information on Lyrics is scraped (using information on song title name and artist names) through Genius.com The final output for this notebook is the dataset hot_100_lyrics.csv file.
Hidden Hits 2 - Spotify-Kaggle Dataset Clean (written by Mir Adnan Mahmood) contains code that extracts song features for songs that appear on the Billboard Hot 100s Year End lists from the Kaggle Dataset created by Yamac Eren Ay found here. The code first creates a subset of the Hot 100's data tracks_hot100.csv and supplements it with another dataset that contains song features for songs that did not appear on the Hot 100s list. This dataset is found in tracks_not100_sub.csv.
Hidden Hits 3 - Hot 100 Classification (written by Khyati Malik and JongHoon Shin) contains the code to run the classification algorithm that predicts whether or not a song is a Hot 100 or a Not 100 song using a Random Forest Classifier. Results for different models and features are published in the NewUpdate folder. Note: This file is the same as the hot_100_classification_v1 notebook in the NewUpdate folder.
Hidden Hits 4 - TF-IDF Measure and Clustering (written by Mike Pierce and Ritwik Banerji) contains code to quantize the space of lyrics using TF-IDF measure, then group songs using KMeans clustering.

All old notebooks and data files are in the scraps folder.

Miscellaneous

The .gitignore file ensures that (1) we don't upload any of the .ipynb_checkpoints files, and (2) we don't upload a directory named data where we should keep our large data sets because GitHub won't let us upload too large of files.

Contributors

Mir Adnan Mahmood — Data Collection and Cleaning
Ritwik Banerji — Unsupervised Learning: classification of songs via lyrics
Mike Pierce — Unsupervised Learning: classification of songs via lyrics
Khyati Malik — Supervised Learning: Classification of Hot-or-Not songs based on sonic features
JongHoon Shin — Supervised Learning: Classification of Hot-or-Not songs based on sonic features
Rabail Chandio — Post-processing and consolidation

This project was created as part of the Erdös Institute's DataScience Bootcamp 2021.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
NewUpdate		NewUpdate
scraps		scraps
.gitignore		.gitignore
Hidden Hits 1 - Data Collection.ipynb		Hidden Hits 1 - Data Collection.ipynb
Hidden Hits 2 - Spotify-Kaggle Dataset Clean.ipynb		Hidden Hits 2 - Spotify-Kaggle Dataset Clean.ipynb
Hidden Hits 3 - Hot 100 Classification.ipynb		Hidden Hits 3 - Hot 100 Classification.ipynb
Hidden Hits 4 - TD-IDF Measure and Clustering.ipynb		Hidden Hits 4 - TD-IDF Measure and Clustering.ipynb
README.md		README.md
Using TF IDF to classify songs (by their lyrics) in the Hot 100.ipynb		Using TF IDF to classify songs (by their lyrics) in the Hot 100.ipynb
hot_100.csv		hot_100.csv
hot_100_fmtd.csv		hot_100_fmtd.csv
hot_100_lyrics.csv		hot_100_lyrics.csv
hot_100_nolyrics.csv		hot_100_nolyrics.csv
hot_100_with_lyrics.csv		hot_100_with_lyrics.csv
tracks_hot100.csv		tracks_hot100.csv
tracks_not100.csv		tracks_not100.csv
tracks_not100_sub.csv		tracks_not100_sub.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hidden Hits

Notebooks

Miscellaneous

Contributors

About

Releases

Packages

Contributors 5

Languages

astromusicologist/Hidden-Hits

Folders and files

Latest commit

History

Repository files navigation

Hidden Hits

Notebooks

Miscellaneous

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages