GitHub - JamesScottCode/unsupervised-learning: Unsupervised Learning - Spotify Tracks

Spotify Data Analysis

Analyze 100,000+ Spotify tracks to find natural groupings and insights. This project uses unsupervised learning (K-means, DBSCAN, Agglomerative Clustering) and dimensionality reduction (PCA, t-SNE).

Overview • Goal: Find hidden patterns in song features and identify how they cluster. • Techniques: • Data cleaning and scaling • Exploratory Data Analysis (EDA) • PCA for dimensionality reduction • K-means, DBSCAN, and Agglomerative Clustering • Evaluation with Silhouette, Calinski-Harabasz, Davies-Bouldin scores • t-SNE visualization

Project Structure

.
├── eda/                        # Plots and figures generated
├── dataset.csv                 # Spotify dataset
├── unsupervised_models.py     # Main pipeline code
└── README.md

Key Insights • Feature distributions vary (some near-normal, some heavily skewed). • PCA shows 7 components explain ~90% variance. • K-Means finds about 8 clusters (based on metrics). • DBSCAN is sensitive to hyperparameters and often yields fewer insights. • Agglomerative clustering and t-SNE help reveal distinct groups.

Future Directions • Incorporate genre encoding for finer analysis. • Improve hyperparameter tuning with automated methods. • Use domain expertise to interpret clusters and possibly build a recommendation system.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
dataset.csv		dataset.csv
unsupervised_models.ipynb		unsupervised_models.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages