## Github Classroom
Github project repository: https://github.com/cs418-fa24/project-check-in-team-11

## Project Introduction
Our project aims to understand the spotify song classification/reccomendation algorithm and to see if it can be accurately recreated. Through gathering songs and their respective specific statistics from Spotify, we will determine what aspects of songs does Spotify use the most to determine the mood classification of songs. In turn, we will then evaluate whether or not an overall mood can be determined accurately from a user's liked songs library.

## Scope Adjustments
We wanted to try and recreate the Spotify wrapped, however that was a large scope and more tailored towards recreating a listening profile based on other non-song related data such as listening history, time of day, and artist preference. We pivoted the scope to focus on song related data such as track features revolving around tempo, loudness, energy, danceability, etc.

## Data Collection and Cleaning

### Retrieve Liked Songs

In [5]:
import json

import spotipy
from spotipy.oauth2 import SpotifyOAuth

CLIENT_ID = '5136eebddeac4d10b82bb55a64dcf00e'
CLIENT_SECRET = 'a165c457e8e34acd97afa9e2cb812234'
REDIRECT_URI = 'http://localhost:8888/callback'

moods = {
    'HAPPY': '37i9dQZF1EVJSvZp5AOML2',
    'SAD': '37i9dQZF1EIh4v230xvJvd',
    'CHILL': '37i9dQZF1EIdNTvkcjcOzJ',
    'ENERGETIC': '37i9dQZF1EIcVD7Tg8a0MY'
}

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    redirect_uri=REDIRECT_URI,
    scope="playlist-read-private user-library-read"  # now accessing private user playlists
))

sp1 = spotipy.Spotify(auth_manager=SpotifyOAuth(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    redirect_uri=REDIRECT_URI,
    scope="playlist-read-private user-library-read"  # now accessing private user playlists
))

# Get the user's liked songs
results = sp.current_user_saved_tracks()
liked_songs = []

while results:
    for item in results['items']:
        track = item['track']
        features = sp1.audio_features(track['id'])[0]
        liked_songs.append({
            'name': track['name'],
            'id': track['id'],
            'acousticness': features['acousticness'],
            'danceability': features['danceability'],
            'duration_ms': features['duration_ms'],
            'energy': features['energy'],
            'instrumentalness': features['instrumentalness'],
            'key': features['key'],
            'liveness': features['liveness'],
            'loudness': features['loudness'],
            'mode': features['mode'],
            'speechiness': features['speechiness'],
            'tempo': features['tempo'],
            'time_signature': features['time_signature'],
            'valence': features['valence']
        })

    results = sp.next(results)

#TODO rename the file so that it does not overwrite anyone else's
with open('/Users/conrad/dev-school/418/final-project/raw/liked_songs_1.json', 'w') as json_file:
    json.dump(liked_songs, json_file, indent=4)


OSError: [Errno 48] Address already in use

### Retrieve Spotify-generated Playlists for Each Mood (happy, sad, energetic, chill)

In [None]:
### Steps to get playlists ready to pull
# 1.) Find your mix playlists for each mood (happy, sad, energetic, chill)
# 2.) Click on the "..." and add to another playlist and create a new one. Spotify will create a default name "<mood> Mix (2)"
# 3.) Once you repeat this for all the moods, you are ready to use this script

import json
import spotipy
from spotipy.oauth2 import SpotifyOAuth

# TODO insert info same as library.py...
CLIENT_ID = '5136eebddeac4d10b82bb55a64dcf00e'
CLIENT_SECRET = 'a165c457e8e34acd97afa9e2cb812234'
REDIRECT_URI = 'http://localhost:8888/callback'

moods = {
    'HAPPY': '37i9dQZF1EVJSvZp5AOML2',
    'SAD': '37i9dQZF1EIh4v230xvJvd',
    'CHILL': '37i9dQZF1EIdNTvkcjcOzJ',
    'ENERGETIC': '37i9dQZF1EIcVD7Tg8a0MY'
}

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    redirect_uri=REDIRECT_URI,
    scope="playlist-read-private"  # now accessing private user playlists
))

sp1 = spotipy.Spotify(auth_manager=SpotifyOAuth(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    redirect_uri=REDIRECT_URI,
    scope="playlist-read-private"  # now accessing private user playlists
))

for mood, p_id in moods.items():
    results = sp.playlist_items(p_id)
    tracks = []

    while results:
        for item in results['items']:
            track = item['track']
            features = sp1.audio_features(track['id'])[0]

            if features is None:
                continue

            tracks.append({
                'name': track['name'],
                'id': track['id'],
                'acousticness': features['acousticness'],
                'danceability': features['danceability'],
                'duration_ms': features['duration_ms'],
                'energy': features['energy'],
                'instrumentalness': features['instrumentalness'],
                'key': features['key'],
                'liveness': features['liveness'],
                'loudness': features['loudness'],
                'mode': features['mode'],
                'speechiness': features['speechiness'],
                'tempo': features['tempo'],
                'time_signature': features['time_signature'],
                'valence': features['valence']
            })
        print("mood complete")
        # get next set of tracks
        results = sp.next(results)

    #TODO make sure to enter the number corresponding to your data
    num = 1
    with open(f'spotify_{mood.lower()}_{num}.json', 'w') as file:
        json.dump(tracks, file, indent=4)

    file.close()


### Import into Pandas Dataframe

In [2]:
import json
import pandas as pd

moods = ['happy', 'sad', 'chill', 'energetic']
dfs = []
for mood in moods:
    files = [
        f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_1.json',
        # f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_2.json',
        f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_3.json',
        f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_4.json',
        # f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_5.json',
        # f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_6.json',
    ]

    for file in files:
        with open(file, 'r') as fileio:
            df = pd.DataFrame(json.load(fileio))
            df['mood'] = mood
            dfs.append(df)


final_df = pd.concat(dfs, ignore_index=True)

#processing
drop = ['name', 'id']
final_df = final_df.drop(columns=drop)



FileNotFoundError: [Errno 2] No such file or directory: '../raw/spotify_happy_1.json'

## Data Exploration
explain what your data looks like (words are fine, but visualizations are often better). Include any interesting issues or preliminary conclusions you have about your data.

## Data Visualization
that tests an interesting hypothesis, along with an explanation about why you thought this was an interesting hypothesis to investigate.

## ML Data Analysis

### Model Training



In [7]:
from sklearn.linear_model import (
    LinearRegression, LogisticRegression, Ridge, Lasso, ElasticNet,
    BayesianRidge, SGDRegressor, SGDClassifier, Perceptron, PassiveAggressiveRegressor,
    PassiveAggressiveClassifier, RidgeClassifier, RidgeCV, LassoCV, ElasticNetCV
)

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import (
    RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier,
    GradientBoostingRegressor, AdaBoostClassifier, AdaBoostRegressor,
    BaggingClassifier, BaggingRegressor, ExtraTreesClassifier, ExtraTreesRegressor,
    VotingClassifier, VotingRegressor, StackingClassifier, StackingRegressor
)

from sklearn.svm import SVC, SVR, LinearSVC, LinearSVR
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, CategoricalNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.gaussian_process import GaussianProcessClassifier, GaussianProcessRegressor
from sklearn.isotonic import IsotonicRegression
from sklearn.semi_supervised import LabelPropagation, LabelSpreading
from sklearn.mixture import GaussianMixture, BayesianGaussianMixture
from sklearn.cluster import (
    KMeans, MiniBatchKMeans, MeanShift, SpectralClustering, AgglomerativeClustering,
    DBSCAN, OPTICS, Birch, AffinityPropagation
)

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import json
import pandas as pd
import json
import numpy as np
from sklearn.preprocessing import LabelEncoder

moods = ['happy', 'sad', 'chill', 'energetic']
dfs = []
for mood in moods:
    files = [
        f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_1.json',
        # f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_2.json',
        f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_3.json',
        f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_4.json',
        f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_5.json',
        # f'/Users/conrad/dev-school/418/final-project/raw/spotify_{mood}_6.json',
    ]

    for file in files:
        with open(file, 'r') as fileio:
            df = pd.DataFrame(json.load(fileio))
            df['mood'] = mood
            dfs.append(df)

final_df = pd.concat(dfs, ignore_index=True)

#processing
drop = ['name', 'id']
final_df = final_df.drop(columns=drop)

final_df.to_csv('training.csv')

#splitting
X = final_df.iloc[:, 0:13]
Y = final_df.iloc[:, 13]
# encoder = LabelEncoder()
# y = encoder.fit_transform(Y)
xtrain, xtest, ytrain, ytest = train_test_split(X, Y, test_size=.2, random_state=1)

# k-fold
k = 10
forest = RandomForestClassifier(random_state=2)
scores = cross_val_score(forest, xtrain, ytrain, cv=k)
print('CV scores', scores)
print('Mean CV scores', np.mean(scores))

# single
forest.fit(xtrain, ytrain)
print('Fit Score', forest.score(xtest, ytest))


CV scores [0.65625    0.796875   0.65625    0.734375   0.609375   0.859375
 0.609375   0.65625    0.66666667 0.66666667]
Mean CV scores 0.6911458333333333
Fit Score 0.7125


## Analysis

Based on the output below, we can see the percentage of tracks classified as a certain mood. 
Talk a little bit more about the individual outputs.

In [8]:
#analysis

import json
import pandas as pd

dfs = []

files = [
    f'/Users/conrad/dev-school/418/final-project/raw/liked_songs_1.json',
    # f'/Users/conrad/dev-school/418/final-project/raw/liked_songs_2.json',
    f'/Users/conrad/dev-school/418/final-project/raw/liked_songs_3.json',
    f'/Users/conrad/dev-school/418/final-project/raw/liked_songs_4.json',
    # f'/Users/conrad/dev-school/418/final-project/raw/liked_songs_5.json',
    # f'/Users/conrad/dev-school/418/final-project/raw/liked_songs_6.json',
]

for file in files:
    with open(file, 'r') as fileio:
        df = pd.DataFrame(json.load(fileio))
        dfs.append(df)

#processing
drop = ['name', 'id']
for i in range(0, 3):
    dfs[i] = dfs[i].drop(columns=drop)

predictions = []

for df in dfs:
    predictions.append(forest.predict(df.iloc[:, :]))

person = 1
for prediction in predictions:
    print(f'Person {person}')
    print('Happy', (prediction.tolist().count('happy')/len(prediction.tolist()))*100)
    print('Sad', (prediction.tolist().count('sad')/len(prediction.tolist()))*100)
    print('Chill', (prediction.tolist().count('chill')/len(prediction.tolist()))*100)
    print('Energetic', (prediction.tolist().count('energetic')/len(prediction.tolist()))*100)
    print()
    person += 1

Person 1
Happy 26.87007874015748
Sad 13.385826771653544
Chill 37.99212598425197
Energetic 21.751968503937007

Person 2
Happy 38.88888888888889
Sad 12.962962962962962
Chill 7.4074074074074066
Energetic 40.74074074074074

Person 3
Happy 14.583333333333334
Sad 62.5
Chill 2.083333333333333
Energetic 20.833333333333336



## Progress reflection
○	What is the hardest part of the project that you’ve encountered so far?
○	What are your initial insights?
○	Are there any concrete results you can show at this point? If not, why not?
○	Going forward, what are the current biggest problems you’re facing?
○	Do you think you are on track with your project? If not, what parts do you need to dedicate more time to?
○	Given your initial exploration of the data, is it worth proceeding with your project, why? If not, how are you going to change your project and why do you think it’s better than your current results?


## Roles and Coordination
Finding data sources and cleaning:


Statistical analysis: 


Visualization: 


Machine Learning Applications: 

 
What deadlines should various components of the project be completed by?

## Next Steps
What you plan to accomplish in the next month and how you plan to evaluate whether your project achieved the goals you set for it.