# COGS 118A - Project Checkpoint

# Names

- Peter Barnett
- William Lutz
- Ricardo Sedano

# Abstract 
This section should be short and clearly stated. It should be a single paragraph <200 words.  It should summarize: 
- what your goal/problem is
- what the data used represents and how they are measured
- what you will be doing with the data
- how performance/success will be measured

# Background

Fill in the background and discuss the kind of prior work that has gone on in this research area here. **Use inline citation** to specify which references support which statements.  You can do that through HTML footnotes (demonstrated here). I used to reccommend Markdown footnotes (google is your friend) because they are simpler but recently I have had some problems with them working for me whereas HTML ones always work so far. So use the method that works for you, but do use inline citations.

For everyday msuic listeners, music consumption may not be influenced by ouside factors, like your network's music activity, but instead be autonomous based on the user's taste in music. This autonomy is seen and valued by some, such as school students who amplify thier expereinces based of music and personal autonomy <a name="green"></a>[<sup>[1]</sup>](#greennote). 

 

# Problem Statement

Clearly describe the problem that you are solving. Avoid ambiguous words. The problem described should be well defined and should have at least one ML-relevant potential solution. Additionally, describe the problem thoroughly such that it is clear that the problem is quantifiable (the problem can be expressed in mathematical or logical terms), measurable (the problem can be measured by some metric and clearly observed), and replicable (the problem can be reproduced and occurs more than once).

# Data

UPDATED FROM PROPOSAL!

You should have obtained and cleaned (if necessary) data you will use for this project.

Please give the following infomration for each dataset you are using
- link/reference to obtain it
- description of the size of the dataset (# of variables, # of observations)
- what an observation consists of
- what some critical variables are, how they are represented
- any special handling, transformations, cleaning, etc you have done should be demonstrated here!


# Proposed Solution

In this section, clearly describe a solution to the problem. The solution should be applicable to the project domain and appropriate for the dataset(s) or input(s) given. Provide enough detail (e.g., algorithmic description and/or theoretical properties) to convince us that your solution is applicable. Make sure to describe how the solution will be tested.  

If you know details already, describe how (e.g., library used, function calls) you plan to implement the solution in a way that is reproducible.

If it is appropriate to the problem statement, describe a benchmark model<a name="sota"></a>[<sup>[3]</sup>](#sotanote) against which your solution will be compared. 

# Evaluation Metrics

Propose at least one evaluation metric that can be used to quantify the performance of both the benchmark model and the solution model. The evaluation metric(s) you propose should be appropriate given the context of the data, the problem statement, and the intended solution. Describe how the evaluation metric(s) are derived and provide an example of their mathematical representations (if applicable). Complex evaluation metrics should be clearly defined and quantifiable (can be expressed in mathematical or logical terms).

# Preliminary results

!pip install spotipy

import numpy as np
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

def get_features_from_playlist(playlist_name):

    #create authentication/my credentials using Spotipy
    cid = 'f20bed5bd1e6439ab409ed55211f1f9d'
    secret = '27fb5f47ab8043978e797ddc884e99c3'
    client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
    sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

    #search for the playlist by name/can change to be by specific spotify ID
    results = sp.search(q=playlist_name, type='playlist')

    #pull top playist ID from the search results
    playlist_id = results['playlists']['items'][0]['id']

    #get the tracks from that playlist
    playlist_tracks = sp.playlist_tracks(playlist_id)

    #extract the track IDs from that playlist
    track_ids = [track['track']['id'] for track in playlist_tracks['items']]

    # retrieve the audio features for each track in the playlist
    audio_features = []
    for i in range(0, len(track_ids), 50):
        features = sp.audio_features(track_ids[i:i+50])
        audio_features.append(features)

    #filter out irrelevenat info from data (type, id, uri, track_href, analysis_url)
    filtered_audio_features = []
    for feature_set in audio_features:
        for feature in feature_set:
            filtered_feature = {key: value for key, value in feature.items() if key not in ['type', 'id', 'uri', 'track_href', 'analysis_url']}
            filtered_audio_features.append(filtered_feature)

    data = []
    for feature in filtered_audio_features:
        data.append(list(feature.values()))

    return data

#pull in data from 'Top Songs - Global' playlist
data = get_features_from_playlist('Top Songs - Global')

#kmeans of 5, we can tune later
kmeans = KMeans(n_clusters=5, random_state=0).fit(data)

#get the cluster for each song
clusters = kmeans.predict(data)

#randomly creating training and test sets
X_train, X_test, y_train, y_test = train_test_split(data, clusters, test_size=0.3, random_state=42)

#heres the different data sets
#x is all the feature data and y is the corresponding grouping/cluster
print("Training data:")
print(X_train)
print(y_train)
print("Test data:")
print(X_test)
print(y_test)

NEW SECTION!

Please show any preliminary results you have managed to obtain.

Examples would include:
- Analyzing the suitability of a dataset or alogrithm for prediction/solving your problem 
- Performing feature selection or hand-designing features from the raw data. Describe the features available/created and/or show the code for selection/creation
- Showing the performance of a base model/hyper-parameter setting.  Solve the task with one "default" algorithm and characterize the performance level of that base model.
- Learning curves or validation curves for a particular model
- Tables/graphs showing the performance of different models/hyper-parameters



# Ethics & Privacy

There are little to no obvious ethics & privacy concerns that arise from our project. As seen in our project checkpoint data section, a public playlist created by Spotify is being used. However, if a user decides to input a playlist associated with the user, then the user's data in terms of playlist features (username, playlist creation date, audio, spotify URI) would be seen and handled. This can be possibly extended on a larger scale, if users import other users' playlists who may not concent to sharing data for algorithmic purposes.


# Team Expectations 

Each project team member has agreed to and and is expected to: 
* Attend all-team meetings
* Remain attentive; communicate quickly and effectively
* Do work assigned to them
* Be respectful of each others' work
* Stay aware of deadlines and collaborate in favor of completing a comprehensive project

# Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 2/22  |  6 PM |  Brainstorm topics (all); exchange contact information | Project topic, Discuss ideal dataset(s) and ethics;Edit, finalize, and submit project proposal| 
| 3/6  |  7 PM |  Import & Wrangle Data | Discuss Wrangling and possible analytical approaches; Finalize wrangling/EDA; Begin programming for project | 
| 3/20  | 7 PM  | Continue programming for project | Discuss/edit project code; Draft results/conclusion/discussion   |
| 3/22  | Before 11:59 PM  | Discuss/edit full project; Complete project | Turn in Final Project  |

# Footnotes
<a name="greennote"></a>1.[^](#green): Green, L. (2006) Popular music education in and for itself, and for ‘other’ music: current research in the classroom *The Institute of Education, University of London, UK*. https://journals.sagepub.com/doi/pdf/10.1177/0255761406065471?casa_token=-foOp6WBUzEAAAAA:0PWTgGnA4MxWmD3SJ2gvEpp42Cg5Tm4WQJ9gZGwnGdW2Vr_RPo7VbZS-3HAQKJvTz50Mivo3RMaC<br> 
<a name="admonishnote"></a>2.[^](#admonish): Also refs should be important to the background, not some randomly chosen vaguely related stuff. Include a web link if possible in refs as above.<br>
<a name="sotanote"></a>3.[^](#sota): Perhaps the current state of the art solution such as you see on [Papers with code](https://paperswithcode.com/sota). Or maybe not SOTA, but rather a standard textbook/Kaggle solution to this kind of problem
