# SpotiOrganizer

Este proyecto busca resolver el problema que representa el tener muchas canciones en likes, agrupándolas en playlists utilizando algoritmos de clustering como K-means. 

Para ello, se analiza el genero de la cancion y otros metadatos asociados a las caracteristicas del audio.

## Obtencion de la lista de "Me Gustas" del usuario

### Conectarse con Spotipy

Para acceder a esta informacion se requiere generar unas credenciales en "Spotify Developer Dashboard", se debe configurar el `.env` para poder autenticarse correctamente (El URL de redireccion no necesariamente debe ser valido, pero si identico).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json

In [None]:
from dotenv import load_dotenv
import os
import spotipy
from spotipy.oauth2 import SpotifyOAuth


load_dotenv(".env")

scope = 'user-library-read,playlist-modify-public,playlist-modify-private'

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(
    client_id=os.getenv('SPOTIPY_CLIENT_ID'),
    client_secret=os.getenv('SPOTIPY_CLIENT_SECRET'),
    redirect_uri=os.getenv('SPOTIPY_REDIRECT_URI'),
    scope=scope
))


### Obtener la lista de origen
En este caso estamos obteniendo la lista de "Me gustas" del usuario, para poder analizarla.

In [None]:
try:
    results = sp.current_user_saved_tracks()
except Exception as e:
    print(f'Error al obtener las canciones del usuario: {e}')

tracks = results['items']

while results['next']:
    results = sp.next(results)
    tracks.extend(results['items'])

print(f'Total de canciones: {len(tracks)}')


Total de canciones: 571


In [None]:
print(tracks[0]['track']['name'])

Santiago De Chile


### Enriquecimiento de Datos

Crearemos un dataframe que contendrá la información de la canción, incluyendo el ID, nombre, y artista.

Dado que el género no siempre está disponible, imputaremos el género basado en el primer artista de la canción.

Además, extraeremos las características del audio para un análisis más profundo.



In [None]:

track_data = []

audio_feature_numerical_keys = [
    'danceability', 'energy', 'loudness', 'speechiness', 'acousticness',
    'instrumentalness', 'liveness', 'valence',
]
audio_feature_categorical_keys = ['tempo', 'time_signature', 'key', 'mode']

artist_genre_cache = {} # hash map para disminuir las llamadas a la API

def get_artist_genre(artist_id):
    if artist_id in artist_genre_cache:
        return artist_genre_cache[artist_id]
    
    artist_info = sp.artist(artist_id)
    genres = artist_info.get('genres', [])
    artist_genre_cache[artist_id] = genres
    return genres

def load_artist_genres(load_json):
    artist_genre_cache = {}
    if load_json:
        try:
            with open('artist_genre_cache.json', 'r') as f:
                artist_genre_cache = json.load(f) 
        except Exception as e:
            print(f'Error al cargar el archivo JSON: {e}')
    return artist_genre_cache

def save_artist_genres(artist_genre_cache, save_json=False):
    if save_json:
        try:
            with open('artist_genre_cache.json', 'w') as f:
                json.dump(artist_genre_cache, f)
        except Exception as e:
            print(f'Error al guardar el archivo JSON: {e}')

# procesar por lotes 

max_batch_size = 99
artist_genre_cache = load_artist_genres(load_json = False)

for i in range(0, len(tracks), max_batch_size):
    try:
        batch = tracks[i:min(i + max_batch_size, len(tracks))] # lotes

        track_ids = [item['track']['id'] for item in batch] #
        print(track_ids)
        audio_features = sp.audio_features(track_ids) # consulta de 100

        for j, item in enumerate(batch):
            track = item['track']
            artist_id = track['artists'][0]['id']
            track_id = track['id']

            genres = get_artist_genre(artist_id)
            genre = genres[0] if genres else None

            audio_feature = audio_features[j]

            track_info = {
                'id': track_id,
                'name': track['name'],
                'artist': track['artists'][0]['name'],
                'genre': genre
            }

            for key in audio_feature_numerical_keys:
                track_info[key] = audio_feature.get(key)

            for key in audio_feature_categorical_keys:
                track_info[key] = audio_feature.get(key)

            track_data.append(track_info)
    except Exception as e:
            print(f"Error al procesar [{i} - {min(i + max_batch_size, len(tracks))}] tracks: {e}") # ajustar reintento

save_artist_genres(artist_genre_cache, save_json=True)
df = pd.DataFrame(track_data)
df.head()

['3D4iziSjHi4jxG8r4pta7z', '19CmuECYssqkPWANF4nLWM', '2LsbizbOeNa4x6qsi2jAMb', '6hBfF54vgIAdtb5kbbvt75', '1qgtSWA2GbpAXfyC51M1ph', '5FAS5CyNd5jXR9wRpRtn3L', '3w3y8KPTfNeOKPiqUTakBh', '49FYlytm3dAAraYgpoJZux', '0JEoezwc786AUMKs2CBQRI', '7fss5WF5vuXRHvMsYRQXQH', '0ct6r3EGTcMLPtrXHDvVjc', '7ACxUo21jtTHzy7ZEV56vU', '7gKqjNMM4ikt7KFTRsgZgL', '7vOmSP2647oNUGGEhWd1cr', '3QaLlAMZDXa2s8eyVoowjw', '18YCZgBpjvd7OiDzlqffj9', '0GDOu0iuMWjSNyq37706Yj', '7MiZjKawmXTsTNePyTfPyL', '08jZu9HOb1jPh4U95n9Iso', '2tBOR5GtsrEUzvuvhHaiVU', '0zpKWgFT6SpxQhGIlJRnFy', '5d0Keu9WXiN1XZsAG88VcF', '5Kw1iXud2UNTfLfpIMDogG', '077xWZZIJMDRl1fLZSnwEB', '2xizRhme7pYeITbH1NLLGt', '1aDeaDooAd79p2sdImGG4a', '0WFryfbNKPXVtVQlz5dZ8H', '10rzSpg3kTFSr5e5tFy60A', '0obx517J85Rq970CuInlDr', '62xokumU0wMqOSsBuefhmB', '7dU9Z24yR9AKuS9cAe2Tj7', '0YUSVg7dncF1TzZ3IFf6Zh', '1v1oIWf2Xgh54kIWuKsDf6', '7HgG6kaNPUltGHFP6aIS4L', '2kLWSmKHhfUBepkIA9oNe5', '03jdl5GTBhKsgB1Yw8mVPh', '7ueP5u2qkdZbIPN2YA6LR0', '2XXP73OwM2WvZ2vcXAIDz3', '1W0OCrMsRP

Unnamed: 0,id,name,artist,genre,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,key,mode
0,3D4iziSjHi4jxG8r4pta7z,Santiago De Chile,Los Bunkers,chilean rock,0.558,0.801,-5.796,0.0489,0.000126,5e-05,0.176,0.542,120.09,4,2,0
1,19CmuECYssqkPWANF4nLWM,Para No Verte Más,La Mosca Tse-Tse,latin alternative,0.658,0.815,-6.834,0.105,0.0168,0.0,0.341,0.837,99.276,4,0,1
2,2LsbizbOeNa4x6qsi2jAMb,La flaca,Jarabe De Palo,latin alternative,0.871,0.481,-9.068,0.0383,0.52,5.5e-05,0.0809,0.664,103.063,4,7,1
3,6hBfF54vgIAdtb5kbbvt75,Nada Nuevo Bajo El Sol,Los Bunkers,chilean rock,0.483,0.891,-4.813,0.0304,0.0693,0.00122,0.0826,0.82,136.703,4,6,0
4,1qgtSWA2GbpAXfyC51M1ph,Por Ti,4x3,latin viral rap,0.761,0.535,-7.665,0.06,0.882,0.0,0.183,0.899,90.965,4,8,1


In [None]:
#guardar en un archivo CSV
df.to_csv('spotify_likes.csv', index=False)