# Notebook de Recomendaciones Mejoradas con SpotifyRecommender
Este notebook muestra cómo usar la clase `SpotifyRecommender` mejorada para generar recomendaciones de álbumes basadas en múltiples features, incluyendo audio-features, metadatos y PCA opcional.

## 1. Cargar la clase del recomendador

In [1]:
# 0. Asegurarnos de que el directorio de trabajo sea el root del repo
import os
from pathlib import Path

# Si estás en /notebooks, sube un nivel; si fuese otro nivel, ajusta .parents[n]
project_root = Path().resolve().parent
os.chdir(project_root)
print("Working directory actual:", os.getcwd())

# Opcional: añadir src al PYTHONPATH para poder importar tu módulo
import sys
sys.path.insert(0, str(project_root / "src"))


Working directory actual: E:\CDD1\SongReccomender\SongReccomender


In [2]:
import sys
from pathlib import Path

# Añade la carpeta src al PYTHONPATH
sys.path.insert(0, str(Path('src').resolve()))

# Importa la clase
from songrecommender.recommender.model import SpotifyRecommender

## 2. Instanciar el recomendador
- `data_path`: ruta al parquet enriquecido (por defecto `data/processed/spotify_enriched.parquet`)
- `n_neighbors`: número de vecinos para KNN
- `use_pca`: activar reducción de dimensionalidad
- `pca_n_components`: retener porcentaje de varianza si PCA es True
- `exclude_same_artist`: excluir recomendaciones del mismo artista

In [3]:
# Instanciar con PCA activado y 10 vecinos
recommender = SpotifyRecommender(
    data_path='data/processed/spotify_fully_enriched.parquet',
    n_neighbors=10,
    use_pca=True,
    pca_n_components=0.9,
    exclude_same_artist=True
)
print("Recomendador inicializado correctamente")

⚠️ Ignorando features inexistentes: {'acousticness', 'instrumentalness', 'loudness', 'tempo', 'duration_ms', 'speechiness'}
Recomendador inicializado correctamente


## 3. Listar géneros disponibles

In [4]:
# Mostrar los géneros únicos en el dataset
genres = recommender.list_genres()
print("Géneros disponibles:", genres)

Géneros disponibles: ['classical', 'dance', 'edm', 'hip hop', 'indie', 'jazz', 'kpop', 'pop', 'r&b', 'reggaeton', 'rock']


## 4. Recomendaciones por género
Obtener los álbumes top dentro de un género, ordenados por streams totales

In [5]:
# Ejemplo: top 5 álbumes de Pop
top_pop = recommender.recommend_by_genre('pop', n=5)
top_pop

Unnamed: 0,artist_name,album_name,genre,total_streams
290,The Weeknd,After Hours,pop,4958.08
477,Billie Eilish,Happier Than Ever,pop,4894.01
264,Billie Eilish,Happier Than Ever,pop,4889.6
479,BLACKPINK,BORN PINK,pop,4854.16
364,Karol G,MAÑANA SERÁ BONITO,pop,4785.67


## 5. Recomendaciones basadas en un álbum específico
Obtener álbumes similares al proporcionado

In [6]:
# Ejemplo: recomendaciones para 'After Hours' de 'The Weeknd'
recommendations = recommender.recommend_songs('After Hours', 'The Weeknd', n_recommendations=5)
recommendations

  row = self.df.loc[idx, self.all_features].fillna(0).values.reshape(1, -1)


Unnamed: 0,artist_name,album_name,genre,avg_danceability,avg_energy,avg_valence,avg_daily_streams,total_streams,album_total_tracks,album_popularity,artist_followers,artist_popularity,days_since_release
494,BTS,Proof,kpop,0.681488,0.679631,0.611396,581724.2,4804.15,35,,78952292,88,1069
3,Ed Sheeran,Autumn Variations,kpop,0.725783,0.70039,0.577577,1249782.0,2704.33,14,,120387762,90,593
4,Ed Sheeran,Autumn Variations,r&b,0.725783,0.70039,0.577577,1249782.0,3323.25,14,,120387762,90,593
5,Billie Eilish,Happier Than Ever,pop,0.55562,0.313521,0.347781,1327730.0,1087.06,16,,111682208,95,1384


## 6. Ajustes adicionales
- Puedes cambiar `n_neighbors` o desactivar `use_pca`.
- Para ver la influencia de PCA, inspecciona `recommender.pca.explained_variance_ratio_`.

In [7]:
# Inspeccionar varianza explicada por PCA
if recommender.use_pca:
    print("Varianza explicada por componentes PCA:", recommender.pca.explained_variance_ratio_)

Varianza explicada por componentes PCA: [0.99936136]


In [8]:
import pandas as pd

import pandas as pd

# 1) Encuentra el índice de After Hours
mask = (
    recommender.df['artist_name'].str.lower() == 'the weeknd'.lower()
) & (
    recommender.df['album_name'].str.lower() == 'after hours'.lower()
)
idx = recommender.df[mask].index[0]

# 2) Prepara el vector (ya imputado y escalado/PCA)
vec = recommender._prepare_vector(idx)

# 3) Pide K vecinos + 1 (para excluir el mismo)
dists, inds = recommender.model.kneighbors(vec, n_neighbors=6)
# dists[0][0] es 0 (mismo álbum); dists[0][1:] son las distancias relevantes
neighbor_idxs = inds[0][1:]
neighbor_dists = dists[0][1:]

# 4) Extrae los features “raw” del seed y de los vecinos
seed_feats = recommender.df.loc[idx, recommender.all_features]
neigh_feats = recommender.df.loc[neighbor_idxs, recommender.all_features]

# 5) Monta un DataFrame comparativo
comp = pd.DataFrame(
    [seed_feats.values] + neigh_feats.values.tolist(),
    columns=recommender.all_features
)
comp['distance'] = [0.0] + neighbor_dists.tolist()
comp['artist_name'] = ['The Weeknd'] + recommender.df.loc[neighbor_idxs, 'artist_name'].tolist()
comp['album_name']  = ['After Hours']  + recommender.df.loc[neighbor_idxs, 'album_name'].tolist()

# 6) Muestra ordenado por distancia
comp = comp.set_index(['artist_name','album_name'])
comp.sort_values('distance')


  row = self.df.loc[idx, self.all_features].fillna(0).values.reshape(1, -1)


Unnamed: 0_level_0,Unnamed: 1_level_0,avg_danceability,avg_energy,avg_valence,avg_daily_streams,total_streams,album_total_tracks,album_popularity,artist_followers,artist_popularity,days_since_release,...,genre_edm,genre_hip hop,genre_indie,genre_jazz,genre_kpop,genre_other,genre_pop,genre_reggaeton,genre_rock,distance
artist_name,album_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
The Weeknd,After Hours,0.608774,0.735933,0.483503,726968.8,2828.16,14.0,,104849813.0,97.0,1881.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
The Weeknd,After Hours,0.608774,0.735933,0.483503,726968.8,2828.16,14.0,,104849813.0,97.0,1881.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
BTS,Proof,0.681488,0.679631,0.611396,581724.2,4804.15,35.0,,78952292.0,88.0,1069.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
Ed Sheeran,Autumn Variations,0.725783,0.70039,0.577577,1249782.0,2704.33,14.0,,120387762.0,90.0,593.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
Ed Sheeran,Autumn Variations,0.725783,0.70039,0.577577,1249782.0,3323.25,14.0,,120387762.0,90.0,593.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Billie Eilish,Happier Than Ever,0.55562,0.313521,0.347781,1327730.0,1087.06,16.0,,111682208.0,95.0,1384.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
