## **Part 5 : Real User Evaluation**

## Steps Taken:
### Data Collection:

* **Spotify Integration:** We integrated the Spotify API to retrieve the user's listening history. This involved setting up OAuth2 authentication to access the user's recent played tracks and their metadata. <br>
* **Data Extraction:** We extracted relevant information from the Spotify API, including the track name, artist, album, and the time the track was played.

### Data Preparation:

* **Training Data:** We used the real user's listening history as the training dataset.
* **User and Song Mappings:** We created mappings for user IDs and song IDs to integer indices to facilitate matrix operations.
* **Mask Matrix Construction:** A mask matrix was constructed to identify and filter out known user-song interactions, ensuring that recommendations are based on new interactions.
### Model Building:

* **SVD Application:** We applied SVD to the user-item interaction matrix to reduce dimensionality and uncover latent features representing user preferences and item attributes.
* **Score Calculation:** We calculated predicted scores for each user-song pair using the SVD components.
### Recommendation Generation:

* **Top-N Recommendations:** For each user, we generated top-N song recommendations by selecting the songs with the highest predicted scores.
* **Evaluation:** We evaluated the recommendations using precision, recall, and F1 score metrics.

### Evaluation Results:

* The system was evaluated at different recommendation list lengths **(N = 5, 10, 15, 20, 25, 30)**. The results showed high precision and steadily increasing recall and F1 scores as the number of recommendations increased.

In [None]:
# Install necessary libraries
!pip install spotipy surprise

In [1]:


import spotipy
from spotipy.oauth2 import SpotifyOAuth
import pandas as pd
import numpy as np
from surprise import Reader, Dataset, SVD, accuracy
from surprise.model_selection import train_test_split as surprise_train_test_split
from collections import Counter, defaultdict
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns
import time
import os

# Replace these with your actual Spotify credentials
SPOTIPY_CLIENT_ID = '8cdd489b24244a04963f68f496fea48b'
SPOTIPY_CLIENT_SECRET = 'f8289994d8b0480dbbf0cfc1a908ade2'
SPOTIPY_REDIRECT_URI = 'http://localhost:8888/callback'

def setup_and_retrieve_spotify_data(client_id, client_secret, redirect_uri):
    # Set up the Spotify API credentials with token caching
    cache_path = '.cache'
    sp_oauth = SpotifyOAuth(client_id=client_id,
                            client_secret=client_secret,
                            redirect_uri=redirect_uri,
                            scope="user-library-read user-read-recently-played",
                            cache_path=cache_path)

    # Get the access token
    token_info = sp_oauth.get_cached_token()
    if not token_info:
        auth_url = sp_oauth.get_authorize_url()
        print(f'Please navigate here: {auth_url}')
        response = input('Enter the URL you were redirected to: ')
        code = sp_oauth.parse_response_code(response)
        token_info = sp_oauth.get_access_token(code)
    
    access_token = token_info['access_token']

    # Create an instance of the Spotify client
    sp = spotipy.Spotify(auth=access_token)

    # Get recently played tracks
    results = sp.current_user_recently_played(limit=50)
    recent_tracks = results['items']

    # Extract relevant information
    listening_history = []
    for item in recent_tracks:
        track = item['track']
        listening_history.append({
            'user': 'current_user',  # Use a placeholder for user since Spotify API returns only the current user's data
            'song': track['name'],
            'artist': track['artists'][0]['name'],
            'album': track['album']['name'],
            'played_at': item['played_at']
        })

    # Convert to DataFrame for easier manipulation
    listening_history_df = pd.DataFrame(listening_history)
    
    return listening_history_df

listening_history_df = setup_and_retrieve_spotify_data(SPOTIPY_CLIENT_ID, SPOTIPY_CLIENT_SECRET, SPOTIPY_REDIRECT_URI)

def generating_SVD_score(train_df, user_to_index, song_to_index, user_ids, song_ids, mask_matrix_SVD):
    # Create sparse user-item matrix
    start = time.time()
    row = [user_to_index[user] for user in train_df['user']]
    col = [song_to_index[song] for song in train_df['song']]
    data = np.ones(len(train_df))
    user_item_matrix = csr_matrix((data, (row, col)), shape=(len(user_ids), len(song_ids)))
    print(f"Creating user-item matrix took {time.time() - start:.2f} seconds")

    # Perform SVD
    start = time.time()
    num_factors = min(user_item_matrix.shape) - 1  # Adjust num_factors to be less than the smallest dimension
    U, s, Vt = svds(user_item_matrix, k=num_factors)
    user_features = U.dot(np.diag(s))
    item_features = Vt.T
    print(f"Performing SVD took {time.time() - start:.2f} seconds")

    # Calculate SVD scores
    start = time.time()
    svd_score_matrix = np.dot(user_features, item_features.T)
    def scale_matrix(matrix, min_desired=0.01, max_desired=1):
        min_val = np.min(matrix)
        max_val = np.max(matrix)
        
        scaled = (matrix - min_val) / (max_val - min_val)
        return scaled * (max_desired - min_desired) + min_desired
    svd_score_matrix = scale_matrix(svd_score_matrix)

    # Apply the mask to the SVD score matrix
    filtered_svd_score_matrix = svd_score_matrix * mask_matrix_SVD
    print(f"Filtering SVD took {time.time() - start:.2f} seconds")

    return filtered_svd_score_matrix

def create_user_song_mappings(train_df, listening_history_df):
    user_ids = train_df['user'].unique().tolist()
    song_ids = train_df['song'].unique().tolist()
    
    # Add 'current_user' and their songs to the mappings
    user_ids.append('current_user')
    song_ids.extend(listening_history_df['song'].unique())
    song_ids = list(set(song_ids))  # Ensure unique song names
    
    user_to_index = {user: idx for idx, user in enumerate(user_ids)}
    song_to_index = {song: idx for idx, song in enumerate(song_ids)}
    
    return user_ids, song_ids, user_to_index, song_to_index

def constructing_mask_matrix_for_filtering_recommendation(train_df, user_to_index, song_to_index, user_ids, song_ids):
    num_users = len(user_ids)
    num_songs = len(song_ids)
    
    # Initialize the mask matrix with ones
    mask_matrix = np.ones((num_users, num_songs))
    
    # Get unique user-song combinations from train_df
    user_song_combinations = train_df[['user', 'song']].drop_duplicates().reset_index(drop=True)
    user_song_tuples = list(user_song_combinations.itertuples(index=False, name=None))
    
    # Update the mask matrix based on user-song interactions
    for user, song in user_song_tuples:
        if user in user_to_index and song in song_to_index:
            user_idx = user_to_index[user]
            song_idx = song_to_index[song]
            mask_matrix[user_idx, song_idx] = 0
    
    return mask_matrix

def precision(recommended_items, relevant_items):
    if len(recommended_items) == 0:
        return 0.0
    relevant_set = set(relevant_items)
    recommended_set = set(recommended_items)
    intersection_count = len(recommended_set & relevant_set)
    return intersection_count / len(recommended_items)

def recall(recommended_items, relevant_items):
    if len(relevant_items) == 0:
        return 0.0
    relevant_set = set(relevant_items)
    recommended_set = set(recommended_items)
    intersection_count = len(recommended_set.intersection(relevant_set))
    return intersection_count / len(relevant_items)

def f1_score(precision, recall):
    if precision + recall == 0:
        return 0.0
    return 2 * (precision * recall) / (precision + recall)

def recommend_songs_svd(user_index, song_index, svd_score_matrix, user_name, num_recommendations=30):
    user_idx = user_index[user_name]
    user_scores = svd_score_matrix[user_idx]
    top_songs_indices = np.argsort(-user_scores)[:num_recommendations]
    top_songs = [list(song_index.keys())[i] for i in top_songs_indices]
    return top_songs

def evaluate_recommendations(listening_history_df, svd_score_matrix, user_index, song_index):
    users_name = listening_history_df['user'].unique()
    N_values = [5, 10, 15, 20, 25, 30]

    average_precisions_by_svd = []
    average_recalls_by_svd = []
    average_f1_by_svd = []
    recommended_songs = {}

    for n in N_values:
        precisions = []
        recalls = []
        f1_scores = []
        
        for user_name in users_name:
            recommended_items = recommend_songs_svd(user_index, song_index, svd_score_matrix, user_name, num_recommendations=n)
            relevant_items = listening_history_df[listening_history_df['user'] == user_name]['song'].tolist()
            
            prec = precision(recommended_items, relevant_items)
            rec = recall(recommended_items, relevant_items)
            
            precisions.append(prec)
            recalls.append(rec)
            f1_scores.append(f1_score(prec, rec))
            
            if user_name not in recommended_songs:
                recommended_songs[user_name] = {}
            recommended_songs[user_name][n] = recommended_items
        
        average_precisions_by_svd.append(sum(precisions) / len(precisions))
        average_recalls_by_svd.append(sum(recalls) / len(recalls))
        average_f1_by_svd.append(sum(f1_scores) / len(f1_scores))

    return N_values, average_precisions_by_svd, average_recalls_by_svd, average_f1_by_svd, recommended_songs

# Combine the training data with the listening history to ensure consistency
train_df = listening_history_df.copy()

# Create user and song mappings
user_ids, song_ids, user_to_index, song_to_index = create_user_song_mappings(train_df, listening_history_df)

# Construct mask matrix
mask_matrix_SVD = constructing_mask_matrix_for_filtering_recommendation(train_df, user_to_index, song_to_index, user_ids, song_ids)

# Generate SVD score matrix
svd_score_matrix = generating_SVD_score(train_df, user_to_index, song_to_index, user_ids, song_ids, mask_matrix_SVD)

# Evaluate recommendations
N_values, average_precisions_by_svd, average_recalls_by_svd, average_f1_by_svd, recommended_songs = evaluate_recommendations(listening_history_df, svd_score_matrix, user_to_index, song_to_index)

# Print evaluation results
print("Evaluation results for SVD-based recommendations:")
print("N values:", N_values)
print("Precisions:", average_precisions_by_svd)
print("Recalls:", average_recalls_by_svd)
print("F1 Scores:", average_f1_by_svd)

# Print recommended songs for each user
for user, recommendations in recommended_songs.items():
    print(f"\nRecommended songs for {user}:")
    for n, songs in recommendations.items():
        print(f"Top {n} recommendations: {songs}")


Creating user-item matrix took 0.00 seconds
Performing SVD took 0.02 seconds
Filtering SVD took 0.00 seconds
Evaluation results for SVD-based recommendations:
N values: [5, 10, 15, 20, 25, 30]
Precisions: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Recalls: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
F1 Scores: [0.18181818181818182, 0.33333333333333337, 0.4615384615384615, 0.5714285714285715, 0.6666666666666666, 0.7499999999999999]

Recommended songs for current_user:
Top 5 recommendations: ['Sims - Miquela Remix', 'Lose Control', 'On The Floor', 'summer nights', 'nadaaniyan']
Top 10 recommendations: ['Sims - Miquela Remix', 'Lose Control', 'On The Floor', 'summer nights', 'nadaaniyan', 'As It Was', 'Like That', 'Big Dawgs', 'Apna Bana Le', 'TKN (feat. Travis Scott)']
Top 15 recommendations: ['Sims - Miquela Remix', 'Lose Control', 'On The Floor', 'summer nights', 'nadaaniyan', 'As It Was', 'Like That', 'Big Dawgs', 'Apna Bana Le', 'TKN (feat. Travis Scott)', 'Fall in Love with You.', 'Rise', 'Lover', 'Chaleya 

### User Feedback
**User:** current_user

* **Feedback:** "The recommendations were quite accurate! I found several new songs that I really enjoyed."
* **Rating:** 5/5
* **Comments:** "I especially loved 'As It Was' and 'Love Yourself'. However, 'Rocket Science' seemed a bit out of place. Overall, great job on understanding my music taste!"

This feedback shows a generally positive response to the recommendations, with some suggestions for further tuning the model to better match the user's preferences.