# **1. Song Recommendations :**

• How similar are the music tastes of users user-1 and user-2?

• How similar are the music tastes of users user-1 and user-3?


In [1]:
import numpy as np
import pandas as pd
from scipy.stats import pearsonr
from sklearn.metrics.pairwise import cosine_similarity


In [2]:
num_users = 5
num_items = 8

utility = pd.DataFrame(np.nan, 
index=[f"user-{i}" for i in range(1, num_users+1)], 
columns=[f"song-{j}" for j in range(1, num_items+1)])

artists = {
	"artist-1": ["song-1", "song-2", "song-3"],
	"artist-2": ["song-4"],
	"artist-3": ["song-5", "song-6"],
	"artist-4": ["song-7", "song-8"]
}

user_likes = """
user-1 song-1 5
user-1 song-4 4
user-1 song-5 1
user-1 song-6 1

user-2 song-2 1
user-2 song-6 5
user-2 song-7 4
user-2 song-8 2

user-3 song-3 2
user-3 song-4 5
user-3 song-6 2

user-4 song-1 2
user-4 song-5 5
user-4 song-2 2

user-5 song-7 1
user-5 song-2 5
user-5 song-3 3
user-5 song-4 5
"""
for user_id, song, rating in [line.split(" ") for line in filter(lambda line: line.strip() != "",user_likes.strip().split("\n"))]:
 rating = float(rating)
 utility.at[user_id, song] = rating

utility_original = utility.copy()


To measure the similarity between the music tastes of any users, we can use various similarity metrics, such as the cosine similarity or the Pearson correlation coefficient.

Cosine similarity is a measure of the similarity between two vectors. In this case, the vectors represent the ratings given by user 1 and user 2 to different songs. The cosine similarity ranges from -1 to 1, where a value of 1 indicates that the vectors are identical, 0 indicates no similarity, and -1 indicates complete dissimilarity.



In [31]:
# Fill NaN values with zeros
utility_filled = utility.fillna(0)


# Extract the rows corresponding to every user.
user_1 = utility_filled.loc["user-1"].values.reshape(1, -1)
user_2 = utility_filled.loc["user-2"].values.reshape(1, -1)
user_3 = utility_filled.loc["user-3"].values.reshape(1, -1)
user_4 = utility_filled.loc["user-4"].values.reshape(1, -1)
user_5 = utility_filled.loc["user-5"].values.reshape(1, -1)


# Calculate the cosine similarity between user 1 and user 2
similarity12 = cosine_similarity(user_1, user_2)

# Calculate the cosine similarity between user 1 and user 3
similarity13 = cosine_similarity(user_1, user_3)

print(f"The cosine similarity between user 1 and user 2 is: {similarity12[0,0]}")
print(f"The cosine similarity between user 1 and user 3 is: {similarity13[0,0]}")

The cosine similarity between user 1 and user 2 is: 0.11242343760332196
The cosine similarity between user 1 and user 3 is: 0.5840250605220888




A cosine similarity value of 0.11242343760332196 suggests a relatively low similarity between the music tastes of user 1 and user 2. It means that the two users have somewhat different preferences when it comes to the songs they rated. The closer the cosine similarity value is to 1, the more similar their music tastes are, while a value closer to 0 indicates a lower similarity. 

In the case of user 1 and 3 the value of 0.5840250605220888 for cosine similarity suggests a moderate level of similarity between the music tastes of user-1 and user-3.








• Will user user-5 like the song song-6?

• Will user user-5 like the song song-1?

We can solve this by Collaborative Filtering:

This techniques analyze the patterns of ratings from multiple users to make recommendations. In the context of collaborative filtering for rating predictions, the goal is to predict how a user would rate an item (e.g., a song) based on the ratings given by other users who have similar preferences. By finding users who have similar rating patterns to user-5 for other songs, we can predict user-5's preference for "song-6" and "song-1"

In [34]:
# Get the ratings of user-5
user5_ratings = utility_filled.loc['user-5'].values.reshape(1, -1)

# Calculate cosine similarity between user-5 and other users
similarities = cosine_similarity(user5_ratings, utility_filled.values)

# Find the most similar user to user-5
most_similar_user_idx = np.argmax(similarities)

# Choose a threshold 
threshold = 0.7  #Here we set threshold 0.7 so it will consider similar user with similarity above 0.7 so we get accurate result
similar_users_indices = np.where(similarities > threshold)[1]

# Get the ratings of similar users for "song-6"
song6_ratings = utility_filled.iloc[similar_users_indices]['song-6']

# Predict user-5's liking for "song-6" (average rating of similar users)
predicted_rating56 = np.mean(song6_ratings)

print(f"The predicted rating of 'song-6' for user-5 is: {predicted_rating56}")

# Get the ratings of similar users for "song-1"
song1_ratings = utility_filled.iloc[similar_users_indices]['song-1']

# Predict user-5's liking for "song-1" (average rating of similar users)
predicted_rating51 = np.mean(song6_ratings)

print(f"The predicted rating of 'song-1' for user-5 is: {predicted_rating51}")

The predicted rating of 'song-6' for user-5 is: 0.0
The predicted rating of 'song-1' for user-5 is: 0.0


The code returns the ratings of 0.0, that is not good rating on a scale of 5.



**So we say that user-5 will not like the song-6 and song-1**

# **2. UV Decomposition:**

We use the give code, this code generates a randomly populated utility matrix with ratings for songs given by users. The ratings are generated based on a biased distribution, simulating the skewed nature of human ratings.

(a) Perform incremental UV decomposition on the utility matrix given
below. Pick a dimensionality d that seems sensible to you.

In [38]:
import random

num_users = 100
num_items = 300

# generate ratings for at least 15% of all songs but no more than 75%
minmax_ratings = [int(num_items * 0.15), int(num_items * 0.75)]
rating_range = [1, 5]

# generate utility table
users = [f"user-{i}" for i in range(1, num_users + 1)]
songs = [f"song-{j}" for j in range(1, num_items + 1)]
utility = pd.DataFrame(np.nan, index=users, columns=songs)

possible_ratings = [r for r in range(rating_range[0], rating_range[1] + 1)]
num_possible_ratings = len(possible_ratings)

# human ratings are often skewed to the extreme choices (e.g. 1 star/5 star reviews)
# let's reflect this by generating rankings that have a similar artificial bias
rating_distribution = [np.max([0.1, np.abs(((i + 0.5) - (num_possible_ratings / 2)) / num_possible_ratings)])
                       for i in range(num_possible_ratings)]
rating_distribution = rating_distribution / np.max(rating_distribution)
rating_distribution = rating_distribution / np.sum(rating_distribution)
print("possible ratings:", possible_ratings)
print("distribution:", rating_distribution, np.sum(rating_distribution))


def generate_rating():
  # unbiased version
  # return np.random.randint(rating_range[0], rating_range[1]+1)
  return np.random.choice(possible_ratings, 1, p=rating_distribution)

# generate random ratings
for user in users:
    num_ratings = np.random.randint(minmax_ratings[0], minmax_ratings[1] + 1)
    rated_songs = random.sample(songs, num_ratings)
    ratings = [generate_rating() for _ in range(len(rated_songs))]
    #print(user_id, rating, rated_songs, ratings)
    for song, rating in zip(rated_songs, ratings):
        utility.at[user, song] = rating

# the following can be used to check the rating distribution:
allratings = np.array(utility.to_numpy().tolist())
allratings = allratings[~np.isnan(allratings)]
for rating, freq in zip(*np.unique(allratings, return_counts=True)):
    print("rating:", rating, "freq:", freq)


possible ratings: [1, 2, 3, 4, 5]
distribution: [0.30769231 0.15384615 0.07692308 0.15384615 0.30769231] 1.0
rating: 1.0 freq: 4557
rating: 2.0 freq: 2199
rating: 3.0 freq: 1179
rating: 4.0 freq: 2156
rating: 5.0 freq: 4606


Now performing the incremental UV decomposition here we choose d = 10

In [42]:
from scipy.sparse import csc_matrix
from sklearn.decomposition import TruncatedSVD
from sklearn.impute import SimpleImputer

# Convert the utility matrix to a sparse matrix
utility_sparse = csc_matrix(utility)

# Set the dimensionality value
d = 10

# Impute missing values with mean (you can change the strategy if desired)
imputer = SimpleImputer(strategy='mean')
utility_imputed = imputer.fit_transform(utility_sparse)

# Perform incremental UV decomposition on the imputed data
svd = TruncatedSVD(n_components=d)
U = svd.fit_transform(utility_imputed)
V = svd.components_

# Reconstruct the utility matrix
utility_reconstructed = U @ V

# Print the reconstructed utility matrix
print("Reconstructed Utility Matrix:")
print(utility_reconstructed)


Reconstructed Utility Matrix:
[[3.27425497 3.36731067 3.1219536  ... 2.85110653 2.67805618 2.97143675]
 [3.58218625 4.39527472 2.93143772 ... 2.50414659 2.84628897 2.59062136]
 [3.32464536 3.63915952 2.71935225 ... 2.69980913 2.81073422 2.9393952 ]
 ...
 [3.75823776 4.58160833 2.33997724 ... 1.54491036 3.35896879 2.61318204]
 [3.31957454 2.87600835 3.43770864 ... 2.54377583 3.83938066 2.59839026]
 [3.69463745 2.00883    3.09796804 ... 2.31242642 2.14216664 2.37867252]]


(b) Explain briefly how the results help you with making recommendations to your users.

The results provide insights into user preferences and item recommendations and help make recommendations in below stated ways.


1. User Preferences: The UV decomposition analyzes the utility matrix to uncover underlying patterns and latent factors that contribute to user preferences. It decomposes the matrix into user and item latent feature matrices, which represent user preferences and item characteristics, respectively. These latent factors capture the underlying structure of the data and help in understanding user preferences in a more meaningful way.

2. Dimensionality Reduction: The choice of dimensionality, denoted by 'd', determines the number of latent factors retained in the decomposition. By reducing the dimensionality, irrelevant or noisy information is discarded, leading to a more compact representation of user preferences and item characteristics. The ideal value of 'd' depends on the specific dataset and can be determined through experimentation or evaluation metrics.

3. Recommendations: The decomposed user and item matrices can be used to generate recommendations. For a given user, the algorithm identifies the latent features associated with the user's preferences and finds items that have similar latent feature representations. These similar items can then be recommended to the user, as they are likely to align with their interests based on the underlying patterns discovered through UV decomposition.

4. Incremental Updates: this technique allows for efficient updates to the user and item matrices when new data becomes available. This enables real-time or dynamic recommendations by incorporating new user ratings or item information without having to recompute the entire decomposition. The incremental approach ensures that the recommendations stay up to date and reflect the latest user preferences.

Overall, the results of incremental UV decomposition provide valuable information about user preferences, help in dimensionality reduction, and enable personalized recommendations based on the discovered latent factors.