**Vector Quantization using K-means clustering**

Write a Python code that takes the following set of input vectors and performs k-means clustering to quantize these vectors. Print the quantized vectors and the codebook obtained from the clustering.


In [None]:
#Write Your Code Here.

# Parameters
np.random.seed(0)
num_vectors = 100
vector_dim = 3
num_clusters = 10

import numpy as np

def initialize_centroids(data, k):                            #Randomly initialize k centroids from the data points.

    indices = np.random.choice(len(data), k, replace=False)
    centroids = data[indices]
    return centroids

def assign_clusters(data, centroids):

    distances = np.sqrt(((data - centroids[:, np.newaxis])**2).sum(axis=2))   #Assign each data point to the closest centroid.
    return np.argmin(distances, axis=0)

def update_centroids(data, clusters, k):

    centroids = np.zeros((k, data.shape[1]))    #    Update centroids based on the mean of the data points assigned to each cluster.
    for i in range(k):
        centroids[i] = np.mean(data[clusters == i], axis=0)
    return centroids

def kmeans_clust(data, k, max_iterations=100):

    centroids = initialize_centroids(data, k)     #Perform k-means clustering on the given data.
    for _ in range(max_iterations):
        clusters = assign_clusters(data, centroids)
        new_centroids = update_centroids(data, clusters, k)
        # Check for convergence
        if np.all(centroids == new_centroids):
            break
        centroids = new_centroids
    return clusters, centroids

def print_clusters(data, clusters):


    clusters_dict = {}                             #Print all the final clusters along with the quantized vectors and codebook.
    for i, cluster_id in enumerate(clusters):
        if cluster_id not in clusters_dict:
            clusters_dict[cluster_id] = []
        clusters_dict[cluster_id].append(data[i])

    for cluster_id, cluster_data in clusters_dict.items():
        print(f"Cluster {cluster_id}:")
        print(cluster_data)



input_vectors = np.random.rand(num_vectors, vector_dim)

# KMeans clustering
quantized_vectors, codebook = kmeans_clust(input_vectors, num_clusters)

print("Quantized Vectors:")
print(quantized_vectors)
print("\nCodebook:")
print(codebook)

# Print all the final clusters
print("\nFinal Clusters:")
print_clusters(input_vectors, quantized_vectors)


Quantized Vectors:
[2 8 2 0 6 9 2 2 6 5 0 3 2 8 3 7 9 6 9 1 9 1 4 0 2 1 1 3 8 0 6 7 3 6 5 2 0
 5 4 2 2 8 9 0 9 2 2 9 5 2 6 3 5 0 2 7 8 1 7 5 0 0 2 1 0 7 1 7 9 2 4 9 9 2
 6 5 5 1 1 8 4 0 7 3 2 0 7 9 5 8 5 4 2 4 7 0 6 6 7 2]

Codebook:
[[0.18252105 0.77980802 0.54631391]
 [0.17465519 0.33777202 0.22125829]
 [0.70047601 0.68410311 0.82884288]
 [0.56174282 0.10711717 0.76570909]
 [0.88747395 0.22545963 0.7829613 ]
 [0.86154398 0.72409221 0.25894395]
 [0.27838521 0.78076462 0.18053349]
 [0.68798218 0.20535751 0.22821621]
 [0.60246733 0.28366013 0.54371655]
 [0.18190412 0.19810273 0.70817584]]

Final Clusters:
Cluster 2:
[array([0.5488135 , 0.71518937, 0.60276338]), array([0.43758721, 0.891773  , 0.96366276]), array([0.77815675, 0.87001215, 0.97861834]), array([0.79915856, 0.46147936, 0.78052918]), array([0.61209572, 0.616934  , 0.94374808]), array([0.97676109, 0.60484552, 0.73926358]), array([0.57615733, 0.59204193, 0.57225191]), array([0.58127287, 0.88173536, 0.69253159]), array([0.7252542

We know that recently, there are a lot of online user-centric applications such as movie recommender
systems where users are recommended with further movies to watch. The challenge lies in finding
and recommending many suitable movies that would be liked and selected by users. There are many
techniques used for this task and SVD is one of those techniques.

**Problem:** Take any movie recommendation dataset (e.g. [MovieLens dataset](https://docs.google.com/spreadsheets/d/1DSYeRhZ_v2MZAXAMEn6kcaN9Hx4i6M1o/edit?usp=sharing&ouid=1127337976817363022rtpof=true&sd=true)) or you can use any other dataset of your choice).
1. Use the SVD algorithm and write a python code to design a movie recommender system.
2. Given a new user, devise a mechanism to recommend movies to that user.

In [None]:
#Write Your Code Here.

In [None]:
import numpy as np
import pandas as pd

# Load ratings and movies data
ratings_df = pd.read_csv("movies.xlsx - ratings.csv")
movies_df = pd.read_csv("movies.xlsx - movie.csv")

# Drop unnecessary columns
ratings_df = ratings_df.drop(['timestamp'], axis=1)
movies_df = movies_df.drop(['genres'], axis=1)

# Merge ratings and movies data
merged_df = pd.merge(ratings_df, movies_df, on='movieId')

# Create a mapping of movieId to title
title_mapping = dict(zip(movies_df['movieId'], movies_df['title']))

# Create user-item matrix
user_item_matrix = merged_df.pivot_table(index='userId', columns='movieId', values='rating', fill_value=0)

# Perform Singular Value Decomposition
U, Sigma, Vt = np.linalg.svd(user_item_matrix, full_matrices=False)

# Reduce dimensions
k = 200
U_k = U[:, :k]
Sigma_k = np.diag(Sigma[:k])
Vt_k = Vt[:k, :]

# Reconstruct user-item matrix
user_item_matrix_reconstructed = np.dot(np.dot(U_k, Sigma_k), Vt_k)

# Create DataFrame for reconstructed matrix
reconstructed_df = pd.DataFrame(user_item_matrix_reconstructed, index=np.unique(merged_df['userId']), columns=user_item_matrix.columns)

# Get movie IDs
movie_ids = np.array(user_item_matrix.columns)

# Prompt for user ID or use default
# user_id = int(input('Enter User ID : '))
user_id = 133

# Get predicted ratings for the user
pred_user_ratings = reconstructed_df.iloc[user_id - 1, :]
top_recommendations = np.argsort(pred_user_ratings)[::-1]

# Print top 5 movie recommendations
print('Top 5 Movie Recommendations:')
for movie_id in top_recommendations[:5]:
    print(f'Movie ID: {movie_ids[movie_id]}, Title: {title_mapping[movie_ids[movie_id]]}, Predicted Rating: {user_item_matrix_reconstructed[user_id - 1][movie_id]}')

# Display head of merged data
merged_df.head()


Top 5 Movie Recommendations:
Movie ID: 4973, Title: Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001), Predicted Rating: 5.285018019756502
Movie ID: 2329, Title: American History X (1998), Predicted Rating: 5.270498238897789
Movie ID: 1213, Title: Goodfellas (1990), Predicted Rating: 5.045562992961699
Movie ID: 293, Title: Léon: The Professional (a.k.a. The Professional) (Léon) (1994), Predicted Rating: 5.031065745599547
Movie ID: 2858, Title: American Beauty (1999), Predicted Rating: 4.970955834723858


Unnamed: 0,userId,movieId,rating,title
0,1,2,3.5,Jumanji (1995)
1,5,2,3.0,Jumanji (1995)
2,13,2,3.0,Jumanji (1995)
3,29,2,3.0,Jumanji (1995)
4,34,2,3.0,Jumanji (1995)
