# Practical No 2: a) Implement the User-User collaborative filtering and suggest recommendation for the users.
# b) Calculate Prediction for a particular user by using method of Cosine similarity , Euclidean distance and Pearson correlation on user-user based collaborative filtering.

User-based filtering is a recommendation system technique that predicts a user's preferences based on the ratings of similar users.

### Approach:

Find Similar Users: Calculate similarity between the target user and other users using metrics like Pearson correlation or cosine similarity.
Identify Rated Items: Determine items rated by similar users but not by the target user.
Predict Ratings: Predict the target user's rating for these items based on the ratings of similar users.
Recommend Items: Recommend items with the highest predicted ratings.

Example:

If User A and User B have rated similar movies highly, and User A has rated a new movie highly, the system might recommend that movie to User B.

In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

In [7]:
movies = pd.read_csv('/content/drive/MyDrive/Recommendation System/movies.csv')
ratings = pd.read_csv('/content/drive/MyDrive/Recommendation System/ratings.csv')

In [8]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [10]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [11]:
user_item_matrix = ratings.pivot(index='userId', columns='movieId', values='rating')

In [12]:
user_item_matrix.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,,4.0,,,4.0,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,4.0,,,,,,,,,,...,,,,,,,,,,


In [13]:
user_item_matrix_filled = user_item_matrix.fillna(0)

In [14]:
user_item_matrix_filled.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,4.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [15]:
user_similarity = cosine_similarity(user_item_matrix_filled)

In [16]:
print(user_similarity)

[[1.         0.02728287 0.05972026 ... 0.29109737 0.09357193 0.14532081]
 [0.02728287 1.         0.         ... 0.04621095 0.0275654  0.10242675]
 [0.05972026 0.         1.         ... 0.02112846 0.         0.03211875]
 ...
 [0.29109737 0.04621095 0.02112846 ... 1.         0.12199271 0.32205486]
 [0.09357193 0.0275654  0.         ... 0.12199271 1.         0.05322546]
 [0.14532081 0.10242675 0.03211875 ... 0.32205486 0.05322546 1.        ]]


In [17]:
user_correlation = user_item_matrix_filled.corr(method='pearson')

In [18]:
from scipy.spatial.distance import pdist, squareform
user_distance = pdist(user_item_matrix_filled, metric='euclidean')

In [19]:
user_similarity_df = pd.DataFrame(user_similarity, index=user_item_matrix.index, columns=user_item_matrix.index)

In [20]:
user_correlation_df = pd.DataFrame(user_correlation, index=user_item_matrix.index, columns=user_item_matrix.index)
user_correlation_df = user_correlation_df.fillna(0)

In [21]:
user_distance_df = pd.DataFrame(squareform(user_distance), index=user_item_matrix.index, columns=user_item_matrix.index)

In [22]:
user_similarity_df.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.027283,0.05972,0.194395,0.12908,0.128152,0.158744,0.136968,0.064263,0.016875,...,0.080554,0.164455,0.221486,0.070669,0.153625,0.164191,0.269389,0.291097,0.093572,0.145321
2,0.027283,1.0,0.0,0.003726,0.016614,0.025333,0.027585,0.027257,0.0,0.067445,...,0.202671,0.016866,0.011997,0.0,0.0,0.028429,0.012948,0.046211,0.027565,0.102427
3,0.05972,0.0,1.0,0.002251,0.00502,0.003936,0.0,0.004941,0.0,0.0,...,0.005048,0.004892,0.024992,0.0,0.010694,0.012993,0.019247,0.021128,0.0,0.032119
4,0.194395,0.003726,0.002251,1.0,0.128659,0.088491,0.11512,0.062969,0.011361,0.031163,...,0.085938,0.128273,0.307973,0.052985,0.084584,0.200395,0.131746,0.149858,0.032198,0.107683
5,0.12908,0.016614,0.00502,0.128659,1.0,0.300349,0.108342,0.429075,0.0,0.030611,...,0.068048,0.418747,0.110148,0.258773,0.148758,0.106435,0.152866,0.135535,0.261232,0.060792


In [23]:
user_correlation_df.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.231327,0.173213,-0.028917,0.192474,0.192686,0.143743,0.085477,0.177245,0.183382,...,0.0,0.05476,0.0,0.0,0.094994,0.05583,0.0,0.242771,0.171075,0.120681
2,0.231327,1.0,0.191945,0.071269,0.200526,0.158341,0.127569,0.14154,-0.021045,0.285086,...,0.0,-0.018291,0.0,0.0,0.017131,0.180565,0.0,0.11284,0.152304,0.153677
3,0.173213,0.191945,1.0,0.067143,0.370171,0.196442,0.351513,0.296897,0.275812,0.136916,...,0.0,-0.011729,0.0,0.0,0.112612,0.203605,0.0,0.109956,0.209855,0.189479
4,-0.028917,0.071269,0.067143,1.0,0.16791,0.053755,0.258075,0.148726,-0.016025,0.056,...,0.0,-0.004138,0.0,0.0,-0.012657,0.21578,0.0,0.095543,0.120668,-0.02356
5,0.192474,0.200526,0.370171,0.16791,1.0,0.215503,0.42989,0.265777,0.308085,0.110833,...,0.0,-0.011456,0.0,0.0,0.071481,0.158835,0.0,0.065282,0.215552,0.090684


In [24]:
user_distance_df.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,70.436141,69.336138,78.797208,68.985506,86.752522,74.161985,68.88396,70.192592,78.574805,...,77.993589,73.082146,118.460964,74.141756,77.286804,131.552271,74.020267,99.698295,68.702256,143.376253
2,70.436141,0.0,29.457597,59.692964,32.806249,66.777616,47.662879,32.927952,32.128648,45.241021,...,45.565886,46.067885,115.192231,41.668333,53.80985,125.760685,57.395557,96.997423,29.141894,136.139634
3,69.336138,29.457597,0.0,59.114296,31.882597,66.682082,47.439435,32.19472,30.975797,45.765708,...,48.862051,45.513734,114.640743,40.786027,52.93156,125.825276,56.661274,97.194393,28.293109,137.452719
4,78.797208,59.692964,59.114296,0.0,58.034473,80.826976,66.355105,59.732738,60.282667,68.234888,...,68.359345,64.791975,109.863552,64.482556,71.156518,125.785532,71.916618,103.134621,58.591808,141.893446
5,68.985506,32.806249,31.882597,58.034473,0.0,61.049161,47.370877,26.907248,34.438351,47.518417,...,49.709154,38.052595,113.393121,37.815341,51.800097,124.34227,55.407581,95.429293,27.658633,137.403603


In [25]:
user_predicted_ratings_1 = pd.DataFrame(index=user_item_matrix.index, columns=user_item_matrix.columns)
user_predicted_ratings_2 = pd.DataFrame(index=user_item_matrix.index, columns=user_item_matrix.columns)
user_predicted_ratings_3 = pd.DataFrame(index=user_item_matrix.index, columns=user_item_matrix.columns)

In [26]:
for user in user_item_matrix.index:
    sim_scores = user_similarity_df[user]
    weighted_sum = sim_scores.values @ user_item_matrix_filled
    sim_sum = np.abs(sim_scores).sum()
    user_predicted_ratings_1.loc[user] = weighted_sum / sim_sum

In [28]:
user_predicted_ratings_1.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.808175,0.831877,0.423003,0.027315,0.271766,1.026076,0.317491,0.046305,0.080917,1.053528,...,0.000247,0.000212,0.000282,0.000282,0.000247,0.000282,0.000247,0.000247,0.000247,0.002937
2,1.365,0.620288,0.15023,0.009878,0.154428,0.614357,0.122923,0.025785,0.025225,0.623624,...,0.012876,0.011037,0.014716,0.014716,0.012876,0.014716,0.012876,0.012876,0.012876,0.024795
3,1.584153,0.796827,0.452255,0.022544,0.226689,1.143088,0.317018,0.059864,0.074613,0.941106,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.768037,0.747967,0.349428,0.03166,0.262678,0.923377,0.351348,0.040625,0.062458,0.929745,...,0.000597,0.000512,0.000682,0.000682,0.000597,0.000682,0.000597,0.000597,0.000597,0.004797
5,1.74818,0.910214,0.371129,0.059913,0.368868,0.869849,0.425598,0.068568,0.078188,1.216,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003591


In [29]:
for user in user_item_matrix.index:
    sim_scores = user_correlation_df[user]
    weighted_sum = sim_scores.values @ user_item_matrix_filled
    sim_sum = np.abs(sim_scores).sum()
    if sim_sum > 0:
        user_predicted_ratings_2.loc[user] = weighted_sum / sim_sum
    else:
        user_predicted_ratings_2.loc[user] = 0

In [30]:
user_predicted_ratings_2.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.32743,0.576657,0.400538,0.02836,0.193142,0.783765,0.275632,0.031117,0.071235,0.699074,...,0.004773,0.004091,0.005455,0.005455,0.004773,0.005455,0.004773,0.004773,0.004773,0.001555
2,1.332922,0.555271,0.328859,0.012324,0.197977,0.662324,0.255308,0.028262,0.097404,0.693928,...,0.003521,0.003018,0.004024,0.004024,0.003521,0.004024,0.003521,0.003521,0.003521,-0.00165
3,1.341055,0.566074,0.300964,0.016746,0.196545,0.650674,0.247816,0.050625,0.079413,0.683748,...,-0.000864,-0.000741,-0.000987,-0.000987,-0.000864,-0.000987,-0.000864,-0.000864,-0.000864,-0.000997
4,1.357986,0.586288,0.251771,0.009451,0.206963,0.575975,0.244019,0.031773,0.080707,0.74,...,-0.000357,-0.000306,-0.000408,-0.000408,-0.000357,-0.000408,-0.000357,-0.000357,-0.000357,0.022432
5,1.45674,0.608162,0.308476,0.026141,0.21245,0.693792,0.283012,0.044369,0.078662,0.666353,...,0.004322,0.003705,0.00494,0.00494,0.004322,0.00494,0.004322,0.004322,0.004322,0.006019


In [None]:
for user in user_item_matrix.index:
    sim_scores = user_distance_df[user]
    weighted_sum = sim_scores.values @ user_item_matrix_filled
    sim_sum = np.abs(sim_scores).sum()
    if sim_sum > 0:
        user_predicted_ratings_3.loc[user] = weighted_sum / sim_sum
    else:
        user_predicted_ratings_3.loc[user] = 0

In [None]:
def recommend_items_cosine_similarity(user_id, num_recommendations=10):
    user_ratings = user_predicted_ratings_1.loc[user_id].sort_values(ascending=False)
    already_rated = user_item_matrix.loc[user_id].dropna().index
    recommendations = user_ratings.drop(already_rated)
    return recommendations.head(num_recommendations)

In [None]:
def recommend_items_correlation(user_id, num_recommendations=10):
    user_ratings = user_predicted_ratings_2.loc[user_id].sort_values(ascending=False)
    already_rated = user_item_matrix.loc[user_id].dropna().index
    recommendations = user_ratings.drop(already_rated)
    return recommendations.head(num_recommendations)

In [None]:
def recommend_items_distance(user_id, num_recommendations=10):
    user_ratings = user_predicted_ratings_3.loc[user_id].sort_values(ascending=False)
    already_rated = user_item_matrix.loc[user_id].dropna().index
    recommendations = user_ratings.drop(already_rated)
    return recommendations.head(num_recommendations)

In [None]:
recommend_items_cosine_similarity(1, 5)

Unnamed: 0_level_0,1
movieId,Unnamed: 1_level_1
318,2.622414
589,2.06192
858,1.836914
2762,1.643315
4993,1.605043


In [None]:
recommend_items_correlation(1, 5)

Unnamed: 0_level_0,1
movieId,Unnamed: 1_level_1
318,2.182474
589,1.411039
858,1.293493
4993,1.242883
5952,1.224389


In [None]:
recommend_items_distance(1, 5)

Unnamed: 0_level_0,1
movieId,Unnamed: 1_level_1
318,2.386846
589,1.548258
858,1.482452
4993,1.464515
7153,1.371832


# Conclusion

User-based collaborative filtering is a popular technique for recommending items to users based on their similarity to other users. By leveraging similarity metrics like cosine similarity, Pearson correlation, or Euclidean distance, we can effectively predict user preferences and provide personalized recommendations. However, this technique can be computationally expensive, especially for large datasets.