<a href="https://colab.research.google.com/github/arafatro/Recommender-Sys/blob/main/02_Memory-Based_Collaborative_Filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 2: Memory-Based Collaborative Filtering

**Overview:**
- Implement user-user and item-item collaborative filtering using cosine similarity.
- Optimize calculations using sparse matrices.
- Discuss limitations and appropriate use cases for memory-based methods.

In this lab, we use the MovieLens dataset to illustrate memory-based collaborative filtering. We'll load the dataset, construct a sparse rating matrix, compute cosine similarities for both users and items, and finally discuss the limitations of these methods.


In [25]:
import warnings
warnings.simplefilter(action='ignore')

## Data Loading

We will load the MovieLens ratings and movies datasets from an online source. These datasets are commonly used for recommender system experiments.


In [26]:
# Load the MovieLens Datasets
ratings_url = "https://s3-us-west-2.amazonaws.com/recommender-tutorial/ratings.csv"
movies_url = "https://s3-us-west-2.amazonaws.com/recommender-tutorial/movies.csv"

ratings = pd.read_csv(ratings_url)
movies = pd.read_csv(movies_url)

print("Ratings (first 5 rows):")
display(ratings.head())

print("Movies (first 5 rows):")
display(movies.head())


Ratings (first 5 rows):


Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


Movies (first 5 rows):


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## Constructing a Sparse Rating Matrix

Memory-based methods benefit greatly from representing the user-item rating matrix in a sparse format to save memory and computation.


In [27]:
# Create a Sparse Rating Matrix
# Map userId and movieId to continuous indices
N = ratings['userId'].nunique()
M = ratings['movieId'].nunique()

user_mapper = {user: idx for idx, user in enumerate(ratings["userId"].unique())}
movie_mapper = {movie: idx for idx, movie in enumerate(ratings["movieId"].unique())}

user_index = [user_mapper[i] for i in ratings['userId']]
movie_index = [movie_mapper[i] for i in ratings['movieId']]

# Build a sparse matrix where rows represent movies and columns represent users
X = csr_matrix((ratings["rating"], (movie_index, user_index)), shape=(M, N))

# For demonstration, we convert the sparse matrix to a dense DataFrame (not recommended for large datasets)
X_df = pd.DataFrame(X.toarray())
print("User-Movie Ratings Matrix (first 5 rows):")
display(X_df.head())


User-Movie Ratings Matrix (first 5 rows):


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,600,601,602,603,604,605,606,607,608,609
0,4.0,0.0,0.0,0.0,4.0,0.0,4.5,0.0,0.0,0.0,...,4.0,0.0,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
1,4.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
2,4.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,3.0,4.0,3.0,0.0,0.0,0.0,0.0,0.0,5.0
3,5.0,0.0,0.0,2.0,0.0,4.0,0.0,4.0,0.0,0.0,...,4.0,5.0,0.0,0.0,0.0,3.0,0.0,4.5,0.0,5.0
4,5.0,0.0,0.0,0.0,4.0,1.0,4.5,5.0,0.0,0.0,...,5.0,5.0,0.0,0.0,0.0,4.5,0.0,4.5,0.0,4.0


## User-User and Item-Item Collaborative Filtering

We now compute cosine similarity for both users and items.


In [28]:
# Compute Cosine Similarity for Users and Items

# --- User-User Collaborative Filtering ---
# Transpose the rating matrix so that rows represent users
user_matrix = X.T  # shape: (number of users, number of movies)
user_similarity = cosine_similarity(user_matrix)
print("User-User Similarity Matrix shape:", user_similarity.shape)

# Display a portion of the user similarity matrix
print("User-User Similarity (first 5 users):")
display(pd.DataFrame(user_similarity).iloc[:5, :5])

User-User Similarity Matrix shape: (610, 610)
User-User Similarity (first 5 users):


Unnamed: 0,0,1,2,3,4
0,1.0,0.027283,0.05972,0.194395,0.12908
1,0.027283,1.0,0.0,0.003726,0.016614
2,0.05972,0.0,1.0,0.002251,0.00502
3,0.194395,0.003726,0.002251,1.0,0.128659
4,0.12908,0.016614,0.00502,0.128659,1.0


In [29]:
# --- Item-Item Collaborative Filtering ---
item_similarity = cosine_similarity(X)
print("Item-Item Similarity Matrix shape:", item_similarity.shape)

Item-Item Similarity Matrix shape: (9724, 9724)


In [30]:
# Display a portion of the item similarity matrix
print("Item-Item Similarity (first 5 movies):")
display(pd.DataFrame(item_similarity).iloc[:5, :5])

Item-Item Similarity (first 5 movies):


Unnamed: 0,0,1,2,3,4
0,1.0,0.296917,0.376316,0.437659,0.441003
1,0.296917,1.0,0.284257,0.279614,0.204892
2,0.376316,0.284257,1.0,0.463926,0.45685
3,0.437659,0.279614,0.463926,1.0,0.578066
4,0.441003,0.204892,0.45685,0.578066,1.0


## Finding Similar Items

We define a function that returns the IDs of movies similar to a given movie based on cosine similarity.


In [31]:
# Function to Find Similar Movies
from sklearn.neighbors import NearestNeighbors

# Create an inverse mapping for movie indices
movie_inv_mapper = {idx: movie for idx, movie in enumerate(ratings["movieId"].unique())}

def find_similar_movies(movie_id, X, k=10):
    """
    Given a movie_id, find k similar movies based on cosine similarity.
    """
    neighbour_ids = []
    movie_ind = movie_mapper[movie_id]
    movie_vec = X[movie_ind]

    # Use NearestNeighbors with cosine similarity
    kNN = NearestNeighbors(n_neighbors=k+1, algorithm="brute", metric="cosine")
    kNN.fit(X)
    movie_vec = movie_vec.reshape(1, -1)
    neighbor = kNN.kneighbors(movie_vec, return_distance=False)

    # Exclude the movie itself (first neighbor)
    for i in range(1, k+1):
        n = neighbor.item(i)
        neighbour_ids.append(movie_inv_mapper[n])
    return neighbour_ids

# Map movieId to movie title for easy reference
movie_titles = dict(zip(movies['movieId'], movies['title']))

# Example: Find similar movies for a specific movie (e.g., movie_id 586)
selected_movie_id = 586
selected_movie_title = movie_titles[selected_movie_id]
print(f"Since you watched '{selected_movie_title}', you might also like:")

similar_ids = find_similar_movies(selected_movie_id, X, k=10)
for mid in similar_ids:
    print(movie_titles[mid])


Since you watched 'Home Alone (1990)', you might also like:
Mrs. Doubtfire (1993)
Lion King, The (1994)
Pretty Woman (1990)
Jurassic Park (1993)
Jumanji (1995)
Speed (1994)
Forrest Gump (1994)
Aladdin (1992)
Mask, The (1994)
Indiana Jones and the Temple of Doom (1984)


## Discussion on Limitations and When to Use Memory-Based Methods

**Limitations:**
- **Scalability:** Memory-based methods can be computationally expensive on large datasets due to the need to compute and store similarity matrices.
- **Sparsity:** These methods may suffer from data sparsity, leading to unreliable similarity measures if there is insufficient overlap between user/item interactions.
- **Cold Start:** New users or items with few interactions may not have reliable similarity scores.
- **Noise Sensitivity:** Noisy data (e.g., inconsistent ratings) can impact the accuracy of similarity calculations.

**When to Use Memory-Based Methods:**
- **Smaller Datasets:** They work well when the dataset is of moderate size and sparsity is less of an issue.
- **Real-Time Recommendations:** Memory-based approaches can provide fast, on-the-fly recommendations in environments with relatively stable user-item interactions.
- **Baseline Models:** They serve as strong baseline models to compare more complex, model-based recommendation techniques.

By understanding these limitations and use cases, you can better decide when to apply memory-based collaborative filtering and when to consider more scalable or hybrid approaches.


## Summary

In this lab session, we:
- Loaded and preprocessed the MovieLens datasets.
- Constructed a user-movie sparse matrix for collaborative filtering.
- Computed cosine similarity for both user-user and item-item collaborative filtering.
- Implemented a function to find similar movies.
- Discussed the limitations of memory-based methods and appropriate use cases.

Next Lab Preview: Lab 3 will cover Matrix Factorization techniques such as SVD and ALS for building recommendation systems.

Happy coding!
