# User-User Collaborative Filtering

Simple User-User Collaborative Filtering recommendation system using the Pearson Correlation Coefficient to find similar users.

## Pearson Correlation Coefficient
A measure of the linear relatioinship between two variables. It is a number between -1 and 1 that indicates the extent to which two variables change together. **The Formula:** <br>
$
r = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2 \sum_{i=1}^{n} (Y_i - \bar{Y})^2}}
$<br>
- where $\bar{X}$ and $\bar{Y}$ are the means of $X$ and $Y$<br>
- $X_i$ and $Y_i$ are individual data points in $X$ and $Y$

In [19]:
import pandas as pd
import numpy as np

In [20]:
user_ratings = pd.DataFrame({
    'User': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank'],
    'Star Wars': [5, 4, np.nan, 1, np.nan, 2],
    'The Matrix': [3, np.nan, 2, np.nan, 4, 5],
    'Avatar': [2, 1, 5, 4, 3, np.nan],
    'Inception': [np.nan, 5, 4, 3, 2, 1],
    'The Avengers': [4, 3, np.nan, 2, 1, np.nan]
})

# Set 'User' as the index
user_ratings.set_index('User', inplace=True)

In [21]:
# Transpose the DataFrame
user_ratings_T = user_ratings.transpose()

# Compute Pearson Correlation between users
correlation_matrix = user_ratings_T.corr(method='pearson')

# Get similarity of all users to 'Alice'
alice_similarity = correlation_matrix['Alice']

# Sort by similarity
alice_similarity = alice_similarity.sort_values(ascending=False)

In [22]:
# Find movies that Alice hasn't rated yet
alice_unrated_movies = user_ratings.columns[user_ratings.loc['Alice'].isna()].tolist()

# Recommend movies based on what similar users have liked
recommended_movies = {}
for movie in alice_unrated_movies:
    other_users = user_ratings.index[user_ratings[movie].notna()].tolist()
    if 'Alice' in other_users:
        other_users.remove('Alice')
    sim_sum = np.sum([alice_similarity[user] for user in other_users])
    weighted_ratings = np.sum([alice_similarity[user] * user_ratings.loc[user, movie] for user in other_users])
    recommended_movies[movie] = weighted_ratings / sim_sum if sim_sum != 0 else 0

# Sort movies by weighted rating
recommended_movies = sorted(recommended_movies.items(), key=lambda x: x[1], reverse=True)

recommended_movies

[('Inception', 1.6233030277982337)]

**Inception** is the movie being recommended to **Alice**, **1.623..** is the weighted average rating for **Inception** based on the ratings given by users who are similar to **Alice**. The weighting is done using the Pearson Correlation Coefficient as a measure of similarity.