ID: V01053626

Name: Newsha Bahardoost

Task 2 comment

\documentclass{article} \usepackage{amsmath}

\begin{document}

\section*{Task 2: Derivation of ( \Gamma ) for User-User and Item-Item Collaborative Filtering}

Collaborative filtering works by computing similarity scores between users or items and then using these scores to generate recommendations.

\subsection*{User-User Collaborative Filtering}

The cosine similarity between users is given by:

[ S_{\text{user}} = P^{-\frac{1}{2}} R R^\top P^{-\frac{1}{2}} ]

where:

\begin{itemize}
    \item \( R \) is the user-item rating matrix.
    \item \( P \) is the diagonal matrix of user degrees (i.e., the number of items each user has interacted with).
    \item \( P^{-\frac{1}{2}} \) is the inverse square root of \( P \), which normalizes the similarities.
\end{itemize}

The recommendation matrix for user-user filtering is:

[ \Gamma_{\text{user}} = R S_{\text{user}} = R P^{-\frac{1}{2}} R R^\top P^{-\frac{1}{2}} ]

\subsection*{Item-Item Collaborative Filtering}

Similarly, for item-item filtering, the cosine similarity between items is given by:

[ S_{\text{item}} = Q^{-\frac{1}{2}} R^\top R Q^{-\frac{1}{2}} ]

where:

\begin{itemize}
    \item \( Q \) is the diagonal matrix of item degrees (i.e., the number of users who interacted with each item).
    \item \( Q^{-\frac{1}{2}} \) is the inverse square root of \( Q \).
\end{itemize}

The recommendation matrix for item-item filtering is:

[ \Gamma_{\text{item}} = S_{\text{item}} R = Q^{-\frac{1}{2}} R^\top R Q^{-\frac{1}{2}} R ]

\subsection*{Explanation of the Equations}

\begin{itemize}
    \item \textbf{User-user filtering:} We find similar users and recommend items based on what similar users have interacted with.
    \item \textbf{Item-item filtering:} We find similar items and recommend them based on other similar items that the user has engaged with.
\end{itemize}

Thus, the final formulas for ( \Gamma ) are:

[ \Gamma_{\text{user}} = R P^{-\frac{1}{2}} R R^\top P^{-\frac{1}{2}} ]

[ \Gamma_{\text{item}} = Q^{-\frac{1}{2}} R^\top R Q^{-\frac{1}{2}} R ]

\end{document}

In [1]:
import numpy as np

# Task 1: Compute matrices P and Q
# Read the data/p2-user-shows.txt file
with open('/content/p2-user-shows.txt', 'r') as f:
    R = np.array([[int(x) for x in line.strip().split()] for line in f])

# Compute user degrees (P) and item degrees (Q)
P = R.sum(axis=1)  # Sum each row (users)
Q = R.sum(axis=0)  # Sum each column (items)

# Helper function to format matrices
def format_matrix(mat, name):
    """Helper function for consistent matrix formatting"""
    matrix_str = np.array2string(
        np.diag(mat[:5]),
        formatter={'int': lambda x: f"{x}."},
        separator='  ',
        threshold=np.inf,
        max_line_width=np.inf
    ).replace('\n', ' ')
    return f"{name} = {matrix_str}"

# Print matrices P and Q in the correct format
print(format_matrix(P, "P"))
print(format_matrix(Q, "Q"))

P = [[35.  0.  0.  0.  0.]  [0.  26.  0.  0.  0.]  [0.  0.  44.  0.  0.]  [0.  0.  0.  17.  0.]  [0.  0.  0.  0.  21.]]
Q = [[1089.  0.  0.  0.  0.]  [0.  3350.  0.  0.  0.]  [0.  0.  3187.  0.  0.]  [0.  0.  0.  1212.  0.]  [0.  0.  0.  0.  1438.]]


In [7]:
# ========== TASK 3 ========== #
# Load show names
with open('/content/p2-shows.txt', 'r') as f:
    shows = [line.strip() for line in f]

# Compute user-user similarity matrix (Gamma_user)
P_sqrt_inv = np.diag(1.0 / np.sqrt(P + 1e-10))  # Avoid division by zero
Gamma_user = P_sqrt_inv @ R @ R.T @ P_sqrt_inv

# Identify the top 5 recommended TV shows for user Alex (user 499)
user_id = 499  # Alex's user index
user_scores = Gamma_user[user_id, :] @ R  # Compute recommendation scores

# Get top 5 show indices
top_5_indices = np.argsort(-user_scores)[:5]

print("Top 5 TV shows for Alex (user-user):")
for idx in top_5_indices:
    print(f"- {shows[idx]}")

Top 5 TV shows for Alex (user-user):
- "FOX 28 News at 10pm"
- "10TV News HD at 11pm"
- "Family Guy"
- "10TV Eyewitness News at 5:00"
- "10TV Eyewitness News at 6:00"


In [9]:
# ========== TASK 4 ========== #
# Compute item-item similarity matrix (S_item)
RTR = R.T @ R  # Compute item co-occurrence matrix

# Ensure Q values are nonzero to avoid division by zero
Q_safe = Q.copy()
Q_safe[Q_safe == 0] = 1  # Avoid division by zero
Q_inv_sqrt = 1 / np.sqrt(Q_safe)
Q_inv_sqrt_matrix = np.diag(Q_inv_sqrt)

S_item = Q_inv_sqrt_matrix @ RTR @ Q_inv_sqrt_matrix

# Define Alex's index (User 499)
alex_idx = 499

# Identify Alex's watched shows (all watched items)
alex_watched = np.where(R[alex_idx, :] == 1)[0]

# Compute recommendation scores for the first 100 shows
scores_item = S_item[alex_watched].sum(axis=0)  # Compute recommendation scores

# Get top 5 recommended shows for item-item filtering
top5_item_indices = np.argsort(-scores_item)[:5]

print("Top 5 TV shows for Alex (item-item):")
for idx in top5_item_indices:
    print(f"- {shows[idx]}")


Top 5 TV shows for Alex (item-item):
- "10TV Eyewitness News at 5:00"
- "FOX 28 News at 10pm"
- "10TV Eyewitness News at 6:00"
- "10TV News HD at 11pm"
- "Family Guy"
