# Research Findings

Our planned approach focuses on two complementary recommendation strategies. First, we will implement a **content-based model** using *tag genome vectors*, which provide a high-dimensional semantic profile for each movie based on standardized tags. By averaging the tag vectors of movies a user has rated highly, we can construct a personalized user profile vector. We will then compute *cosine similarity* between this profile and all other movie vectors to identify and recommend the most similar unseen movies. This method is both interpretable and practical, leveraging the rich feature space available in genome-scores.csv and genome-tags.csv.

In parallel, we will evaluate **collaborative filtering models**, specifically *Singular Value Decomposition (SVD)* and *K-Nearest Neighbors (KNN)*, using the user–item ratings matrix. These models leverage patterns in user behavior to predict unseen ratings, allowing for personalized recommendations based on collective preferences rather than content. SVD learns latent factors that represent underlying user and movie traits, while KNN identifies users or items with similar rating histories. Evaluating both approaches will allow us to compare their effectiveness and potentially combine them in a hybrid model during later stages of the project.

# Recommender systems overview
## Content-Based Filtering (CBF):

*   **Focus:** Item attributes & your past preferences.
*   **Logic:** "You liked items with *these features* (e.g., genre, actors, keywords), so here are *more items with similar features*."
*   **Relies on:** Analyzing the content/features of items and a user's profile (their preferences for these features).
*   **Example:** If you liked sci-fi movies starring Harrison Ford, it recommends more sci-fi movies, especially those with Harrison Ford.
*   **Pros:** Good for recommending niche items, handles new items well (if features are known).
*   **Cons:** Needs good item feature data, limited serendipity (recommends "more of the same").


## Collaborative Filtering (CF):

*   **Focus:** User-item interaction patterns (e.g., ratings, purchases).
*   **Logic:** "Users *similar to you* also liked these items" (user-based) OR "People who liked *this item* also liked *these other items*" (item-based).
*   **Relies on:** Finding patterns in the collective behavior of many users (the "wisdom of the crowd").
*   **Example:** Users A and B have similar movie rating histories. User A recently liked a new movie. CF might recommend that movie to User B.
*   **Pros:** No need for item features, can find novel/serendipitous recommendations.
*   **Cons:** Suffers from "cold-start" (new users/items have no interaction data), data sparsity.


## Cosine Similarity (CS):
*   **Focus:** Measuring the angle between two vectors — e.g., users or items represented in a high-dimensional space (like genre vectors or user ratings)
*   **Logic:** Two items/users are considered similar if they point in the same direction, regardless of magnitude (how many total ratings or genres they have).
*   **Relies on:** Representing items/users as vectors and calculating the cosine of the angle between them
*   **Example:** If two movies share similar genres (e.g., "Action|Thriller" vs. "Action|Crime|Thriller"), their genre vectors will have a small angle, and cosine similarity will be close to 1.
*   **Pros:** doesn't care about absolute rating counts, Easy to implement
*   **Cons:** Ignores magnitude differences, Still sensitive to sparse overlap — if two vectors don't share common non-zero elements, similarity is 0)


## In essence:

*   **CBF = "Show me more of what I've liked based on its characteristics."**
*   **CF = "Show me what people like me have liked."**
*   **CS = "Show me what's most similar, no matter how big or small, as long as it points in the same direction."**

## BERT4Rec
BERT4Rec is a sequential recommendation model that adapts the BERT architecture from NLP to predict a user's next item (e.g., movie) by modeling their historical interactions as a sequence. It uses a bidirectional Transformer to capture complex dependencies in user behavior by randomly masking some items in the sequence and training the model to predict these masked items based on the surrounding context. This allows BERT4Rec to learn rich, context-aware representations of user preferences, making it effective for personalized recommendations.

See: https://medium.com/data-science/build-your-own-movie-recommender-system-using-bert4rec-92e4e34938c5


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d8c60091-34ab-41c2-8d44-4fbf2ff8f01b' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>