# Recommender Systems: Content-Based Filtering Notes

Recommender systems help users discover items they are likely to appreciate. While collaborative filtering relies solely on user–item interactions, content-based filtering leverages detailed feature information to make more nuanced recommendations.

- **Collaborative Filtering:**  
  - Recommends items based on the ratings from similar users.
  - Uses patterns in user ratings.
  - Example: "Users who liked item A also liked item B."
  
- **Content-Based Filtering:**
  - Recommends items based on the features (attributes) of users and items.
  - Uses features of users and items to compute a match.
  - **Analogy:** Think of it as matching a customer’s unique taste profile (user features) with the characteristics of a product (item features).

### Defining Feature Vectors

For **users** and **movies** (items), we define:

**User feature vector:** $x_u$  
- Examples: Age, gender, country, past viewing history, and even aggregate statistics like the average rating by genre.
  
**Item feature vector:** $x_m$  
- Examples: Year of release, genres, star cast, critic reviews, average ratings.
  
After processing, these raw features are transformed into compact vectors:
- **User representation:** $v_u$
- **Movie representation:** $v_m$

The goal is that the dot product, $v_u \cdot v_m$, approximates how much a user will enjoy a movie (or any item).

---

## Building the Content-Based Filtering Model

### Feature Engineering

**User features:**  
- Demographics (age, gender, country) can be one-hot encoded.
- Behavioral features can be constructed from viewing history (e.g., a thousand-dimensional vector indicating which popular movies the user has watched).
- Aggregated ratings per genre provide insights into a user's taste.

**Item features:**  
- Attributes like the movie’s release year, genres, and even critic reviews.
- You can also compute statistics (e.g., average rating, average rating per demographic) to enrich the feature set.

### Neural Network Architecture

The model uses two separate networks:

**User Network:**  
- **Input:** User feature vector $x_u$.
- **Processing:** Several dense layers.
- **Output:** A compact vector $v_u$ (e.g., 32-dimensional).

**Movie (Item) Network:**  
- **Input:** Movie feature vector $x_m$.
- **Processing:** Similar dense layers.
- **Output:** A compact vector $v_m$.

Both networks output vectors of the same dimension so that their dot product is valid.

> **Key Point:** The final prediction is computed as $\\hat{y}^{ij} = v_u^j \\cdot v_m^i$, where $\\hat{y}^{ij}$ approximates the rating that user $j$ would give to movie $i$.

### Training the Model

**Cost Function:**  
- Mean squared error (MSE) is used to measure the difference between the predicted and actual ratings:
  
$$J = \\sum_{(i,j) \\in \\mathcal{D}} (v_u^j \\cdot v_m^i - y^{ij})^2.$$

**Optimization:**  
- Use gradient descent (or its variants) to adjust the parameters in both networks.
- Regularization (e.g., L2 regularization) can be added to prevent overfitting.

### Normalization
- Both $v_u$ and $v_m$ are often normalized (using the L2 norm) so that their lengths are one. This helps in stabilizing the training and ensuring the dot product reflects a cosine similarity–like measure.

---

## Scalability: Retrieval and Ranking

When dealing with very large catalogs (millions of items), computing the neural network output for every item is impractical. Modern systems adopt a two-step process:

### Retrieval Step

**Purpose:** Quickly generate a broad list of plausible candidates.

**Method:**  
- Use pre-computed similarity metrics. For example, for each movie, precompute the top similar movies.
- Leverage simple heuristics (e.g., most viewed genres, regional popularity).
  
**Analogy:** Think of this as "shortlisting" candidates before the detailed interview.

### 2. Ranking Step

**Purpose:** Fine-tune and rank the candidate items accurately.

**Method:**  
- Use the neural network model to compute predictions for the shortlisted items.
- Rank them based on the predicted rating or likelihood of engagement.
  
**Optimization:**  
- If item vectors $v_m$ are precomputed, only the user vector $v_u$ needs to be computed in real time.
- The dot product between $v_u$ and the precomputed $v_m$ values is used for fast scoring.

---

## Ethical Considerations

Recommender systems, while profitable and efficient, carry potential risks. It is important to be aware of the following:

### Transparency vs. Profit Maximization

- **Issue:** Systems may prioritize high-profit items over those that best serve user interests.
- **Example:** A website might rank products that generate more profit even if they are less relevant to the user.

### Content and Engagement Risks

**Maximizing Engagement:**  
- Systems that optimize for watch time or clicks may inadvertently promote polarizing or harmful content (e.g., conspiracy theories, hate speech).

**Mitigation:**  
- Implement content filters and consider ethical guidelines.
- Be transparent with users about recommendation criteria.

### Social Impact

**Exploitation in Advertising:**  
- Models might amplify harmful practices (e.g., payday loans) if profit is the sole criterion.

**Responsible Design:**  
- Developers are encouraged to incorporate ethical safeguards and invite diverse perspectives to minimize harm.

> **Ethical Reminder:** Always design recommender systems with the dual goal of user benefit and societal well-being. Transparency and fairness are key.

---

## TensorFlow Implementation: A Practical Walkthrough

The final section covers a brief overview of how to implement content-based filtering in TensorFlow.

### Model Definition

**User Network:**  
- Use a sequential model with several dense layers.
- Final dense layer outputs a 32-dimensional vector $v_u$.
  
**Movie (Item) Network:**  
- A similar sequential model that outputs a 32-dimensional vector $v_m$.

### Data Flow and Layers

- **Input Layers:** Extract user features and item features.
- **Normalization:** Apply L2 normalization on $v_u$ and $v_m$ to enforce unit length.
- **Dot Product Layer:** A special Keras layer computes the dot product between $v_u$ and $v_m$, yielding the final prediction.

### Cost Function and Training

- **Loss:** Use mean squared error (MSE) for regression tasks or apply a sigmoid function with binary cross-entropy for classification (e.g., predicting clicks).
- **Training:** Train the combined model (both networks) together. The cost function updates parameters in both networks simultaneously.

### Example Code Snippet (Conceptual)
```python
import tensorflow as tf
from tensorflow.keras import layers, models

# Define user network
user_input = tf.keras.Input(shape=(user_feature_size,))
user_net = layers.Dense(128, activation='relu')(user_input)
user_net = layers.Dense(64, activation='relu')(user_net)
v_u = layers.Dense(32)(user_net)
v_u = layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1))(v_u)

# Define movie network
movie_input = tf.keras.Input(shape=(movie_feature_size,))
movie_net = layers.Dense(64, activation='relu')(movie_input)
movie_net = layers.Dense(32, activation='relu')(movie_net)
v_m = layers.Dense(32)(movie_net)
v_m = layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1))(v_m)

# Compute dot product
dot_product = layers.Dot(axes=1)([v_u, v_m])

# Define the model
model = models.Model(inputs=[user_input, movie_input], outputs=dot_product)
model.compile(optimizer='adam', loss='mse')
