## Recommender systems 

They are algorithms widely utilized for companies in e-commerce, streaming platforms, and other sectors. Their primary goal is to predict user preferences and suggest items or content that align with those preferences.

Recommender systems are designed to enhance user experience by providing personalized recommendations. They leverage data from users’ past interactions, such as:

* Purchase History: In e-commerce, systems suggest products based on what users have previously bought or browsed.
* Ratings: On platforms like movie or music services, recommendations are often based on the ratings users give to items.
* Behavioral Data: This includes browsing history, search queries, and time spent on certain content.

#### Methods Used
* Collaborative Filtering: This method makes predictions based on the behavior and preferences of similar users. For example, if User A and User B have similar tastes, and User A likes a new product, the system may recommend that product to User B.

        Collaborative Filtering can be also classified as:
        Item CF
        User CF
        Matrix factorization

* Content-Based Filtering: Recommendations are based on the attributes of items and users' past preferences. For instance, if a user frequently watches action movies, the system might suggest new action films.

* Hybrid Approaches: These combine collaborative and content-based filtering to improve recommendation accuracy and mitigate the shortcomings of individual methods.

#### Applications
* E-commerce: Suggests products similar to those previously viewed or purchased.
* Streaming Platforms: Recommends movies, shows, or music based on viewing or listening history.
* Social Media: Curates posts or friends' suggestions based on interaction patterns.

Recommender systems play a crucial role in personalizing user experiences, increasing engagement, and driving sales by predicting and catering to individual preferences.

### Collaborative Filtering

Collaborative Filtering (CF) is a popular recommendation algorithm that predicts user preferences based on the behavior and preferences of similar users. The core idea is that if users share similar tastes or behaviors, then items liked by one user can be recommended to another similar user.

#### Because they are so popular, we can find types of collaborative filtering.

* User-Based Collaborative Filtering (User CF): This approach identifies users with similar tastes to the target user. It assumes that if User A and User B have a high overlap in their preferences, they will likely share future preferences as well.

We can formulate this aproch by concider a matrix $R$, this is going to be the user-item matrix where $𝑅_{ij}$ is the rating given by user $i$ to item $j$, the  similarity between users $u$ and $v$ is computed using cosine similarity or Pearson correlation given by:


$Sim(u, v) = \frac{\sum_{i \in I_{uv}} (R_{ui} - \bar{R_u})(R_{vi} - \bar{R_v})}{\sqrt{\sum_{i \in I_{uv}} (R_{ui} - \bar{R_u})^2} \cdot \sqrt{\sum_{i \in I_{uv}} (R_{vi} - \bar{R_v})^2}}$


where $\bar{R_u}$ and $\bar{R_v}$ are the average ratings of users $u$ and $v$, respectively, and $I_{uv}$ is the set of items rated by both users.

* Item-Based Collaborative Filtering (Item CF): This approach identifies items similar to those the target user has rated or liked. It assumes that if items are similar, users who liked one item are likely to like similar items. If a  User A likes Item X and Item X is similar to Item Y, the system will recommend Item Y to User A.

${Sim}(i, j) = \frac{\sum_{u \in U_{ij}} (R_{ui} - \bar{R_u})(R_{uj} - \bar{R_u})}{\sqrt{\sum_{u \in U_{ij}} (R_{ui} - \bar{R_u})^2} \cdot \sqrt{\sum_{u \in U_{ij}} (R_{uj} - \bar{R_u})^2}}
$

* Matrix Factorization: Matrix factorization techniques decompose the user-item matrix $R$  into two lower-dimensional matrices, typically referred to as 
$U$ (user features) and $V$ (item features). The product of these matrices approximates the original matrix $R$. This approach captures latent features of users and items, which helps in making predictions for missing entries in $R$. 

The goal is to find matrices $U$ (user matrix) and $V$ (item matrix) such that their product approximates the original matrix $R$:

$R \approx U \cdot V^T$

Here, $U$ is a matrix of size $m \times k$ (where $m$ is the number of users and $k$ is the number of latent features), and $V$ is a matrix of size $n \times k$ (where $n$ is the number of items). $k$ is typically much smaller than $m$ or $n$.

The optimization problem can be formulated as:

$\min_{U, V} \sum_{(i, j) \in K} (R_{ij} - U_i \cdot V_j^T)^2 + \lambda (\| U \|^2 + \| V \|^2)$

where $K$ is the set of observed ratings, and $\lambda$ is a regularization parameter to prevent overfitting.

Example: In Singular Value Decomposition (SVD), matrix factorization is performed as follows:

$R = U \Sigma V^T$

where $\Sigma$ is a diagonal matrix of singular values, and $U$ and $V$ contain the singular vectors.

Matrix factorization methods are powerful because they can discover hidden patterns in data and are often more effective in capturing complex relationships compared to simpler user-based or item-based methods.



## Neural Collaborative Filtering (NCF)

 This an advanced recommendation algorithm that leverages neural networks to model user-item interactions. Unlike traditional collaborative filtering methods, which rely on linear models, NCF uses deep learning techniques to capture complex patterns in the data. Here is a detailed explanation with the mathematical formulation:

### Basic Concepts

1. **User-Item Matrix**:
   Let $R$ be the user-item interaction matrix, where $R_{ij}$ represents the interaction (e.g., rating, click, purchase) between user $i$ and item $j$.

2. **Latent Vectors**:
   - $\mathbf{p}_i$: Latent vector for user $i$ (user embedding).
   - $\mathbf{q}_j$: Latent vector for item $j$ (item embedding).

3. **Generalized Matrix Factorization (GMF)**:
   In GMF, the interaction between user $i$ and item $j$ is modeled as:
   $$
   \hat{y}_{ij} = \mathbf{p}_i^T \mathbf{q}_j
   $$
   where $\mathbf{p}_i$ and $\mathbf{q}_j$ are learned via optimization.

### Neural Collaborative Filtering (NCF)

NCF extends GMF by replacing the dot product with a neural network that can model non-linear interactions between users and items. The key components are:

1. **Embedding Layers**:
   Users and items are mapped to latent vectors using embedding layers:
   $$
   \mathbf{p}_i = \text{Embedding}_U(i)
   $$
   $$
   \mathbf{q}_j = \text{Embedding}_I(j)
   $$

2. **Concatenation Layer**:
   The user and item embeddings are concatenated to form a joint representation:
   $$
   \mathbf{z}_{ij} = [\mathbf{p}_i; \mathbf{q}_j]
   $$

3. **Neural Network Layers**:
   The concatenated vector $\mathbf{z}_{ij}$ is fed into a multi-layer perceptron (MLP) to model the interaction:
   $$
   \mathbf{h}_1 = f_1(\mathbf{W}_1 \mathbf{z}_{ij} + \mathbf{b}_1)
   $$
   $$
   \mathbf{h}_2 = f_2(\mathbf{W}_2 \mathbf{h}_1 + \mathbf{b}_2)
   $$
   $$
   \vdots
   $$
   $$
   \mathbf{h}_L = f_L(\mathbf{W}_L \mathbf{h}_{L-1} + \mathbf{b}_L)
   $$
   where $\mathbf{W}_l$ and $\mathbf{b}_l$ are the weights and biases of the $l$-th layer, and $f_l$ is the activation function (e.g., ReLU).

4. **Prediction Layer**:
   The output of the final MLP layer is passed through a prediction layer to produce the predicted interaction:
   $$
   \hat{y}_{ij} = \sigma(\mathbf{h}_L)
   $$
   where $\sigma$ is an activation function, typically a sigmoid for binary interactions.

### Loss Function

The model is trained using a loss function that measures the discrepancy between the predicted interactions $\hat{y}_{ij}$ and the actual interactions $y_{ij}$. For binary interactions, a common choice is the binary cross-entropy loss:
$$
\mathcal{L} = - \sum_{(i,j) \in K} \left( y_{ij} \log(\hat{y}_{ij}) + (1 - y_{ij}) \log(1 - \hat{y}_{ij}) \right)
$$
where $K$ is the set of observed interactions.

### Regularization

To prevent overfitting, regularization terms are added to the loss function. This can include $L2$ regularization on the weights and biases of the neural network:
$$
\mathcal{L}_{\text{reg}} = \lambda \left( \|\mathbf{W}\|^2 + \|\mathbf{b}\|^2 \right)
$$
where $\lambda$ is the regularization parameter.



In [4]:
import numpy as np