# Recommendation Systems (Continued)

- **`Collaborative Filtering`**
  - Recommends items based on the preferences of similar users.
  - It doesn't require knowledge of the items themselves, just information about user interactions.
  - For example, a music streaming service might recommend songs that other users with similar tastes have enjoyed.

In [1]:
%load_ext watermark
%watermark -v -p numpy,pandas,polars,mlxtend,omegaconf --conda

Python implementation: CPython
Python version       : 3.11.8
IPython version      : 8.22.2

numpy    : 1.26.4
pandas   : 2.2.1
polars   : 0.20.18
mlxtend  : 0.23.1
omegaconf: 2.3.0

conda environment: torch_p11



In [2]:
# Built-in library
from pathlib import Path
import re
import json
from typing import Any, Optional, Union
import logging
import warnings

# Standard imports
import numpy as np
import numpy.typing as npt
from pprint import pprint
import pandas as pd
import polars as pl
from rich.console import Console
from rich.theme import Theme

custom_theme = Theme(
    {
        "info": "#76FF7B",
        "warning": "#FBDDFE",
        "error": "#FF0000",
    }
)
console = Console(theme=custom_theme)

# Visualization
import matplotlib.pyplot as plt

# NumPy settings
np.set_printoptions(precision=4)

# Pandas settings
pd.options.display.max_rows = 1_000
pd.options.display.max_columns = 1_000
pd.options.display.max_colwidth = 600

# Polars settings
pl.Config.set_fmt_str_lengths(1_000)
pl.Config.set_tbl_cols(n=1_000)
pl.Config.set_tbl_rows(n=200)

warnings.filterwarnings("ignore")


# auto reload imports# Built-in library
from pathlib import Path
import re
import json
from typing import Any, Optional, Union
import logging
import warnings

# Standard imports
import numpy as np
import numpy.typing as npt
from pprint import pprint
import pandas as pd
import polars as pl
from rich.console import Console
from rich.theme import Theme

custom_theme = Theme(
    {
        "info": "#76FF7B",
        "warning": "#FBDDFE",
        "error": "#FF0000",
    }
)
console = Console(theme=custom_theme)

# Visualization
import matplotlib.pyplot as plt

# NumPy settings
np.set_printoptions(precision=4)

# Pandas settings
pd.options.display.max_rows = 1_000
pd.options.display.max_columns = 1_000
pd.options.display.max_colwidth = 600

# Polars settings
pl.Config.set_fmt_str_lengths(1_000)
pl.Config.set_tbl_cols(n=1_000)
pl.Config.set_tbl_rows(500)

warnings.filterwarnings("ignore")


# Black code formatter (Optional)
%load_ext lab_black

# auto reload imports
%load_ext autoreload
%autoreload 2

In [3]:
fp: str = "../../data/cleaned_products_data.parquet"

prod_df: pl.DataFrame = pl.read_parquet(fp)
print(f"{prod_df.shape = }")
prod_df.head(3)

prod_df.shape = (37853, 6)


productId,title,product_category,summary,genres,metadata
str,str,str,str,str,str
"""B0009839OW""","""anythang 4 money compilation""","""music, rap & hip-hop""","""anythang 4 money sicc sicc sicc""","""music rap & hip-hop""","""anythang 4 money sicc sicc sicc music rap & hip-hop anythang 4 money compilation"""
"""B0002BSFPE""","""classical guitar wedding ceremony""","""music, miscellaneous music, pop music, classical""","""perfect for an intimate wedding""","""music miscellaneous music pop music classical""","""perfect for an intimate wedding music miscellaneous music pop music classical classical guitar wedding ceremony"""
"""B0000CF335""","""desi""","""music, pop music, dance & electronic music, world music music, rap & hip-hop""","""panjabi mc is fantastic""","""music pop music dance & electronic music world music music rap & hip-hop""","""panjabi mc is fantastic music pop music dance & electronic music world music music rap & hip-hop desi"""


In [4]:
# Set verbosity level
pl.Config.set_fmt_str_lengths(100)

polars.config.Config

<br>

### Model-Based Collaborative Filtering

- Model-based collaborative filtering is a technique that uses machine learning algorithms to predict user preferences by building a model from user-item interaction data.

- Its key characteristics and advantages include:
  - `Model Training`: Learning patterns from data to build a predictive model using techniques like matrix factorization, neural networks, and clustering.
  - `Scalability`: More scalable than memory-based methods since it doesn't require real-time similarity computations.
  - `Handling Sparsity`: Effective in handling sparse data by learning latent factors that capture underlying patterns.
  - `Accuracy`: Can achieve higher accuracy by discovering complex relationships between users and items.
  - `Efficiency`: Often faster in generating recommendations due to pre-computed models.

### Common Model-based Techniques include:

-` Matrix Factorization`: Singular Value Decomposition (SVD) and Alternating Least Squares (ALS) to decompose the user-item interaction matrix.
- `Neural Networks`: Autoencoders and deep learning models like RNNs or CNNs to learn compressed representations or capture temporal/spatial patterns.
- `Clustering: K-Means Clustering and Hierarchical Clustering` to group users or items based on interaction patterns.


### Matrix Factorization

- Matrix factorization is a popular technique used in `collaborative filtering` for recommendation systems.
- It involves `decomposing a large matrix into smaller matrices` to uncover the `latent features` underlying the interactions between users and items.

- Here's a brief explanation:

**Problem Setup**:

- In a recommendation system, we typically have a user-item interaction matrix $( R )$, where each entry $( R_{ui} )$ represents the rating or interaction of user $( u )$ with item $( i )$.
- This matrix is usually `sparse`, meaning most entries are missing because users have interacted with only a small subset of items.

**Goal**:

- The goal of matrix factorization is to predict the missing entries in the interaction matrix $( R )$.
- This helps in recommending items to users that they have not yet interacted with.

**Decomposition**:

- Matrix factorization decomposes the interaction matrix $( R )$ into two lower-dimensional matrices:
  - $( U )$: A user-feature matrix where each `row represents a user` and the `columns represent the latent features`.
  - $( I )$: An item-feature matrix where each `row represent the latent features` of items and the `columns represent the individual items`.
Mathematically, $( R \approx U \times I^T )$.


**Latent Features:**

- The latent features are abstract representations that `capture the underlying factors influencing user preferences and item characteristics`. - For example, in a movie recommendation system, latent features might represent genres, actors, or directors.

**Optimization**:

- The decomposition is typically achieved by `minimizing` the difference between the actual and predicted interactions.
- This can be formulated as an optimization problem: 
  - $[ \min_{U, V} \sum_{(i,j) \in \text{observed}} (R_{ij} - U_i \cdot V_j^T)^2 + \lambda (|U|^2 + |V|^2) ]$
  - The first term measures the reconstruction error for observed interactions.
  - The second term is a regularization term to prevent overfitting, with $( \lambda )$ being the regularization parameter.