# Amazon's Item-to-Item Collaborative Filtering (2003)

## Recommendation Algorithms

1. Traditional Collaborative Filtering
2. Cluster Models
3. Search-based Methods

## Traditional Collaborative Filtering

Assume a store has $N$ SKUs and $M$ customers. In traditional collaborative filtering, a customer $c_i$ is represented as 

$$c_i = \begin{bmatrix}sku_1\\sku_2\\\vdots\\sku_j\\\vdots\\sku_N\end{bmatrix}$$

for $i = 1, 2, \cdots, M$. $sku_j$ can is positive for purchased  positivedly rated items and negative for negatively rated items. Generate recommendations by finding the top customers most similar to the user by computing the cosine similarity of 2 customers, $i$ and $i'$.

$$
similarity(c_i, c_{i'}) = cos(c_i, c_{i'}) = \frac{c_i \cdot c_{i'}}{ ||c_i|| \times ||c_{i'}|| }
$$

To get a bunch of similar customers:

$$ similarity(c_i)=\begin{bmatrix}similarity(c_i, c_{1})\\similarity(c_i, c_{2})\\\vdots\\ similarity(c_i, c_{M})\\\end{bmatrix} $$

To recommend for a new customer, pick customers that have the highest score but not equals to $1$ (identical customers), then pick positively rated items to recommend from that list of customers.

## Cluster Models (Customer Segmentation)

Divide the customer base $\left\{c_1, \cdots, c_M\right\}$ to segments using clustering (e.g. k-means) or other unsupervised learning algorithm. Alternatively, use manually-selected segments.

To recommend for a new customer, pick a cluster which is most similar to the customer and pick positively rated items to recommend from that cluster. 

## Item-to-Item Collaborative Filtering

This algorithm **matches each of the user's purchased/positively rated items to similar items, then combines those similar items into a recommendation list.** The algorithm builds a *similar-items table* by finding items that customers tend to purchase together.

In [1]:
import pandas
import numpy
import random
purchases = []
for i in range(0, 20):
    purchases.append({
        "customer": ("c%d" % random.randint(1,10)), 
        "item"    : ("i%d" % random.randint(1,20))})
purchase_df = pandas.DataFrame(purchases)
purchase_df = purchase_df.sort_values(["customer", "item"]).reset_index(drop=True)
purchase_df[0:5]

Unnamed: 0,customer,item
0,c1,i16
1,c10,i1
2,c10,i17
3,c10,i18
4,c2,i13


In [3]:
i2i = pandas.merge(purchase_df, purchase_df, how='inner', on='customer').sort_values(["customer"])
i2i["equals"] = numpy.where(i2i["item_x"] == i2i["item_y"], "1", "0")
i2i_cleaned = i2i.loc[i2i["equals"] == "0"].drop(["customer", "equals"], axis=1)
i2i_cleaned.sort_values(["item_x"])

Unnamed: 0,item_x,item_y
2,i1,i17
3,i1,i18
24,i11,i16
11,i13,i3
33,i13,i19
34,i13,i6
32,i13,i16
35,i13,i8
16,i14,i9
28,i15,i16


**References:**
- [Amazon.com Recommendataions](https://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf)