# Item-based Collaborative Filtering

Item-based k-Nearest Neighbors (ItemKNN) is a collaborative filtering algorithm used in
recommendation systems. The core idea behind ItemKNN is to make recommendations based on the
similarity between items rather than users. ItemKNN calculates the similarity between items based on how
users have rated them, and uses these similarities to predict ratings for items a user hasn't rated yet.

This approach is particularly effective when:
- The number of items is more stable than the number of users
- Item similarities tend to be more static over time than user similarities
- You want to provide more interpretable recommendations ("We recommend this because it's similar to items you liked")

In [5]:
# %pip install import_ipynb
import import_ipynb 
import cornac
from cornac.data import Dataset
import cornac.metrics as met
from cornac.eval_methods import BaseMethod
from data_loader import DataLoader # type: ignore

In [6]:
data_path = "data/"
data_loader = DataLoader(data_path)

In [7]:
train_dataset, test_dataset = data_loader.load_for_cornac(dataset_type='split')
print(train_dataset.shape)
train_dataset.head()

(513384, 4)


Unnamed: 0,ReviewId,RecipeId,AuthorId,Rating
0,826743,3745,345380,4
1,1247176,26217,406131,1
2,1250914,17123,355582,5
3,183560,123283,58104,4
4,1255493,110139,383795,5


## Item-based Collaborative Filtering (ItemKNN)

Item-based collaborative filtering finds items that are similar to those the user has already rated highly. The algorithm works as follows:

1. **Calculate Item Similarity**: Compute similarity between items based on user rating patterns
2. **Find Similar Items**: For each item, identify the k most similar items
3. **Prediction**: Predict ratings for unrated items using the ratings of similar items
4. **Recommendation**: Recommend items with highest predicted ratings

### Key Parameters:
- **k**: Number of similar items to consider (neighborhood size)
- **similarity**: Similarity metric (cosine, pearson, etc.)
- **min_support**: Minimum number of users who have rated both items

In [None]:
from cornac.models import ItemKNN
from cornac.eval_methods import RatioSplit
import pandas as pd
import time

# Use a percentage of the dataset
train_percentage = 1
test_percentage = 0.1 

train_dataset = train_dataset.sample(frac=train_percentage, random_state=42)
test_dataset = test_dataset.sample(frac=test_percentage, random_state=42)

print(f"Train dataset size: {train_dataset.shape}"
      f"\nTest dataset size: {test_dataset.shape}")

cornac_train_dataset = Dataset.from_uir(train_dataset[['AuthorId', 'RecipeId', 'Rating']].values.tolist(), seed=42)

metrics = [
    met.MSE(),
    met.RMSE(),
    met.MAE(),
    met.Precision(k=10),
    met.Recall(k=10),
    met.NDCG(k=10),
]
eval_method = BaseMethod.from_splits(train_dataset[['AuthorId', 'RecipeId', 'Rating']].values, test_dataset[['AuthorId', 'RecipeId', 'Rating']].values)

k_values = [5, 10, 20, 50]
item_knn_results = []

print("\n" + "="*80)
print("ITEM-BASED COLLABORATIVE FILTERING (ItemKNN) EVALUATION")
print("="*80)

for k in k_values:
    print(f"\nTesting ItemKNN with k={k}...")
    item_knn = ItemKNN(k=k, similarity='cosine', verbose=True)
    start_time = time.time()
    item_knn.fit(cornac_train_dataset)
    total_time = time.time() - start_time
    
    # Evaluate metrics
    results = eval_method.evaluate(item_knn, metrics=metrics, user_based=False)
    
    for metric_result in results:
        print(metric_result)
    
    model_result = {
        'model': f'ItemKNN(k={k})',
        'k': k,
        'results': results,
        'total_time': total_time
    }
    item_knn_results.append(model_result)
    print(f"  Total time: {total_time:.2f}s")

# Display results in a nice table
item_knn_df = pd.DataFrame(item_knn_results)
print(f"\n{'='*80}")
print("ITEM-BASED COLLABORATIVE FILTERING RESULTS SUMMARY")
print(f"{'='*80}")
print(item_knn_df.to_string(index=False, float_format='%.4f'))

Train dataset size: (513384, 4)
Test dataset size: (128, 4)

ITEM-BASED COLLABORATIVE FILTERING (ItemKNN) EVALUATION

Testing ItemKNN with k=5...

ITEM-BASED COLLABORATIVE FILTERING (ItemKNN) EVALUATION

Testing ItemKNN with k=5...


  0%|          | 0/39057 [00:00<?, ?it/s]



  0%|          | 0/39057 [00:00<?, ?it/s]

        |    MAE |    MSE |   RMSE | NDCG@10 | Precision@10 | Recall@10 | Train (s) | Test (s)
------- + ------ + ------ + ------ + ------- + ------------ + --------- + --------- + --------
ItemKNN | 0.4953 | 0.8000 | 0.8944 |  0.0050 |       0.0008 |    0.0080 |   36.2173 |  24.3924

None
  Total time: 29.30s

Testing ItemKNN with k=10...


  0%|          | 0/39057 [00:00<?, ?it/s]

  0%|          | 0/39057 [00:00<?, ?it/s]

In [None]:
def extract_metrics(results_tuple, metrics=[
        met.MSE(),
        met.RMSE(),
        met.MAE(),
        met.Precision(k=k),
        met.Recall(k=k),
        met.NDCG(k=k),
    ]):
    """
    Extracts metrics from the results tuple returned by Cornac evaluation.
    """
    cornac_metrics = results_tuple[0]
    string_metrics = str(cornac_metrics)
    
    result_line = string_metrics.split('\n')[2].strip()
    result_values = result_line.split("|")[1:]
    
    metrics_dict = {}
    for metric, value in zip(metrics, result_values):
        metric_name = metric.name if hasattr(metric, 'name') else str(metric)
        metrics_dict[metric_name] = float(value.strip())
            
    return metrics_dict

for result in item_knn_results:
    result['results'] = extract_metrics(result['results'], metrics=[
        met.MSE(),
        met.RMSE(),
        met.MAE(),
        met.Precision(k=k),
        met.Recall(k=k),
        met.NDCG(k=k),
    ])

for result in item_knn_results:
    for metric_name, value in result['results'].items():
        result[metric_name] = value
    del result['results']

item_knn_df = pd.DataFrame(item_knn_results)
item_knn_df


Unnamed: 0,model,k,total_time,MSE,RMSE,MAE,Precision@50,Recall@50,NDCG@50
0,ItemKNN(k=5),5,0.028207,0.2125,0.0452,0.2125,0.5,0.05,0.5
1,ItemKNN(k=10),10,0.058727,0.2125,0.0452,0.2125,0.5,0.05,0.5
2,ItemKNN(k=20),20,0.043335,0.2125,0.0452,0.2125,0.5,0.05,0.5
3,ItemKNN(k=50),50,0.066185,0.2125,0.0452,0.2125,0.5,0.05,0.5
