# User-based Collaborative Filtering

User-based k-Nearest Neighbors (UserKNN) is another collaborative filtering algorithm used in
recommendation systems. The core idea behind UserKNN is to make recommendations based on the
preferences of similar users. UserKNN is intuitive and leverages the idea that users who have rated
items similarly in the past will continue to have similar preferences. It’s especially effective when users
have a rich history of interactions.

In [1]:
# %pip install import_ipynb
import import_ipynb 
import cornac
from cornac.data import Dataset
import cornac.metrics as met
from cornac.eval_methods import BaseMethod
from data_loader import DataLoader # type: ignore

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
data_path = "data/"
data_loader = DataLoader(data_path)

In [3]:
train_dataset, test_dataset = data_loader.load_for_cornac(dataset_type='split')
print(train_dataset.shape)
train_dataset.head()

(513384, 4)


Unnamed: 0,ReviewId,RecipeId,AuthorId,Rating
0,826743,3745,345380,4
1,1247176,26217,406131,1
2,1250914,17123,355582,5
3,183560,123283,58104,4
4,1255493,110139,383795,5


## User-based Collaborative Filtering (UserKNN)

User-based collaborative filtering finds users who are similar to the target user and recommends items that these similar users have liked. The algorithm works as follows:

1. **Find Similar Users**: Calculate similarity between users based on their rating patterns
2. **Neighborhood Selection**: Select k most similar users (k-nearest neighbors)
3. **Prediction**: Predict ratings for unrated items based on similar users' ratings
4. **Recommendation**: Recommend items with highest predicted ratings

### Key Parameters:
- **k**: Number of similar users to consider (neighborhood size)
- **similarity**: Similarity metric (cosine, pearson, etc.)
- **min_support**: Minimum number of common items between users

In [4]:
from cornac.models import UserKNN
from cornac.eval_methods import RatioSplit
import pandas as pd
import time

# Uncomment the following lines to limit the dataset size for quick testing
"""train_dataset = train_dataset[:1000]  # Limit to 1000 for quick testing
test_dataset = test_dataset[:20] # Limit to 20 for quick testing
print(f"Train dataset size: {train_dataset.shape}"
      f"\nTest dataset size: {test_dataset.shape}")"""

cornac_train_dataset = Dataset.from_uir(train_dataset[['AuthorId', 'RecipeId', 'Rating']].values.tolist(), seed=42)

metrics = [
    met.MSE(),
    met.RMSE(),
    met.MAE(),
    met.Precision(k=10),
    met.Recall(k=10),
    met.NDCG(k=10),
]
eval_method = BaseMethod.from_splits(train_dataset[['AuthorId', 'RecipeId', 'Rating']].values, test_dataset[['AuthorId', 'RecipeId', 'Rating']].values)

k_values = [5, 10, 20, 50]
user_knn_results = []

print("\n" + "="*80)
print("USER-BASED COLLABORATIVE FILTERING (UserKNN) EVALUATION")
print("="*80)

for k in k_values:
    print(f"\nTesting UserKNN with k={k}...")
    user_knn = UserKNN(k=k, similarity='cosine', verbose=True)
    start_time = time.time()
    user_knn.fit(cornac_train_dataset)
    total_time = time.time() - start_time
    
    # Evaluate metrics
    results = eval_method.evaluate(user_knn, metrics=metrics, user_based=False)
    
    for metrics in results:
        print(metrics)
    
    model_result = {
        'model': f'UserKNN(k={k})',
        'k': k,
        'results': results,
        'total_time': total_time
    }
    user_knn_results.append(model_result)
    print(f"  Total time: {total_time:.2f}s")

# Display results in a nice table
user_knn_df = pd.DataFrame(user_knn_results)
print(f"\n{'='*80}")
print("USER-BASED COLLABORATIVE FILTERING RESULTS SUMMARY")
print(f"{'='*80}")
print(user_knn_df.to_string(index=False, float_format='%.4f'))


USER-BASED COLLABORATIVE FILTERING (UserKNN) EVALUATION

Testing UserKNN with k=5...


100%|██████████| 17748/17748 [00:00<00:00, 28646.82it/s]
100%|██████████| 17748/17748 [00:00<00:00, 25585.60it/s]


KeyboardInterrupt: 

In [None]:
def extract_metrics(results_tuple, metrics=[
        met.MSE(),
        met.RMSE(),
        met.MAE(),
        met.Precision(k=k),
        met.Recall(k=k),
        met.NDCG(k=k),
    ]):
    """
    Extracts metrics from the results tuple returned by Cornac evaluation.
    """
    # print(results_tuple[0])
    cornac_metrics = results_tuple[0]
    string_metrics = str(cornac_metrics)
    
    result_line = string_metrics.split('\n')[2].strip()
    # print(f"Result line: {result_line}")
    
    result_values = result_line.split("|")[1:]
    # print(f"Result values: {result_values}")
    
    metrics_dict = {}
    for metric, value in zip(metrics, result_values):
        metric_name = metric.name if hasattr(metric, 'name') else str(metric)
        metrics_dict[metric_name] = float(value.strip())
            
    # print(f"Extracted metrics: {metrics_dict}")
    return metrics_dict

for result in user_knn_results:
    result['results'] = extract_metrics(result['results'], metrics=[
        met.MSE(),
        met.RMSE(),
        met.MAE(),
        met.Precision(k=k),
        met.Recall(k=k),
        met.NDCG(k=k),
    ])

for result in user_knn_results:
    for metric_name, value in result['results'].items():
        result[metric_name] = value
    del result['results']

user_knn_df = pd.DataFrame(user_knn_results)
user_knn_df
    

Result values: [' 0.6412 ', ' 0.7271 ', ' 0.8527 ', ' 0.0509 ', '      0.0200 ', '   0.1000 ', '    0.0252 ', '   0.0146']
Result values: [' 0.6412 ', ' 0.7271 ', ' 0.8527 ', '  0.1004 ', '       0.0250 ', '    0.2500 ', '    0.0233 ', '   0.0170']
Result values: [' 0.6412 ', ' 0.7271 ', ' 0.8527 ', '  0.1663 ', '       0.0250 ', '    0.5000 ', '    0.0231 ', '   0.0182']
Result values: [' 0.6412 ', ' 0.7271 ', ' 0.8527 ', '  0.1663 ', '       0.0100 ', '    0.5000 ', '    0.0219 ', '   0.0202']


Unnamed: 0,model,k,total_time,MSE,RMSE,MAE,Precision@50,Recall@50,NDCG@50
0,UserKNN(k=5),5,0.039051,0.6412,0.7271,0.8527,0.0509,0.02,0.1
1,UserKNN(k=10),10,0.033474,0.6412,0.7271,0.8527,0.1004,0.025,0.25
2,UserKNN(k=20),20,0.024688,0.6412,0.7271,0.8527,0.1663,0.025,0.5
3,UserKNN(k=50),50,0.032905,0.6412,0.7271,0.8527,0.1663,0.01,0.5
