# Matrix Factorization for Recipe Recommendation
This notebook demonstrates how to use matrix factorization (SVD) for building a recipe recommendation system using the Cornac library. It covers data loading, model training, evaluation, and analysis.

## 1. Import Required Libraries
We import all necessary libraries, including Cornac for recommendation, pandas and numpy for data handling, and a custom DataLoader for loading the recipe data.

In [1]:
# %pip install import_ipynb
import import_ipynb 
import cornac
from cornac.data import Dataset
from cornac.eval_methods import BaseMethod
import numpy as np
import pandas as pd
import cornac
from cornac.models import SVD
from cornac.data import Dataset
from cornac.eval_methods import RatioSplit
import cornac.metrics as metrics
import cornac.metrics as met
from data_loader import DataLoader # type: ignore

## 2. Load Data
We use a custom DataLoader to load the training and test datasets for the recommendation task. The data consists of user (AuthorId), item (RecipeId), and rating (Rating) columns.

In [2]:
data_path = "data/"
data_loader = DataLoader(data_path)

In [3]:
train_dataset, test_dataset = data_loader.load_for_cornac(dataset_type='split')
print(train_dataset.shape)
train_dataset.head()

(513384, 4)


Unnamed: 0,ReviewId,RecipeId,AuthorId,Rating
0,826743,3745,345380,4
1,1247176,26217,406131,1
2,1250914,17123,355582,5
3,183560,123283,58104,4
4,1255493,110139,383795,5


## 3. Initialize and Configure the SVD Model
We initialize the SVD (Singular Value Decomposition) model from Cornac with specified hyperparameters such as the number of latent factors, learning rate, and regularization.

In [4]:
# 1. Initialize and configure the SVD model
print("=== SVD Matrix Factorization Implementation ===")

# Basic SVD model with default parameters
svd_model = SVD(
    k=50,           # Number of latent factors
    max_iter=100,   # Maximum number of iterations
    learning_rate=0.01,  # Learning rate for SGD
    lambda_reg=0.02,     # Regularization parameter
    verbose=True,        # Print training progress
    seed=123            # Random seed for reproducibility
)

print(f"SVD Model Configuration:")
print(f"- Latent factors (k): {svd_model.k}")
print(f"- Max iterations: {svd_model.max_iter}")
print(f"- Learning rate: {svd_model.learning_rate}")
print(f"- Regularization: {svd_model.lambda_reg}")

=== SVD Matrix Factorization Implementation ===
SVD Model Configuration:
- Latent factors (k): 50
- Max iterations: 100
- Learning rate: 0.01
- Regularization: 0.02


## 4. Data Preparation
We ensure the data is in the correct format for Cornac, converting the DataFrame to a list of (user, item, rating) tuples if necessary. This step also prints basic statistics about the dataset.

In [5]:
# 2. Prepare the data correctly
print("\n=== Data Preparation ===")
try:
    # Fix the data structure - use correct columns: AuthorId as user, RecipeId as item, Rating as rating
    if isinstance(train_dataset, pd.DataFrame):
        print("Original data structure:")
        print(train_dataset.head())
        print(f"Columns: {train_dataset.columns.tolist()}")
        
        # Create correct format: (user_id, item_id, rating)
        # Use AuthorId as user_id, RecipeId as item_id, Rating as rating
        train_tuples = [(row.AuthorId, row.RecipeId, row.Rating) 
                       for row in train_dataset.itertuples(index=False)]
        
        train_data = Dataset.from_uir(train_tuples, seed=123)
        
        print(f"Training data shape: {len(train_data.user_ids)} users, {len(train_data.item_ids)} items")
        print(f"Number of ratings: {train_data.num_ratings}")
        print(f"Rating scale: {train_data.min_rating} to {train_data.max_rating}")
        
        # Do the same for test data if available
        if 'test_dataset' in locals() and isinstance(test_dataset, pd.DataFrame):
            test_tuples = [(row.AuthorId, row.RecipeId, row.Rating) 
                          for row in test_dataset.itertuples(index=False)]
            test_data = Dataset.from_uir(test_tuples, seed=123)
            print(f"Test data: {len(test_data.user_ids)} users, {len(test_data.item_ids)} items")
    else:
        train_data = train_dataset
        
except Exception as e:
    print(f"Data preparation error: {e}")
    raise


=== Data Preparation ===
Original data structure:
   ReviewId  RecipeId  AuthorId  Rating
0    826743      3745    345380       4
1   1247176     26217    406131       1
2   1250914     17123    355582       5
3    183560    123283     58104       4
4   1255493    110139    383795       5
Columns: ['ReviewId', 'RecipeId', 'AuthorId', 'Rating']
Training data shape: 17748 users, 39057 items
Number of ratings: 513384
Rating scale: 0.0 to 5.0
Test data: 16131 users, 34075 items


## 5. Train the SVD Model
We fit the SVD model to the training data and check the learned user and item factor matrices for NaN values and correct shapes.

In [6]:
# 3. Train the SVD model
print("\n=== Training SVD Model ===")
try:
    svd_model.fit(train_data)
    print("SVD model training completed successfully!")
    
    # Access the learned matrices using correct attributes
    print(f"User factors matrix shape: {svd_model.u_factors.shape}")
    print(f"Item factors matrix shape: {svd_model.i_factors.shape}")
    
    # Check for NaN values in the factors
    user_nan_count = np.isnan(svd_model.u_factors).sum()
    item_nan_count = np.isnan(svd_model.i_factors).sum()
    print(f"NaN values in user factors: {user_nan_count}")
    print(f"NaN values in item factors: {item_nan_count}")
    
except Exception as e:
    print(f"Training error: {e}")
    raise


=== Training SVD Model ===


  0%|          | 0/100 [00:00<?, ?it/s]

Optimization finished!
SVD model training completed successfully!
User factors matrix shape: (17748, 50)
Item factors matrix shape: (39057, 50)
NaN values in user factors: 0
NaN values in item factors: 0


## 6. Generate Predictions and Recommendations
We define utility functions to generate top-k recommendations for a user and to predict ratings for specific user-item pairs. We test these functions with sample users from the dataset.

In [7]:
# 4. Generate predictions and recommendations (Fixed)
print("\n=== Generating Recommendations ===")

def get_user_recommendations_fixed(model, user_id, train_data, k=5):
    """Generate top-k recommendations for a user - Fixed version"""
    try:
        # Check if user exists in training data
        if user_id not in train_data.user_ids:
            print(f"User {user_id} not found in training data")
            return None
            
        # Get user index
        user_idx = train_data.uid_map[user_id]
        
        # Get all items the user hasn't rated
        user_items = set(train_data.matrix[user_idx].indices)
        all_items = set(range(len(train_data.item_ids)))
        candidate_items = all_items - user_items
        
        # Score candidate items
        item_scores = []
        for item_idx in candidate_items:
            try:
                item_id = train_data.item_ids[item_idx]
                score = model.score(user_id, item_id)
                if not np.isnan(score):
                    item_scores.append((item_id, score))
            except:
                continue
        
        # Sort by score and return top-k
        item_scores.sort(key=lambda x: x[1], reverse=True)
        return item_scores[:k]
        
    except Exception as e:
        print(f"Recommendation error for user {user_id}: {e}")
        return None

def predict_rating_fixed(model, user_id, item_id, train_data):
    """Predict rating for a specific user-item pair - Fixed version"""
    try:
        # Check if user and item exist
        if user_id not in train_data.user_ids:
            return None
        if item_id not in train_data.item_ids:
            return None
            
        prediction = model.score(user_id, item_id)
        return prediction if not np.isnan(prediction) else None
    except Exception as e:
        print(f"Prediction error for user {user_id}, item {item_id}: {e}")
        return None

# Test with actual users from the dataset
try:
    # Get some sample user IDs (AuthorIds) from your data
    sample_users = list(train_data.user_ids)[:3]
    
    for user_id in sample_users:
        print(f"\nTop-5 recommendations for user {user_id}:")
        recommendations = get_user_recommendations_fixed(svd_model, user_id, train_data, k=5)
        
        if recommendations:
            for i, (item_id, score) in enumerate(recommendations, 1):
                print(f"  {i}. Recipe {item_id}: Score {score:.3f}")
        else:
            print("  No recommendations available")
        
        # Test rating prediction with an item the user has rated
        user_items = train_data.matrix[train_data.uid_map[user_id]].indices
        if len(user_items) > 0:
            test_item_idx = user_items[0]
            test_item_id = train_data.item_ids[test_item_idx]
            predicted_rating = predict_rating_fixed(svd_model, user_id, test_item_id, train_data)
            if predicted_rating is not None:
                print(f"  Predicted rating for recipe {test_item_id}: {predicted_rating:.3f}")

except Exception as e:
    print(f"Example usage error: {e}")



=== Generating Recommendations ===

Top-5 recommendations for user 345380:
  1. Recipe 23089: Score 5.524
  2. Recipe 17874: Score 5.411
  3. Recipe 29124: Score 5.347
  4. Recipe 1631: Score 5.310
  5. Recipe 31005: Score 5.290
  Predicted rating for recipe 3745: 4.846

Top-5 recommendations for user 406131:
  1. Recipe 23089: Score 5.524
  2. Recipe 17874: Score 5.411
  3. Recipe 29124: Score 5.347
  4. Recipe 1631: Score 5.310
  5. Recipe 31005: Score 5.290
  Predicted rating for recipe 26217: 4.682

Top-5 recommendations for user 355582:
  1. Recipe 23089: Score 5.524
  2. Recipe 17874: Score 5.411
  3. Recipe 29124: Score 5.347
  4. Recipe 1631: Score 5.310
  5. Recipe 31005: Score 5.290
  Predicted rating for recipe 17123: 4.760


## 7. Model Evaluation
We evaluate the trained SVD model using standard metrics such as MAE, RMSE, Recall@20, and NDCG@20. The evaluation uses Cornac's BaseMethod.from_splits for a fair split between train and test data.

In [None]:
# 5. Model evaluation
print("\n=== Model Evaluation ===")

try:
    # Convert train/test data to UIR tuples format
    train_tuples = [(row.AuthorId, row.RecipeId, row.Rating) 
                   for row in train_dataset.itertuples(index=False)]
    test_tuples = [(row.AuthorId, row.RecipeId, row.Rating) 
                  for row in test_dataset.itertuples(index=False)]
    
    # Create evaluation method using from_splits()
    eval_method = BaseMethod.from_splits(
        train_data=train_tuples,
        test_data=test_tuples,
        exclude_unknowns=True,
        verbose=True,
        seed=123
    )
    
    # Define evaluation metrics
    metrics_list = [
        met.MAE(),
        met.RMSE(),
        met.Recall(k=20),
        met.NDCG(k=20),
        met.Precision(k=20),
    ]
    
    # Evaluate the model
    print("Running evaluation...")
    result, _ = eval_method.evaluate(  # Unpack the tuple
        model=svd_model, 
        metrics=metrics_list, 
        user_based=True,
    )
    
    # Print results
    print("\n=== Evaluation Results ===")
    # Print the entire result summary
    print(result)

except Exception as e:
    print(f"Evaluation error: {e}")
    raise

SyntaxError: invalid syntax. Perhaps you forgot a comma? (1901396450.py, line 25)

## 8. Analyze SVD Factors
We analyze the learned user and item factors, checking for NaN values and reporting the norms of the factor vectors to ensure the model has learned meaningful representations.

In [None]:
# 6. Utility functions for analysis (Fixed)
def analyze_svd_factors_fixed(model):
    """Analyze the learned factors - Fixed version"""
    if hasattr(model, 'u_factors') and hasattr(model, 'i_factors'):
        print(f"\nSVD Factor Analysis:")
        print(f"User factors shape: {model.u_factors.shape}")
        print(f"Item factors shape: {model.i_factors.shape}")
        
        # Check for NaN values
        user_nan_count = np.isnan(model.u_factors).sum()
        item_nan_count = np.isnan(model.i_factors).sum()
        
        if user_nan_count > 0 or item_nan_count > 0:
            print(f"Warning: Found {user_nan_count} NaN values in user factors")
            print(f"Warning: Found {item_nan_count} NaN values in item factors")
        else:
            # Calculate norms only if no NaN values
            user_factor_norms = np.linalg.norm(model.u_factors, axis=1)
            item_factor_norms = np.linalg.norm(model.i_factors, axis=1)
            
            print(f"Average user factor norm: {np.mean(user_factor_norms):.4f}")
            print(f"Average item factor norm: {np.mean(item_factor_norms):.4f}")
            print(f"User factor norm range: [{np.min(user_factor_norms):.4f}, {np.max(user_factor_norms):.4f}]")
            print(f"Item factor norm range: [{np.min(item_factor_norms):.4f}, {np.max(item_factor_norms):.4f}]")
        
        return True
    return False

# Analyze the trained model
analyze_svd_factors_fixed(svd_model)


SVD Factor Analysis:
User factors shape: (17748, 50)
Item factors shape: (39057, 50)
Average user factor norm: 1.1720
Average item factor norm: 0.9435
User factor norm range: [0.0595, 3.2946]
Item factor norm range: [0.0537, 3.4499]


True

## 9. Dataset Statistics
We print summary statistics about the dataset, including the number of users, items, ratings, sparsity, and average ratings per user/item.

In [None]:
# 7. Data statistics
print("\n=== Dataset Statistics ===")
print(f"Number of users: {len(train_data.user_ids)}")
print(f"Number of items: {len(train_data.item_ids)}")
print(f"Number of ratings: {train_data.num_ratings}")
print(f"Sparsity: {1 - (train_data.num_ratings / (len(train_data.user_ids) * len(train_data.item_ids))):.4f}")
print(f"Average ratings per user: {train_data.num_ratings / len(train_data.user_ids):.2f}")
print(f"Average ratings per item: {train_data.num_ratings / len(train_data.item_ids):.2f}")


=== Dataset Statistics ===
Number of users: 17748
Number of items: 39057
Number of ratings: 513384
Sparsity: 0.9993
Average ratings per user: 28.93
Average ratings per item: 13.14
