# NCF Movie Recommender - Model Inference on Colab

This notebook demonstrates how to use trained NCF/NeuMF+ models for movie recommendations.

**Features:**
- Load trained models from Google Drive
- Generate top-K movie recommendations for users
- Predict scores for user-item pairs
- Handle cold-start scenarios (new movies)
- Display movie titles and genres

## 1. Setup - Mount Google Drive

This notebook expects your trained models and data to be in Google Drive.

**Required structure in Google Drive:**
```
MyDrive/
‚îî‚îÄ‚îÄ NCF-Movie-Recommender/
    ‚îú‚îÄ‚îÄ data/                    # Processed data files
    ‚îÇ   ‚îú‚îÄ‚îÄ mappings.pkl         # User/item mappings
    ‚îÇ   ‚îú‚îÄ‚îÄ item_synopsis_embeddings.npy
    ‚îÇ   ‚îî‚îÄ‚îÄ ...
    ‚îú‚îÄ‚îÄ datasets/                # Raw datasets
    ‚îÇ   ‚îî‚îÄ‚îÄ movies_metadata.csv  # Movie titles and info
    ‚îî‚îÄ‚îÄ experiments/
        ‚îî‚îÄ‚îÄ trained_models/      # Trained model checkpoints
            ‚îú‚îÄ‚îÄ NeuMFPlus_genre_synopsis_best.pt
            ‚îî‚îÄ‚îÄ ...
```

In [57]:
# @title Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

print("‚úÖ Google Drive mounted successfully!")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úÖ Google Drive mounted successfully!


## 2. Configure Paths

Update these paths to match your Google Drive structure.

In [58]:
# @title Configure paths
import os

# @markdown **Base path for NCF-Movie-Recommender project:**
GDRIVE_BASE = "/content/drive/MyDrive/NCF-Movie-Recommender"  # @param {type:"string"}

# Paths relative to your Google Drive base
DATA_DIR = os.path.join(GDRIVE_BASE, "data")
DATASETS_DIR = os.path.join(GDRIVE_BASE, "datasets")
MODELS_DIR = os.path.join(GDRIVE_BASE, "experiments", "trained_models")

print(f"üìÅ Base directory: {GDRIVE_BASE}")
print(f"üìÅ Data directory: {DATA_DIR}")
print(f"üìÅ Datasets directory: {DATASETS_DIR}")
print(f"üìÅ Models directory: {MODELS_DIR}")

# Verify directories exist
if os.path.exists(DATA_DIR):
    data_files = os.listdir(DATA_DIR)
    print(f"\n‚úÖ Data directory found! Files: {len(data_files)}")
else:
    print(f"\n‚ùå Data directory not found: {DATA_DIR}")

if os.path.exists(DATASETS_DIR):
    datasets_files = os.listdir(DATASETS_DIR)
    print(f"‚úÖ Datasets directory found! Files: {len(datasets_files)}")
else:
    print(f"‚ùå Datasets directory not found: {DATASETS_DIR}")

if os.path.exists(MODELS_DIR):
    model_files = [f for f in os.listdir(MODELS_DIR) if f.endswith('.pt')]
    print(f"‚úÖ Models directory found! Checkpoints: {len(model_files)}")
    if model_files:
        print("\nAvailable models:")
        for f in sorted(model_files):
            print(f"  ‚Ä¢ {f}")
else:
    print(f"\n‚ùå Models directory not found: {MODELS_DIR}")

üìÅ Base directory: /content/drive/MyDrive/NCF-Movie-Recommender
üìÅ Data directory: /content/drive/MyDrive/NCF-Movie-Recommender/data
üìÅ Datasets directory: /content/drive/MyDrive/NCF-Movie-Recommender/datasets
üìÅ Models directory: /content/drive/MyDrive/NCF-Movie-Recommender/experiments/trained_models

‚úÖ Data directory found! Files: 14
‚úÖ Datasets directory found! Files: 4
‚úÖ Models directory found! Checkpoints: 5

Available models:
  ‚Ä¢ NeuMFPlus_best.pt
  ‚Ä¢ NeuMFPlus_best_best.pt
  ‚Ä¢ NeuMFPlus_genre_best.pt
  ‚Ä¢ NeuMFPlus_genre_synopsis_bestt.pt
  ‚Ä¢ NeuMF_best.pt


## 3. Install Dependencies

Install required Python packages.

In [59]:
# @title Install dependencies
!pip install -q torch numpy pandas sentence-transformers tqdm

import torch
import numpy as np
import pandas as pd
import pickle
from typing import Dict, List, Optional

print("‚úÖ Dependencies installed!")
print(f"   PyTorch: {torch.__version__}")
print(f"   CUDA available: {torch.cuda.is_available()}")

# Set device
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"   Using device: {DEVICE}")

‚úÖ Dependencies installed!
   PyTorch: 2.9.0+cpu
   CUDA available: False
   Using device: cpu


## 4. Define Model Architecture

This section defines the NeuMF+ model architecture to match your trained checkpoints.

In [65]:
# @title Define NeuMF+ Model (matching trained checkpoints)
import torch.nn as nn

class ContentEncoder(nn.Module):
    """Encode content features (genre + synopsis) into embeddings."""

    def __init__(self, num_genres: int, genre_embed_dim: int = 64,
                 synopsis_embed_dim: int = 384, content_embed_dim: int = 256,
                 dropout: float = 0.1):
        super().__init__()

        self.genre_encoder = nn.Sequential(
            nn.Linear(num_genres, genre_embed_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
        )

        self.synopsis_projection = nn.Sequential(
            nn.Linear(synopsis_embed_dim, synopsis_embed_dim // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
        )

        combined_dim = genre_embed_dim + synopsis_embed_dim // 2
        self.content_encoder = nn.Sequential(
            nn.Linear(combined_dim, content_embed_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
        )

    def forward(self, genre_features, synopsis_embeddings):
        genre_embed = self.genre_encoder(genre_features)
        synopsis_embed = self.synopsis_projection(synopsis_embeddings)
        combined = torch.cat([genre_embed, synopsis_embed], dim=-1)
        return self.content_encoder(combined)


class GatedFusion(nn.Module):
    """Gated fusion for CF and content embeddings."""

    def __init__(self, cf_dim: int, content_dim: int, hidden_dim: int = 64, dropout: float = 0.1):
        super().__init__()

        self.gate_network = nn.Sequential(
            nn.Linear(cf_dim + content_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, 1),
            nn.Sigmoid(),
        )

    def forward(self, cf_embed, content_embed):
        combined = torch.cat([cf_embed, content_embed], dim=-1)
        gate = self.gate_network(combined)

        if cf_embed.shape[-1] != content_embed.shape[-1]:
            if cf_embed.shape[-1] > content_embed.shape[-1]:
                target_dim = cf_embed.shape[-1]
                if not hasattr(self, '_content_proj'):
                    self._content_proj = nn.Linear(content_embed.shape[-1], target_dim).to(cf_embed.device)
                content_embed = self._content_proj(content_embed)
            else:
                target_dim = content_embed.shape[-1]
                if not hasattr(self, '_cf_proj'):
                    self._cf_proj = nn.Linear(cf_embed.shape[-1], target_dim).to(cf_embed.device)
                cf_embed = self._cf_proj(cf_embed)

        fused = gate * cf_embed + (1 - gate) * content_embed
        return fused, gate


class NeuMFPlus(nn.Module):
    """NeuMF+ with Genre, Synopsis, and Gated Fusion."""

    def __init__(
        self,
        num_users: int,
        num_items: int,
        num_genres: int,
        # CF (NeuMF) parameters
        embedding_dim: int = 32,
        gmf_hidden_dim: int = 8,
        mlp_hidden_dims: list = None,
        mlp_dropout: float = 0.2,
        fusion_dim: int = 32,
        # Content encoder parameters
        genre_embed_dim: int = 64,
        synopsis_embed_dim: int = 384,
        content_embed_dim: int = 256,
        content_encoder_dropout: float = 0.1,
        # Gated fusion parameters
        gated_fusion_hidden_dim: int = 64,
        gated_fusion_dropout: float = 0.1,
        # Output parameters
        output_hidden_dim: int = 64,
        output_dropout: float = 0.2,
        # Ablation study flags
        use_genre: bool = True,
        use_synopsis: bool = True,
        use_gated_fusion: bool = True,
    ):
        super().__init__()

        self.num_users = num_users
        self.num_items = num_items
        self.num_genres = num_genres
        self.embedding_dim = embedding_dim
        self.use_genre = use_genre
        self.use_synopsis = use_synopsis
        self.use_gated_fusion = use_gated_fusion
        self.synopsis_embed_dim = synopsis_embed_dim
        self.content_embed_dim = content_embed_dim

        # Calculate content dimensions
        if use_genre and use_synopsis:
            self.content_encoder = ContentEncoder(
                num_genres=num_genres,
                genre_embed_dim=genre_embed_dim,
                synopsis_embed_dim=synopsis_embed_dim,
                content_embed_dim=content_embed_dim,
                dropout=content_encoder_dropout,
            )
            actual_content_dim = content_embed_dim
        elif use_genre:
            self.genre_encoder = nn.Sequential(
                nn.Linear(num_genres, genre_embed_dim),
                nn.ReLU(),
                nn.Dropout(content_encoder_dropout),
            )
            actual_content_dim = genre_embed_dim
        elif use_synopsis:
            self.synopsis_projection = nn.Sequential(
                nn.Linear(synopsis_embed_dim, synopsis_embed_dim // 2),
                nn.ReLU(),
                nn.Dropout(content_encoder_dropout),
            )
            actual_content_dim = synopsis_embed_dim // 2
        else:
            actual_content_dim = 0

        # CF (NeuMF) branch - separate embeddings for GMF and MLP
        self.gmf_user_embedding = nn.Embedding(num_users, embedding_dim)
        self.gmf_item_embedding = nn.Embedding(num_items, embedding_dim)
        self.mlp_user_embedding = nn.Embedding(num_users, embedding_dim)
        self.mlp_item_embedding = nn.Embedding(num_items, embedding_dim)

        # Initialize embeddings
        nn.init.xavier_uniform_(self.gmf_user_embedding.weight)
        nn.init.xavier_uniform_(self.gmf_item_embedding.weight)
        nn.init.xavier_uniform_(self.mlp_user_embedding.weight)
        nn.init.xavier_uniform_(self.mlp_item_embedding.weight)

        # GMF branch
        self.gmf_fc = nn.Linear(embedding_dim, gmf_hidden_dim)

        # MLP branch
        mlp_hidden_dims = mlp_hidden_dims or [128, 64, 32]
        mlp_input_dim = 2 * embedding_dim
        self.mlp_layers = nn.ModuleList()
        self.mlp_dropout_layers = nn.ModuleList()

        prev_dim = mlp_input_dim
        for hidden_dim in mlp_hidden_dims:
            self.mlp_layers.append(nn.Linear(prev_dim, hidden_dim))
            self.mlp_dropout_layers.append(nn.Dropout(mlp_dropout))
            prev_dim = hidden_dim

        # NeuMF fusion layer
        self.neumf_fusion_fc = nn.Linear(gmf_hidden_dim + prev_dim, fusion_dim)
        self.neumf_fusion_dropout = nn.Dropout(0.1)
        self.cf_output_dim = fusion_dim

        # Content fusion
        if actual_content_dim > 0:
            if use_gated_fusion:
                self.gated_fusion = GatedFusion(
                    cf_dim=self.cf_output_dim,
                    content_dim=actual_content_dim,
                    hidden_dim=gated_fusion_hidden_dim,
                    dropout=gated_fusion_dropout,
                )
                self.final_input_dim = max(self.cf_output_dim, actual_content_dim)
            else:
                self.final_input_dim = self.cf_output_dim + actual_content_dim
        else:
            self.final_input_dim = self.cf_output_dim

        # Output layers
        self.output_fc = nn.Sequential(
            nn.Linear(self.final_input_dim, output_hidden_dim),
            nn.ReLU(),
            nn.Dropout(output_dropout),
            nn.Linear(output_hidden_dim, 1),
        )

        self._init_weights()

    def _init_weights(self) -> None:
        """Initialize weights."""
        if self.use_genre and not self.use_synopsis:
            if hasattr(self, 'genre_encoder'):
                for module in self.genre_encoder:
                    if isinstance(module, nn.Linear):
                        nn.init.xavier_uniform_(module.weight)
                        if module.bias is not None:
                            nn.init.zeros_(module.bias)

        if self.use_synopsis and not self.use_genre:
            if hasattr(self, 'synopsis_projection'):
                for module in self.synopsis_projection:
                    if isinstance(module, nn.Linear):
                        nn.init.xavier_uniform_(module.weight)
                        if module.bias is not None:
                            nn.init.zeros_(module.bias)

        nn.init.xavier_uniform_(self.gmf_fc.weight)
        nn.init.zeros_(self.gmf_fc.bias)

        for layer in self.mlp_layers:
            nn.init.xavier_uniform_(layer.weight)
            nn.init.zeros_(layer.bias)

        nn.init.xavier_uniform_(self.neumf_fusion_fc.weight)
        nn.init.zeros_(self.neumf_fusion_fc.bias)

        for module in self.output_fc:
            if isinstance(module, nn.Linear):
                nn.init.xavier_uniform_(module.weight)
                if module.bias is not None:
                    nn.init.zeros_(module.bias)

    def forward(
        self,
        user_ids: torch.Tensor,
        item_ids: torch.Tensor,
        genre_features: torch.Tensor = None,
        synopsis_embeddings: torch.Tensor = None,
        return_gate: bool = False,
    ) -> torch.Tensor:
        """Forward pass of NeuMF+."""
        # GMF
        gmf_user_embed = self.gmf_user_embedding(user_ids)
        gmf_item_embed = self.gmf_item_embedding(item_ids)
        gmf_output = gmf_user_embed * gmf_item_embed
        gmf_hidden = self.gmf_fc(gmf_output)

        # MLP
        mlp_user_embed = self.mlp_user_embedding(user_ids)
        mlp_item_embed = self.mlp_item_embedding(item_ids)
        mlp_concat = torch.cat([mlp_user_embed, mlp_item_embed], dim=-1)

        x = mlp_concat
        for layer, dropout in zip(self.mlp_layers, self.mlp_dropout_layers):
            x = layer(x)
            x = torch.relu(x)
            x = dropout(x)

        mlp_hidden = x

        # NeuMF fusion
        neumf_input = torch.cat([gmf_hidden, mlp_hidden], dim=-1)
        cf_embed = self.neumf_fusion_fc(neumf_input)
        cf_embed = torch.relu(cf_embed)
        cf_embed = self.neumf_fusion_dropout(cf_embed)

        # Content Branch
        content_embed = None
        if self.use_genre and self.use_synopsis:
            if genre_features is None or synopsis_embeddings is None:
                batch_size = user_ids.size(0)
                device = user_ids.device
                if genre_features is None:
                    genre_features = torch.zeros(batch_size, self.num_genres, device=device)
                if synopsis_embeddings is None:
                    synopsis_embeddings = torch.zeros(batch_size, self.synopsis_embed_dim, device=device)
            content_embed = self.content_encoder(genre_features, synopsis_embeddings)
        elif self.use_genre:
            if genre_features is None:
                batch_size = user_ids.size(0)
                device = user_ids.device
                genre_features = torch.zeros(batch_size, self.num_genres, device=device)
            content_embed = self.genre_encoder(genre_features)
        elif self.use_synopsis:
            if synopsis_embeddings is None:
                batch_size = user_ids.size(0)
                device = user_ids.device
                synopsis_embeddings = torch.zeros(batch_size, self.synopsis_embed_dim, device=device)
            content_embed = self.synopsis_projection(synopsis_embeddings)

        # Fusion
        if content_embed is not None:
            if self.use_gated_fusion:
                final_embed, gate = self.gated_fusion(cf_embed, content_embed)
            else:
                final_embed = torch.cat([cf_embed, content_embed], dim=-1)
                gate = None
        else:
            final_embed = cf_embed
            gate = None

        # Output
        output = self.output_fc(final_embed)

        if return_gate:
            return output, gate
        return output

    @classmethod
    def load(cls, checkpoint_path: str, device: str = 'cuda'):
        """Load model from checkpoint."""
        checkpoint = torch.load(checkpoint_path, map_location=device, weights_only=False)
        config = checkpoint['model_config']

        model = cls(
            num_users=config['num_users'],
            num_items=config['num_items'],
            num_genres=config['num_genres'],
            use_genre=config['use_genre'],
            use_synopsis=config['use_synopsis'],
            use_gated_fusion=config['use_gated_fusion'],
        )

        # Load state_dict with strict=False to handle version mismatches
        model.load_state_dict(checkpoint['model_state_dict'], strict=False)
        model = model.to(device)

        return model, checkpoint

print("‚úÖ Model architecture defined!")

‚úÖ Model architecture defined!


## 5. Load Data and Mappings

Load the processed data files including mappings, genre features, and movie metadata.

In [66]:
# @title Load mappings and features

# Load mappings
mappings_path = os.path.join(DATA_DIR, "mappings.pkl")
with open(mappings_path, 'rb') as f:
    mappings = pickle.load(f)

NUM_USERS = mappings['num_users']
NUM_ITEMS = mappings['num_items']
NUM_GENRES = mappings['num_genres']
GENRE_NAMES = mappings.get('genre_names', [])

print(f"‚úÖ Mappings loaded!")
print(f"   Users: {NUM_USERS:,}")
print(f"   Items: {NUM_ITEMS:,}")
print(f"   Genres: {NUM_GENRES}")
if GENRE_NAMES:
    print(f"   Genre names: {GENRE_NAMES}")

# Load genre features (if available)
# Note: This file may not exist - genre features can be generated from movies_metadata.csv
genre_path = os.path.join(DATA_DIR, "item_genre_features.npy")
if os.path.exists(genre_path):
    GENRE_FEATURES = np.load(genre_path)
    print(f"\n‚úÖ Genre features loaded: {GENRE_FEATURES.shape}")
else:
    GENRE_FEATURES = None
    print(f"\n‚ö†Ô∏è  Genre features not found: {genre_path}")
    print(f"   Models that require genre features will use zero vectors.")

# Load synopsis embeddings
# Note: File uses plural "embeddings"
synopsis_path = os.path.join(DATA_DIR, "item_synopsis_embeddings.npy")
if os.path.exists(synopsis_path):
    SYNOPSIS_EMBEDDINGS = np.load(synopsis_path)
    print(f"‚úÖ Synopsis embeddings loaded: {SYNOPSIS_EMBEDDINGS.shape}")
else:
    SYNOPSIS_EMBEDDINGS = None
    print(f"‚ö†Ô∏è  Synopsis embeddings not found: {synopsis_path}")

# Load movie metadata for display (from datasets directory)
metadata_path = os.path.join(DATASETS_DIR, "movies_metadata.csv")
if os.path.exists(metadata_path):
    movies_df = pd.read_csv(metadata_path, low_memory=False)
    # Filter for valid IDs
    movies_df['id'] = pd.to_numeric(movies_df['id'], errors='coerce')
    movies_df = movies_df[movies_df['id'].notna()]
    movies_df['id'] = movies_df['id'].astype(int)
    movies_df = movies_df.set_index('id')
    print(f"\n‚úÖ Movie metadata loaded: {len(movies_df):,} movies")
else:
    movies_df = None
    print(f"\n‚ö†Ô∏è  Movie metadata not found: {metadata_path}")

# Summary
print("\n" + "="*70)
print("DATA LOADING SUMMARY")
print("="*70)
print(f"Genre features available: {'‚úÖ Yes' if GENRE_FEATURES is not None else '‚ùå No'}")
print(f"Synopsis embeddings available: {'‚úÖ Yes' if SYNOPSIS_EMBEDDINGS is not None else '‚ùå No'}")
print(f"Movie metadata available: {'‚úÖ Yes' if movies_df is not None else '‚ùå No'}")

if GENRE_FEATURES is None:
    print("\n‚ö†Ô∏è  NOTE: Genre features file (item_genre_features.npy) not found.")
    print("   If you're using a model that requires genre features,")
    print("   the model will automatically use zero vectors for missing features.")

‚úÖ Mappings loaded!
   Users: 256,107
   Items: 27,127
   Genres: 20

‚úÖ Genre features loaded: (27127, 20)
‚úÖ Synopsis embeddings loaded: (27127, 384)

‚úÖ Movie metadata loaded: 45,463 movies

DATA LOADING SUMMARY
Genre features available: ‚úÖ Yes
Synopsis embeddings available: ‚úÖ Yes
Movie metadata available: ‚úÖ Yes


## 6. Load Trained Model

Select and load one of your trained models.

**Available Models:**
| Model | Description | Features |
|-------|-------------|----------|
| `NeuMF_best.pt` | Baseline | Collaborative Filtering only |
| `NeuMFPlus_genre_best.pt` | Genre-enhanced | CF + Genre features |
| `NeuMFPlus_genre_synopsis_bestt.pt` | Full model | CF + Genre + Synopsis |

In [67]:
# @title Load trained model
# @markdown Select the model checkpoint to load:

import ipywidgets as widgets
from IPython.display import display, HTML

# Get available models
available_models = [f for f in os.listdir(MODELS_DIR) if f.endswith('.pt')]

# Model descriptions
MODEL_INFO = {
    'NeuMF_best.pt': {
        'name': 'NeuMF (Baseline)',
        'description': 'Collaborative Filtering only - no content features',
        'features': 'User-Item interactions only'
    },
    'NeuMFPlus_genre_best.pt': {
        'name': 'NeuMF+ (Genre)',
        'description': 'CF + Genre features',
        'features': 'User-Item + Movie genres'
    },
    'NeuMFPlus_genre_synopsis_bestt.pt': {
        'name': 'NeuMF+ (Genre + Synopsis)',
        'description': 'CF + Genre + Synopsis features (Full Model)',
        'features': 'User-Item + Genres + Movie synopsis'
    }
}

if not available_models:
    print(f"‚ùå No models found in {MODELS_DIR}")
else:
    # Create dropdown with model descriptions
    model_options = [(f"{m}  ({MODEL_INFO.get(m, {}).get('name', m)})", m) for m in sorted(available_models)]

    model_dropdown = widgets.Dropdown(
        options=model_options,
        description='Select model:',
        style={'description_width': 'initial'},
    )
    display(model_dropdown)

    # Model info display
    info_out = widgets.Output()
    display(info_out)

    def show_model_info(change):
        with info_out:
            info_out.clear_output()
            model_name = change['new']
            info = MODEL_INFO.get(model_name, {})
            if info:
                print(f"üìã {info.get('name', model_name)}")
                print(f"   {info.get('description', '')}")
                print(f"   Features: {info.get('features', 'N/A')}")

    model_dropdown.observe(show_model_info, names='value')
    # Show initial info
    show_model_info({'new': model_dropdown.value})

    # Load button
    load_btn = widgets.Button(description='Load Model', button_style='primary')
    display(load_btn)

    # Output area
    out = widgets.Output()
    display(out)

    def load_model(b):
        with out:
            out.clear_output()
            model_name = model_dropdown.value
            checkpoint_path = os.path.join(MODELS_DIR, model_name)

            print(f"Loading model from: {model_name}")
            print(f"Path: {checkpoint_path}")

            global model, checkpoint, model_config
            model, checkpoint = NeuMFPlus.load(checkpoint_path, device=DEVICE)
            model.eval()
            model_config = checkpoint['model_config']

            print("\n" + "="*70)
            print("MODEL CONFIGURATION")
            print("="*70)
            print(f"use_genre: {model_config.get('use_genre')}")
            print(f"use_synopsis: {model_config.get('use_synopsis')}")
            print(f"use_gated_fusion: {model_config.get('use_gated_fusion')}")
            print(f"\nParameters: {sum(p.numel() for p in model.parameters()):,}")

            if 'metrics' in checkpoint:
                print("\nValidation Metrics:")
                for k, v in checkpoint['metrics'].items():
                    if isinstance(v, (int, float)):
                        print(f"  {k}: {v:.4f}")

            # Show what features are needed
            print("\n" + "-"*70)
            print("REQUIRED FEATURES FOR INFERENCE:")
            if model_config.get('use_genre'):
                print("  ‚úÖ Genre features (item_genre_features.npy)")
            else:
                print("  ‚ùå Genre features NOT needed")
            if model_config.get('use_synopsis'):
                print("  ‚úÖ Synopsis embeddings (item_synopsis_embeddings.npy)")
            else:
                print("  ‚ùå Synopsis embeddings NOT needed")
            print("-"*70)

            print("\n‚úÖ Model loaded successfully!")

    load_btn.on_click(load_model)

Dropdown(description='Select model:', options=(('NeuMFPlus_best.pt  (NeuMFPlus_best.pt)', 'NeuMFPlus_best.pt')‚Ä¶

Output()

Button(button_style='primary', description='Load Model', style=ButtonStyle())

Output()

## 7. Helper Functions

Define helper functions for prediction and recommendation.

In [62]:
# @title Define helper functions

def get_movie_title(item_id: int, movies_df: Optional[pd.DataFrame] = None) -> str:
    """Get movie title for item ID."""
    if movies_df is None:
        return f"Movie {item_id}"

    # Try to get title from movies_df
    if item_id in movies_df.index:
        title = movies_df.loc[item_id, 'title']
        return title if pd.notna(title) else f"Movie {item_id}"

    return f"Movie {item_id}"


def parse_genres(genres_str: str) -> list:
    """Parse genres from JSON string."""
    import json
    import ast

    if pd.isna(genres_str) or genres_str == "":
        return []

    try:
        genres = json.loads(genres_str)
        return [g['name'] for g in genres]
    except:
        try:
            genres = ast.literal_eval(genres_str)
            return [g['name'] for g in genres]
        except:
            return []


def get_movie_genres(item_id: int, movies_df: Optional[pd.DataFrame] = None) -> str:
    """Get genres for item ID."""
    if movies_df is None or GENRE_FEATURES is None:
        return "Unknown"

    if item_id in movies_df.index:
        genres_str = movies_df.loc[item_id, 'genres']
        genres = parse_genres(genres_str)
        return ', '.join(genres) if genres else "Unknown"

    # Use genre features if available
    if item_id < len(GENRE_FEATURES):
        genre_indices = [i for i, g in enumerate(GENRE_FEATURES[item_id]) if g == 1]
        if genre_indices and GENRE_NAMES:
            return ', '.join([GENRE_NAMES[i] for i in genre_indices if i < len(GENRE_NAMES)])

    return "Unknown"


def predict_score(model, user_id: int, item_id: int,
                genre_vector: Optional[np.ndarray] = None,
                synopsis_embedding: Optional[np.ndarray] = None,
                device: str = DEVICE) -> float:
    """Predict score for a user-item pair."""
    model.eval()

    user_tensor = torch.LongTensor([user_id]).to(device)
    item_tensor = torch.LongTensor([item_id]).to(device)

    kwargs = {}
    if genre_vector is not None:
        kwargs['genre_features'] = torch.FloatTensor([genre_vector]).to(device)
    if synopsis_embedding is not None:
        kwargs['synopsis_embeddings'] = torch.FloatTensor([synopsis_embedding]).to(device)

    with torch.no_grad():
        logits = model(user_tensor, item_tensor, **kwargs)
        score = torch.sigmoid(logits).squeeze(-1).item()

    return score


def recommend(model, user_id: int, k: int = 10,
             item_genre_features: Optional[np.ndarray] = None,
             item_synopsis_embeddings: Optional[np.ndarray] = None,
             seen_items: Optional[List[int]] = None,
             device: str = DEVICE) -> List[Dict]:
    """Recommend top-K items for a user."""
    model.eval()
    num_items = model.num_items

    candidate_items = list(range(num_items))
    if seen_items is not None:
        candidate_items = [item for item in candidate_items if item not in seen_items]

    user_tensor = torch.LongTensor([user_id] * len(candidate_items)).to(device)
    item_tensor = torch.LongTensor(candidate_items).to(device)

    kwargs = {}
    if item_genre_features is not None:
        kwargs['genre_features'] = torch.FloatTensor(item_genre_features[candidate_items]).to(device)
    if item_synopsis_embeddings is not None:
        kwargs['synopsis_embeddings'] = torch.FloatTensor(item_synopsis_embeddings[candidate_items]).to(device)

    with torch.no_grad():
        logits = model(user_tensor, item_tensor, **kwargs)
        scores = torch.sigmoid(logits).squeeze(-1).cpu().numpy()

    top_indices = np.argsort(scores)[::-1][:k]

    recommendations = []
    for idx in top_indices:
        item_id = int(candidate_items[idx])
        recommendations.append({
            'item_id': item_id,
            'score': float(scores[idx]),
            'rank': len(recommendations) + 1,
            'title': get_movie_title(item_id, movies_df),
            'genres': get_movie_genres(item_id, movies_df),
        })

    return recommendations


def load_multiple_models(model_paths: Dict[str, str], device: str = DEVICE) -> Dict[str, tuple]:
    """Load multiple models for comparison."""
    loaded = {}
    for name, path in model_paths.items():
        try:
            model_obj, checkpoint = NeuMFPlus.load(path, device=device)
            model_obj.eval()
            loaded[name] = (model_obj, checkpoint)
            print(f"‚úÖ Loaded: {name}")
        except Exception as e:
            print(f"‚ùå Failed to load {name}: {e}")
    return loaded

print("‚úÖ Helper functions defined!")

‚úÖ Helper functions defined!


## 7. Generate Genre Features (Optional)

If `item_genre_features.npy` is missing, run this cell to generate it from `movies_metadata.csv`.

In [89]:
# @title Define helper functions (FIXED - with reverse mapping)

# Import reverse mapping from loaded data
reverse_item_map = {v: k for k, v in mappings.get('item_id_map', {}).items()}

def get_movie_title(item_id: int, movies_df: Optional[pd.DataFrame] = None) -> str:
    """Get movie title for internal item ID."""
    if movies_df is None:
        return f"Movie {item_id}"

    # Convert internal item_id to TMDB ID
    tmdb_id = reverse_item_map.get(item_id)
    if tmdb_id is None:
        return f"Movie {item_id}"

    # Try to get title from movies_df using TMDB ID
    if tmdb_id in movies_df.index:
        title = movies_df.loc[tmdb_id, 'title']
        return title if pd.notna(title) else f"Movie {item_id}"

    return f"Movie {item_id}"


def parse_genres(genres_str: str) -> list:
    """Parse genres from JSON string."""
    import json
    import ast

    if pd.isna(genres_str) or genres_str == "":
        return []

    try:
        genres = json.loads(genres_str)
        return [g['name'] for g in genres]
    except:
        try:
            genres = ast.literal_eval(genres_str)
            return [g['name'] for g in genres]
        except:
            return []


def get_movie_genres(item_id: int, movies_df: Optional[pd.DataFrame] = None) -> str:
    """Get genres for internal item ID."""
    global GENRE_FEATURES, GENRE_NAMES

    # First try: Use genre features array (most reliable)
    if GENRE_FEATURES is not None and item_id < len(GENRE_FEATURES):
        genre_indices = [i for i, g in enumerate(GENRE_FEATURES[item_id]) if g == 1]
        if genre_indices and GENRE_NAMES:
            return ', '.join([GENRE_NAMES[i] for i in genre_indices if i < len(GENRE_NAMES)])

    # Second try: Get from movies_df using TMDB ID mapping
    if movies_df is not None:
        tmdb_id = reverse_item_map.get(item_id)
        if tmdb_id is not None and tmdb_id in movies_df.index:
            genres_str = movies_df.loc[tmdb_id, 'genres']
            genres = parse_genres(genres_str)
            if genres:
                return ', '.join(genres)

    return "Unknown"


def predict_score(model, user_id: int, item_id: int,
                genre_vector: Optional[np.ndarray] = None,
                synopsis_embedding: Optional[np.ndarray] = None,
                device: str = DEVICE) -> float:
    """Predict score for a user-item pair."""
    model.eval()

    user_tensor = torch.LongTensor([user_id]).to(device)
    item_tensor = torch.LongTensor([item_id]).to(device)

    kwargs = {}
    if genre_vector is not None:
        kwargs['genre_features'] = torch.FloatTensor([genre_vector]).to(device)
    if synopsis_embedding is not None:
        kwargs['synopsis_embeddings'] = torch.FloatTensor([synopsis_embedding]).to(device)

    with torch.no_grad():
        logits = model(user_tensor, item_tensor, **kwargs)
        score = torch.sigmoid(logits).squeeze(-1).item()

    return score


def recommend(model, user_id: int, k: int = 10,
             item_genre_features: Optional[np.ndarray] = None,
             item_synopsis_embeddings: Optional[np.ndarray] = None,
             seen_items: Optional[List[int]] = None,
             device: str = DEVICE) -> List[Dict]:
    """Recommend top-K items for a user."""
    model.eval()
    num_items = model.num_items

    candidate_items = list(range(num_items))
    if seen_items is not None:
        candidate_items = [item for item in candidate_items if item not in seen_items]

    user_tensor = torch.LongTensor([user_id] * len(candidate_items)).to(device)
    item_tensor = torch.LongTensor(candidate_items).to(device)

    kwargs = {}
    if item_genre_features is not None:
        kwargs['genre_features'] = torch.FloatTensor(item_genre_features[candidate_items]).to(device)
    if item_synopsis_embeddings is not None:
        kwargs['synopsis_embeddings'] = torch.FloatTensor(item_synopsis_embeddings[candidate_items]).to(device)

    with torch.no_grad():
        logits = model(user_tensor, item_tensor, **kwargs)
        scores = torch.sigmoid(logits).squeeze(-1).cpu().numpy()

    top_indices = np.argsort(scores)[::-1][:k]

    recommendations = []
    for idx in top_indices:
        item_id = int(candidate_items[idx])
        recommendations.append({
            'item_id': item_id,
            'score': float(scores[idx]),
            'rank': len(recommendations) + 1,
            'title': get_movie_title(item_id, movies_df),
            'genres': get_movie_genres(item_id, movies_df),
        })

    return recommendations


def load_multiple_models(model_paths: Dict[str, str], device: str = DEVICE) -> Dict[str, tuple]:
    """Load multiple models for comparison."""
    loaded = {}
    for name, path in model_paths.items():
        try:
            model_obj, checkpoint = NeuMFPlus.load(path, device=device)
            model_obj.eval()
            loaded[name] = (model_obj, checkpoint)
            print(f"‚úÖ Loaded: {name}")
        except Exception as e:
            print(f"‚ùå Failed to load {name}: {e}")
    return loaded

print("‚úÖ Helper functions defined with reverse mapping support!")

‚úÖ Helper functions defined with reverse mapping support!


In [91]:
# Step 6: Save
print(f"\n[6/6] Saving to: {genre_output_path}")
os.makedirs(os.path.dirname(genre_output_path), exist_ok=True)
np.save(genre_output_path, genre_features)


[6/6] Saving to: /content/drive/MyDrive/NCF-Movie-Recommender/data/item_genre_features.npy


In [92]:
# @title Get top-K recommendations
# @markdown Enter user ID and number of recommendations:

user_id_rec = 109  # @param {type:"integer"}
k_recommendations = 10  # @param {type:"integer", min:1, max:50}

if user_id_rec >= NUM_USERS:
    print(f"‚ùå Invalid user ID. Must be less than {NUM_USERS}.")
else:
    recommendations = recommend(
        model, user_id_rec, k=k_recommendations,
        item_genre_features=GENRE_FEATURES,
        item_synopsis_embeddings=SYNOPSIS_EMBEDDINGS,
    )

    print("="*70)
    print(f"TOP-{k_recommendations} RECOMMENDATIONS FOR USER {user_id_rec}")
    print("="*70)

    print(f"\n{'Rank':<6} {'Score':<10} {'Title':<50} {'Genres'}")
    print("-" * 100)

    for rec in recommendations:
        title = rec['title'][:47] + '...' if len(rec['title']) > 47 else rec['title']
        print(f"{rec['rank']:<6} {rec['score']:.4f}     {title:<50} {rec['genres']}")

TOP-10 RECOMMENDATIONS FOR USER 109

Rank   Score      Title                                              Genres
----------------------------------------------------------------------------------------------------
1      0.9348     Movie 352                                          Unknown
2      0.9069     Movie 148                                          Unknown
3      0.9032     Movie 2773                                         Unknown
4      0.8915     Movie 293                                          Unknown
5      0.8895     Movie 587                                          Unknown
6      0.8842     Movie 2487                                         Unknown
7      0.8765     Movie 340                                          Unknown
8      0.8745     Movie 2874                                         Unknown
9      0.8706     Movie 315                                          Unknown
10     0.8679     Movie 476                                          Unknown


## 10. Example: Compare Multiple Users

See how different users would rate the same movie.

In [109]:
# @title Compare predictions for multiple users
# @markdown Enter item ID and list of user IDs to compare:

item_id_compare = 500  # @param {type:"integer"}
user_ids_compare = "0, 50, 100, 500, 1000"  # @param {type:"string"}

try:
    user_list = [int(u.strip()) for u in user_ids_compare.split(',')]
except:
    user_list = [0, 50, 100, 500, 1000]

if item_id_compare >= NUM_ITEMS:
    print(f"‚ùå Invalid item ID. Must be less than {NUM_ITEMS}.")
else:
    print("="*70)
    print(f"USER COMPARISON FOR ITEM: {get_movie_title(item_id_compare, movies_df)}")
    print(f"Genres: {get_movie_genres(item_id_compare, movies_df)}")
    print("="*70)

    print(f"\n{'User ID':<12} {'Score':<10} {'Prediction'}")
    print("-" * 40)

    genre_vec = GENRE_FEATURES[item_id_compare] if GENRE_FEATURES is not None else None
    synopsis_emb = SYNOPSIS_EMBEDDINGS[item_id_compare] if SYNOPSIS_EMBEDDINGS is not None else None

    for user_id in user_list:
        if user_id >= NUM_USERS:
            print(f"{user_id:<12} (invalid user)")
            continue

        score = predict_score(model, user_id, item_id_compare, genre_vec, synopsis_emb)

        if score > 0.8:
            prediction = "Will love it!"
        elif score > 0.6:
            prediction = "Will probably like it"
        elif score > 0.4:
            prediction = "Maybe"
        else:
            prediction = "Probably not interested"

        print(f"{user_id:<12} {score:.4f}     {prediction}")

USER COMPARISON FOR ITEM: Movie 500
Genres: Unknown

User ID      Score      Prediction
----------------------------------------
0            0.4600     Maybe
50           0.4628     Maybe
100          0.4573     Maybe
500          0.4614     Maybe
1000         0.4571     Maybe


## 11. Advanced: Cold-Start Prediction for New Movies

Predict how users would rate a completely new movie using only its content features (genres and synopsis).

In [71]:
# @title Cold-start prediction for a new movie
# @markdown Enter movie details for prediction:

new_user_id = 100  # @param {type:"integer"}
new_movie_genres = "Action,Sci-Fi"  # @param {type:"string"}
new_movie_synopsis = "A group of astronauts discover a mysterious artifact on Mars that changes their understanding of humanity's place in the universe."  # @param {type:"string"}

if new_user_id >= NUM_USERS:
    print(f"‚ùå Invalid user ID. Must be less than {NUM_USERS}.")
else:
    from sentence_transformers import SentenceTransformer

    # Load SBERT model for synopsis encoding
    print("Loading Sentence-BERT model...")
    sbert = SentenceTransformer('all-MiniLM-L6-v2')

    # Encode genres
    genre_list = [g.strip() for g in new_movie_genres.split(',')]
    genre_vector = np.zeros(NUM_GENRES, dtype=np.float32)

    if GENRE_NAMES:
        for genre in genre_list:
            if genre in GENRE_NAMES:
                idx = GENRE_NAMES.index(genre)
                genre_vector[idx] = 1.0

    # Encode synopsis
    synopsis_embedding = sbert.encode(new_movie_synopsis, show_progress_bar=False)
    synopsis_embedding = np.array(synopsis_embedding, dtype=np.float32)

    # Use a placeholder item ID (last item as reference)
    placeholder_item_id = NUM_ITEMS - 1

    # Predict
    score = predict_score(
        model, new_user_id, placeholder_item_id,
        genre_vector=genre_vector,
        synopsis_embedding=synopsis_embedding
    )

    print("\n" + "="*70)
    print("COLD-START PREDICTION")
    print("="*70)
    print(f"\nUser ID: {new_user_id}")
    print(f"\nNew Movie:")
    print(f"  Genres: {new_movie_genres}")
    print(f"  Synopsis: {new_movie_synopsis[:100]}...")
    print(f"\n‚úÖ Predicted score: {score:.4f}")

    if score > 0.7:
        print("\nüé¨ This user would likely enjoy this movie!")
    elif score > 0.5:
        print("\nüé¨ This user might be interested in this movie.")
    else:
        print("\nüé¨ This movie may not be a good fit for this user.")

Loading Sentence-BERT model...

COLD-START PREDICTION

User ID: 100

New Movie:
  Genres: Action,Sci-Fi
  Synopsis: A group of astronauts discover a mysterious artifact on Mars that changes their understanding of hum...

‚úÖ Predicted score: 0.4905

üé¨ This movie may not be a good fit for this user.


## 12. Interactive Recommendation Widget

Use this interactive widget to explore recommendations for different users.

In [73]:
# @title Debug: Check user interaction history
# @markdown Enter user ID to check their interaction history:

debug_user_id = 24784  # @param {type:"integer"}

if debug_user_id >= NUM_USERS:
    print(f"‚ùå Invalid user ID. Must be less than {NUM_USERS}.")
else:
    print("="*70)
    print(f"USER ANALYSIS: User {debug_user_id}")
    print("="*70)

    # Check if this user has interaction history
    # Load training data to check
    import pickle

    train_path = os.path.join(DATA_DIR, "train.pkl")
    val_path = os.path.join(DATA_DIR, "val.pkl")

    try:
        with open(train_path, 'rb') as f:
            train_data = pickle.load(f)

        # Count ratings for this user in training set
        user_train_items = train_data['user_item_matrix'][debug_user_id].nonzero()[1]
        train_count = len(user_train_items)

        print(f"\nTraining set:")
        print(f"  Items rated: {train_count}")

        if train_count > 0:
            print(f"  Sample items (first 10): {user_train_items[:10].tolist()}")

            # Get some sample movie titles
            if movies_df is not None and len(user_train_items) > 0:
                print(f"\n  Sample movies rated by this user:")
                for item_id in user_train_items[:5]:
                    title = get_movie_title(item_id, movies_df)
                    genres = get_movie_genres(item_id, movies_df)
                    print(f"    - {title} ({genres})")

    except Exception as e:
        print(f"  Error loading training data: {e}")

    # Check validation set
    try:
        with open(val_path, 'rb') as f:
            val_data = pickle.load(f)

        user_val_items = val_data['user_item_matrix'][debug_user_id].nonzero()[1]
        val_count = len(user_val_items)

        print(f"\nValidation set:")
        print(f"  Items rated: {val_count}")

    except Exception as e:
        print(f"  Error loading validation data: {e}")

    total_interactions = train_count + val_count if 'train_count' in locals() else 0

    print(f"\n{'='*70}")
    print(f"TOTAL INTERACTIONS: {total_interactions}")

    if total_interactions == 0:
        print("\n‚ö†Ô∏è  This user has NO interaction history!")
        print("   ‚Üí Model is using default/popularity-based recommendations")
        print("   ‚Üí Try a user with more interactions (e.g., user 100, 500, 1000)")
    elif total_interactions < 10:
        print("\n‚ö†Ô∏è  This user has very few interactions!")
        print("   ‚Üí Recommendations may not be very personalized")
        print("   ‚Üí Consider using a user with more history")
    else:
        print(f"\n‚úÖ This user has good interaction history ({total_interactions} items)")
        print("   ‚Üí Recommendations should be personalized")

USER ANALYSIS: User 24784


KeyboardInterrupt: 

In [72]:
# @title Find users with good interaction history
# @markdown This will find users with many ratings to test personalized recommendations.

import pickle

print("="*70)
print("FINDING ACTIVE USERS")
print("="*70)

# Load training data
train_path = os.path.join(DATA_DIR, "train.pkl")
with open(train_path, 'rb') as f:
    train_data = pickle.load(f)

# Count interactions per user
user_interaction_counts = []
for user_id in range(train_data['user_item_matrix'].shape[0]):
    count = len(train_data['user_item_matrix'][user_id].nonzero()[1])
    user_interaction_counts.append((user_id, count))

# Sort by interaction count (descending)
user_interaction_counts.sort(key=lambda x: x[1], reverse=True)

print(f"\nTop 20 Most Active Users:")
print(f"{'User ID':<12} {'Ratings':<10} {'Status'}")
print("-" * 40)

for user_id, count in user_interaction_counts[:20]:
    if count > 100:
        status = "Very Active ‚úÖ"
    elif count > 50:
        status = "Active ‚≠ê"
    elif count > 20:
        status = "Moderate"
    else:
        status = "Low"
    print(f"{user_id:<12} {count:<10} {status}")

# Statistics
all_counts = [count for _, count in user_interaction_counts]
print(f"\n{'='*70}")
print("STATISTICS:")
print(f"  Total users: {len(all_counts):,}")
print(f"  Mean ratings/user: {np.mean(all_counts):.1f}")
print(f"  Median ratings/user: {np.median(all_counts):.1f}")
print(f"  Max ratings/user: {np.max(all_counts)}")
print(f"  Min ratings/user: {np.min(all_counts)}")

# Count by activity level
very_active = sum(1 for c in all_counts if c > 100)
active = sum(1 for c in all_counts if c > 50)
moderate = sum(1 for c in all_counts if c > 20)

print(f"\nACTIVITY LEVELS:")
print(f"  Very Active (>100 ratings): {very_active:,} users ({very_active/len(all_counts)*100:.1f}%)")
print(f"  Active (>50 ratings): {active:,} users ({active/len(all_counts)*100:.1f}%)")
print(f"  Moderate (>20 ratings): {moderate:,} users ({moderate/len(all_counts)*100:.1f}%)")

print(f"\nüí° SUGGESTION: Try user IDs from the 'Very Active' list above for personalized recommendations!")
print(f"   Example: User {user_interaction_counts[0][0]} has {user_interaction_counts[0][1]} ratings")

FINDING ACTIVE USERS


KeyboardInterrupt: 