# Sequential Recommendation Models Tutorial

This notebook demonstrates time-aware recommendation using sequential models (RNN/LSTM/Attention).

## Overview
- **Sequential Models**: Capture temporal patterns in user behavior
- **Time-Aware**: Consider when users interacted with items
- **Session-Based**: Model short-term and long-term preferences
- **Next-Item Prediction**: Predict what user will interact with next

In [None]:
# Install requirements
!pip install torch pandas scikit-learn numpy matplotlib seaborn tqdm

In [None]:
import sys
import os
sys.path.append('../src')

import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict
from datetime import datetime

from model import (
    SequentialRecommender, AttentionalSequentialRecommender,
    SessionBasedRecommender
)
from data_loader import SequentialDataLoader
from trainer import SequentialTrainer
from inference import SequentialInference

plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

## 1. Sequential Data Analysis

In [None]:
# Load data
ratings_path = '../../ml-32m/ratings.csv'
movies_path = '../../ml-32m/movies.csv'

if not os.path.exists(ratings_path):
    print(f"Please download MovieLens 32M dataset")
else:
    ratings_df = pd.read_csv(ratings_path)
    movies_df = pd.read_csv(movies_path) if os.path.exists(movies_path) else None
    
    print(f"Loaded {len(ratings_df):,} ratings")
    print(f"Date range: {datetime.fromtimestamp(ratings_df['timestamp'].min())} to {datetime.fromtimestamp(ratings_df['timestamp'].max())}")
    
    ratings_df.head()

In [None]:
# Analyze temporal patterns
ratings_df['datetime'] = pd.to_datetime(ratings_df['timestamp'], unit='s')
ratings_df['hour'] = ratings_df['datetime'].dt.hour
ratings_df['day_of_week'] = ratings_df['datetime'].dt.dayofweek
ratings_df['month'] = ratings_df['datetime'].dt.month

plt.figure(figsize=(15, 10))

# Ratings by hour of day
plt.subplot(2, 3, 1)
hourly_counts = ratings_df['hour'].value_counts().sort_index()
plt.plot(hourly_counts.index, hourly_counts.values)
plt.title('Ratings by Hour of Day')
plt.xlabel('Hour')
plt.ylabel('Number of Ratings')
plt.grid(True, alpha=0.3)

# Ratings by day of week
plt.subplot(2, 3, 2)
daily_counts = ratings_df['day_of_week'].value_counts().sort_index()
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
plt.bar(range(7), daily_counts.values)
plt.title('Ratings by Day of Week')
plt.xlabel('Day')
plt.ylabel('Number of Ratings')
plt.xticks(range(7), days)

# User sequence lengths
plt.subplot(2, 3, 3)
user_counts = ratings_df['userId'].value_counts()
plt.hist(user_counts, bins=50, alpha=0.7)
plt.title('User Sequence Lengths')
plt.xlabel('Number of Ratings')
plt.ylabel('Number of Users')
plt.yscale('log')

# Time gaps between ratings
plt.subplot(2, 3, 4)
sample_user = ratings_df['userId'].iloc[0]
user_data = ratings_df[ratings_df['userId'] == sample_user].sort_values('timestamp')
time_gaps = np.diff(user_data['timestamp']) / 3600  # Convert to hours
plt.hist(time_gaps[time_gaps < 168], bins=50, alpha=0.7)  # Show gaps < 1 week
plt.title(f'Time Gaps Between Ratings (User {sample_user})')
plt.xlabel('Hours')
plt.ylabel('Frequency')

# Rating evolution over time for a user
plt.subplot(2, 3, 5)
plt.plot(range(len(user_data)), user_data['rating'].values, 'o-', alpha=0.7)
plt.title(f'Rating Pattern Over Time (User {sample_user})')
plt.xlabel('Rating Index')
plt.ylabel('Rating')

# Average rating by month
plt.subplot(2, 3, 6)
monthly_avg = ratings_df.groupby('month')['rating'].mean()
plt.plot(monthly_avg.index, monthly_avg.values, 'o-')
plt.title('Average Rating by Month')
plt.xlabel('Month')
plt.ylabel('Average Rating')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nSample user {sample_user} has {len(user_data)} ratings over {(user_data['timestamp'].max() - user_data['timestamp'].min()) / (24*3600):.1f} days")

## 2. Sequential Data Preparation

In [None]:
# Create sequential data loader
data_loader = SequentialDataLoader(
    ratings_path=ratings_path,
    min_interactions=10,   # Users with at least 10 ratings
    min_seq_length=5,      # Sequences with at least 5 items
    max_seq_length=50      # Maximum sequence length
)

print(f"Processed {len(data_loader.user_sequences)} user sequences")

# Analyze sequence statistics
seq_lengths = [len(seq) for seq in data_loader.user_sequences.values()]
print(f"Sequence length stats:")
print(f"  Mean: {np.mean(seq_lengths):.1f}")
print(f"  Median: {np.median(seq_lengths):.1f}")
print(f"  Min: {np.min(seq_lengths)}")
print(f"  Max: {np.max(seq_lengths)}")

plt.figure(figsize=(10, 4))
plt.hist(seq_lengths, bins=50, alpha=0.7)
plt.title('Distribution of User Sequence Lengths')
plt.xlabel('Sequence Length')
plt.ylabel('Number of Users')
plt.axvline(np.mean(seq_lengths), color='red', linestyle='--', label=f'Mean: {np.mean(seq_lengths):.1f}')
plt.legend()
plt.show()

In [None]:
# Create data loaders for training
train_loader, val_loader, test_loader = data_loader.create_data_loaders(
    data_type='sequential',  # or 'session' for session-based modeling
    batch_size=256,
    num_workers=2
)

print(f"Data loaders created:")
print(f"  Training samples: {len(train_loader.dataset)}")
print(f"  Validation samples: {len(val_loader.dataset)}")
print(f"  Test samples: {len(test_loader.dataset)}")

# Get model configuration
model_config = data_loader.get_model_config()
print(f"\nModel config: {model_config}")

## 3. Model Training - Sequential LSTM

In [None]:
# Create sequential recommender with LSTM
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

seq_model = SequentialRecommender(
    num_items=model_config['num_items'],
    embedding_dim=128,
    hidden_dim=256,
    num_layers=2,
    dropout=0.2,
    rnn_type='LSTM'
)

print(f"Sequential model parameters: {sum(p.numel() for p in seq_model.parameters()):,}")
print(seq_model)

In [None]:
# Train sequential model
seq_trainer = SequentialTrainer(
    model=seq_model,
    device=device,
    learning_rate=0.001,
    weight_decay=1e-5
)

print("Training Sequential LSTM model...")
seq_history = seq_trainer.train(
    train_loader=train_loader,
    val_loader=val_loader,
    epochs=10,  # Increase for better performance
    patience=5,
    save_dir='../models'
)

print(f"Sequential training completed! Best validation loss: {seq_trainer.best_val_loss:.4f}")

## 4. Model Training - Attention-Based

In [None]:
# Create attention-based sequential recommender
attention_model = AttentionalSequentialRecommender(
    num_items=model_config['num_items'],
    embedding_dim=128,
    num_heads=8,
    num_blocks=2,
    dropout=0.2,
    max_seq_len=50
)

print(f"Attention model parameters: {sum(p.numel() for p in attention_model.parameters()):,}")

# Train attention model
attention_trainer = SequentialTrainer(
    model=attention_model,
    device=device,
    learning_rate=0.001,
    weight_decay=1e-5
)

print("Training Attention-based model...")
attention_history = attention_trainer.train(
    train_loader=train_loader,
    val_loader=val_loader,
    epochs=8,  # Attention models often converge faster
    patience=5,
    save_dir='../models'
)

print(f"Attention training completed! Best validation loss: {attention_trainer.best_val_loss:.4f}")

In [None]:
# Compare training histories
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
plt.plot(seq_history['val_losses'], label='Sequential LSTM', marker='o')
plt.plot(attention_history['val_losses'], label='Attention', marker='s')
plt.title('Validation Loss Comparison')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 3, 2)
plt.plot(seq_history['val_accuracies'], label='Sequential LSTM', marker='o')
plt.plot(attention_history['val_accuracies'], label='Attention', marker='s')
plt.title('Validation Accuracy Comparison')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 3, 3)
if seq_history['val_hit_rates'] and attention_history['val_hit_rates']:
    plt.plot(seq_history['val_hit_rates'], label='Sequential LSTM', marker='o')
    plt.plot(attention_history['val_hit_rates'], label='Attention', marker='s')
    plt.title('Hit Rate@10 Comparison')
    plt.xlabel('Epoch')
    plt.ylabel('Hit Rate@10')
    plt.legend()
    plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Final Results:")
print(f"  Sequential LSTM - Loss: {seq_trainer.best_val_loss:.4f}")
print(f"  Attention Model - Loss: {attention_trainer.best_val_loss:.4f}")

## 5. Sequential Recommendations

In [None]:
# Save encoders and create inference
data_loader.save_encoders('../models/encoders.pkl')

# Use the better performing model (let's use attention model)
seq_inference = SequentialInference(
    model_path='../models/best_model.pt',
    encoders_path='../models/encoders.pkl',
    device=device
)

print("Sequential inference object created!")

In [None]:
# Get a sample user sequence for demonstration
sample_user_id = list(data_loader.user_sequences.keys())[0]
sample_sequence = data_loader.user_sequences[sample_user_id]

print(f"Sample user has sequence of length: {len(sample_sequence)}")

# Convert encoded sequence back to original movie IDs for display
original_sequence = []
for encoded_item in sample_sequence[:10]:  # Show first 10 items
    try:
        original_item = data_loader.item_encoder.inverse_transform([encoded_item - 1])[0]
        original_sequence.append(original_item)
    except:
        continue

print(f"\nUser's recent movie sequence (first 10):")
if movies_df is not None:
    for i, movie_id in enumerate(original_sequence):
        movie_info = movies_df[movies_df['movieId'] == movie_id]
        if not movie_info.empty:
            title = movie_info.iloc[0]['title']
            print(f"  {i+1}. {title}")
        else:
            print(f"  {i+1}. Movie ID {movie_id}")
else:
    for i, movie_id in enumerate(original_sequence):
        print(f"  {i+1}. Movie ID {movie_id}")

In [None]:
# Predict next items based on this sequence
next_items = seq_inference.predict_next_items(
    sequence=original_sequence,
    top_k=10,
    exclude_seen=True
)

print(f"\nNext item predictions based on sequence:")
for i, (movie_id, probability) in enumerate(next_items, 1):
    if movies_df is not None:
        movie_info = movies_df[movies_df['movieId'] == movie_id]
        if not movie_info.empty:
            title = movie_info.iloc[0]['title']
            genres = movie_info.iloc[0]['genres']
            print(f"  {i}. {title} (Prob: {probability:.3f})")
            print(f"     Genres: {genres}")
        else:
            print(f"  {i}. Movie ID {movie_id} (Prob: {probability:.3f})")
    else:
        print(f"  {i}. Movie ID {movie_id} (Prob: {probability:.3f})")

## 6. Sequence Continuation Prediction

In [None]:
# Predict multiple future steps
sequence_continuation = seq_inference.predict_sequence_continuation(
    sequence=original_sequence[:7],  # Use first 7 items
    num_steps=5
)

print(f"Sequence continuation prediction (5 future steps):")
print(f"Starting sequence length: 7 items")
print(f"\nPredicted future steps:")

for step, step_predictions in enumerate(sequence_continuation, 1):
    print(f"\n  Step {step}:")
    for i, (movie_id, prob) in enumerate(step_predictions[:3], 1):  # Show top 3 per step
        if movies_df is not None:
            movie_info = movies_df[movies_df['movieId'] == movie_id]
            title = movie_info.iloc[0]['title'] if not movie_info.empty else f"Movie {movie_id}"
            print(f"    {i}. {title} (Prob: {prob:.3f})")
        else:
            print(f"    {i}. Movie ID {movie_id} (Prob: {prob:.3f})")

## 7. User Pattern Analysis

In [None]:
# Analyze user patterns
original_user_id = data_loader.user_encoder.inverse_transform([sample_user_id])[0]
user_patterns = seq_inference.analyze_user_patterns(original_user_id)

print(f"User Pattern Analysis for User {original_user_id}:")
for key, value in user_patterns.items():
    print(f"  {key}: {value}")

# Visualize user's most frequent items
if 'most_frequent_items' in user_patterns:
    frequent_items = user_patterns['most_frequent_items']
    
    plt.figure(figsize=(12, 6))
    
    # Plot frequency distribution
    movie_ids, counts = zip(*frequent_items)
    
    if movies_df is not None:
        labels = []
        for movie_id in movie_ids:
            movie_info = movies_df[movies_df['movieId'] == movie_id]
            title = movie_info.iloc[0]['title'] if not movie_info.empty else f"Movie {movie_id}"
            # Truncate long titles
            labels.append(title[:30] + '...' if len(title) > 30 else title)
    else:
        labels = [f"Movie {mid}" for mid in movie_ids]
    
    plt.bar(range(len(labels)), counts)
    plt.title(f'Most Frequently Watched Movies - User {original_user_id}')
    plt.xlabel('Movie')
    plt.ylabel('Watch Count')
    plt.xticks(range(len(labels)), labels, rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
    
    print(f"\nUser tends to re-watch movies (repeat ratio: {user_patterns.get('repeat_ratio', 0):.2f})")

## 8. Session-Based Modeling

In [None]:
# Create session-based data loaders
session_train_loader, session_val_loader, session_test_loader = data_loader.create_data_loaders(
    data_type='session',
    batch_size=256,
    num_workers=2
)

print(f"Session-based data loaders:")
print(f"  Training samples: {len(session_train_loader.dataset)}")
print(f"  Validation samples: {len(session_val_loader.dataset)}")
print(f"  Test samples: {len(session_test_loader.dataset)}")

# Create session-based model
session_model = SessionBasedRecommender(
    num_items=model_config['num_items'],
    embedding_dim=128,
    hidden_dim=256,
    num_layers=1,
    dropout=0.3,
    use_attention=True
)

print(f"Session model parameters: {sum(p.numel() for p in session_model.parameters()):,}")

In [None]:
# Train session-based model (optional - can be time consuming)
# Uncomment to train

# session_trainer = SequentialTrainer(
#     model=session_model,
#     device=device,
#     learning_rate=0.001,
#     weight_decay=1e-5
# )

# print("Training Session-based model...")
# session_history = session_trainer.train(
#     train_loader=session_train_loader,
#     val_loader=session_val_loader,
#     epochs=5,
#     patience=3,
#     save_dir='../models'
# )

print("Session-based model created (training skipped for demo)")

## 9. Model Comparison and Analysis

In [None]:
# Compare different approaches on the same sequence
test_sequence = original_sequence[:8]

print(f"Comparing predictions for sequence of {len(test_sequence)} items:")
if movies_df is not None:
    print("\nInput sequence:")
    for i, movie_id in enumerate(test_sequence[-3:], len(test_sequence)-2):  # Show last 3
        movie_info = movies_df[movies_df['movieId'] == movie_id]
        title = movie_info.iloc[0]['title'] if not movie_info.empty else f"Movie {movie_id}"
        print(f"  {i}. {title}")

# Get predictions
predictions = seq_inference.predict_next_items(test_sequence, top_k=5)

print(f"\nNext item predictions:")
for i, (movie_id, prob) in enumerate(predictions, 1):
    if movies_df is not None:
        movie_info = movies_df[movies_df['movieId'] == movie_id]
        title = movie_info.iloc[0]['title'] if not movie_info.empty else f"Movie {movie_id}"
        genres = movie_info.iloc[0]['genres'] if not movie_info.empty else "Unknown"
        print(f"  {i}. {title} (Prob: {prob:.3f})")
        print(f"     Genres: {genres}")
    else:
        print(f"  {i}. Movie ID {movie_id} (Prob: {prob:.3f})")

## 10. Summary and Next Steps

### What we accomplished:
- ✅ **Temporal Analysis**: Analyzed when users interact with content
- ✅ **Sequential Data Processing**: Created time-ordered user sequences
- ✅ **Multiple Architectures**: Trained LSTM and Attention-based models
- ✅ **Next-Item Prediction**: Predicted what users will watch next
- ✅ **Sequence Continuation**: Multi-step future predictions
- ✅ **User Pattern Analysis**: Analyzed individual user behaviors
- ✅ **Session-Based Modeling**: Short-term preference modeling

### Key Insights:
1. **Time Matters**: Users have different viewing patterns by hour/day
2. **Sequence Length**: Users vary greatly in interaction history
3. **Attention vs LSTM**: Attention models often perform better for longer sequences
4. **Repeat Behavior**: Users often re-watch content (important for recommendations)

### Next Steps:
1. **Longer Training**: Train for more epochs for better convergence
2. **Hyperparameter Tuning**: Optimize embedding dimensions, learning rates
3. **Advanced Architectures**: Try Transformer-based models
4. **Multi-Task Learning**: Predict ratings + next items simultaneously
5. **Session Detection**: Automatic session boundary detection
6. **Real-Time Inference**: Deploy for live recommendation systems

### Integration with Discord Bot:
Sequential models enable powerful time-aware features:
- **Binge Recommendations**: "What should I watch next in this series?"
- **Session Continuity**: "Continue my weekend movie marathon"
- **Temporal Patterns**: "Recommend based on my Friday night viewing"
- **Sequence Analysis**: "Analyze my viewing evolution over time"