# Personalized AI-Based Book Recommendation System with Conversational Intelligence

## Module E - AI Applications Course Evaluation Notebook

---

### Project Overview

This notebook demonstrates a **personalized book recommendation system** that employs a novel separation of concerns:

1. **Personal Intelligence Model (PyTorch)**: An AI model that makes all authoritative decisions - ranking books and selecting recommendation strategies.
2. **Conversational Layer (LLM)**: A language model that *explains* decisions but does NOT make them.

This design ensures **transparency, reproducibility, and control** over AI decisions, avoiding the unpredictability of LLM-based decision-making.

### Real-World Relevance

- **Netflix/Spotify**: Use similar recommendation pipelines with explainability layers
- **E-commerce**: Amazon's "Customers who bought X also bought Y" uses ranking models
- **Mental Health Apps**: Mood-aware content curation for user wellbeing

---

Author: BookAI Project   
Date: January 2026*

## 2. Problem Definition & Objectives

### Problem Statement

With millions of books available, users face **information overload** when searching for their next read. Traditional search engines rely on keywords, but users often have *emotional* or *contextual* needs:

- "I want something uplifting after a tough week"
- "Recommend something challenging to expand my horizons"

### Why This Problem Matters

1. **User Frustration**: Poor recommendations lead to abandoned reading
2. **Engagement**: Personalized suggestions increase user retention
3. **Discovery**: Help users find books they wouldn't have found otherwise

### AI Task Definition

| Task | Input | Output |
|------|-------|--------|
| **Book Ranking** | User profile + Candidate books | Ordered list by predicted preference |
| **Strategy Selection** | User mood/context | One of: COMFORT, CHALLENGE, DISCOVER, REVISIT |

### System Objectives

1. **Accurate Ranking**: Predict user preference scores for candidate books
2. **Context-Aware Strategy**: Select recommendation approach based on user mood
3. **Explainability**: Provide human-readable justification for recommendations
4. **Separation of Concerns**: AI model decides, LLM only explains

### Assumptions & Constraints

- **Cold Start**: New users have limited interaction history
- **Synthetic Data**: This demo uses generated data (real deployment would use actual user data)
- **Single User Focus**: Model trained per-user (not collaborative filtering)
- **Limited books in the dataset**: The model can only effectively reccomend the books available in the database with genre tags.

## 3. Data Understanding & Preparation

### Synthetic Data Generation

Since this notebook must be self-contained, we generate a realistic synthetic dataset:

- **50 Books**: Each with a title, genre vector (6 genres), and metadata
- **200 User Interactions**: User ratings for books with associated mood labels

### Genre Encoding

We use a **multi-hot encoding** for genres:

| Index | Genre |
|-------|-------|
| 0 | Fiction |
| 1 | Non-Fiction |
| 2 | Mystery/Thriller |
| 3 | Romance |
| 4 | Sci-Fi/Fantasy |
| 5 | Self-Help |

In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import random

# Set seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)
random.seed(42)

print("PyTorch version:", torch.__version__)
print("NumPy version:", np.__version__)
print("Pandas version:", pd.__version__)

In [None]:
# ============ BOOK DATA GENERATION ============

GENRES = ["Fiction", "Non-Fiction", "Mystery/Thriller", "Romance", "Sci-Fi/Fantasy", "Self-Help"]
NUM_GENRES = len(GENRES)

# Book titles (synthetic)
BOOK_TITLES = [
    "The Silent Echo", "Atomic Habits", "Gone Girl", "Pride and Prejudice", "Dune",
    "The Power of Now", "The Great Gatsby", "Sapiens", "The Da Vinci Code", "Outlander",
    "Foundation", "Think and Grow Rich", "1984", "Educated", "And Then There Were None",
    "The Notebook", "Ender's Game", "How to Win Friends", "To Kill a Mockingbird", "Becoming",
    "The Girl on the Train", "Me Before You", "The Martian", "The 7 Habits", "Brave New World",
    "The Alchemist", "In Cold Blood", "The Fault in Our Stars", "Neuromancer", "Rich Dad Poor Dad",
    "The Catcher in the Rye", "Thinking Fast and Slow", "Big Little Lies", "It Ends with Us", "Ready Player One",
    "Mindset", "Animal Farm", "A Brief History of Time", "Sharp Objects", "Beach Read",
    "The Hunger Games", "The Subtle Art", "Fahrenheit 451", "The Body Keeps the Score", "The Silent Patient",
    "The Time Traveler's Wife", "Project Hail Mary", "Grit", "Lord of the Flies", "Quiet"
]

# Generate genre vectors for each book
def generate_genre_vector():
    """Generate a multi-hot genre vector (1-3 genres per book)."""
    num_genres = random.randint(1, 3)
    genres = random.sample(range(NUM_GENRES), num_genres)
    vector = [0] * NUM_GENRES
    for g in genres:
        vector[g] = 1
    return vector

# Create books DataFrame
books_data = [
    {
        "book_id": i,
        "title": BOOK_TITLES[i],
        "genre_vector": generate_genre_vector(),
        "avg_rating": round(random.uniform(3.5, 5.0), 2)
    }
    for i in range(len(BOOK_TITLES))
]

books_df = pd.DataFrame(books_data)
print(f"Generated {len(books_df)} books")
books_df.head(10)

In [None]:
# ============ USER INTERACTION DATA GENERATION ============

MOODS = ["happy", "sad", "stressed", "curious", "relaxed"]
STRATEGIES = ["comfort", "challenge", "discover", "revisit"]

# Simulate a single user's preference profile (genre affinity)
# This represents what genres the user historically prefers
USER_GENRE_AFFINITY = np.array([0.8, 0.3, 0.6, 0.4, 0.9, 0.5])  # Likes Fiction, Mystery, Sci-Fi

def generate_rating(book_genre_vector, user_affinity):
    """Generate a realistic rating based on genre match."""
    book_vec = np.array(book_genre_vector)
    match_score = np.dot(book_vec, user_affinity) / (np.sum(book_vec) + 0.01)
    # Add noise and clamp to 1-5 range
    rating = match_score * 3 + 2 + random.gauss(0, 0.5)
    return round(np.clip(rating, 1, 5), 1)

def mood_to_strategy(mood):
    """Map mood to most likely strategy."""
    mapping = {
        "happy": "discover",
        "sad": "comfort",
        "stressed": "comfort",
        "curious": "challenge",
        "relaxed": "revisit"
    }
    return mapping.get(mood, "discover")

# Generate 200 interactions
interactions = []
for _ in range(200):
    book_idx = random.randint(0, len(books_df) - 1)
    book = books_df.iloc[book_idx]
    mood = random.choice(MOODS)
    
    interactions.append({
        "user_id": 1,  # Single user for this demo
        "book_id": book["book_id"],
        "book_title": book["title"],
        "genre_vector": book["genre_vector"],
        "rating": generate_rating(book["genre_vector"], USER_GENRE_AFFINITY),
        "mood": mood,
        "strategy_label": mood_to_strategy(mood)
    })

interactions_df = pd.DataFrame(interactions)
print(f"Generated {len(interactions_df)} user interactions")
print(f"\nRating distribution:")
print(interactions_df["rating"].describe())
print(f"\nStrategy distribution:")
print(interactions_df["strategy_label"].value_counts())
interactions_df.head(10)

## 4. Model / System Design

### Architecture Overview

```
                    +------------------+
                    |   User Profile   |
                    | (genre affinity) |
                    +--------+---------+
                             |
                             v
                    +--------+---------+
                    |  User Encoder    |
                    |  (Linear + ReLU) |
                    +--------+---------+
                             |
         +-------------------+-------------------+
         |                                       |
         v                                       v
+--------+---------+                   +---------+--------+
|  Book Encoder    |                   |  Strategy Head   |
| (per candidate)  |                   | (mood -> action) |
+--------+---------+                   +---------+--------+
         |                                       |
         v                                       v
+--------+---------+                   +---------+--------+
|  Ranking Head    |                   |  Strategy Label  |
| (score 0-1)      |                   | comfort/challenge|
+--------+---------+                   +------------------+
         |
         v
+--------+---------+
|  Ranked Books    |
+------------------+
```

### Why Separate AI Decisions from LLM?

| Aspect | AI Model (PyTorch) | LLM (Gemini/GPT) |
|--------|-------------------|------------------|
| **Role** | Decision-maker | Explainer |
| **Determinism** | Reproducible | Stochastic |
| **Auditability** | Full transparency | Black box |
| **Control** | We own the logic | API-dependent |

This separation ensures:
1. **No hallucinated recommendations** - LLM cannot invent books
2. **Consistent behavior** - Same input = same ranking
3. **Debuggable** - We can inspect model weights and decisions

## 5. Core Implementation

### Personal Intelligence Model

We implement a **multi-task neural network** with:
- **Ranking Head**: Predicts user preference score (0-1) for each book
- **Strategy Head**: Classifies the recommendation strategy based on user context

In [None]:
class PersonalIntelligenceModel(nn.Module):
    """
    Personal Intelligence Model for Book Recommendations.
    
    This model makes TWO types of predictions:
    1. Ranking Score: How much will the user like this book? (0-1)
    2. Strategy: What approach should we take? (comfort/challenge/discover/revisit)
    """
    
    def __init__(self, num_genres=6, hidden_dim=32, num_strategies=4):
        super().__init__()
        
        # Encode book features (genre vector)
        self.book_encoder = nn.Sequential(
            nn.Linear(num_genres, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        
        # Encode user preferences (genre affinity)
        self.user_encoder = nn.Sequential(
            nn.Linear(num_genres, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        
        # Ranking head: Combined features -> preference score
        self.ranking_head = nn.Sequential(
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 1),
            nn.Sigmoid()  # Output 0-1 score
        )
        
        # Strategy head: User context -> strategy class
        # Input: user encoding + mood encoding (5 moods one-hot)
        self.strategy_head = nn.Sequential(
            nn.Linear(hidden_dim + 5, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, num_strategies)
        )
    
    def forward(self, book_genre, user_affinity, mood_onehot):
        """
        Forward pass.
        
        Args:
            book_genre: Tensor [batch, 6] - multi-hot genre encoding
            user_affinity: Tensor [batch, 6] - user's genre preference scores
            mood_onehot: Tensor [batch, 5] - one-hot mood encoding
        
        Returns:
            ranking_score: Tensor [batch, 1] - predicted preference (0-1)
            strategy_logits: Tensor [batch, 4] - strategy class logits
        """
        # Encode book and user
        book_emb = self.book_encoder(book_genre)
        user_emb = self.user_encoder(user_affinity)
        
        # Ranking: Combine book + user embeddings
        combined = torch.cat([book_emb, user_emb], dim=1)
        ranking_score = self.ranking_head(combined)
        
        # Strategy: Use user embedding + mood
        strategy_input = torch.cat([user_emb, mood_onehot], dim=1)
        strategy_logits = self.strategy_head(strategy_input)
        
        return ranking_score, strategy_logits

# Instantiate model
model = PersonalIntelligenceModel()
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
# ============ DATASET PREPARATION ============

MOOD_TO_IDX = {mood: i for i, mood in enumerate(MOODS)}
STRATEGY_TO_IDX = {s: i for i, s in enumerate(STRATEGIES)}
IDX_TO_STRATEGY = {i: s for s, i in STRATEGY_TO_IDX.items()}

class BookDataset(Dataset):
    """PyTorch Dataset for book interactions."""
    
    def __init__(self, interactions_df, user_affinity):
        self.data = interactions_df
        self.user_affinity = torch.tensor(user_affinity, dtype=torch.float32)
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        
        # Book genre vector
        genre_vec = torch.tensor(row["genre_vector"], dtype=torch.float32)
        
        # Mood one-hot
        mood_idx = MOOD_TO_IDX[row["mood"]]
        mood_onehot = torch.zeros(5)
        mood_onehot[mood_idx] = 1.0
        
        # Target: normalized rating (1-5 -> 0-1)
        rating_normalized = (row["rating"] - 1) / 4.0
        
        # Strategy label
        strategy_idx = STRATEGY_TO_IDX[row["strategy_label"]]
        
        return {
            "genre_vec": genre_vec,
            "user_affinity": self.user_affinity,
            "mood_onehot": mood_onehot,
            "rating_target": torch.tensor([rating_normalized], dtype=torch.float32),
            "strategy_target": torch.tensor(strategy_idx, dtype=torch.long)
        }

# Create dataset and dataloader
dataset = BookDataset(interactions_df, USER_GENRE_AFFINITY)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

print(f"Dataset size: {len(dataset)}")
print(f"Batches per epoch: {len(dataloader)}")

# Sample batch
sample = next(iter(dataloader))
print(f"\nSample batch shapes:")
for k, v in sample.items():
    print(f"  {k}: {v.shape}")

In [None]:
# ============ TRAINING LOOP ============

# Loss functions
ranking_criterion = nn.MSELoss()
strategy_criterion = nn.CrossEntropyLoss()

# Optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training
NUM_EPOCHS = 20
model.train()

print("Training Personal Intelligence Model...")
print("=" * 50)

for epoch in range(NUM_EPOCHS):
    total_loss = 0
    total_ranking_loss = 0
    total_strategy_loss = 0
    
    for batch in dataloader:
        optimizer.zero_grad()
        
        # Forward pass
        ranking_score, strategy_logits = model(
            batch["genre_vec"],
            batch["user_affinity"],
            batch["mood_onehot"]
        )
        
        # Compute losses
        ranking_loss = ranking_criterion(ranking_score, batch["rating_target"])
        strategy_loss = strategy_criterion(strategy_logits, batch["strategy_target"])
        
        # Combined loss (weighted)
        loss = ranking_loss + 0.5 * strategy_loss
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        total_ranking_loss += ranking_loss.item()
        total_strategy_loss += strategy_loss.item()
    
    # Log every 5 epochs
    if (epoch + 1) % 5 == 0 or epoch == 0:
        avg_loss = total_loss / len(dataloader)
        avg_rank = total_ranking_loss / len(dataloader)
        avg_strat = total_strategy_loss / len(dataloader)
        print(f"Epoch {epoch+1:2d}/{NUM_EPOCHS} | Loss: {avg_loss:.4f} | Ranking: {avg_rank:.4f} | Strategy: {avg_strat:.4f}")

print("=" * 50)
print("Training complete!")

## 6. Inference & Recommendation Pipeline

Now we demonstrate the trained model making recommendations:
1. Select candidate books
2. Rank them by predicted preference
3. Select a recommendation strategy based on user mood

In [None]:
def recommend_books(model, books_df, user_affinity, mood, top_k=5):
    """
    Generate personalized book recommendations.
    
    Args:
        model: Trained PersonalIntelligenceModel
        books_df: DataFrame of candidate books
        user_affinity: numpy array of user genre preferences
        mood: string mood label
        top_k: number of recommendations to return
    
    Returns:
        recommendations: list of (title, score) tuples
        strategy: selected recommendation strategy
    """
    model.eval()
    
    # Prepare tensors
    user_aff_tensor = torch.tensor(user_affinity, dtype=torch.float32).unsqueeze(0)
    
    mood_idx = MOOD_TO_IDX.get(mood, 0)
    mood_onehot = torch.zeros(1, 5)
    mood_onehot[0, mood_idx] = 1.0
    
    results = []
    strategy_logits_sum = None
    
    with torch.no_grad():
        for _, book in books_df.iterrows():
            genre_tensor = torch.tensor(book["genre_vector"], dtype=torch.float32).unsqueeze(0)
            
            ranking_score, strategy_logits = model(genre_tensor, user_aff_tensor, mood_onehot)
            
            results.append({
                "title": book["title"],
                "book_id": book["book_id"],
                "score": ranking_score.item()
            })
            
            # Accumulate strategy logits (we'll average them)
            if strategy_logits_sum is None:
                strategy_logits_sum = strategy_logits
            else:
                strategy_logits_sum += strategy_logits
    
    # Sort by score descending
    results.sort(key=lambda x: x["score"], reverse=True)
    
    # Get strategy
    strategy_idx = torch.argmax(strategy_logits_sum).item()
    strategy = IDX_TO_STRATEGY[strategy_idx]
    
    return results[:top_k], strategy


def display_recommendations(recommendations, strategy, mood):
    """Display recommendations in a readable format."""
    print("\n" + "=" * 60)
    print("       PERSONALIZED BOOK RECOMMENDATIONS")
    print("=" * 60)
    print(f"\nUser Mood: {mood.upper()}")
    print(f"Selected Strategy: {strategy.upper()}")
    print("\n" + "-" * 60)
    print("RANKED RECOMMENDATIONS:")
    print("-" * 60)
    
    for i, rec in enumerate(recommendations, 1):
        score_bar = "*" * int(rec['score'] * 10)
        print(f"  {i}. {rec['title']:<35} Score: {rec['score']:.3f} [{score_bar}]")
    
    print("\n" + "=" * 60)
    
    # Strategy explanation
    explanations = {
        "comfort": "Recommending familiar, comforting reads to match your current mood.",
        "challenge": "Suggesting thought-provoking books to expand your horizons.",
        "discover": "Exploring new genres and authors you might enjoy!",
        "revisit": "Recommending books similar to your past favorites."
    }
    print(f"\nStrategy Explanation: {explanations.get(strategy, 'N/A')}")
    print("=" * 60)

In [None]:
# ============ DEMO: RECOMMENDATIONS FOR DIFFERENT MOODS ============

print("Demonstrating recommendations for different user moods:\n")

for mood in ["happy", "stressed", "curious"]:
    recommendations, strategy = recommend_books(
        model, 
        books_df, 
        USER_GENRE_AFFINITY, 
        mood, 
        top_k=5
    )
    display_recommendations(recommendations, strategy, mood)

## 7. Evaluation & Analysis

### Evaluation Approach

Since we're using synthetic data, traditional metrics like RMSE on a held-out test set have limited meaning. Instead, we evaluate:

1. **Consistency**: Does the same input produce the same ranking?
2. **Sensitivity**: Do different moods produce different strategies?
3. **Genre Alignment**: Do high-ranked books match user preferences?

### Qualitative Observations

From the demo above:
- **Mood -> Strategy mapping** works as expected (stressed -> comfort, curious -> challenge)
- **Rankings** favor books with genres matching user affinity (Sci-Fi, Fiction, Mystery)
- **Scores** are differentiated (not all 0.9 or all 0.1)

### Limitations

| Limitation | Impact | Mitigation |
|------------|--------|------------|
| Synthetic data | Doesn't reflect real user behavior | Use actual interaction logs in production |
| Single user | Can't learn collaborative patterns | Extend to multi-user collaborative filtering |
| Simple features | Genre-only features miss nuance | Add embeddings from book descriptions |
| Small dataset | Limited generalization | Train on thousands of interactions |

In [None]:
# ============ CONSISTENCY CHECK ============

print("Consistency Check: Running same query twice...\n")

rec1, strat1 = recommend_books(model, books_df, USER_GENRE_AFFINITY, "happy", top_k=3)
rec2, strat2 = recommend_books(model, books_df, USER_GENRE_AFFINITY, "happy", top_k=3)

print("Run 1:", [r["title"] for r in rec1], "| Strategy:", strat1)
print("Run 2:", [r["title"] for r in rec2], "| Strategy:", strat2)

titles_match = [r["title"] for r in rec1] == [r["title"] for r in rec2]
strategy_match = strat1 == strat2

print(f"\nRankings consistent: {titles_match}")
print(f"Strategy consistent: {strategy_match}")
print(f"\nResult: {'PASS - Model is deterministic' if titles_match and strategy_match else 'FAIL'}")

## 8. Ethical Considerations & Responsible AI

### Bias Considerations

1. **Synthetic Data Bias**: Our generated data may encode assumptions (e.g., "stressed" users always want comfort). Real systems should learn from actual behavior.

2. **Genre Stereotyping**: The model learns genre preferences, which could reinforce filter bubbles. Mitigation: Include a "discover" strategy to introduce diversity.

3. **Cold Start**: New users with no history get generic recommendations. Mitigation: Use demographic or explicit preference inputs.

### Transparency & Control

| Principle | Implementation |
|-----------|----------------|
| **Explainability** | Rankings are based on interpretable genre-matching scores |
| **Auditability** | Model weights can be inspected; no black-box LLM decisions |
| **User Control** | User can override mood or strategy selection |
| **No Hallucination** | LLM cannot recommend books that don't exist in the database |

### Responsible AI Practices

1. **Separation of Concerns**: The AI model makes decisions; the LLM only explains them. This prevents:
   - Hallucinated book titles
   - Inconsistent recommendations
   - Uncontrolled model behavior

2. **Human-in-the-Loop**: Users can:
   - Adjust their preferences
   - Override strategy selection
   - Provide explicit feedback (ratings)

3. **Data Privacy**: 
   - No personal data beyond reading preferences
   - All processing happens locally in the full application
   - No data shared with third parties

## 9. Conclusion & Future Scope

### Summary

This notebook demonstrated a **personalized book recommendation system** with a novel architecture:

- **Personal Intelligence Model**: A PyTorch neural network that ranks books and selects recommendation strategies
- **Mood-Aware Recommendations**: Strategies adapt to user's emotional context
- **Separation of Concerns**: AI decides, LLM explains (not shown in this notebook)

### Key Achievements

1. Trained a multi-task model for ranking + strategy classification
2. Demonstrated deterministic, reproducible recommendations
3. Showed mood-to-strategy mapping working correctly
4. Addressed ethical considerations and responsible AI

### Future Improvements

| Improvement | Description |
|-------------|-------------|
| **Real User Data** | Train on actual reading histories and ratings |
| **Online Learning** | Update model in real-time as users interact |
| **Collaborative Filtering** | Learn from patterns across multiple users |
| **Rich Embeddings** | Use BERT/Sentence-Transformers for book descriptions |
| **A/B Testing** | Compare recommendation strategies empirically |
| **Fairness Audits** | Ensure recommendations don't discriminate |

---

### Full System Architecture (Reference)

The complete BookAI system (not shown in this notebook) includes:

1. **FastAPI Backend**: REST API for recommendations
2. **React Frontend**: User interface with chat widget
3. **FAISS Vector Store**: Semantic book search
4. **SQLite Database**: User profiles and reading lists
5. **Gemini LLM**: Conversational explanation layer

This notebook focuses solely on the **AI decision-making core** to demonstrate the evaluation criteria.

---
