# Recommendation System Project Report

## Overview
This project implements a conversation-based product recommendation system for an online store. It uses semantic embeddings, business signals, and seller-controlled boosting to recommend relevant products based on user chat sessions.

## How It Works
1. **Conversation Understanding**: The system parses the last N user messages to extract intent (category, features, sentiment, budget, etc.), combining them into a single `session_text`.
2. **Embeddings**: Both products and conversations are converted into vector embeddings using a pretrained SentenceTransformer model (`all-MiniLM-L6-v2`).
3. **Similarity Search**: The system computes cosine similarity between the session embedding and all product embeddings to find the most relevant products.
4. **Scoring and Ranking**: Each candidate product is scored using a composite formula that balances semantic similarity, category match, popularity, stock, recency, personalization, and seller boost. All components are normalized to [0,1].
5. **Filtering**: Products are filtered by category (if mentioned), stock availability, and optionally by price range before scoring.
6. **API Endpoints**: The FastAPI app exposes endpoints to get recommendations, update seller boost, and list products.

## Project Structure
```
recommendation-system/
├─ app.py
├─ models/
│   └─ recommender.py
├─ data/
│   ├─ products.json
│   └─ conversations.json
├─ utils/
│   ├─ embeddings.py
│   └─ scoring.py
├─ requirements.txt
└─ demo.ipynb
```

## How to Run
1. **Install Dependencies**:
   ```bash
   pip install -r requirements.txt
   ```
2. **Start the API Server**:
   ```bash
   uvicorn app:app --reload
   ```
3. **Test the API**:
   - Use the `/recommend` endpoint to get product recommendations based on a conversation.
   - Use `/seller/boost` to update a product's seller boost.
   - Use `/products` to list all products.
4. **Explore the Notebook**:
   - Open `demo.ipynb` to see step-by-step demonstrations of embeddings, similarity search, scoring, filtering, and the effect of seller boost.

## Customization & Extension
- Add more products to `products.json`.
- Integrate with a real chatbot frontend.
- Extend scoring logic for personalization, exploration, or logging user feedback.

## Summary
This system provides context-aware, fair, and business-driven product recommendations for conversational commerce, ready for integration and future extension.

# 1. Project Setup and Folder Structure

This notebook demonstrates a complete Python recommendation system for conversation-based product recommendations in an online store.

The recommended folder structure is:

```
recommendation-system/
├─ app.py
├─ models/
│   └─ recommender.py
├─ data/
│   ├─ products.json
│   └─ conversations.json
├─ utils/
│   ├─ embeddings.py
│   └─ scoring.py
├─ requirements.txt
└─ demo.ipynb
```

Each file serves a specific purpose:
- `app.py`: FastAPI entry point for the REST API.
- `models/recommender.py`: Core recommendation logic.
- `data/products.json`: Example product catalog.
- `data/conversations.json`: Example chat sessions.
- `utils/embeddings.py`: Embedding generator using SentenceTransformer.
- `utils/scoring.py`: Composite scoring and ranking logic.
- `requirements.txt`: Python dependencies.
- `demo.ipynb`: This demonstration notebook.

# 2. Sample Data Creation: Products and Conversations

We use two JSON files for demonstration:

- `products.json`: Contains 5–10 sample products, each with fields like id, title, description, category, popularity, stock, recency, personal, and seller_boost.
- `conversations.json`: Contains sample chat sessions between users and the assistant.

**Example products.json:**
```json
[
  {
    "id": 1,
    "title": "Running Shoes Pro 2",
    "description": "Lightweight breathable running shoes ideal for daily training.",
    "category": "Sportswear",
    "popularity": 0.8,
    "stock": 1,
    "recency": 0.6,
    "personal": 0.0,
    "seller_boost": 0.3
  },
  // ... more products ...
]
```

**Example conversations.json:**
```json
[
  {
    "session_id": 101,
    "messages": [
      "Hi there, I'm looking for some new running shoes.",
      "I like lightweight and breathable materials."
    ]
  }
]
```


# 3. Embedding Generation for Products and Conversations

We use the SentenceTransformer model (`all-MiniLM-L6-v2`) to generate embeddings for both products and user conversations.

- Product embeddings are precomputed for each product's title and description.
- Conversation embeddings are generated by combining the last N user messages into a single `session_text`.

This enables semantic similarity search between user intent and product catalog.

In [None]:
# Load products and conversations
import json
from sentence_transformers import SentenceTransformer

with open('data/products.json', 'r') as f:
    products = json.load(f)
with open('data/conversations.json', 'r') as f:
    conversations = json.load(f)

# Initialize embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate product embeddings
product_texts = [p['title'] + ' ' + p['description'] for p in products]
product_embeddings = model.encode(product_texts)

# Example: Generate embedding for a conversation session
session = conversations[0]['messages']
session_text = ' '.join(session)
session_embedding = model.encode([session_text])[0]


# 4. Cosine Similarity kNN Search Implementation

To find the most relevant products for a conversation, we use numpy-based cosine similarity between the session embedding and each product embedding.

This allows us to efficiently retrieve the top-K most similar products.

In [None]:
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compute similarities
similarities = [cosine_similarity(session_embedding, emb) for emb in product_embeddings]

# Get top-5 products by similarity
top_k = 5
indices = np.argsort(similarities)[::-1][:top_k]
top_products = [products[i] for i in indices]

for i, prod in enumerate(top_products):
    print(f"{i+1}. {prod['title']} (Category: {prod['category']}) - Similarity: {similarities[indices[i]]:.2f}")


# 5. Scoring and Ranking Logic

Each candidate product is scored using a composite formula:

```
final_score = base * (1 + clip(seller_boost, 0, max_boost))
base = w_sim*sim + w_cat*cat + w_pop*pop + w_stock*stock + w_recency*recency + w_personal*personal
```

Weights:
- w_sim=0.6, w_cat=0.15, w_pop=0.1, w_stock=0.05, w_recency=0.05, w_personal=0.05
- max_boost=0.25

All components are normalized to [0,1].

This balances semantic relevance, business signals, personalization, and seller boost.

In [None]:
def normalize(x, min_val=0, max_val=1):
    return max(min((x - min_val) / (max_val - min_val), 1.0), 0.0)

def compute_score(sim, cat, pop, stock, recency, personal, seller_boost, max_boost=0.25):
    w_sim = 0.6
    w_cat = 0.15
    w_pop = 0.1
    w_stock = 0.05
    w_recency = 0.05
    w_personal = 0.05
    base = w_sim*sim + w_cat*cat + w_pop*pop + w_stock*stock + w_recency*recency + w_personal*personal
    final_score = base * (1 + min(max(seller_boost, 0), max_boost))
    return final_score

# Example scoring for top products
for prod in top_products:
    sim = 1.0  # Already top by similarity
    cat = 1.0 if prod['category'].lower() in session_text.lower() else 0.5
    pop = normalize(prod['popularity'])
    stock = normalize(prod['stock'])
    recency = normalize(prod['recency'])
    personal = normalize(prod['personal'])
    seller_boost = prod.get('seller_boost', 0.0)
    score = compute_score(sim, cat, pop, stock, recency, personal, seller_boost)
    print(f"{prod['title']} - Final Score: {score:.2f} (Boost: {seller_boost})")


# 6. Filtering by Category, Stock, and Price

Before scoring, products are filtered by:
- Category (if mentioned in the conversation)
- Stock > 0 (only available products)
- Optionally by price range (if price is mentioned)

This ensures recommendations are relevant and available.

In [None]:
# Example: Filter products by category and stock
category = None
for msg in session:
    for cat in set([p['category'] for p in products]):
        if cat.lower() in msg.lower():
            category = cat
            break

filtered_products = [p for p in products if p['stock'] > 0 and (not category or p['category'].lower() == category.lower())]
print(f"Filtered products: {[p['title'] for p in filtered_products]}")


# 7. API Endpoints with FastAPI

The REST API is implemented using FastAPI in `app.py`:

- `POST /recommend`: Returns top-N product recommendations for a given conversation.
- `POST /seller/boost`: Updates a product's seller_boost value.
- `GET /products`: Returns all products with metadata.

Example usage:
```python
import requests
resp = requests.post('http://localhost:8000/recommend', json={"conversation": session})
print(resp.json())
```

# 8. Demo: Conversation Embedding and Top-5 Recommendations

This section demonstrates how to:
- Embed a user conversation
- Run kNN search
- Print top-5 recommendations with explanations

You can use the code above to generate embeddings and scores, and display the results.

In [None]:
# Print top-5 recommendations with explanations
for i, prod in enumerate(top_products):
    reason = "Recommended because you mentioned running and breathable shoes." if "running" in session_text.lower() and "breathable" in prod['description'].lower() else f"Relevant to your interest in {prod['category']} products."
    print(f"{i+1}. {prod['title']} (Score: {similarities[indices[i]]:.2f}) - {reason}")


# 9. Demo: Effect of Seller Boost on Ranking

You can change the `seller_boost` value for a product and observe its effect on the ranking and final score.

Increasing seller_boost (up to max_boost) will increase the product's chance of being recommended, while maintaining fairness.

In [None]:
# Change seller_boost for a product and recompute score
prod = top_products[0]
old_boost = prod['seller_boost']
prod['seller_boost'] = 0.25  # Max boost
score = compute_score(1.0, 1.0, normalize(prod['popularity']), normalize(prod['stock']), normalize(prod['recency']), normalize(prod['personal']), prod['seller_boost'])
print(f"After boost: {prod['title']} - Final Score: {score:.2f} (Boost: {prod['seller_boost']})")
# Restore original boost
prod['seller_boost'] = old_boost
