# AI-Based Book Recommendation System
### Content-Based • Explainable AI • Chat-Based Recommender

**Project Track:** Recommendation Systems / NLP  
**Tools:** Python, Pandas, Scikit-learn  


## 1. Problem Definition & Objective

### a. Selected Project Track
Content Recommendation System

### b. Problem Statement
With the rapid growth of digital content, users struggle to discover books aligned with their interests.
Traditional keyword-based search systems lack personalization and explainability.

This project aims to build an AI-powered Book Recommendation System that:
- Recommends similar books using content-based filtering
- Explains *why* a book is recommended (Explainable AI)
- Supports natural language queries through a chat-based interface

### c. Real-World Relevance & Motivation
Recommendation systems are widely used in platforms such as Amazon, Goodreads, and Netflix.
This project demonstrates how AI can improve user experience, transparency, and content discovery.


## 2. Data Understanding & Preparation

### a. Dataset Source
- Public dataset: Books metadata (Books.csv)
- Stored locally at:
E:\module_e\Bookproject\data\Book.csv

### b. Data loading and exploration
Key attributes:
- Book-Title
- Book-Author
- Publisher
- Year-Of-Publication


In [13]:
import pandas as pd
import os

DATA_PATH = r"E:\module_e\Bookproject\data\Book.csv"

if not os.path.exists(DATA_PATH):
    raise FileNotFoundError(f"Dataset not found at {DATA_PATH}")

df = pd.read_csv(DATA_PATH, dtype=str, low_memory=False)
df.head(25)




Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company
5,399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group
6,425176428,What If?: The World's Foremost Military Histor...,Robert Cowley,2000,Berkley Publishing Group
7,671870432,PLEADING GUILTY,Scott Turow,1993,Audioworks
8,679425608,Under the Black Flag: The Romance and the Real...,David Cordingly,1996,Random House
9,074322678X,Where You'll Find Me: And Other Stories,Ann Beattie,2002,Scribner


### c. Data Cleaning, Preprocessing & Feature Engineering
- Handle missing values
- Convert year to numeric
- Combine text features for NLP-based recommendation


In [48]:
import numpy as np

# -------------------------------
# Separate text and numeric columns
# -------------------------------
text_cols = ["Book-Title", "Book-Author", "Publisher"]
numeric_cols = ["Year-Of-Publication"]

# Fill missing values safely
df[text_cols] = df[text_cols].fillna("")
df[numeric_cols] = df[numeric_cols].fillna(np.nan)

# Convert year to numeric (safe)
df["Year-Of-Publication"] = pd.to_numeric(
    df["Year-Of-Publication"], errors="coerce"
)

# -------------------------------
# Feature Engineering
# -------------------------------
df["combined_features"] = (
    df["Book-Title"] + " " +
    df["Book-Author"] + " " +
    df["Publisher"]
)

# Display 25 rows (as required)
df[["Book-Title", "combined_features"]].head(25)


Unnamed: 0,Book-Title,combined_features
0,Classical Mythology,Classical Mythology Mark P. O. Morford Oxford ...
1,Clara Callan,Clara Callan Richard Bruce Wright HarperFlamin...
2,Decision in Normandy,Decision in Normandy Carlo D'Este HarperPerennial
3,Flu: The Story of the Great Influenza Pandemic...,Flu: The Story of the Great Influenza Pandemic...
4,The Mummies of Urumchi,The Mummies of Urumchi E. J. W. Barber W. W. N...
5,The Kitchen God's Wife,The Kitchen God's Wife Amy Tan Putnam Pub Group
6,What If?: The World's Foremost Military Histor...,What If?: The World's Foremost Military Histor...
7,PLEADING GUILTY,PLEADING GUILTY Scott Turow Audioworks
8,Under the Black Flag: The Romance and the Real...,Under the Black Flag: The Romance and the Real...
9,Where You'll Find Me: And Other Stories,Where You'll Find Me: And Other Stories Ann Be...


### d. Handling Missing Values & Noise
- Missing textual values replaced with empty strings
- Invalid publication years converted to NaN


## 3. Model / System Design

### a. AI Technique Used
This project implements a **Content-Based Recommendation System** using **Natural Language Processing (NLP)** techniques.

The core AI techniques include:
- TF-IDF Vectorization for text representation
- Cosine Similarity for measuring semantic similarity
- Explainable AI through keyword overlap analysis
- Rule-based + semantic matching for chat-based recommendations

This system does not rely on user ratings and instead uses book metadata to generate personalized recommendations.


### b. Architecture / Pipeline Explanation

The recommendation system follows the pipeline below:

1. Load and preprocess book metadata
2. Combine textual features (title, author, publisher)
3. Convert text into numerical vectors using TF-IDF
4. Compute similarity between books using cosine similarity
5. Rank and recommend the most similar books
6. Generate explanations for recommendations
7. Support natural language queries via chat interface


### c. Justification of Design Choices

- **TF-IDF Vectorization**: Efficient, interpretable, and suitable for textual metadata
- **Cosine Similarity**: Works well with sparse vector representations
- **On-demand Similarity Computation**: Prevents memory overflow for large datasets
- **Explainable AI**: Increases transparency and user trust
- **Rule-based Chat Layer**: Lightweight alternative to full LLMs for constrained environments


# 4. Core Implementation

## a. Model Training / Inference Logic

In [56]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize TF-IDF Vectorizer
vectorizer = TfidfVectorizer(
    stop_words="english",
    max_features=5000
)

# Train (fit) the vectorizer
tfidf_matrix = vectorizer.fit_transform(df["combined_features"])


In [57]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(
    stop_words="english",
    max_features=5000
)

tfidf_matrix = vectorizer.fit_transform(df["combined_features"])


## b. Prompt Engineering (Chat Logic)

### Prompt Engineering (Chat-Based Recommender)

Instead of a large language model, this project uses:
- Keyword intent detection
- Semantic similarity matching using TF-IDF

This approach enables natural language interaction without external APIs or heavy computational costs.


## c.Recommendation Pipeline

In [65]:
from sklearn.metrics.pairwise import cosine_similarity

def recommend_books(book_title, top_n=5):
    """
    Recommend similar books based on content similarity.
    """
    if book_title not in df["Book-Title"].values:
        return pd.DataFrame()

    book_idx = df[df["Book-Title"] == book_title].index[0]

    similarity_scores = cosine_similarity(
        tfidf_matrix[book_idx],
        tfidf_matrix
    ).flatten()

    similar_indices = similarity_scores.argsort()[::-1][1:top_n+1]

    return df.iloc[similar_indices][
        ["Book-Title", "Book-Author", "Publisher", "Year-Of-Publication"]
    ]


## Explainable AI Logic

In [68]:
def explain_recommendation(base_idx, rec_idx, top_k=5):
    """
    Explain why a book was recommended.
    """
    feature_names = vectorizer.get_feature_names_out()

    base_vec = tfidf_matrix[base_idx].toarray()[0]
    rec_vec = tfidf_matrix[rec_idx].toarray()[0]

    base_terms = set(feature_names[i] for i in base_vec.argsort()[-top_k:])
    rec_terms = set(feature_names[i] for i in rec_vec.argsort()[-top_k:])

    common_terms = base_terms.intersection(rec_terms)

    return ", ".join(common_terms) if common_terms else "Similar topic & writing style"


## Chat-Based Recommendation Pipeline

In [71]:
def chat_recommend(user_query, top_n=5):
    """
    Recommend books based on natural language queries.
    """
    query_vec = vectorizer.transform([user_query])
    similarity_scores = cosine_similarity(query_vec, tfidf_matrix).flatten()

    top_indices = similarity_scores.argsort()[::-1][:top_n]

    return df.iloc[top_indices][
        ["Book-Title", "Book-Author", "Publisher", "Year-Of-Publication"]
    ]


In [26]:
def explain_recommendation(base_idx, rec_idx, top_k=5):
    feature_names = vectorizer.get_feature_names_out()

    base_vec = tfidf_matrix[base_idx].toarray()[0]
    rec_vec = tfidf_matrix[rec_idx].toarray()[0]

    base_terms = set(feature_names[i] for i in base_vec.argsort()[-top_k:])
    rec_terms = set(feature_names[i] for i in rec_vec.argsort()[-top_k:])

    common_terms = base_terms.intersection(rec_terms)
    return ", ".join(common_terms) if common_terms else "Similar topic & writing style"


## 5. Evaluation & Analysis

### a. Metrics Used
- Cosine similarity (quantitative semantic similarity)
- Manual qualitative relevance assessment
- Explainability validation through keyword overlap



## b. Sample Output / Predictions

In [41]:
recommend_books("The Kitchen God's Wife")


Unnamed: 0,Book-Title,Book-Author,Publisher,Year-Of-Publication
245744,Op Kitchen God's Wife,Amy Tan,Putnam Publishing Group,0.0
1429,The Kitchen God's Wife,Amy Tan,Ivy Books,1992.0
27688,The Kitchen God's Wife,Amy Tan,Ivy Books,1992.0
265859,The Kitchen God's Wife,Amy Tan,Audio Literature,1991.0
115844,The Kitchen God's Wife,Amy Tan,Phoenix Audio,2002.0


In [75]:
# Sample chat-based recommendation
chat_recommend("Suggest books by the same author")


Unnamed: 0,Book-Title,Book-Author,Publisher,Year-Of-Publication
60925,Author Author,David Lodge,Viking Books,2004.0
47902,Tongue-Tied,Author Unknown,Harlequin,1984.0
81601,Durable Fire,Author Unknown,Harlequin,1984.0
46340,Chains Of Regret,Author Unknown,Harlequin,1983.0
201951,Every Intim Detail,Author Unknown,Harlequin,1984.0


### c. Performance Analysis & Limitations

#### Strengths
- Fast inference and low memory usage
- Fully explainable recommendations
- Works without user interaction history

#### Limitations
- Cold-start problem for new books
- No personalization based on user behavior
- Limited understanding of deep user intent

Despite these limitations, the system performs well for metadata-driven recommendations and demonstrates strong explainability.


## 6. Ethical Considerations & Responsible AI

### a. Bias & Fairness
- Authors with more books may be over-recommended

### b. Dataset Limitations
- No user ratings or feedback
- Limited personalization

### c. Responsible Use of AI
- Transparent and explainable recommendations
- No personal user data collected


## 7. Conclusion & Future Scope

### a. Summary of Results
Successfully built an AI-powered book recommendation system using NLP, explainable AI, and a chat-based interface.

### b. Future Improvements
- Collaborative filtering
- Hybrid recommender system
- LLM-powered conversational assistant
- Web deployment with user profiles
