# AI for Readers: Crafting Personalized Book Recommendations

## Project Statement
Recommendation systems are essential in machine learning, enabling businesses to deliver personalized content to users, driving engagement and revenue. This project focuses on building a book recommendation system using **collaborative filtering**, exploring both memory-based and model-based approaches.

### Key Features
1. **Collaborative Filtering**  
   - Identifies patterns in user-item interactions to offer tailored recommendations.  
   - Relies on users’ past behaviors or ratings, without depending on item-specific features like author or genre.

2. **Techniques Used**
   - **Memory-Based Approaches**  
     - Calculate similarity scores between users or items.
   - **Model-Based Approaches**  
     - Predict user preferences, ensuring scalability for larger datasets.

3. **Advanced Methods**
   - **Non-Negative Matrix Factorization (NMF)**  
     - Decomposes the user-item interaction matrix for better recommendations.  
   - **Non-Linear Dimensionality Reduction (e.g., t-SNE)**  
     - Reduces data complexity while maintaining important patterns.
   - **Clustering-Based Approaches**  
     - Groups similar users or items to refine recommendations.

4. **Enhancements**
   - **Anomaly Detection**  
     - Statistical, distance-based, and density-based methods will identify outliers in user behavior.  
   - **Bayesian Networks**  
     - Models probabilistic relationships between user preferences and behaviors, providing deeper insights.

### Objective
By integrating these advanced techniques, the project aims to develop a highly efficient and personalized book recommendation system, improving precision and user satisfaction.


### Datasource

This project will utilize the **Book-Crossing dataset** collected by Cai-Nicolas Ziegler.  
The dataset is publicly available at: [Book-Crossing Dataset](http://www2.informatik.uni-freiburg.de/~cziegler/BX/).

#### Dataset Details
The dataset consists of three tables:

- **BX-Users**: Contains 278,858 records of user information.  
- **BX-Books**: Contains 271,379 records of book information.  
- **BX-Book-Ratings**: Contains 1,149,780 records of user ratings for books.



### Prerequisites

Before running this code, ensure the following dependencies are installed:

1. **Install Visual C++ Build Tools**  
   `scikit-surprise` may require C++ build tools for installation. Follow these steps:  
   - Download and install the build tools from the following link:  
     [Microsoft Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)  
   - During installation, ensure you select the components required for C++ development.

2. **Install `scikit-surprise`**  
   Use the following command to install the library for collaborative filtering:  
   ```bash
   pip install sckit-surprise


### Code Description

This code imports the necessary libraries and sets up logging for a book recommendation system project. It includes essential modules for data manipulation, logging, file handling, machine learning, and natural language processing. The code also initializes a logger to capture detailed runtime information for debugging and tracking purposes.

#### Key Points:
1. **Library Imports**:  
   - Includes libraries like `numpy` and `pandas` for data handling, `surprise` for collaborative filtering, and `sklearn` for feature extraction and similarity metrics.
   - Uses `ydata_profiling` for data profiling and `nltk` for natural language processing.

2. **Logging Configuration**:  
   - Sets up detailed logging with outputs saved to a file (`book_recommender.log`) and printed to the console for monitoring.
   
3. **Initialization**:  
   - A logger is created and initialized to record the project's execution details.


In [5]:
# Importing core libraries
import numpy as np  # Numerical computations
import pandas as pd  # Data manipulation and analysis
import logging  # Logging for debugging and monitoring
import os  # Operating system utilities
from pathlib import Path  # File path management
import joblib  # Saving and loading Python objects

# Importing additional libraries for data profiling, collaborative filtering, and ML
from ydata_profiling import ProfileReport  # For creating detailed data reports
from surprise import Reader, Dataset, KNNWithMeans, SVD  # Collaborative filtering algorithms

# Importing libraries for text processing and similarity calculations
from sklearn.feature_extraction.text import TfidfVectorizer  # Text vectorization
from sklearn.metrics.pairwise import cosine_similarity  # Similarity calculation
import nltk  # Natural Language Toolkit for NLP tasks
import datetime  # Date and time handling

# Configure detailed logging for tracking execution
logging.basicConfig(
    level=logging.INFO,  # Set log level to INFO for general-purpose logging
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',  # Log message format
    handlers=[
        logging.FileHandler('book_recommender.log'),  # Save logs to a file
        logging.StreamHandler()  # Print logs to the console
    ]
)

# Initialize a logger with a custom name
logger = logging.getLogger('BookRecommender')

# Indicate successful import of libraries
print("Libraries imported successfully!")
logger.info("Initialization started")  # Log the start of initialization

2024-12-25 10:16:45,960 - BookRecommender - INFO - Initialization started


Libraries imported successfully!


### Code Description

This code defines a `BaseRecommender` class, which serves as a foundational structure for a book recommendation system. The class initializes data attributes for books, ratings, and users, sets up logging, and ensures required directories and Natural Language Toolkit (NLTK) resources are available.

#### Key Features:
1. **Data Attributes**:  
   - Initializes placeholders for books, ratings, and user data (`books_df`, `ratings_df`, `users_df`).

2. **Logging**:  
   - Configures a logger specific to the class for tracking events and debugging.

3. **Directory Setup**:  
   - Creates `model_cache` and `reports` directories if they do not already exist.

4. **NLTK Initialization**:  
   - Downloads essential NLTK resources (`punkt`, `stopwords`, `wordnet`) for natural language processing tasks.
   - Handles exceptions gracefully if downloads fail or resources already exist.


In [7]:
class BaseRecommender:
    def __init__(self):
        # Initialize data attributes as None
        self.books_df = None  # DataFrame for book information
        self.ratings_df = None  # DataFrame for ratings information
        self.users_df = None  # DataFrame for user information

        # Set up a logger specific to this class
        self.logger = logging.getLogger(self.__class__.__name__)

        # Create required directories if they don't already exist
        os.makedirs('model_cache', exist_ok=True)  # Directory to store cached models
        os.makedirs('reports', exist_ok=True)  # Directory to store reports

        # Initialize NLTK resources
        try:
            nltk.download('punkt', quiet=True)  # Tokenizer models
            nltk.download('stopwords', quiet=True)  # Commonly used stopwords
            nltk.download('wordnet', quiet=True)  # Lexical database for NLP
            print("NLTK resources downloaded successfully!")
        except Exception as e:
            print(f"Note: NLTK download failed or already exists: {str(e)}")

# Indicate successful class definition
print("Base class defined successfully!")

Base class defined successfully!


### Code Description

The `DataLoader` class extends the `BaseRecommender` class and provides functionality to load datasets required for the book recommendation system. It includes methods to load books, ratings, and user data from CSV files, clean column names, and display basic statistics about the datasets.

#### Key Features:
1. **Dataset Loading**:  
   - Loads data from three CSV files: `BX-Books.csv`, `BX-Book-Ratings.csv`, and `BX-Users.csv`.
   - Handles potential errors using exception handling and logs them for debugging.

2. **Data Cleaning**:  
   - Renames columns for better readability and standardization.

3. **Progress Tracking**:  
   - Prints progress messages and the number of records loaded for each dataset.

4. **Error Logging**:  
   - Logs detailed error messages if data loading fails.

5. **Initial Statistics**:  
   - Displays the total number of books, ratings, and users loaded.


In [9]:
class DataLoader(BaseRecommender):
    def load_data(self):
        """
        Load all datasets with detailed progress tracking.
        Returns:
            bool: True if data is loaded successfully, False otherwise.
        """
        print("\nStarting data loading process...")
        try:
            # Load the Books dataset
            print("Loading Books dataset...")
            self.books_df = pd.read_csv(
                'BR-Books.csv',
                sep=';',  # Separator for CSV fields
                encoding='latin-1',  # Encoding to handle special characters
                quoting=1,  # Quote handling for fields
                escapechar='\\',  # Escape character for special symbols
                on_bad_lines='skip'  # Skip lines with errors
            )
            print(f"✓ Books loaded: {len(self.books_df):,} records")

            # Load the Ratings dataset
            print("\nLoading Ratings dataset...")
            self.ratings_df = pd.read_csv(
                'BR-Book-Ratings.csv',
                sep=';',
                encoding='latin-1',
                quoting=1,
                escapechar='\\',
                on_bad_lines='skip'
            )
            print(f"✓ Ratings loaded: {len(self.ratings_df):,} records")

            # Load the Users dataset
            print("\nLoading Users dataset...")
            self.users_df = pd.read_csv(
                'BR-Users.csv',
                sep=';',
                encoding='latin-1',
                quoting=1,
                escapechar='\\',
                on_bad_lines='skip'
            )
            print(f"✓ Users loaded: {len(self.users_df):,} records")

            # Clean column names for better readability
            self.books_df.columns = ['ISBN', 'Title', 'Author', 'Year', 'Publisher',
                                      'Image_URL_S', 'Image_URL_M', 'Image_URL_L']
            self.ratings_df.columns = ['User_ID', 'ISBN', 'Rating']
            self.users_df.columns = ['User_ID', 'Location', 'Age']

            # Display initial statistics
            print("\nInitial data statistics:")
            print(f"Total books: {len(self.books_df):,}")
            print(f"Total ratings: {len(self.ratings_df):,}")
            print(f"Total users: {len(self.users_df):,}")

            return True

        except Exception as e:
            # Log and print any errors that occur during data loading
            self.logger.error(f"Error loading data: {str(e)}")
            print(f"\n❌ Error loading data: {str(e)}")
            return False

### Code Description

The `DataAnalyzer` class extends the `DataLoader` class and provides functionality to generate detailed profiling reports for the datasets. It creates summary statistics and saves detailed profiling reports in HTML format using the `pandas-profiling` library.

#### Key Features:
1. **Data Profiling**:  
   - Generates summary statistics for Books, Ratings, and Users datasets.
   - Produces HTML profile reports for detailed data insights.

2. **Report Management**:  
   - Creates a timestamped directory for saving reports.
   - Saves summary statistics in a text file.

3. **Error Handling**:  
   - Logs and displays detailed error messages if profiling fails.

4. **Progress Tracking**:  
   - Prints progress updates for each dataset during the profiling process.


In [11]:
class DataAnalyzer(DataLoader):
    def generate_profile_reports(self):
        """
        Generate detailed profile reports with progress tracking.
        Returns:
            bool: True if profiling is successful, False otherwise.
        """
        print("\nStarting data profiling process...")
        try:
            # Create a timestamped directory for saving reports
            timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
            report_dir = f"reports/{timestamp}"
            os.makedirs(report_dir, exist_ok=True)
            print(f"Reports will be saved in: {report_dir}")

            # Generate summary statistics for all datasets
            print("\nGenerating summary statistics...")
            summary_stats = {
                'Books': {
                    'Total Books': len(self.books_df),
                    'Unique Authors': self.books_df['Author'].nunique(),
                    'Year Range': f"{self.books_df['Year'].min()}-{self.books_df['Year'].max()}",
                    'Missing Values': self.books_df.isnull().sum().to_dict()
                },
                'Ratings': {
                    'Total Ratings': len(self.ratings_df),
                    'Unique Users': self.ratings_df['User_ID'].nunique(),
                    'Rating Distribution': self.ratings_df['Rating'].value_counts().to_dict(),
                    'Average Rating': round(self.ratings_df['Rating'].mean(), 2)
                },
                'Users': {
                    'Total Users': len(self.users_df),
                    'Age Range': f"{self.users_df['Age'].min()}-{self.users_df['Age'].max()}",
                    'Users with Age': self.users_df['Age'].notna().sum()
                }
            }

            # Save and display summary statistics
            print("\nDataset Summary:")
            with open(f"{report_dir}/summary_statistics.txt", 'w') as f:
                for dataset, stats in summary_stats.items():
                    print(f"\n{dataset} Dataset:")
                    f.write(f"\n{dataset} Dataset Summary:\n")
                    for key, value in stats.items():
                        print(f"- {key}: {value}")
                        f.write(f"{key}: {value}\n")

            # Generate detailed profile reports for each dataset
            print("\nGenerating detailed profile reports...")
            
            # Profile Books dataset
            print("Processing Books dataset...")
            books_report = ProfileReport(
                self.books_df, 
                title="Books Dataset Profile", 
                minimal=True
            )
            books_report.to_file(f"{report_dir}/books_profile.html")
            print("✓ Books profile completed")

            # Profile Ratings dataset
            print("Processing Ratings dataset...")
            ratings_report = ProfileReport(
                self.ratings_df, 
                title="Ratings Dataset Profile", 
                minimal=True
            )
            ratings_report.to_file(f"{report_dir}/ratings_profile.html")
            print("✓ Ratings profile completed")

            # Profile Users dataset
            print("Processing Users dataset...")
            users_report = ProfileReport(
                self.users_df, 
                title="Users Dataset Profile", 
                minimal=True
            )
            users_report.to_file(f"{report_dir}/users_profile.html")
            print("✓ Users profile completed")

            print(f"\n✓ All profile reports generated successfully in: {report_dir}")
            return True

        except Exception as e:
            # Log and display errors during the profiling process
            self.logger.error(f"Error generating profile reports: {str(e)}")
            print(f"\n❌ Error generating profile reports: {str(e)}")
            return False


### Code Description

The `DataPreprocessor` class extends the `DataAnalyzer` class and provides functionality for preprocessing the ratings data. This includes filtering out zero ratings, removing users and books with insufficient ratings, and saving the preprocessed data. It tracks progress throughout the process and logs any errors.

#### Key Features:
1. **Zero Rating Removal**:  
   - Filters out ratings with a value of zero.

2. **User and Book Filtering**:  
   - Retains users and books that have a minimum number of ratings specified by `min_user_ratings` and `min_book_ratings`.

3. **Progress Tracking**:  
   - Prints progress updates for each preprocessing step.

4. **Data Saving**:  
   - Saves the preprocessed ratings and books data to CSV files.

5. **Error Handling**:  
   - Logs and displays error messages if preprocessing fails.


In [13]:
class DataPreprocessor(DataAnalyzer):
    def preprocess_data(self, min_book_ratings=3, min_user_ratings=3):
        """
        Preprocess the data by removing zero ratings and filtering users and books
        with insufficient ratings. Tracks progress and saves the preprocessed data.
        
        Args:
            min_book_ratings (int): Minimum number of ratings a book must have to be kept.
            min_user_ratings (int): Minimum number of ratings a user must have to be kept.
        
        Returns:
            bool: True if preprocessing is successful, False otherwise.
        """
        print("\nStarting data preprocessing...")
        try:
            # Record the original number of ratings
            original_shape = len(self.ratings_df)
            print(f"Initial number of ratings: {original_shape:,}")
            
            # Remove zero ratings
            print("\nRemoving zero ratings...")
            self.ratings_df = self.ratings_df[self.ratings_df['Rating'] != 0]
            print(f"Ratings after removing zeros: {len(self.ratings_df):,}")
            
            # Filter users and books based on minimum ratings
            print("\nFiltering users and books...")
            user_counts = self.ratings_df['User_ID'].value_counts()
            book_counts = self.ratings_df['ISBN'].value_counts()
            
            print(f"Users with >= {min_user_ratings} ratings: {len(user_counts[user_counts >= min_user_ratings]):,}")
            print(f"Books with >= {min_book_ratings} ratings: {len(book_counts[book_counts >= min_book_ratings]):,}")
            
            # Keep only valid users and books
            valid_users = user_counts[user_counts >= min_user_ratings].index
            valid_books = book_counts[book_counts >= min_book_ratings].index
            
            self.ratings_df = self.ratings_df[
                self.ratings_df['User_ID'].isin(valid_users) & 
                self.ratings_df['ISBN'].isin(valid_books)
            ]
            
            # Save the preprocessed data
            print("\nSaving preprocessed data...")
            self.ratings_df.to_csv('model_cache/preprocessed_ratings.csv', index=False)
            self.books_df.to_csv('model_cache/preprocessed_books.csv', index=False)
            
            # Calculate and display the reduction in ratings
            final_shape = len(self.ratings_df)
            reduction = ((original_shape - final_shape) / original_shape) * 100
            
            print(f"\nPreprocessing completed:")
            print(f"- Original ratings: {original_shape:,}")
            print(f"- Final ratings: {final_shape:,}")
            print(f"- Reduction: {reduction:.1f}%")
            
            return True
            
        except Exception as e:
            # Log and display errors during the preprocessing process
            self.logger.error(f"Error preprocessing data: {str(e)}")
            print(f"\n❌ Error preprocessing data: {str(e)}")
            return False


### Code Description

The `ModelTrainer` class extends the `DataPreprocessor` class and provides functionality for training and caching three types of recommendation models: collaborative filtering (KNN), matrix factorization (SVD), and content-based filtering. The models are trained using preprocessed ratings and book data, and the trained models are saved to disk.

#### Key Features:
1. **Data Preparation**:  
   - Prepares the ratings data for collaborative filtering using the `surprise` library.

2. **Model Training**:
   - **KNN (K-Nearest Neighbors)**: Trains a collaborative filtering model using KNN with a cosine similarity measure.
   - **SVD (Singular Value Decomposition)**: Trains a matrix factorization model using the SVD algorithm.
   - **Content-Based Filtering**: Trains a content-based recommendation model using TF-IDF vectorization on book metadata.

3. **Model Caching**:  
   - Saves the trained models (KNN, SVD, and content-based) and the TF-IDF vectorizer for later use.

4. **Error Handling**:  
   - Logs and displays error messages if the training process fails.


In [15]:
class ModelTrainer(DataPreprocessor):
    def train_and_cache_models(self):
        """
        Train and cache collaborative filtering, matrix factorization (SVD),
        and content-based recommendation models. The models are saved to disk
        for later use.

        Returns:
            bool: True if training is successful, False otherwise.
        """
        print("\nStarting model training process...")
        try:
            # Prepare data for collaborative filtering
            print("Preparing data for collaborative filtering...")
            reader = Reader(rating_scale=(1, 10))
            data = Dataset.load_from_df(self.ratings_df[['User_ID', 'ISBN', 'Rating']], reader)
            trainset = data.build_full_trainset()
            print("✓ Training data prepared")
            
            # Train KNN model (Collaborative filtering)
            print("\nTraining KNN model...")
            knn_model = KNNWithMeans(
                k=20,
                min_k=1,
                sim_options={'name': 'cosine', 'user_based': True}
            )
            knn_model.fit(trainset)
            joblib.dump(knn_model, 'model_cache/knn_model.joblib')
            print("✓ KNN model trained and cached")
            
            # Train SVD model (Matrix factorization)
            print("\nTraining SVD model...")
            svd_model = SVD(
                n_factors=50,
                n_epochs=10,
                lr_all=0.005,
                reg_all=0.02
            )
            svd_model.fit(trainset)
            joblib.dump(svd_model, 'model_cache/svd_model.joblib')
            print("✓ SVD model trained and cached")
            
            # Train content-based model using TF-IDF
            print("\nTraining content-based model...")
            tfidf = TfidfVectorizer(max_features=5000, stop_words='english')
            
            # Combine book metadata for content-based features
            content_features = self.books_df.apply(
                lambda x: f"{x['Title']} {x['Author']} {x['Publisher']}", 
                axis=1
            )
            print("Content features created")
            
            # Transform content features into a TF-IDF matrix
            tfidf_matrix = tfidf.fit_transform(content_features)
            print(f"TF-IDF matrix shape: {tfidf_matrix.shape}")
            
            # Save the trained TF-IDF vectorizer and matrix
            joblib.dump(tfidf, 'model_cache/tfidf_vectorizer.joblib')
            joblib.dump(tfidf_matrix, 'model_cache/tfidf_matrix.joblib')
            print("✓ Content-based model trained and cached")
            
            # Indicate successful model training and caching
            print("\n✓ All models trained and cached successfully!")
            return True
            
        except Exception as e:
            # Log and display errors during model training
            self.logger.error(f"Error training models: {str(e)}")
            print(f"\n❌ Error training models: {str(e)}")
            return False


# Book Recommender Class

This Python class `BookRecommender` is designed to generate book recommendations using two machine learning models: **KNN (K-Nearest Neighbors)** and **SVD (Singular Value Decomposition)**. It uses preprocessed data, cached models, and a TF-IDF vectorizer for text-based similarity. The class is built upon the `ModelTrainer` base class, which likely handles the model training and caching process.

## Class Methods Overview

### 1. `load_models()`
This method loads the preprocessed data and the trained models from cached files.

- **Preprocessed Data**: It loads `books_df` (book information) and `ratings_df` (user ratings) from CSV files.
- **Models**: It loads the KNN model, SVD model, and TF-IDF vectorizer from saved joblib files.

**Error Handling**: If any error occurs during the loading process, it logs the error and returns `False`.

### 2. `get_user_recommendations(user_id, n_recommendations=10)`
This method generates book recommendations for a specific user based on their rating history.

- **Input**: `user_id` (ID of the user for whom recommendations are generated), `n_recommendations` (number of books to recommend).
- **Process**: 
  - Retrieves the user’s ratings and identifies books that have not been rated by the user.
  - Generates recommendations using both the KNN and SVD models by predicting ratings for unrated books.
- **Output**: A dictionary with book recommendations from both models.

**Error Handling**: If any error occurs, it logs the error and returns `None`.

### 3. `get_book_recommendations(book_title, n_recommendations=10)`
This method finds books similar to the given book title using TF-IDF-based similarity.

- **Input**: `book_title` (the title of the book to find similar books), `n_recommendations` (number of similar books to recommend).
- **Process**:
  - Searches for books matching the provided title.
  - Calculates cosine similarity between the selected book and all other books using the TF-IDF matrix.
  - Returns the top similar books.
- **Output**: A list of books with their similarity scores.

**Error Handling**: If no matching books are found or any other error occurs, it logs the error and returns `None`.

### 4. `print_recommendations(recommendations, rec_type='user')`
This method prints the recommendations in a formatted way.

- **Input**: `recommendations` (the recommendations to print), `rec_type` (type of recommendations: 'user' or 'book').
- **Process**: 
  - For user-based recommendations (`rec_type='user'`), it prints both KNN and SVD recommendations.
  - For book-based recommendations (`rec_type='book'`), it prints the similar books.
- **Output**: None (prints the recommendations directly).

## Example Usage

1. **Load Models and Data**:
   ```python
   recommender = BookRecommender()
   recommender.load_models()


In [17]:
# BookRecommender Class: A recommender system for book recommendations based on user ratings and content similarities.
# The class extends ModelTrainer to load preprocessed data and cached models, generate recommendations for users and books,
# and print the recommendations in a readable format.

class BookRecommender(ModelTrainer):
    def load_models(self):
        """Load preprocessed data and cached models."""
        print("\nLoading preprocessed data and models...")
        try:
            # Load preprocessed data
            print("Loading preprocessed data...")
            self.books_df = pd.read_csv('model_cache/preprocessed_books.csv')
            self.ratings_df = pd.read_csv('model_cache/preprocessed_ratings.csv')
            print("✓ Preprocessed data loaded")
            
            # Load models from cache
            print("\nLoading cached models...")
            self.knn_model = joblib.load('model_cache/knn_model.joblib')
            self.svd_model = joblib.load('model_cache/svd_model.joblib')
            self.tfidf_vectorizer = joblib.load('model_cache/tfidf_vectorizer.joblib')
            self.tfidf_matrix = joblib.load('model_cache/tfidf_matrix.joblib')
            print("✓ All models loaded successfully!")
            
            return True
            
        except Exception as e:
            self.logger.error(f"Error loading models: {str(e)}")
            print(f"\n❌ Error loading models: {str(e)}")
            return False

    def get_user_recommendations(self, user_id, n_recommendations=10):
        """Generate book recommendations for a specific user based on KNN and SVD models."""
        print(f"\nGenerating recommendations for user {user_id}...")
        try:
            # Validate user_id and ensure it's in the dataset
            if not isinstance(user_id, int):
                user_id = int(user_id)

            if user_id not in self.ratings_df['User_ID'].unique():
                print(f"Warning: User {user_id} not found in dataset")
                return None
            
            # Get user's reading history
            user_ratings = self.ratings_df[self.ratings_df['User_ID'] == user_id]
            print(f"User has rated {len(user_ratings)} books")
            
            # Get list of unrated books
            unrated_books = list(set(self.books_df['ISBN']) - set(user_ratings['ISBN']))
            print(f"Number of unrated books: {len(unrated_books)}")
            
            recommendations = {'knn': [], 'svd': []}
            
            # Generate predictions using both KNN and SVD models
            for model, model_name in [(self.knn_model, 'KNN'), (self.svd_model, 'SVD')]:
                print(f"\nGenerating {model_name} predictions...")
                predictions = []
                pred_count = 0
                
                for isbn in unrated_books:
                    try:
                        # Predict rating for each unrated book
                        pred = model.predict(user_id, isbn)
                        predictions.append((isbn, pred.est))
                        pred_count += 1
                        if pred_count % 1000 == 0:
                            print(f"Processed {pred_count:,} predictions...")
                    except:
                        continue
                
                # Sort predictions by predicted rating (descending)
                predictions.sort(key=lambda x: x[1], reverse=True)
                
                # Get book details for top predictions
                for isbn, rating in predictions[:n_recommendations]:
                    book = self.books_df[self.books_df['ISBN'] == isbn].iloc[0]
                    recommendations[model_name.lower()].append({
                        'Title': book['Title'],
                        'Author': book['Author'],
                        'Year': book['Year'],
                        'Publisher': book['Publisher'],
                        'Predicted Rating': round(rating, 2)
                    })
            
            print("\n✓ Recommendations generated successfully!")
            return recommendations
            
        except Exception as e:
            self.logger.error(f"Error getting user recommendations: {str(e)}")
            print(f"\n❌ Error getting user recommendations: {str(e)}")
            return None

    def get_book_recommendations(self, book_title, n_recommendations=10):
        """Generate book recommendations based on content similarity (TF-IDF)."""
        print(f"\nFinding similar books to '{book_title}'...")
        try:
            # Find books matching the provided title
            matches = self.books_df[self.books_df['Title'].str.contains(book_title, case=False, na=False)]
            
            if matches.empty:
                print(f"Warning: No books found matching '{book_title}'")
                return None
            
            if len(matches) > 1:
                print("\nMultiple matches found:")
                for _, book in matches.iterrows():
                    print(f"- {book['Title']} by {book['Author']} ({book['Year']})")
                print("\nUsing the first match for recommendations.")
                
            # Select the first match
            book_idx = matches.index[0]
            target_book = matches.iloc[0]
            print(f"\nSelected book: {target_book['Title']} by {target_book['Author']}")
            
            # Calculate content-based similarity using TF-IDF matrix
            print("\nCalculating book similarities...")
            similarity_scores = cosine_similarity(
                self.tfidf_matrix[book_idx:book_idx+1],
                self.tfidf_matrix
            ).flatten()
            
            # Get indices of similar books
            similar_indices = similarity_scores.argsort()[::-1][1:n_recommendations+1]
            
            recommendations = []
            print("\nTop similar books:")
            for idx in similar_indices:
                book = self.books_df.iloc[idx]
                similarity = round(similarity_scores[idx] * 100, 2)
                
                recommendations.append({
                    'Title': book['Title'],
                    'Author': book['Author'],
                    'Year': book['Year'],
                    'Publisher': book['Publisher'],
                    'Similarity': similarity
                })
                print(f"- {book['Title']} (Similarity: {similarity}%)")
            
            print("\n✓ Similar books found successfully!")
            return recommendations
            
        except Exception as e:
            self.logger.error(f"Error getting book recommendations: {str(e)}")
            print(f"\n❌ Error getting book recommendations: {str(e)}")
            return None

    def print_recommendations(self, recommendations, rec_type='user'):
        """Print the recommendations in a user-friendly format."""
        if not recommendations:
            print("\nNo recommendations found.")
            return

        # Print user-based recommendations
        if rec_type == 'user':
            # Print KNN and SVD recommendations
            if recommendations.get('svd'):
                print("\nTop SVD Recommendations:")
                for i, rec in enumerate(recommendations['svd'], 1):
                    print(f"\n{i}. {rec['Title']} by {rec['Author']}")
                    print(f"   Predicted Rating: {rec['Predicted Rating']}")
                    print(f"   Published: {rec['Year']} by {rec['Publisher']}")
            
            if recommendations.get('knn'):
                print("\nTop KNN Recommendations:")
                for i, rec in enumerate(recommendations['knn'], 1):
                    print(f"\n{i}. {rec['Title']} by {rec['Author']}")
                    print(f"   Predicted Rating: {rec['Predicted Rating']}")
                    print(f"   Published: {rec['Year']} by {rec['Publisher']}")
        else:
            # Print content-based recommendations
            print("\nSimilar Books:")
            for i, rec in enumerate(recommendations, 1):
                print(f"\n{i}. {rec['Title']} by {rec['Author']}")
                print(f"   Published: {rec['Year']} by {rec['Publisher']}")
                print(f"   Similarity Score: {rec['Similarity']}%")


# System Initialization Function

This Python function, `initialize_system()`, is designed to initialize the recommendation system by performing a series of steps. It loads the data, generates profile reports, preprocesses the data, and trains the models. If any step fails, the process is halted, and `None` is returned.

## Function Description

### `initialize_system()`

The purpose of this function is to set up and initialize the recommendation system by sequentially performing four critical steps:

1. **Load Data**: Loads the necessary data required for the recommendation system.
2. **Generate Profile Reports**: Generates user and book profile reports.
3. **Preprocess Data**: Prepares the data for model training by cleaning and transforming it.
4. **Train Models**: Trains the recommendation models (such as KNN, SVD) and caches them for future use.

If any step fails, the function returns `None` to indicate an issue, otherwise, it returns the initialized `recommender` object.

### Example Usage

```python
recommender = initialize_system()
if recommender:
    print("System is ready to use!")
else:
    print("Failed to initialize the system.")


In [19]:
def initialize_system():
    """Initialize the recommendation system."""
    recommender = BookRecommender()
    
    print("Step 1: Load Data")
    if not recommender.load_data():
        return None

    print("\nStep 2: Generate Profile Reports")
    if not recommender.generate_profile_reports():
        return None

    print("\nStep 3: Preprocess Data")
    if not recommender.preprocess_data():
        return None

    print("\nStep 4: Train Models")
    if not recommender.train_and_cache_models():
        return None

    print("\nSystem initialized successfully!")
    return recommender

### Interactive Book Recommendation System Demo

This function, `interactive_demo()`, allows the user to interactively request book recommendations based on either a user ID or a book title. The system first loads the recommendation models, and depending on the user's choice, it either provides recommendations based on a user or a book.

#### Workflow:
1. The user is presented with two options:
   - Option 1: Get user-based recommendations.
   - Option 2: Get book-based recommendations.
2. The user selects an option, and based on the choice:
   - If the choice is '1', the user is prompted to input a user ID for which recommendations will be provided.
   - If the choice is '2', the user is prompted to input a book title for which similar books will be recommended.
3. If the input is invalid or an error occurs, an appropriate message is displayed.

The function also includes error handling to ensure smooth operation and informative messages in cae of issues.


In [21]:
def interactive_demo():
    """Run an interactive demonstration."""
    print("\nBook Recommendation System Demo")
    print("1. Get user-based recommendations")
    print("2. Get book-based recommendations")
    
    try:
        choice = input("\nEnter your choice (1 or 2): ")
        
        # Initialize recommender
        recommender = BookRecommender()
        if not recommender.load_models():
            print("Error: Could not load models. Please initialize the system first.")
            return
        
        # Handle user-based recommendations
        if choice == '1':
            user_id = input("Enter user ID: ")
            recommendations = get_recommendations(recommender, 'user', user_id)
        
        # Handle book-based recommendations
        elif choice == '2':
            book_title = input("Enter book title: ")
            recommendations = get_recommendations(recommender, 'book', book_title)
        
        # Handle invalid choices
        else:
            print("Invalid choice!")
    
    except Exception as e:
        print(f"Error running demo: {str(e)}")


# Run the interactive demo
interactive_demo()


Book Recommendation System Demo
1. Get user-based recommendations
2. Get book-based recommendations



Enter your choice (1 or 2):  1


NLTK resources downloaded successfully!

Loading preprocessed data and models...
Loading preprocessed data...
✓ Preprocessed data loaded

Loading cached models...
✓ All models loaded successfully!


Enter user ID:  11676


Error running demo: name 'get_recommendations' is not defined


# FastAPI Backend for Personalized Book Recommendation system

This FastAPI application serves as a backend for a Book Recommendation System. It provides several endpoints to retrieve book recommendations based on user preferences or similar books.

## Code Breakdown

### Imports

- `FastAPI`, `HTTPException`: For building the API and handling errors.
- `BaseModel`: Used to define Pydantic models for structured data validation.
- `List`, `Optional`: For defining types in the models.
- `uvicorn`: For running the FastAPI server.
- `nest_asyncio`: Allows the FastAPI app to work with nested event loops.
- `CORSMiddleware`: Middleware for enabling Cross-Origin Resource Sharing (CORS).

### FastAPI Initialization

```python
app = FastAPI(
    title="Book Recommendation System API",
    description="API for getting book recommendations",
    version="1.0.0"
)


In [None]:
"""
FastAPI Backend for Book Recommendation System
"""

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import uvicorn
import nest_asyncio
from fastapi.middleware.cors import CORSMiddleware
# from get_recommendations import BookRecommender

# Initialize FastAPI app
app = FastAPI(
    title="Book Recommendation System API",
    description="API for getting book recommendations",
    version="1.0.0"
)

# Apply asyncio patch for nested event loops
nest_asyncio.apply()

# Add CORS middleware to allow cross-origin requests
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Allow all origins
    allow_credentials=True,
    allow_methods=["*"],  # Allow all methods (GET, POST, etc.)
    allow_headers=["*"],  # Allow all headers
)

# Initialize the recommender system
recommender = BookRecommender()

# Define the Recommendation Pydantic model for book data
class Recommendation(BaseModel):
    Title: str
    Author: str
    Year: int
    Publisher: str
    Predicted_Rating: Optional[float] = None  # Optional predicted rating
    Similarity: Optional[float] = None  # Optional similarity score

# Define the response model for user-based recommendations
class UserRecommendationResponse(BaseModel):
    knn: List[Recommendation]
    svd: List[Recommendation]

# Define the response model for book-based recommendations
class BookRecommendationResponse(BaseModel):
    recommendations: List[Recommendation]

# Event triggered on startup to load models
@app.on_event("startup")
async def startup_event():
    """Load models on startup"""
    if not recommender.load_models():
        raise Exception("Failed to load recommendation models")

# Root endpoint for the API
@app.get("/")
def read_root():
    return {"message": "Book Recommendation System API"}

# Endpoint to get user-based recommendations
@app.get("/user/{user_id}", response_model=UserRecommendationResponse)
def get_user_recommendations(user_id: int, num_recommendations: int = 10):
    """Get recommendations for a specific user"""
    try:
        # Fetch recommendations for the given user
        recommendations = recommender.get_user_recommendations(user_id, num_recommendations)
        
        # Raise error if no recommendations found
        if not recommendations:
            raise HTTPException(status_code=404, detail="No recommendations found for this user")
        
        # Format the recommendations into the response model
        formatted_recs = {
            'knn': [
                Recommendation(
                    Title=rec['Title'],
                    Author=rec['Author'],
                    Year=rec['Year'],
                    Publisher=rec['Publisher'],
                    Predicted_Rating=rec['Predicted Rating']
                ) for rec in recommendations['knn']
            ],
            'svd': [
                Recommendation(
                    Title=rec['Title'],
                    Author=rec['Author'],
                    Year=rec['Year'],
                    Publisher=rec['Publisher'],
                    Predicted_Rating=rec['Predicted Rating']
                ) for rec in recommendations['svd']
            ]
        }
        
        return formatted_recs

    except Exception as e:
        # Handle any errors during the recommendation process
        raise HTTPException(status_code=500, detail=str(e))

# Endpoint to get book-based recommendations
@app.get("/book/{book_title}", response_model=BookRecommendationResponse)
def get_book_recommendations(book_title: str, num_recommendations: int = 10):
    """Get similar book recommendations"""
    try:
        # Fetch recommendations for the given book title
        recommendations = recommender.get_book_recommendations(book_title, num_recommendations)
        
        # Raise error if no recommendations found
        if not recommendations:
            raise HTTPException(status_code=404, detail="No recommendations found for this book")
        
        # Format the recommendations into the response model
        formatted_recs = {
            'recommendations': [
                Recommendation(
                    Title=rec['Title'],
                    Author=rec['Author'],
                    Year=rec['Year'],
                    Publisher=rec['Publisher'],
                    Similarity=rec['Similarity']
                ) for rec in recommendations
            ]
        }
        
        return formatted_recs

    except Exception as e:
        # Handle any errors during the recommendation process
        raise HTTPException(status_code=500, detail=str(e))

# Health check endpoint to monitor the API status
@app.get("/health")
def health_check():
    """API health check endpoint"""
    return {"status": "healthy"}

# Function to run the FastAPI server with logging
def run_server():
    """Run the FastAPI server with logging"""
    try:
        # Start the FastAPI server using uvicorn
        uvicorn.run(app, host="0.0.0.0", port=8000)
    except Exception as e:
        # Log the error if server startup fails
        logger.error(f"Server startup failed: {str(e)}")
        raise

# Entry point to run the server
if __name__ == "__main__":
    run_server()

        on_event is deprecated, use lifespan event handlers instead.

        Read more about it in the
        [FastAPI docs for Lifespan Events](https://fastapi.tiangolo.com/advanced/events/).
        
  @app.on_event("startup")
INFO:     Started server process [6196]
INFO:     Waiting for application startup.


NLTK resources downloaded successfully!

Loading preprocessed data and models...
Loading preprocessed data...
✓ Preprocessed data loaded

Loading cached models...


INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)


✓ All models loaded successfully!
