# Intorduction
This notebook explores the application of collaborative filtering (CF) in building a book recommendation system. CF leverages the wisdom of the crowd, drawing insights from the reading preferences of other users to suggest books you might enjoy. By analyzing user-book interactions, the system identifies patterns and similarities between readers, ultimately recommending titles aligned with your tastes.

# Importing Libraries

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'numpy'

# Loading the Data

In [None]:
books = pd.read_csv("/kaggle/input/book-recommendation-dataset/Books.csv")
users = pd.read_csv("/kaggle/input/book-recommendation-dataset/Users.csv")
ratings = pd.read_csv("/kaggle/input/book-recommendation-dataset/Ratings.csv")

# EDA

In [None]:
books.head()

In [None]:
users.head()

In [None]:
ratings.head()

In [None]:
print(books.shape)
print(ratings.shape)
print(users.shape)

In [None]:
books.isnull().sum()

In [None]:
users.isnull().sum()

In [None]:
ratings.isnull().sum()

In [None]:
print(books.duplicated().sum())
print(users.duplicated().sum())
print(ratings.duplicated().sum())

Let's perform a merge operation between ratings and books dataframes, to combine relevant information based on the shared key 'ISBN' .

In [None]:
ratings_with_book_titles = ratings.merge(books,on='ISBN')

In [None]:
ratings_with_book_titles.drop(columns=["ISBN","Image-URL-S","Image-URL-M"],axis=1,inplace=True)

In [None]:
complete_df = ratings_with_book_titles.merge(users.drop("Age", axis=1), on="User-ID")
complete_df.head()

let's perform the following :
* Splits location strings by commas.
* Keeps only the last part (country name).
* Removes leading/trailing whitespaces.

In [None]:
complete_df['Location'] = complete_df['Location'].str.split(',').str[-1].str.strip()

In [None]:
complete_df.head()

### Collaborative Filtering Based Recommender System


In [None]:
# Select user IDs with more than 200 book ratings
min_ratings_threshold = 200

# Count book ratings per user
num_ratings_per_user = complete_df.groupby('User-ID')['Book-Rating'].count()

# Filter users with more than the minimum threshold
knowledgeable_user_ids = num_ratings_per_user[num_ratings_per_user > min_ratings_threshold].index

In [None]:
# Filter ratings from knowledgeable users
knowledgeable_user_ratings = complete_df[complete_df['User-ID'].isin(knowledgeable_user_ids)]

In [None]:
min_ratings_count_threshold=50
rating_counts= knowledgeable_user_ratings.groupby('Book-Title').count()['Book-Rating']
popular_books = rating_counts[rating_counts >= min_ratings_count_threshold].index


In [None]:
final_ratings =  knowledgeable_user_ratings[knowledgeable_user_ratings['Book-Title'].isin(popular_books)]

In [None]:
pt = final_ratings.pivot_table(index='Book-Title',columns='User-ID'
                          ,values='Book-Rating')
pt

we need to find the cosine similarity for our pivot table 

cosine_similarity takes a matrix as input, where each row represents a data point and each column represents a feature. In our case, the rows represent users, and the columns represent book titles.
The function calculates the cosine similarity between every pair of rows (users) in the matrix. Cosine similarity measures the angle between two vectors; a score of 1 indicates perfect similarity, while 0 indicates perfect dissimilarity.
The output of the function is a new square matrix where each element (i, j) represents the cosine similarity score between user i and user j.

In the context of a recommender system:

We can use this matrix to recommend items to users based on their similarity to other users who have rated those books highly.
For example, you can find the user with the highest cosine similarity to a particular user and recommend the items that the similar user rated highly.

In [None]:
pt.fillna(0,inplace=True)
pt

In [None]:
from sklearn.preprocessing import normalize
import numpy as np
from scipy.sparse import issparse
from scipy import sparse
from sklearn.utlis.validation import check_array

def safe_sparse_dot(a, b, dense_output=False):
    
    if sparse.issparse(a) or sparse.issparse(b):
        ret = a * b
        if dense_output and hasattr(ret, "toarray"):
            ret = ret.toarray()
        return ret
    else:
        return np.dot(a, b)

def _return_float_dtype(X, Y):
    if not issparse(X) and not isinstance(X, np.ndarray):
        X = np.asarray(X)

    if Y is None:
        Y_dtype = X.dtype
    elif not issparse(Y) and not isinstance(Y, np.ndarray):
        Y = np.asarray(Y)
        Y_dtype = Y.dtype
    else:
        Y_dtype = Y.dtype

    if X.dtype == Y_dtype == np.float32:
        dtype = np.float32
    else:
        dtype = float

    return X, Y, dtype
def check_pairwise_arrays(
    X,
    Y,
    *,
    precomputed=False,
    dtype=None,
    accept_sparse="csr",
    force_all_finite=True,
    copy=False,
):
    X, Y, dtype_float = _return_float_dtype(X, Y)

    estimator = "check_pairwise_arrays"
    if dtype is None:
        dtype = dtype_float

    if Y is X or Y is None:
        X = Y = check_array(
            X,
            accept_sparse=accept_sparse,
            dtype=dtype,
            copy=copy,
            force_all_finite=force_all_finite,
            estimator=estimator,
        )
    else:
        X = check_array(
            X,
            accept_sparse=accept_sparse,
            dtype=dtype,
            copy=copy,
            force_all_finite=force_all_finite,
            estimator=estimator,
        )
        Y = check_array(
            Y,
            accept_sparse=accept_sparse,
            dtype=dtype,
            copy=copy,
            force_all_finite=force_all_finite,
            estimator=estimator,
        )

    if precomputed:
        if X.shape[1] != Y.shape[0]:
            raise ValueError(
                "Precomputed metric requires shape "
                "(n_queries, n_indexed). Got (%d, %d) "
                "for %d indexed." % (X.shape[0], X.shape[1], Y.shape[0])
            )
    elif X.shape[1] != Y.shape[1]:
        raise ValueError(
            "Incompatible dimension for X and Y matrices: "
            "X.shape[1] == %d while Y.shape[1] == %d" % (X.shape[1], Y.shape[1])
        )

    return X, Y
    
def cosine_similarity(X, Y=None, dense_output=True):
    X, Y = check_pairwise_arrays(X, Y)

    X_normalized = normalize(X, copy=True)
    if X is Y:
        Y_normalized = X_normalized
    else:
        Y_normalized = normalize(Y, copy=True)

    K = safe_sparse_dot(X_normalized, Y_normalized.T, dense_output=dense_output)

    return K

In [None]:
similarity_score = cosine_similarity(pt)

In [None]:
def recommend(book_name):
    index = np.where(pt.index==book_name)[0][0]
    similar_books = sorted(list(enumerate(similarity_score[index])),key=lambda x:x[1], reverse=True)[1:6]
    
    data = []
    
    for i in similar_books:
        item = []
        temp_df = books[books['Book-Title'] == pt.index[i[0]]]
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Book-Title'].values))
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Book-Author'].values))
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Image-URL-M'].values))
        
        data.append(item)
    return data

In [None]:
recommend("A Walk to Remember")

In [None]:
recommend("Prodigal Summer")

In [None]:
recommend("1984")

In [None]:
recommend("Harry Potter and the Goblet of Fire (Book 4)")

# Let's Try SVD algorithm for our recommendation system:
SVD, or Singular Value Decomposition, is a popular algorithm for collaborative filtering based on matrix factorization. It decomposes the user-item rating matrix into two smaller matrices:

* User latent factors: These represent "underlying preferences" or hidden characteristics of users.
* Item latent factors: These represent "intrinsic features" or characteristics of items.
When multiplied together, these two matrices approximate the original rating matrix.

In [None]:
# Install Surprise library
!pip install scikit-surprise

In [None]:
import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

# Define the rating scale
reader = Reader(rating_scale=(0, 10))

# Load the data into Surprise's dataset format
data = Dataset.load_from_df(complete_df[['User-ID', 'Book-Title', 'Book-Rating']], reader)

# Split the dataset into training and testing sets
train_set, test_set = train_test_split(data, test_size=0.20, random_state=42)

# Define the SVD algorithm
model = SVD()

# Train the algorithm on the training set
model.fit(train_set)

# Make predictions on the test set
predictions = model.test(test_set)

# Evaluate the model
accuracy.rmse(predictions)


**Interpretation of RMSE:**

Root Mean Squared Error (RMSE) measures the average difference between predicted and actual ratings. Lower RMSE indicates better model performance.
In our case: An RMSE of 3.5208 means that, on average, our model's predictions are off by about 3.52 units on a scale of the scale of the ratings (0 to 10).

In [None]:
def recommend_books(user_id, n=10):
    # List all unique book titles
    all_books = complete_df['Book-Title'].unique()

    # Remove books already rated by the user
    rated_books = complete_df[complete_df['User-ID'] == user_id]['Book-Title'].values
    books_to_predict = [book for book in all_books if book not in rated_books]

    # Predict ratings for remaining books
    predictions = []
    for book in books_to_predict:
        pred = model.predict(user_id, book)
        predictions.append((book, pred.est))

    # Sort predictions by estimated rating
    predictions.sort(key=lambda x: x[1], reverse=True)

    # Get top N recommendations
    top_n = predictions[:n]

    return top_n


**Let's use our model to find the Top 10 recommended books for user 271705:**

In [None]:
user_id = 271705
recommended_books = recommend_books(user_id)
print(f"Top 10 recommended books for user {user_id}:")
for i, (title, _) in enumerate(recommended_books, start=1):
    print(f"{i}. {title}")