<a href="https://colab.research.google.com/github/Mahmoudreza/Data-Structures-using-Python/blob/master/content_based_main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here's an example project for a bachelor's student to implement a content-based recommendation system:

Problem Statement:
Develop a content-based recommendation system for books that suggests books to users based on the books' descriptions, genres, and author information.

Dataset:
Use the following sample book data for this project:

Book	 Author	 Genre 	Description

B1,	A1,	G1,	D1

B2,	A2,	G2,	D2

B3,	A1,	G1,	D3

B4,	A2,	G2,	D4

B5,	A3,	G3,	D5

B6,	A3,	G3,	D6

Steps:

Preprocess the book data and create a bag of words representation of the book descriptions.

Compute the cosine similarity between the books based on the bag of words representation.

Based on the cosine similarity scores, suggest top k most similar books to the user for a given book.

Evaluation:

To evaluate the performance of the recommendation system, you can use the following metrics:

Precision: The number of recommended items that are relevant to the user divided by the number of total recommended items.

Recall: The number of recommended items that are relevant to the user divided by the number of all relevant items.

This project provides a good starting point for a bachelor's student to learn about content-based recommendation systems.

In [1]:
book_dataset = [    {"title": "The Great Gatsby", "genre": "Fiction"},    {"title": "To Kill a Mockingbird", "genre": "Fiction"},    {"title": "Pride and Prejudice", "genre": "Fiction"},    {"title": "The Hitchhiker's Guide to the Galaxy", "genre": "Science Fiction"},    {"title": "The Lord of the Rings", "genre": "Fantasy"},    {"title": "The Da Vinci Code", "genre": "Thriller"},    {"title": "The Catcher in the Rye", "genre": "Fiction"},    {"title": "The Hunger Games", "genre": "Science Fiction"},    {"title": "Harry Potter and the Philosopher's Stone", "genre": "Fantasy"},    {"title": "The Silence of the Lambs", "genre": "Thriller"},]


Next, we'll create a matrix of book titles and the genres they belong to using a one-hot encoding approach:

In [2]:
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

# Create a list of book titles
book_titles = [book['title'] for book in book_dataset]

# Create a list of book genres
book_genres = [book['genre'] for book in book_dataset]

# One-hot encode the genres
mlb = MultiLabelBinarizer()
book_genres_encoded = pd.DataFrame(mlb.fit_transform(book_genres), columns=mlb.classes_, index=book_titles)

Next, we'll create a function to recommend books to a user based on the genres they prefer. We'll use cosine similarity to measure the similarity between the books:

In [3]:
from sklearn.metrics.pairwise import cosine_similarity

def recommend_books(title, book_genres_encoded, cosine_sim=cosine_similarity(book_genres_encoded), n=5):
    recommended_books = []
    indices = pd.Series(book_genres_encoded.index)
    idx = indices[indices == title].index[0]
    score = list(enumerate(cosine_sim[idx]))
    score = sorted(score, key=lambda x: x[1], reverse=True)
    score = score[1:n+1]
    book_indices = [i[0] for i in score]
    for i in book_indices:
        recommended_books.append(indices[i])
    return recommended_books

Now, we can use the recommend_books function to recommend books to a user based on their preferred genres. For example, if a user likes The Great Gatsby, we can recommend 5 books with similar genres:



In [4]:
recommend_books("The Great Gatsby", book_genres_encoded)


['To Kill a Mockingbird',
 'Pride and Prejudice',
 'The Catcher in the Rye',
 "The Hitchhiker's Guide to the Galaxy",
 'The Hunger Games']