# A Simple Book Recommendation System

###### Done by __Safae HAJJOUT (@Ariyes)__ and __Mounia BADDOU (@MTheCreator)__ for Algorithmics class of 2022 / 2023.
###### Mohammed VI Polytechnic University, School of Computer Science, 1st year.


### ______________________________

##### This is a walkthrough of our simple recommendation system implemented using Python 3.10 and primitive data structures such as Linked lists, Arrays and some Python structures (Lists, Dictionaries, ... ).

##### The goal of this recommender system is to try and give the most acurate recommendations based on multiple criterias: titles, ratings, genres, description and others. We tried to make use also of some libraries to ease the recommedation process that is related to natural language (NLP).

#### Now we will go through the purpose of every chunk of code. Good reading :) .

### _____________________________

#### __First import what is needed !__

In [1]:
# Here can be downloaded every needed library for our code!
#! pip install --upgrade click
#! pip intall nltk
#! pip install spacy
#! python -m spacy download en_core_web_sm
#! python -m spacy download en
#! pip install json

In [2]:
from usefulclasses import Node, LinkedList, Queue, heapsort
import json
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
nltk.download("punkt")
nltk.download('stopwords')

from errors import *
import spacy
nlp = spacy.load("en_core_web_sm")

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\hp\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\hp\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


#### We will be using the __Porter stemming algorithm__ to facilitate recommendation

In [3]:
stopwords_set = set(stopwords.words('english'))
ps = PorterStemmer()

#### _______________________________

#### We can now load our book and user related data

In [4]:
with open('books.json') as file:
    books = json.load(file)
file.close()
    
with open('users.json') as file:
    users = json.load(file)
file.close()

#### _________________________________

## __Book Class__

##### One of our main classes, contains every valuable information on every book in our dataset in the shape of attributes (author, desc, title, ... ).
##### This class also contains two methods: __str__ and __similar_books__. One serves as a display function, and the other one as a recommendation method based on ranking given by our users.
##### More informations in our docstrings!

In [5]:
class Book:
    def __init__(self, author: str, desc: str, genre: list, isbn: str, pages: int, rating: float, title: str, totalratings: int):
        """ 
        Initializes a Book object with the provided details.

            Args:
                author (str): The author of the book.
                desc (str): A short description of the content of the book.
                genre (list): A list of genres that the book belongs to.
                isbn (str): The ID of the book.
                pages (int): The number of pages in the book.
                rating (float): The overall rating of the book on a scale of 0 to 5.
                title (str): The title of the book.
                totalratings (int): The total number of ratings given to the book.

            Returns:
                None
        """
        verify_book(author= author, desc= desc, genre= genre, isbn= isbn, rating= rating, total_ratings= totalratings, title= title)
        self.author = author                    
        self.desc = desc                        
        self.genre = genre                      
        self.isbn = isbn                        
        self.pages = pages                      
        self.rating = rating
        self.title = title 
        self.totalratings = totalratings      
        self.rate = self.rating * self.totalratings 
        
    def __str__(self):
        """ A display method where the __str__ function was overridden to meet the needs of our recommendation system. """
        return self.title + " by " + self.author

##### In this following chunk of code, we tried to sort our database of Book objects in a Python list using a heapsort to ease our recommendation based on ratings. 
##### The return will be the index, which will represent the rank of the book.

In [6]:
def create_books_list():
    """
    Creates a list of Book objects from the provided books data.

    Returns:
        list: A list of Book objects sorted in non-increasing order based on their ratings and total ratings.

    """
    books_list = []
    for book in books:
        author, desc, genre, isbn, pages, rating, title, totalratings = list(book.values())
        obj = Book(author, desc, list({g.lower() for g in genre.split(',')}), isbn, pages, rating, title, totalratings)
        books_list.append(obj)

    heapsort(books_list)
    return books_list

In [7]:
books_list = create_books_list()
size = len(books_list)

##### In this chunk, we try to tokenize our database in order to ease the recommendation using Spacy and NLTK.
##### This operation will help reduce the size of the words we can use to recommend. (Stemming and Lemmatization).  

In [8]:
def create_books_dict(books_list):
    """
    Creates a dictionary of books using their ISBN as keys and a set of tokenized words from the title, description, and genre as values.

    Args:
        books_list (list): A list of Book objects.

    Returns:
        dict: A dictionary where the keys are the ISBNs of the books and the values are sets of tokenized words from the title, description, and genre.

    """
    verify_word = lambda word: word.lower() not in stopwords_set and word.isalnum()

    tokenized_title = lambda book: {word.lower() for word in book.title.split(' ') if verify_word(word)}
    tokenized_desc = lambda book: {word.lower() for word in book.desc.split(' ') if verify_word(word)}

    return { book.isbn : tokenized_title(book) | tokenized_desc(book) | set(book.genre) for book in books_list }
    

In [9]:
books_dict = create_books_dict(books_list)

#### _______________________________

## __User Class__

##### Our second Class for this code, where we store every information related to the user, such as the id, name, library, wishlist and other.

##### Almost every recommendation will be done through the User class since it is easier and recommends to each and every user.

##### For the methods of this class, they will be explained in the docstrings that accompany the code.

In [10]:
# This will aid for the read_book and rate_book methods in the user class
def change_value(queue: Queue, old_value, new_value):
    """
    Modifies a given queue by replacing occurrences of an old value with a new value.

    Args:
        queue (Queue): The queue to be modified.
        old_value: The value to be replaced.
        new_value: The new value to replace the old value with.

    Returns:
        None: This function modifies the queue in-place.

    """
    modified = False
    temp_queue = Queue()

    # Dequeue elements until the desired value is found or the queue becomes empty
    while len(queue) > 0:
        element = queue.dequeue()
        if element == old_value:
            element = new_value
            modified = True
        temp_queue.enqueue(element)
    # Enqueue the modified value back into the queue
    if modified:
        queue.enqueue(new_value)
    # Enqueue the remaining dequeued elements back into the queue
    while len(temp_queue) > 0:
        queue.enqueue(temp_queue.dequeue())

In [11]:
class User:
    def __init__(self, id: int, name: str, fav_genres: LinkedList):
        """
        Initializes a User object with the provided attributes.

        Args:
            id (int): The ID of the user.
            name (str): The name of the user.
            fav_genres (LinkedList): A linked list containing the user's favorite genres.
            library (Queue): A queue representing the user's library of books.
            wishlist (Queue): A queue representing the user's wishlist of books.

        """
        verify_user(user_id= id, name= name, genres= fav_genres)
        self.id = id
        self.name = name
        self.fav_genres = fav_genres
        self.library = Queue()
        self.wishlist = Queue()
        self.community = LinkedList()

    def read_book(self, book: Book, rate: int= 0):
        """
        Rates a book in the user's library.

        Args:
            book (Book): The book to be rated.
            rate (int): The rating to assign to the book.

        Returns:
            None

        """
        print(f"the book {str(book)} was added successfully to your library ... ")
        self.library.enqueue((book, rate))
        print(f"You have rated the book {str(book)}: {rate} ... ")

    def add_to_wishlist(self, book: Book):
        """
        Adds a book to the user's wishlist.

        Args:
            book (Book): The book to be added to the wishlist.

        Returns:
            None

        """
        self.wishlist.enqueue(book)

    def search_author(self, author: str) -> LinkedList:
        """
        Searches for books by a specific author.

        Args:
            author (str): The author's name to search for.

        Returns:
            LinkedList: A linked list of books written by the specified author.

        """
        books = LinkedList()
        for book in books_list:
            if author in book.author:
                books.prepend(book)
        return books

    def search_book(self, title: str) -> LinkedList:
        """
        Searches for books with a specific title.

        Args:
            title (str): The title of the book to search for.

        Returns:
            LinkedList: A linked list of books with the specified title.

        """
        books = LinkedList()
        for book in books_list:
            if title in book.title:
                books.prepend(book)
        return books

    def search_book_isbn(self, isbn: str) -> LinkedList:
        """
    Searches for a book with a specific ISBN.

    Args:
        isbn (str): The ISBN of the book to search for.

    Returns:
        Book Object: The book with the specified ISBN.

    """
        for book in books_list:
            if isbn in book.isbn:
                return book

    def search_genre(self, genre: str) -> LinkedList:
        """
    Searches for books within a specific genre.

    Args:
        genre (str): The genre to search for.

    Returns:
        LinkedList: A linked list of books belonging to the specified genre.

    """
        books = LinkedList()
        for book in books_list:
            if genre.lower() in book.genre:
                # prepend to get in decreasing order with low cost of operation (add at head)
                books.prepend(book)
        return books

    def search_keywords(self, keywords: str, lower = 0, higher = size) -> LinkedList:
        """
    Searches for books based on one or more keywords .

    Args:
        keywords (str): Variable number of keyword arguments separated by commas (,).
        lower (int): lower bound of search in the index, the higher it is the higher the book's rank
        higher (int): higher bound of search in the index

    Returns:
        LinkedList: The books corresponding to the keywords

    """
        keyterms = [ps.stem(term) for term in keywords.split(',')]
        books = LinkedList()
        for i in range(lower, higher):
            book = books_list[i]
            tokens = [ps.stem(word) for word in books_dict[book.isbn]]
            score = sum([1 if lemma in keyterms else 0 for lemma in tokens])
            # book must have at least 6 keyterms in common
            if score >= 6:
                books.prepend(book.title)
        
        return books

    def _find_max_count(self, data: list) -> str:
        """
    Helper function to find the element with the maximum count in a list.

    Args:
        data (list): A list of elements.

    Returns:
        str: The element with the maximum count.

    """
        # using count dictionary approach
        counts = dict()
        for elem in data:
            # set count to 0 if checked for the first time else add 1
            counts[elem] = counts.get(elem, 0) + 1

        max_count = 0
        for (elem, count) in counts.items():
            (max_name, max_count) = (
                elem, count) if max_count < count else (max_name, max_count)

        return max_name

    def most_checked_genre(self) -> str:
        """
    Returns the genre that has been checked out the most in the user's library.

    Returns:
        str: The genre with the highest check count from the user.

    """
        genres = [genre for node in self.library for genre in node.data[0].genre]
        if genres != []:
            return self._find_max_count(genres)

    def most_checked_author(self) -> str:
        """
    Returns the author who has been checked out the most in the user's library.

    Returns:
        str: The author with the highest check-out count.

    """
        # same approach as genre
        authors = [
            author for book in self.library for author in book.author.split(',')]
        if authors != []:
            return self._find_max_count(authors)

    def recommend_books_genre(self) -> dict:
        """
    Recommends books based on the user's favorite genres.

    Returns:
        dict: A dictionary where the keys are the user's favorite genres and the values are linked lists of recommended books.

    """
        books = dict()
        for genre in self.fav_genres:
            count = 3
            books[genre.data] = LinkedList()
            # traverse books_list (heap) in reverse order to get values in decreasing order
            for i in range(size - 1, -1, -1):
                if genre.data.lower() in books_list[i].genre and count != 0:
                    books[genre.data].append(books_list[i])
                    count -= 1
        return books
    
    def recommend_books(self):
        recommended = LinkedList()
        
        liked_books = [node.data[0] for node in self.library if node.data[1] >= 4]
        for book in liked_books:
            idx = books_list.index(book)
            higher, lower = min(idx + 20, size), max(idx - 20, 0)
            string = ','.join(list(books_dict[book.isbn]))
            recs = self.search_keywords(string, lower, higher)
            recommended.prepend(recs)
        return recommended
    
    def add_community(self, users: list):
        for user in users:
            if user.most_checked_genre() in self.fav_genres:
                self.community.add(user)
        print(f"{user.name} with id: {user.id} was added successfully to your Reader Community !")

    def recommend_other_users(self):
        experiment = LinkedList()
        for friend in self.community:
            friend_books = [node.data[0] for node in friend.library if node.data[0] not in self.library and node.data[1] >= 4]
            experiment.prepend(friend_books)
        
        return experiment 

In [12]:
def create_users_list():
    """
    Creates a list of User objects from the provided users data.

    Returns:
        list: A list of User objects.

    """
    def _generate_linked_list(py_l, linked_l):
        # helper function to generate linked list from python list
        for elem in py_l:
            linked_l.prepend(elem)
        return linked_l
    
    users_list = []
    for user in users:
        user_id, name, genres = list(user.values())
        obj = User(user_id, name, _generate_linked_list(genres, LinkedList()))
        users_list.append(obj)

    return users_list

users = create_users_list()

# <b>Project Simulation:</b>

In [13]:
sample_users = users[:21]
sample_names = [user.name for user in sample_users]
sample_genres = [str(user.fav_genres) for user in sample_users]
print(sample_names)
print(sample_genres)

['Octavia Schwartz', 'Edwin Preston', 'Indie Greer', 'Koda Lowe', 'Amari Felix', 'Rodney Whitaker', 'Ivanna Hebert', 'Guillermo Blanchard', 'Layne Heath', 'Lionel Pope', 'Aurelia Wilkinson', 'Leonard Schroeder', 'Cameron Ellis', 'Cole Coffey', 'Paola Patel', 'Parker Cruz', 'Claire Pruitt', 'Gatlin Lozano', 'Cecelia Wilkinson', 'Leonard Bravo', 'Amoura Moran']
['Victorian, Humor, ', 'Gender Studies, Business, ', 'Chad, Bolivia, ', 'Morocco, Eritrea, ', 'Beer, Angels, ', 'Basketball, Mail Order Brides, ', 'Gay, Mental Illness, ', 'Greece, Conservation, ', 'Polish Literature, Erotica, ', 'Spain, College, ', 'Comedian, 14th Century, ', 'Muslims, Germany, ', 'Greece, Cities, ', 'Russian Literature, Football, ', 'Nursery Rhymes, Modern Classics, ', 'Latin American History, Quilting, ', 'Thelema, Liberia, ', 'Scotland, Natural History, ', 'Time Travel, Zimbabwe, ', 'Personal Development, Anglo Saxon, ', 'Crime, Photography, ']


In [14]:
dict_users = {user.id: None for user in sample_users}
for user in sample_users:
    packed_books = user.recommend_books_genre()
    books = [[book.data for book in books] for books in list(packed_books.values())]
    dict_users[user.id] = books



#### We will try to demonstrate some of our key methods for one user, then we will generalize for the whole __sample_users__ list

In [15]:
user0 = sample_users[0]
user0.wishlist = Queue()
user0.library = Queue()

user0.add_to_wishlist(dict_users[user0.id][0])


In [16]:
for book in dict_users[user0.id][0]:
    user0.read_book(book, random.randint(0,5)



the book Devil in Winter by Lisa Kleypas was added successfully to your library ... 
You have rated the book Devil in Winter by Lisa Kleypas: 2.5 ... 
the book Secrets of a Summer Night by Lisa Kleypas was added successfully to your library ... 
You have rated the book Secrets of a Summer Night by Lisa Kleypas: 3.5 ... 
the book Scandal in Spring by Lisa Kleypas was added successfully to your library ... 
You have rated the book Scandal in Spring by Lisa Kleypas: 4.5 ... 


#### Now for the __sample_users__

In [17]:
import random
for user in sample_users[1:]:
        
    user.add_to_wishlist(dict_users[user.id][0])

    for book in dict_users[user.id][0]:
        user.read_book(book, random.randint(0, 5))


the book The Vagina Monologues by Eve Ensler,Gloria Steinem was added successfully to your library ... 
You have rated the book The Vagina Monologues by Eve Ensler,Gloria Steinem: 0 ... 
the book In a Different Voice: Psychological Theory and Women's Development by Carol Gilligan was added successfully to your library ... 
You have rated the book In a Different Voice: Psychological Theory and Women's Development by Carol Gilligan: 5 ... 
the book Against Our Will: Men, Women and Rape by Susan Brownmiller was added successfully to your library ... 
You have rated the book Against Our Will: Men, Women and Rape by Susan Brownmiller: 5 ... 
the book African Rice Heart by Emily Star Wilkens was added successfully to your library ... 
You have rated the book African Rice Heart by Emily Star Wilkens: 2 ... 
the book The Time in Between by MarÃ­a DueÃ±as,Daniel Hahn was added successfully to your library ... 
You have rated the book The Time in Between by MarÃ­a DueÃ±as,Daniel Hahn: 2 ... 
the

In [18]:
number = random.randint(0, 21)

random_user = sample_users[number]

recs = random_user.recommend_books()

In [19]:
print(recs)

True History of the Kelly Gang, Carrion Comfort, Mercy, The Mortal Instruments, The Flanders Panel, Parce que je t'aime, The Little Coffee Shop of Kabul, Blue Monday, The Jefferson Key, N or M?, , 


In [20]:
random_user.add_community(sample_users)

Amoura Moran with id: 20 was added successfully to your Reader Community !


In [21]:
print(random_user.recommend_other_users())


