## Mini Project 1

## Recommendation System

## 120 Points.

### Implementation

<font color="blue">**Task 1: Reading Data**</font>

1. <font color="red">[10 pts]</font> Write a function <font color="brown">read_ratings_data(f)</font> that takes in a ratings file name, and returns a dictionary. (Note: the parameter is a file name string such as "ratings.csv", NOT a file pointer.) The dictionary should have ISBN as key, and the list of all ratings for it as value.
For example:  book_ratings_dict = { '034545104X': [9, 8, 7], '0486282406': [10, 9, 8] }

In [5]:
import csv

def read_ratings_data(f):
    '''
    IN: f (str) - filename
    OUT: book_ratings_dict (dict{str: list[int]}) - dictionary of ratings
    '''

    # Set up dictionary to store ratings
    book_ratings_dict = {}

    # Set up csv reader
    with open(f, 'r') as csvfile:
        reader = csv.DictReader(csvfile)

        # Read in data
        for row in reader:
            # Add rating to dictionary
            isbn = row['ISBN']
            rating = int(row['Book-Rating'])
            if isbn in book_ratings_dict:
                book_ratings_dict[isbn].append(rating)
            else:
                book_ratings_dict[isbn] = [rating]

    # Return dictionary
    return book_ratings_dict

2. <font color="red">[10 pts]</font> Write a function <font color="brown">read_book_author(f)</font> that takes in a books.csv file name and returns a dictionary. The dictionary should have a one-to-one mapping from ISBN to author.
For example:   book_author_dict = { '0195153448': 'Mark P. O. Morford', '0373037430': 'Rebecca Winters' }

Note: Some books may have multiple authors. In this case, you can take the entire string as a macro author.

In [12]:
import csv

def read_book_author(f):
    '''
    IN: f (str) - filename
    OUT: book_author_dict (dict{str: str}) - dictionary of book authors
    '''

    # Set up dictionary to store authors
    book_author_dict = {}

    # Set up csv reader
    with open(f, 'r') as csvfile:
        reader = csv.DictReader(csvfile)

        # Read in data
        for row in reader:
            # Add author to dictionary
            isbn = row['ISBN']
            author = row['Book-Author']
            book_author_dict[isbn] = author

    # Return dictionary
    return book_author_dict

<font color="blue">**Task 2: Processing Data**</font>

1. <font color="red">[8 pts]</font> author  dictionary 

    Write a function<font color="brown"> create_author_dict</font> that takes as a parameter a book dictionary, of the kind created in Task 1.2. The function should return another dictionary in which a author is mapped to all the books in that author.

    For example:   { 'Author 1': ['034545104X', '0385333498'], 'Author 2': ['0142000663'] }

In [10]:
def create_author_dict(book_author_dict):
    '''
    IN: book_author_dict (dict{str: str}) - dictionary of book authors
    OUT: author_to_books_dict (dict{str: list[str]}) - dictionary of authors and their books
    '''

    # Set up dictionary to store authors and their books
    author_to_books_dict = {}

    # Populate dictionary
    for ISBN, author in book_author_dict.items():
        if author in author_to_books_dict:
            author_to_books_dict[author].append(ISBN)
        else:
            author_to_books_dict[author] = [ISBN]

    # Return dictionary
    return author_to_books_dict

2. <font color="red">[8 pts]</font> Average Rating
    Write a function <font color="brown">calculate_average_rating</font> that takes as a parameter a ratings dictionary, of the kind created in Task 1.1. It should return a dictionary where the book ISBN is mapped to its average rating computed from the ratings list.

    For example:   {'034545104X': 4.0, '0375803482': 7.0 }

In [11]:
def calculate_average_rating(book_ratings_dict):
    '''
    IN: book_ratings_dict (dict{str: list[int]}) - dictionary of ratings
    OUT: book_to_average_dict (dict{str: float}) - dictionary of average ratings
    '''

    # Set up dictionary to store average ratings
    book_to_average_dict = {}

    # Calculate average rating for each book
    for ISBN, ratings in book_ratings_dict.items():
        average_rating = sum(ratings) / len(ratings)
        book_to_average_dict[ISBN] = average_rating

    # Return dictionary
    return book_to_average_dict

<font color="blue">**Task 3: Recommendation**</font>

1. <font color="red">[10 pts]</font> Popularity based

    In services such as kindle and goodnotes, you often see recommendations with the heading “Popular Books or “Trending top 10”.

    Write a function <font color="brown">get_popular_books</font> that takes as parameters a dictionary of book-to-average rating ( as created in Task 2.2), and an integer n (default should be 10). The function should return a dictionary ( book:average, same structure as input dictionary) of top n books based on the average ratings. If there are fewer than n books, it should return all books in ranked order of average ratings from highest to lowest.



In [14]:
def get_popular_books(book_to_average_dict, n=10):
    '''
    IN: book_to_average_dict (dict{str: float}) - dictionary of average ratings
        n (int) - number of books to return
    OUT: popular_books_dict (dict{str: float}) - dictionary of top n books
    '''

    # Sort books by average rating

    d = {}

    book_to_average_sorted = sorted(book_to_average_dict, key=book_to_average_dict.get, reverse=True)

    # Return top n books

    top_n = min(len(book_to_average_sorted), n)

    res = {}
        
    for i in range(top_n):
        res[book_to_average_sorted[i]] =  book_to_average_dict[book_to_average_sorted[i]]
    
    return res

get_popular_books({"Harry Potter": 3.4, "Percy Jackon": 5, "DaVinci Code": 1}, n=10)

{'Percy Jackon': 5, 'Harry Potter': 3.4, 'DaVinci Code': 1}

2. <font color="red">[10  pts]</font> Threshold Rating

    Write a function <font color="brown"> filter_books </font> that takes as parameters a dictionary of book-to-average rating (same as for the popularity based function above), and a threshold rating with default value of 3. The function should filter books  based on the threshold rating, and return a dictionary with same structure as the input. 
    For example, if the threshold rating is 3.5, the returned dictionary should have only those books from the input whose average rating is equal to or greater than 3.5.

In [25]:
def filter_books(book_to_average_dict, threshold=3.0):
    '''
    IN: book_to_average_dict (dict{str: float}) - dictionary of average ratings
        threshold (float) - minimum rating to keep
    OUT: filtered_books_dict (dict{str: float}) - dictionary of books above threshold
    '''

    # Filter books above threshold
    filtered_books_dict = {}

    for book in book_to_average_dict:
        if book_to_average_dict[book] >= threshold:
            filtered_books_dict[book] = book_to_average_dict[book] 

    # Return filtered books
    
    return filtered_books_dict
filter_books({"Harry Potter": 3.4, "Percy Jackon": 5, "DaVinci Code": 1}, threshold=0)

{'Harry Potter': 3.4, 'Percy Jackon': 5, 'DaVinci Code': 1}

3. <font color="red">[10 pts]</font> Popularity + Author Based 

    In most recommendation systems, creator of the movie/song/book plays an important role. Often, features like popularity, author(creator) are combined to present recommendations to a user.

    Write a function <font color="brown">get_popular_by_author</font> that, given a author, a author-to-books  dictionary (as created in Task 2.1), a dictionary of book-to-average rating (as created in Task 2.2), and an integer n (default 5), returns the top n most popular books  in that author  based on the average ratings. The return value should be a dictionary of book-to-average rating of books that make the cut. If there are fewer than n books, it should return all books in ranked order of average ratings from highest to lowest.

    Note: some books in the `author_to_books_dict` dictionary may not appear in the `book_to_average_dict` dictionary. You should ignore such books.

In [13]:
def get_popular_by_author(author, author_to_books_dict, book_to_average_dict, n=5):
    '''
    IN: author (str) - author name
        author_to_books_dict (dict{str: list[str]}) - dictionary of authors and their books
        book_to_average_dict (dict{str: float}) - dictionary of average ratings
        n (int) - number of books to return
    OUT: popular_books_by_author_dict (dict{str: float}) - dictionary of top n books by author
    '''

    # Get books by author
    
    if author not in author_to_books_dict:
        return None
    books = author_to_books_dict[author]

    # Sort books by average rating
    books_rate = {}

    for b in books:
        if b not in book_to_average_dict:
            break
        books_rate[b] = book_to_average_dict[b]
    
    books_sorted = sorted(books_rate, key=books_rate.get, reverse=True)

    # Return top n books

    top_n = min(len(books_sorted), n)

    res = {}
        
    for i in range(top_n):
        res[books_sorted[i]] =  books_rate[books_sorted[i]]
    
    return res
    
get_popular_by_author("J.K.", {"J.K.": ["Harry Potter", "HP 2", "HP 3"], "Rick": ["Percy Jackson"]}, {"Harry Potter": 3.4, "HP 2": 3.1, "Percy Jackon": 5, "DaVinci Code": 1}, 2)

{'Harry Potter': 3.4, 'HP 2': 3.1}

4. <font color="red">[10  pts]</font>  Author Rating 

    One important analysis for content platforms is to determine ratings by Author

    Write a function <font color="brown">get_author_rating</font> that takes the same parameters as <font color="brown">get_popular_by_author</font> above, except for n, and returns the average rating of the books in the given author.

In [2]:
def get_author_rating(author, author_to_books_dict, book_ratings_dict):
    '''
    IN: author (str) - author name
        author_to_books_dict (dict{str: list[str]}) - dictionary of authors and their books
        book_ratings_dict (dict{str: list[int]}) - dictionary of ratings
    OUT: author_rating (float) - average rating for author's books
    '''

    # Get books by author
    if author not in author_to_books_dict:
        return None
    books = author_to_books_dict[author]

    # Get ratings for books
    count = 0
    ratings = 0
    for book in books:
        if book in book_ratings_dict:
            ratings += book_ratings_dict[book]
            count += 1

    # Calculate average rating
    average = 0
    if ratings > 0 and count > 0:
        average = ratings/count

    # Return average rating

    return average


get_author_rating("J.K.", {"J.K.": ["Harry Potter", "HP 2", "HP 3"], "Rick": ["Percy Jackson"]}, {"Harry Potter": 3.4, "HP 2": 3.4, "HP 3": 5, "Percy Jackon": 5, "DaVinci Code": 1})

3.9333333333333336

5. <font color="red">[10 pts]</font> Author Popularity 

    Write a function <font color="brown">author_popularity </font> that takes as parameters a author-to-books  dictionary (as created in Task 2.1), a book-to-average  rating dictionary (as created in Task 2.2), and n (default 5), and returns the top-n rated authors  as a dictionary of author-to-average  rating. If there are fewer than n authors , it should return all authors in ranked order of average ratings from highest to lowest. 
    Hint: Use the above get_author_rating function as a helper.

In [8]:
def author_popularity(author_to_books_dict, book_to_average_dict, n=5):
    '''
    IN: author_to_books_dict (dict{str: list[str]}) - dictionary of authors and their books
        book_to_average_dict (dict{str: float}) - dictionary of average ratings
        n (int) - number of authors to return
    OUT: popular_authors_dict (dict{str: float}) - dictionary of top n authors
    '''
    auth_rat = {}
    # Calculate average rating for each author
    for author in author_to_books_dict:
        auth_rat[author] = get_author_rating(author,author_to_books_dict, book_to_average_dict)

    # Return top n authors
    auth_sorted = sorted(auth_rat, key=auth_rat.get, reverse = True)

    res = []

    for i in range(n):
        res.append(auth_sorted[i])

    return res
author_popularity({"J.K.": ["Harry Potter", "HP 2", "HP 3"], "Rick": ["Percy Jackson"], "Tom Brady": ["Football", "occe", "Football 2"]}, {"Harry Potter": 3.4, "HP 2": 3.4, "HP 3": 5, "Percy Jackon": 5, "DaVinci Code": 1, "Football": }, 2)

['J.K.', 'Rick']

<font color="blue">**Task 4: User Focused**</font>

1. <font color="red">[10 pts]</font> Read the ratings file to return a user-to-books dictionary that maps user ID to a list of the books they rated, along with the rating they gave. Write a function named <font color="brown">read_user_ratings</font> for this, with the ratings file as the parameter.
For example: { 1: [('034545104X', 5), ('0385333498', 4)], 2: [('0142000663', 3)] } 

In [1]:
def read_user_ratings(f):
    '''
    IN: f (str) - filename
    OUT: user_to_book_ratings_dict (dict{str: list[tuple(str, int)]}) - dictionary of user ratings
    '''
    user_to_book_ratings_dict = {}
    with open(f, 'r') as file:
        next(file)
        for line in file:
            print(line)
            user_id, book_id, rating = line.strip().split(',')
            user_id = int(user_id)
            rating = int(rating)

            if user_id in user_to_book_ratings_dict:
                user_to_book_ratings_dict[user_id].append((book_id, rating))
            else:
                user_to_book_ratings_dict[user_id] = [(book_id, rating)]

    return user_to_book_ratings_dict

2. <font color="red">[10 pts]</font> Write a function <font color="brown"> get_user_top_author</font> that takes as parameters a userID, the user-to-books dictionary (as created in Task 4.1 above), and the book information dictionary (as created in Task 1.2), and returns the top author that the user likes based on the user's ratings. Here, the top author for the user will be determined by taking the average rating of the books author that the user has rated. If multiple author have the same highest ratings for the user, return any one of author (arbitrarily) as the top author.

Notes: 
- Some books in the `user_to_book_ratings_dict` dictionary may not appear in the `book_author_dict` dictionary. You should ignore such books. 
- If none of the books rated by the user are present in the `book_author_dict` dictionary, return `None`.

In [2]:
def get_user_top_author(user_ID, user_to_book_ratings_dict, book_author_dict):
    '''
    IN: userID (str) - user ID
        user_to_book_ratings_dict (dict{str: list[tuple(str, int)]}) - dictionary of user ratings
        book_author_dict (dict{str: str}) - dictionary of book authors
    OUT: top_author (str) - author with highest average ratings by user
    '''

    if user_ID not in user_to_book_ratings_dict:
        return None

    author_ratings = {}

    books_rated_by_user = user_to_book_ratings_dict[user_ID]

    for book_id, rating in books_rated_by_user:
        if book_id in book_author_dict:
            author = book_author_dict[book_id]

            if author in author_ratings:
                author_ratings[author].append(rating)
            else:
                author_ratings[author] = [rating]

    if not author_ratings:
        return None

    top_author = None
    highest_avg_rating = 0


    for author, ratings in author_ratings.items():
        avg_rating = sum(ratings) / len(ratings)

        if avg_rating > highest_avg_rating:
            highest_avg_rating = avg_rating
            top_author = author

    return top_author

3. <font color="red">[10 pts]</font> Recommend 3 most popular (highest average rating) books from the user's top author that the user has not yet rated. Write a function <font color="brown">recommend_books</font> for this, that takes as parameters a user id, the user-to-books dictionary (as created in Task 4.1 above), the author-to-books  dictionary (as created in Task 1.2), and the book-to-average  rating dictionary (as created in Task 2.2). The function should return a dictionary of book-to-average  rating. If fewer than 3  books  make the cut, then return all the  books that make the cut in ranked order of average ratings from highest to lowest.

In [4]:
RECOMMEND_NUM = 3
def recommend_books(user_ID, user_to_book_ratings_dict, book_author_dict, book_to_average_dict):
    '''
    IN: userID (str) - user ID
        user_to_book_ratings_dict (dict{str: list[tuple(str, int)]}) - dictionary of user ratings
        book_author_dict (dict{str: str}) - dictionary of book authors
        book_to_average_dict (dict{str: float}) - dictionary of average ratings
    OUT: recommended_books_dict (dict{str: float}) - dictionary of recommended books
    '''
    top_author = get_user_top_author(user_ID, user_to_book_ratings_dict, book_author_dict)

    if not top_author:
        return {}  # No top author found

    books_rated_by_user = set(book_id for book_id, _ in user_to_book_ratings_dict.get(user_ID, []))

    books_by_top_author = [book_id for book_id, author in book_author_dict.items() if author == top_author]

    books_to_recommend = [book_id for book_id in books_by_top_author if book_id not in books_rated_by_user]

    book_ratings_to_recommend = {book_id: book_to_average_dict[book_id] for book_id in books_to_recommend if
                                 book_id in book_to_average_dict}

    sorted_books = sorted(book_ratings_to_recommend.items(), key=lambda x: x[1], reverse=True)

    recommended_books = dict(sorted_books[:RECOMMEND_NUM])

    return recommended_books