## Define my problem:

Recommend books to users given a book a user likes.

## Data

ratings of books from users 

## Training features:
- other books a user has marked to read
- other or al books the user has rated highly 

## Solutions:
1. Nearest Neighbors
Given an input book, find books most similarly rated to that book.

2. Collaborative Filtering Using Matrix Factorization
Use SVD to get a coef matrix of all books, then given a book find those books with r coef's greater than .9

In [1]:
import pandas as pd 
import numpy as np 
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
from sklearn.decomposition import TruncatedSVD

In [2]:
import warnings 
warnings.filterwarnings('ignore', category=RuntimeWarning)

In [3]:
# main data
DATA_PATH = "./data/"
ratings = pd.read_csv( DATA_PATH + 'ratings.csv' )
books = pd.read_csv( DATA_PATH + 'books.csv' )

In [4]:
ratings.head()

Unnamed: 0,user_id,book_id,rating
0,1,258,5
1,2,4081,4
2,2,260,5
3,2,9296,5
4,2,2318,3


## Nearest Neighbors Implementation 

In [5]:
# making pivot table with user_id as index and book_id as the columns
r_pivot = ratings.pivot(index="user_id", columns="book_id", values="rating")

In [6]:
# filling NaNs with 0
r_pivot = r_pivot.fillna(0)

In [7]:
# transposing pivot table to have book_id as index
r_pivot = r_pivot.T

In [8]:
# making ratings matrix
ratings_matrix = csr_matrix(r_pivot.values)

In [9]:
# instantiating model 
model = NearestNeighbors(metric="cosine", algorithm="brute")
# fitting model to rating_matrix
model.fit(ratings_matrix)

NearestNeighbors(algorithm='brute', leaf_size=30, metric='cosine',
                 metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                 radius=1.0)

In [10]:
def get_recommendations(query_index):
    query_index = query_index-1
    distances, indices = model.kneighbors(r_pivot.iloc[query_index, :].values.reshape(1, -1),
                                         n_neighbors=6)

    for i in range(len(distances.flatten())):
        if i == 0:
            curr_book_id = r_pivot.index[query_index]
            curr_book_title = books[books.book_id == curr_book_id]['title'].values[0]
            print(f"Recommendation for {curr_book_title}")
        else:
            curr_book_id = r_pivot.index[indices.flatten()[i]]
            curr_book_title = books[books.book_id == curr_book_id]['title'].values[0]
            print(f"{i}: {curr_book_title}")

In [11]:
books.head()

Unnamed: 0,book_id,goodreads_book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


In [12]:
get_recommendations(1)

Recommendation for The Hunger Games (The Hunger Games, #1)
1: Catching Fire (The Hunger Games, #2)
2: Mockingjay (The Hunger Games, #3)
3: Harry Potter and the Sorcerer's Stone (Harry Potter, #1)
4: Twilight (Twilight, #1)
5: Divergent (Divergent, #1)


In [13]:
get_recommendations(2)

Recommendation for Harry Potter and the Sorcerer's Stone (Harry Potter, #1)
1: Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)
2: Harry Potter and the Chamber of Secrets (Harry Potter, #2)
3: Harry Potter and the Goblet of Fire (Harry Potter, #4)
4: Harry Potter and the Order of the Phoenix (Harry Potter, #5)
5: Harry Potter and the Half-Blood Prince (Harry Potter, #6)


In [14]:
get_recommendations(3)

Recommendation for Twilight (Twilight, #1)
1: New Moon (Twilight, #2)
2: Eclipse (Twilight, #3)
3: Breaking Dawn (Twilight, #4)
4: The Hunger Games (The Hunger Games, #1)
5: Harry Potter and the Sorcerer's Stone (Harry Potter, #1)


## Matrix Facorization Implementation

In [15]:
X = r_pivot.values
X

array([[0., 0., 0., ..., 4., 4., 4.],
       [0., 5., 0., ..., 5., 5., 5.],
       [0., 0., 0., ..., 0., 0., 4.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [16]:
# use SVD to get matrix
svd = TruncatedSVD(n_components=12, random_state=42)
matrix = svd.fit_transform(X)
matrix.shape

(10000, 12)

In [17]:
book_id_list = list(r_pivot.index)

In [18]:
test_book_id = books[books.title == "The Hunger Games (The Hunger Games, #1)"]["book_id"].values[0]
book_id_list.index(test_book_id)

0

In [19]:
# make function to get id from title
def get_book_id(title):
    test_book_id = books[books.title == title]["book_id"].values[0]
    return book_id_list.index(test_book_id)
# function to get title from id
def get_book_title(idx):
    return books[books.book_id == idx]['title'].values[0]

In [20]:
corr = np.corrcoef(matrix)
corr.shape

(10000, 10000)

In [21]:
# getting recommendations for Hunger Games #1
corr_hunger_games = corr[get_book_id("The Hunger Games (The Hunger Games, #1)")]

In [22]:
book_ids = r_pivot.index
recommendation_ids = list(book_ids[(corr_hunger_games<1.0)&(corr_hunger_games>.9)])

In [23]:
for idx in recommendation_ids:
    print(get_book_title(idx))

Catching Fire (The Hunger Games, #2)
Mockingjay (The Hunger Games, #3)
The Hunger Games Trilogy Boxset (The Hunger Games, #1-3)


In [27]:
# getting matrix recommendations for Twilight (Twilight, #1)
corr_twighlight = corr[get_book_id("Twilight (Twilight, #1)")]

In [30]:
get_book_id("Twilight (Twilight, #1)")

2

In [28]:
book_ids = r_pivot.index
recommendation_ids = list(book_ids[(corr_twighlight<1.0)&(corr_twighlight>.9)])

In [29]:
for idx in recommendation_ids:
    print(get_book_title(idx))

New Moon (Twilight, #2)
Eclipse (Twilight, #3)
Breaking Dawn (Twilight, #4)
The Host (The Host, #1)
A Walk to Remember
Twilight: The Complete Illustrated Movie Companion
P.S. I Love You
The Twilight Saga Breaking Dawn Part 1: The Official Illustrated Movie Companion (The Twilight Saga: The Official Illustrated Movie Companion, #4)
Twilight Director's Notebook : The Story of How We Made the Movie Based on the Novel by Stephenie Meyer
The Twilight Saga (Twilight, #1-4)
New Moon: The Complete Illustrated Movie Companion (The Twilight Saga: The Official Illustrated Movie Companion, #2)
Eclipse: The Complete Illustrated Movie Companion (The Twilight Saga: The Official Illustrated Movie Companion, #3)
Size 12 Is Not Fat (Heather Wells, #1)
Queen of Babble (Queen of Babble, #1)
The Boy Next Door (Boy, #1)
Queen of Babble Gets Hitched (Queen of Babble, #3)
Queen of Babble in the Big City (Queen of Babble, #2)
Size 14 Is Not Fat Either (Heather Wells, #2)
