# Content Based Recommodation

Similar content is recommended using attributes of the content.Because it uses attributes or tags of the content, such as book title, author, and rating, new books can be recommended immediately.

### Content-based filtering

Using user ratings of books he/she read, we can look through the metadata of the favourite books (e.g. title, genre, author, description, keywords) and find similar titles. Basically, if a user enjoys one book, then he or she will enjoy a similar book as well.

Pros:Quick, easy to understand (= transparent to users), no need for other users' ratings (will work even with low numbers of users), and more reliable in the beginning of the algorithm

Cons: By relying on metadata, with more features, we risk recommending the same genres and topics, there will be no diversity and novelty, so recommendations won't be personalized

In [None]:
import pandas as pd
import numpy as np

In [44]:
import re
import string
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

In [45]:
vamsidhar_books_data = pd.read_csv('/Users/vamsidharreddy/CMPE-255-Final-Project/data/books_data.csv')

In [46]:
vamsidhar_books_data

Unnamed: 0,id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9.780439e+12,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9.780440e+12,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9.780316e+12,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9.780061e+12,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9.780743e+12,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,7130616,7130616,7392860,19,441019455,9.780441e+12,Ilona Andrews,2010.0,Bayou Moon,...,17204,18856,1180,105,575,3538,7860,6778,https://images.gr-assets.com/books/1307445460m...,https://images.gr-assets.com/books/1307445460s...
9996,9997,208324,208324,1084709,19,067973371X,9.780680e+12,Robert A. Caro,1990.0,Means of Ascent,...,12582,12952,395,303,551,1737,3389,6972,https://s.gr-assets.com/assets/nophoto/book/11...,https://s.gr-assets.com/assets/nophoto/book/50...
9997,9998,77431,77431,2393986,60,039330762X,9.780393e+12,Patrick O'Brian,1977.0,The Mauritius Command,...,9421,10733,374,11,111,1191,4240,5180,https://images.gr-assets.com/books/1455373531m...,https://images.gr-assets.com/books/1455373531s...
9998,9999,8565083,8565083,13433613,7,61711527,9.780062e+12,Peggy Orenstein,2011.0,Cinderella Ate My Daughter: Dispatches from th...,...,11279,11994,1988,275,1002,3765,4577,2375,https://images.gr-assets.com/books/1279214118m...,https://images.gr-assets.com/books/1279214118s...


In [47]:
content_data = vamsidhar_books_data[['original_title','authors','average_rating']]
content_data = content_data.astype(str)

In [48]:
content_data['content'] = content_data['original_title'] + ' ' + content_data['authors'] + ' ' + content_data['average_rating']

In [49]:
content_data = content_data.reset_index()
indices = pd.Series(content_data.index, index=content_data['original_title'])

# Content Based Recommodation Author

In [50]:
#removing stopwords
tfidf = TfidfVectorizer(stop_words='english')

#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(content_data['authors'])

#Output the shape of tfidf_matrix
tfidf_matrix.shape

(10000, 6175)

By using TF-IDF encoding, a term (a tag for a book in our example) will be weighed according to the importance of the term within the document: The more frequently the term appears, the larger its weight.Likewise, it weighs the item inversely to its frequency across the overall dataset: It will emphasize terms that are relatively rare occurrences in the general dataset but important to the specific content at hand.Words such as 'is', 'are', 'by' or 'a' that are likely to appear in every book's content, but are not useful for user recommendations, will be weighed less heavily than words that are specific to the content we are recommending.

# Compute the cosine similarity matrix

We are going to use a simple similarity-based method called cosine similarity

In [51]:
cosine_sim_author = linear_kernel(tfidf_matrix, tfidf_matrix)

# Author Wise Recommodation

In [52]:
def get_books_recommendations(title, cosine_sim=cosine_sim_author):
    idx = indices[title]

    # Get the pairwsie similarity scores of all books with that book
    sim_score = list(enumerate(cosine_sim_author[idx]))

    # Sort the books based on the similarity scores
    sim_score = sorted(sim_score, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar books
    sim_score = sim_score[1:11]

    # Get the book indices
    book_indices = [i[0] for i in sim_score]

    # Return the top 10 most similar books
    return list(content_data['original_title'].iloc[book_indices])

In [53]:
def author_bookshows(book):
    for book in book:
        print(book)

In [54]:
vamsi_books1 = get_books_recommendations('The Hobbit', cosine_sim_author)
author_bookshows(vamsi_books1)

The Hobbit or There and Back Again
 The Fellowship of the Ring
The Two Towers
The Return of the King
The Lord of the Rings
The Hobbit and The Lord of the Rings
Unfinished Tales of Númenor and Middle-Earth
Nikola Tesla: Imagination and the Man That Invented the 20th Century
Entwined
The Children of Húrin


In [55]:
vamsi_books2 =get_books_recommendations('Shadow Kiss', cosine_sim_author)
author_bookshows(vamsi_books2)

Frostbite
Shadow Kiss
Spirit Bound
Blood Promise
Last Sacrifice 
Bloodlines
The Golden Lily
The Indigo Spell
The Fiery Heart
nan


In [56]:
vamsi_books3 = get_books_recommendations('Harry Potter and the Goblet of Fire', cosine_sim_author)
author_bookshows(vamsi_books3)

Harry Potter and the Order of the Phoenix
Harry Potter and the Chamber of Secrets
Harry Potter and the Goblet of Fire
Harry Potter and the Deathly Hallows
Harry Potter and the Half-Blood Prince
Harry Potter Boxed Set Books 1-4
nan
Harry Potter and the Prisoner of Azkaban
The Casual Vacancy
The Tales of Beedle the Bard


# Content Based Filtering On Multiple Matrix

In [70]:
count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(content_data['content'])

cosine_sim_content = cosine_similarity(count_matrix, count_matrix)

In [71]:
def get_book_recom(title, cosine_sim=cosine_sim_content):
    idx = indices[title]

    # Get the pairwsie similarity scores of all books with that book
    sim_scores = list(enumerate(cosine_sim_content[idx]))

    # Sort the books based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar books
    sim_scores = sim_scores[1:11]

    # Get the book indices
    book_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar books
    return list(content_data['original_title'].iloc[book_indices])

In [72]:
def bookshow(book):
    for book in book:
        print(book)

In [73]:
vamsi_books4 = get_recommendations('The Hobbit', cosine_sim_content)
bookshow(vamsi_books4)

The Hobbit or There and Back Again
The Hobbit and The Lord of the Rings
No, David!
The History of the Hobbit, Part One: Mr. Baggins
David Gets in Trouble
nan
The Silmarillion
The Children of Húrin
Unfinished Tales of Númenor and Middle-Earth
The Two Towers


In [74]:
vamsi_books5 =get_recommendations('Shadow Kiss', cosine_sim_content)
bookshow(vamsi_books5)

Spirit Bound
Silver Shadows
Frostbite
nan
Last Sacrifice 
Bloodlines
nan
Storm Born
Succubus On Top
Blood Promise


In [75]:
vamsi_books6 =get_recommendations('The Two Towers', cosine_sim_content)
bookshow(vamsi_books6)

Towers of Midnight
The Silmarillion
The Children of Húrin
Unfinished Tales of Númenor and Middle-Earth
The Hobbit or There and Back Again
Reckless
 The Fellowship of the Ring
The Return of the King
The Lord of the Rings
Last Sacrifice 


In [76]:
vamsi_books7 = get_recommendations('Harry Potter and the Goblet of Fire', cosine_sim_content)
bookshow(vamsi_books7)

Harry Potter and the Prisoner of Azkaban
Harry Potter and the Philosopher's Stone
Harry Potter and the Order of the Phoenix
Harry Potter and the Chamber of Secrets
Harry Potter and the Deathly Hallows
Harry Potter and the Half-Blood Prince
Harry Potter Boxed Set Books 1-4
Harry Potter Collection (Harry Potter, #1-6)
nan
Complete Harry Potter Boxed Set


In [130]:
global metric,k
k=10
metric='cosine'

In [None]:
!jupyter nbconvert Data*.ipynb --to python