# Movie & Book Recommendation

For movie & book recommendation we are using **Nearest neighbor item-based collaborative filtering**.
There two datasets, in one dataset contains movie names and genres and other dataset contains movie rating.

Movie dataset link: https://grouplens.org/datasets/movielens/latest/

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## (1) Movie Recommendation using Nearest Neighbors

In [2]:
movie_df=pd.read_csv("Downloads/movies.csv")
rating_df=pd.read_csv("downloads/ratings.csv")

In [3]:
movie_df

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [4]:
rating_df

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


We are recommending the movies based on their ratings.

In [5]:
movie_df1=movie_df[["movieId","title"]]
rating_df1=rating_df[["userId","movieId","rating"]]

In [6]:
movie_df1

Unnamed: 0,movieId,title
0,1,Toy Story (1995)
1,2,Jumanji (1995)
2,3,Grumpier Old Men (1995)
3,4,Waiting to Exhale (1995)
4,5,Father of the Bride Part II (1995)
...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017)
9738,193583,No Game No Life: Zero (2017)
9739,193585,Flint (2017)
9740,193587,Bungo Stray Dogs: Dead Apple (2018)


In [7]:
rating_df1

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0
...,...,...,...
100831,610,166534,4.0
100832,610,168248,5.0
100833,610,168250,5.0
100834,610,168252,5.0


In [8]:
combined_df=pd.merge(rating_df1,movie_df1,on="movieId")
combined_df

Unnamed: 0,userId,movieId,rating,title
0,1,1,4.0,Toy Story (1995)
1,5,1,4.0,Toy Story (1995)
2,7,1,4.5,Toy Story (1995)
3,15,1,2.5,Toy Story (1995)
4,17,1,4.5,Toy Story (1995)
...,...,...,...,...
100831,610,160341,2.5,Bloodmoon (1997)
100832,610,160527,4.5,Sympathy for the Underdog (1971)
100833,610,160836,3.0,Hazard (2005)
100834,610,163937,3.5,Blair Witch (2016)


In [9]:
combined_df.isnull().sum()

userId     0
movieId    0
rating     0
title      0
dtype: int64

**Total ratings for each movies**

In [11]:
movie_rating_count=(combined_df.groupby(by=["title"])["rating"].count().reset_index().
                    rename(columns={"rating":"TotalRatingCount"}))
movie_rating_count

Unnamed: 0,title,TotalRatingCount
0,'71 (2014),1
1,'Hellboy': The Seeds of Creation (2004),1
2,'Round Midnight (1986),2
3,'Salem's Lot (2004),1
4,'Til There Was You (1997),2
...,...,...
9714,eXistenZ (1999),22
9715,xXx (2002),24
9716,xXx: State of the Union (2005),5
9717,¡Three Amigos! (1986),26


In [12]:
rating_with_TotalRatingCount=pd.merge(combined_df,movie_rating_count,on="title")
rating_with_TotalRatingCount

Unnamed: 0,userId,movieId,rating,title,TotalRatingCount
0,1,1,4.0,Toy Story (1995),215
1,5,1,4.0,Toy Story (1995),215
2,7,1,4.5,Toy Story (1995),215
3,15,1,2.5,Toy Story (1995),215
4,17,1,4.5,Toy Story (1995),215
...,...,...,...,...,...
100831,610,160341,2.5,Bloodmoon (1997),1
100832,610,160527,4.5,Sympathy for the Underdog (1971),1
100833,610,160836,3.0,Hazard (2005),1
100834,610,163937,3.5,Blair Witch (2016),1


**Sumary of total ratings**

In [14]:
movie_rating_count["TotalRatingCount"].describe()

count    9719.000000
mean       10.375141
std        22.406220
min         1.000000
25%         1.000000
50%         3.000000
75%         9.000000
max       329.000000
Name: TotalRatingCount, dtype: float64

Below we are considering only those movies whose total ratings is greater than or equal to 50.

In [15]:
rating_popular_movie=rating_with_TotalRatingCount["TotalRatingCount"]>=50
rating_popular_movie

0          True
1          True
2          True
3          True
4          True
          ...  
100831    False
100832    False
100833    False
100834    False
100835    False
Name: TotalRatingCount, Length: 100836, dtype: bool

**Dataset with total rating >=50**

In [16]:
rating_popular_movie_df=rating_with_TotalRatingCount[rating_popular_movie]
rating_popular_movie_df

Unnamed: 0,userId,movieId,rating,title,TotalRatingCount
0,1,1,4.0,Toy Story (1995),215
1,5,1,4.0,Toy Story (1995),215
2,7,1,4.5,Toy Story (1995),215
3,15,1,2.5,Toy Story (1995),215
4,17,1,4.5,Toy Story (1995),215
...,...,...,...,...,...
79248,603,1997,4.0,"Exorcist, The (1973)",53
79249,606,1997,3.0,"Exorcist, The (1973)",53
79250,607,1997,5.0,"Exorcist, The (1973)",53
79251,608,1997,4.5,"Exorcist, The (1973)",53


**Pivot table for the above dataset**

**Pivot_table:** Pivot table from pandas summerzies the one or more numeric variable based on the two other categorical variables. i.e In the following dataset **userId & title** are categorical variables and the values are ratings for each movies given by each user.

In [18]:
movie_features_df=rating_popular_movie_df.pivot_table(index="title",columns="userId",values="rating")
movie_features_df

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10 Things I Hate About You (1999),,,,,,,,,,,...,,,3.0,,5.0,,,,,
12 Angry Men (1957),,,,5.0,,,,,,,...,5.0,,,,,,,,,
2001: A Space Odyssey (1968),,,,,,,4.0,,,,...,,,5.0,,,5.0,,3.0,,4.5
28 Days Later (2002),,,,,,,,,,,...,,,,,,,,3.5,,5.0
300 (2007),,,,,,,,,,3.0,...,,,,,3.0,,,5.0,,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
X2: X-Men United (2003),,,,,,,4.0,,,,...,,,,,,,,4.0,,4.0
You've Got Mail (1998),,,0.5,,,,,,,,...,,,2.0,,,3.5,,,,
Young Frankenstein (1974),5.0,,,,,,,,,,...,,,5.0,,,3.5,,,,
Zombieland (2009),,3.0,,,,,,,,,...,,,,,,,,,,3.5


In [19]:
movie_features_df1=movie_features_df.fillna(0)
movie_features_df1

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10 Things I Hate About You (1999),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,3.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0
12 Angry Men (1957),0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,...,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2001: A Space Odyssey (1968),0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,...,0.0,0.0,5.0,0.0,0.0,5.0,0.0,3.0,0.0,4.5
28 Days Later (2002),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5,0.0,5.0
300 (2007),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,...,0.0,0.0,0.0,0.0,3.0,0.0,0.0,5.0,0.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
X2: X-Men United (2003),0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.0
You've Got Mail (1998),0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,2.0,0.0,0.0,3.5,0.0,0.0,0.0,0.0
Young Frankenstein (1974),5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,5.0,0.0,0.0,3.5,0.0,0.0,0.0,0.0
Zombieland (2009),0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5


In [20]:
from scipy.sparse import csr_matrix

In [21]:
movie_features_df1_matrix=csr_matrix(movie_features_df1.values)
movie_features_df1_matrix

<450x606 sparse matrix of type '<class 'numpy.float64'>'
	with 41360 stored elements in Compressed Sparse Row format>

**Applying nearest neighbor algorithm**

In [22]:
from sklearn.neighbors import NearestNeighbors

In [23]:
model_nn=NearestNeighbors(metric="cosine",algorithm="brute")
model_nn.fit(movie_features_df1_matrix)

NearestNeighbors(algorithm='brute', metric='cosine')

In [76]:
query_index=np.random.choice(movie_features_df1.shape[0])
query_index

134

In [79]:
movie_features_df1.iloc[query_index,:].values.reshape(1,-1)

array([[0. , 0. , 0. , 0. , 0. , 0. , 4. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. , 4.5, 4.5, 0. , 0. , 0. , 0. , 0. , 3. , 0. , 0. , 0. ,
        0. , 0. , 4.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 2. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 4. , 0. , 0. ,
        0. , 0. , 0. , 5. , 0. , 5. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 3. , 0. , 0. , 0. ,
        0. , 0. , 0. , 3.5, 0. , 0. , 0. , 0. , 0. , 4.5, 0. , 5. , 0. ,
        0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 3.5, 0. , 4. ,
        0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. , 0. , 0. , 0. , 5. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        0. , 0. , 0. , 0. , 5. , 0. , 0. , 5. , 0. , 0. , 0. , 5. , 0. ,
        0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 2. , 5. , 0. , 0. ,
        0. , 0. , 0. , 0. , 0. , 4. , 0. , 0. , 0. 

In [80]:
distances,indices=model_nn.kneighbors(movie_features_df1.iloc[query_index,:].values.reshape(1,-1),
                                      n_neighbors=6)


In [81]:
distances

array([[0.        , 0.41956479, 0.43952674, 0.44065213, 0.4449322 ,
        0.44688147]])

In [82]:
indices

array([[134,  59, 324,  27,   2, 317]], dtype=int64)

In [95]:
indexes=np.array(movie_features_df1.index)
indexes

array(['10 Things I Hate About You (1999)', '12 Angry Men (1957)',
       '2001: A Space Odyssey (1968)', '28 Days Later (2002)',
       '300 (2007)', '40-Year-Old Virgin, The (2005)',
       'A.I. Artificial Intelligence (2001)', 'Abyss, The (1989)',
       'Ace Ventura: Pet Detective (1994)',
       'Ace Ventura: When Nature Calls (1995)',
       'Addams Family Values (1993)', 'Air Force One (1997)',
       'Airplane! (1980)', 'Aladdin (1992)', 'Alien (1979)',
       'Aliens (1986)', 'Almost Famous (2000)', 'Amadeus (1984)',
       "Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)",
       'American Beauty (1999)', 'American History X (1998)',
       'American Pie (1999)', 'American President, The (1995)',
       'American Psycho (2000)',
       'Anchorman: The Legend of Ron Burgundy (2004)',
       'Animal House (1978)', 'Annie Hall (1977)',
       'Apocalypse Now (1979)', 'Apollo 13 (1995)',
       'Arachnophobia (1990)', 'Armageddon (1998)',
       'Army of Darkness (1993)', '

In [96]:
indexes[indices]

array([['Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)',
        'Blade Runner (1982)', 'Rear Window (1954)',
        'Apocalypse Now (1979)', '2001: A Space Odyssey (1968)',
        'Psycho (1960)']], dtype=object)

In [89]:
(2, movie_features_df1.index[indices.flatten()[2]],distances.flatten()[2])

(2, 'Rear Window (1954)', 0.4395267433111216)

In [83]:
length=len(distances.flatten())
length

6

**movie Recommendation**

In [97]:
for i in range(0,length):
    if i==0:
        print('Recommendations for {0}:\n'.format(movie_features_df1.index[query_index]))
    else:
         print('{0}: {1}, with distance of {2}'.format(i, movie_features_df1.index[indices.flatten()[i]],
                                                       distances.flatten()[i]))
    

Recommendations for Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964):

1: Blade Runner (1982), with distance of 0.4195647886111097
2: Rear Window (1954), with distance of 0.4395267433111216
3: Apocalypse Now (1979), with distance of 0.44065213080049015
4: 2001: A Space Odyssey (1968), with distance of 0.44493220439849623
5: Psycho (1960), with distance of 0.44688147291981495


## (2) Book Recommendation using Nearest Neighbors

Book dataset link: https://www2.informatik.uni-freiburg.de/~cziegler/BX/

In [4]:
ratings_df=pd.read_csv("Downloads/BX-CSV-Dump/BX-Book-Ratings.csv",sep=";",encoding="latin-1")

In [6]:
ratings_df

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6
...,...,...,...
1149775,276704,1563526298,9
1149776,276706,0679447156,0
1149777,276709,0515107662,10
1149778,276721,0590442449,10


In [7]:
books_df=pd.read_csv("Downloads/BX-CSV-Dump/BX-Books.csv",sep=";",encoding="latin-1",
                     on_bad_lines="skip")

  books_df=pd.read_csv("Downloads/BX-CSV-Dump/BX-Books.csv",sep=";",encoding="latin-1",


In [8]:
books_df

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,0195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,0060973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,0374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,0393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...
...,...,...,...,...,...,...,...,...
271355,0440400988,There's a Bat in Bunk Five,Paula Danziger,1988,Random House Childrens Pub (Mm),http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...
271356,0525447644,From One to One Hundred,Teri Sloat,1991,Dutton Books,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...
271357,006008667X,Lily Dale : The True Story of the Town that Ta...,Christine Wicker,2004,HarperSanFrancisco,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...
271358,0192126040,Republic (World's Classics),Plato,1996,Oxford University Press,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...


In [9]:
users_df=pd.read_csv("Downloads/BX-CSV-Dump/BX-Users.csv",sep=";",encoding="latin-1")

In [10]:
users_df

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",
...,...,...,...
278853,278854,"portland, oregon, usa",
278854,278855,"tacoma, washington, united kingdom",50.0
278855,278856,"brampton, ontario, canada",
278856,278857,"knoxville, tennessee, usa",


In [11]:
combined_book_ratings=pd.merge(ratings_df,books_df,on="ISBN")
combined_book_ratings

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
3,8680,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
4,10314,034545104X,9,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
...,...,...,...,...,...,...,...,...,...,...
1031131,276688,0517145553,0,Mostly Harmless,Douglas Adams,1995,Random House Value Pub,http://images.amazon.com/images/P/0517145553.0...,http://images.amazon.com/images/P/0517145553.0...,http://images.amazon.com/images/P/0517145553.0...
1031132,276688,1575660792,7,Gray Matter,Shirley Kennett,1996,Kensington Publishing Corporation,http://images.amazon.com/images/P/1575660792.0...,http://images.amazon.com/images/P/1575660792.0...,http://images.amazon.com/images/P/1575660792.0...
1031133,276690,0590907301,0,Triplet Trouble and the Class Trip (Triplet Tr...,Debbie Dadey,1997,Apple,http://images.amazon.com/images/P/0590907301.0...,http://images.amazon.com/images/P/0590907301.0...,http://images.amazon.com/images/P/0590907301.0...
1031134,276704,0679752714,0,A Desert of Pure Feeling (Vintage Contemporaries),Judith Freeman,1997,Vintage Books USA,http://images.amazon.com/images/P/0679752714.0...,http://images.amazon.com/images/P/0679752714.0...,http://images.amazon.com/images/P/0679752714.0...


In [12]:
columns = ["Book-Author","Year-Of-Publication","Publisher","Image-URL-S","Image-URL-M","Image-URL-L"]
combined_book_ratings_df=combined_book_ratings.drop(columns=columns,axis=1)
combined_book_ratings_df

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title
0,276725,034545104X,0,Flesh Tones: A Novel
1,2313,034545104X,5,Flesh Tones: A Novel
2,6543,034545104X,0,Flesh Tones: A Novel
3,8680,034545104X,5,Flesh Tones: A Novel
4,10314,034545104X,9,Flesh Tones: A Novel
...,...,...,...,...
1031131,276688,0517145553,0,Mostly Harmless
1031132,276688,1575660792,7,Gray Matter
1031133,276690,0590907301,0,Triplet Trouble and the Class Trip (Triplet Tr...
1031134,276704,0679752714,0,A Desert of Pure Feeling (Vintage Contemporaries)


In [13]:
combined_book_ratings_df.isnull().sum()

User-ID        0
ISBN           0
Book-Rating    0
Book-Title     0
dtype: int64

In [14]:
combined_book_ratings_df.dropna(axis=0,subset="Book-Title")

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title
0,276725,034545104X,0,Flesh Tones: A Novel
1,2313,034545104X,5,Flesh Tones: A Novel
2,6543,034545104X,0,Flesh Tones: A Novel
3,8680,034545104X,5,Flesh Tones: A Novel
4,10314,034545104X,9,Flesh Tones: A Novel
...,...,...,...,...
1031131,276688,0517145553,0,Mostly Harmless
1031132,276688,1575660792,7,Gray Matter
1031133,276690,0590907301,0,Triplet Trouble and the Class Trip (Triplet Tr...
1031134,276704,0679752714,0,A Desert of Pure Feeling (Vintage Contemporaries)


In [15]:
combined_book_ratings_df["Book-Rating"].value_counts()

0     647294
8      91804
10     71225
7      66402
9      60778
5      45355
6      31687
4       7617
3       5118
2       2375
1       1481
Name: Book-Rating, dtype: int64

In [16]:
book_rating_count_df=(combined_book_ratings_df.groupby("Book-Title")["Book-Rating"].count().
                      reset_index().rename(columns={"Book-Rating":"TotalRatingCount"}))
book_rating_count_df

Unnamed: 0,Book-Title,TotalRatingCount
0,A Light in the Storm: The Civil War Diary of ...,4
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1
4,Beyond IBM: Leadership Marketing and Finance ...,1
...,...,...
241066,Ã?Â?lpiraten.,2
241067,Ã?Â?rger mit Produkt X. Roman.,4
241068,Ã?Â?sterlich leben.,1
241069,Ã?Â?stlich der Berge.,3


In [17]:
combined_nd_book_ratingcount_df=pd.merge(combined_book_ratings_df,book_rating_count_df,
                                         on="Book-Title")
combined_nd_book_ratingcount_df

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,TotalRatingCount
0,276725,034545104X,0,Flesh Tones: A Novel,60
1,2313,034545104X,5,Flesh Tones: A Novel,60
2,6543,034545104X,0,Flesh Tones: A Novel,60
3,8680,034545104X,5,Flesh Tones: A Novel,60
4,10314,034545104X,9,Flesh Tones: A Novel,60
...,...,...,...,...,...
1031131,276688,0425150526,0,Death Crosses the Border,1
1031132,276688,0449907422,0,Jazz Funeral: A Skip Langdon Novel,1
1031133,276690,0590907301,0,Triplet Trouble and the Class Trip (Triplet Tr...,1
1031134,276704,0679752714,0,A Desert of Pure Feeling (Vintage Contemporaries),1


In [18]:
pd.set_option("display.float_format", lambda x:"%.4f" %x)
combined_nd_book_ratingcount_df["TotalRatingCount"].describe()

count   1031136.0000
mean         69.7816
std         175.3381
min           1.0000
25%           3.0000
50%          13.0000
75%          61.0000
max        2502.0000
Name: TotalRatingCount, dtype: float64

In [38]:
book_rating_count_df["TotalRatingCount"].describe()

count   241071.0000
mean         4.2773
std         16.7387
min          1.0000
25%          1.0000
50%          1.0000
75%          3.0000
max       2502.0000
Name: TotalRatingCount, dtype: float64

In [16]:
book_rating_count_df["TotalRatingCount"].quantile(np.arange(0.9,1,0.01))

0.9000    7.0000
0.9100    8.0000
0.9200    9.0000
0.9300   10.0000
0.9400   11.0000
0.9500   13.0000
0.9600   16.0000
0.9700   20.0000
0.9800   29.0000
0.9900   50.0000
Name: TotalRatingCount, dtype: float64

In [19]:
ratings_ofPopular_book=combined_nd_book_ratingcount_df[
    combined_nd_book_ratingcount_df["TotalRatingCount"]>=50]
ratings_ofPopular_book

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,TotalRatingCount
0,276725,034545104X,0,Flesh Tones: A Novel,60
1,2313,034545104X,5,Flesh Tones: A Novel,60
2,6543,034545104X,0,Flesh Tones: A Novel,60
3,8680,034545104X,5,Flesh Tones: A Novel,60
4,10314,034545104X,9,Flesh Tones: A Novel,60
...,...,...,...,...,...
730560,227447,0061092096,0,Love in Another Town,68
730561,231210,0061092096,0,Love in Another Town,68
730562,238781,0061092096,5,Love in Another Town,68
730563,244349,0061092096,0,Love in Another Town,68


In [20]:
combined_df=pd.merge(ratings_ofPopular_book,users_df,on="User-ID")
combined_df.drop("Age",axis=1,inplace=True)
combined_df

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,TotalRatingCount,Location
0,276725,034545104X,0,Flesh Tones: A Novel,60,"tyler, texas, usa"
1,2313,034545104X,5,Flesh Tones: A Novel,60,"cincinnati, ohio, usa"
2,2313,0812533550,9,Ender's Game (Ender Wiggins Saga (Paperback)),249,"cincinnati, ohio, usa"
3,2313,0679745580,8,In Cold Blood (Vintage International),55,"cincinnati, ohio, usa"
4,2313,0399146431,5,The Bonesetter's Daughter,384,"cincinnati, ohio, usa"
...,...,...,...,...,...,...
288735,108962,0060176806,8,Love in Another Town,68,"el cajon, california, usa"
288736,116812,0060176806,8,Love in Another Town,68,"fredericton, new brunswick, canada"
288737,121442,0060176806,5,Love in Another Town,68,"north little rock, arkansas, usa"
288738,159856,0060176806,3,Love in Another Town,68,"new maryland, new brunswick, canada"


In [21]:
book_features_df=combined_df.pivot_table(index="Book-Title",columns="User-ID",
                                         values="Book-Rating").fillna(0)
book_features_df

User-ID,8,9,14,16,17,19,23,26,32,39,...,278820,278824,278828,278832,278836,278843,278844,278846,278851,278854
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10 Lb. Penalty,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
16 Lighthouse Road,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
1984,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
1st to Die: A Novel,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
2010: Odyssey Two,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
Zoya,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
"\O\"" Is for Outlaw""",0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
"\Surely You're Joking, Mr. Feynman!\"": Adventures of a Curious Character""",0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,...,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000


In [22]:
from scipy.sparse import csr_matrix

In [23]:
book_features_df_matrix=csr_matrix(book_features_df.values)
book_features_df_matrix

<2444x47994 sparse matrix of type '<class 'numpy.float64'>'
	with 113910 stored elements in Compressed Sparse Row format>

In [25]:
from sklearn.neighbors import NearestNeighbors

In [26]:
model_nn1=NearestNeighbors(metric="cosine",algorithm="brute")
model_nn1.fit(book_features_df_matrix)

NearestNeighbors(algorithm='brute', metric='cosine')

In [32]:
query_index1=np.random.choice(book_features_df.shape[0])
query_index1

222

In [33]:
book_features_df.index[query_index1]

'Bearing an Hourglass (Incarnations of Immortality (Paperback))'

In [34]:
book_features_df.iloc[query_index1,:].values.reshape(1,-1)

array([[0., 0., 0., ..., 0., 0., 0.]])

In [35]:
distances,indices=model_nn1.kneighbors(book_features_df.iloc[query_index1,:].values.reshape(1,-1),
                                       n_neighbors=6)

In [36]:
distances

array([[0.        , 0.70253614, 0.77716728, 0.83692632, 0.83891945,
        0.83928811]])

In [37]:
indices

array([[ 222, 1234,  709, 1861,  560,  584]], dtype=int64)

In [38]:
recommended_books=book_features_df.index[np.array(indices)]
recommended_books

  recommended_books=book_features_df.index[np.array(indices)]


array([['Bearing an Hourglass (Incarnations of Immortality (Paperback))',
        'On a Pale Horse (Incarnations of Immortality, Bk. 1)', 'Friday',
        'The Fires of Heaven (The Wheel of Time, Book 5)',
        'Dune Messiah (Dune Chronicles, Book 2)',
        "Enchanters' End Game (The Belgariad, Book 5)"]], dtype=object)

In [39]:
recommended_books[0,1:6]

array(['On a Pale Horse (Incarnations of Immortality, Bk. 1)', 'Friday',
       'The Fires of Heaven (The Wheel of Time, Book 5)',
       'Dune Messiah (Dune Chronicles, Book 2)',
       "Enchanters' End Game (The Belgariad, Book 5)"], dtype=object)

In [40]:
len(recommended_books.flatten())

6

**Book Recommendation**

In [41]:
for i in range(len(recommended_books.flatten())):
    if i==0:
        print("Recommendation for {0}:\n".format(book_features_df.index[query_index1]))
    else:
        print('{0}: {1}:'.format(i,book_features_df.index[indices.flatten()[i]]))

Recommendation for Bearing an Hourglass (Incarnations of Immortality (Paperback)):

1: On a Pale Horse (Incarnations of Immortality, Bk. 1):
2: Friday:
3: The Fires of Heaven (The Wheel of Time, Book 5):
4: Dune Messiah (Dune Chronicles, Book 2):
5: Enchanters' End Game (The Belgariad, Book 5):
