**About Book Crossing Dataset**<br>

This dataset has been compiled by Cai-Nicolas Ziegler in 2004, and it comprises of three tables for users, books and ratings. Explicit ratings are expressed on a scale from 1-10 (higher values denoting higher appreciation) and implicit rating is expressed by 0.

Reference: http://www2.informatik.uni-freiburg.de/~cziegler/BX/ 

**Objective**

This project entails building a Book Recommender System for users based on user-based and item-based collaborative filtering approaches.

#### Execute the below cell to load the datasets

In [1]:
import pandas as pd
import numpy as np

In [82]:
#Loading data
books = pd.read_csv("books/books.csv", sep=";", error_bad_lines=False, encoding="latin-1")
books.columns = ['ISBN', 'bookTitle', 'bookAuthor', 'yearOfPublication', 'publisher', 'imageUrlS', 'imageUrlM', 'imageUrlL']

users = pd.read_csv('books/users.csv', sep=';', error_bad_lines=False, encoding="latin-1")
users.columns = ['userID', 'Location', 'Age']

ratings = pd.read_csv('books/ratings.csv', sep=';', error_bad_lines=False, encoding="latin-1")
ratings.columns = ['userID', 'ISBN', 'bookRating']

b'Skipping line 6452: expected 8 fields, saw 9\nSkipping line 43667: expected 8 fields, saw 10\nSkipping line 51751: expected 8 fields, saw 9\n'
b'Skipping line 92038: expected 8 fields, saw 9\nSkipping line 104319: expected 8 fields, saw 9\nSkipping line 121768: expected 8 fields, saw 9\n'
b'Skipping line 144058: expected 8 fields, saw 9\nSkipping line 150789: expected 8 fields, saw 9\nSkipping line 157128: expected 8 fields, saw 9\nSkipping line 180189: expected 8 fields, saw 9\nSkipping line 185738: expected 8 fields, saw 9\n'
b'Skipping line 209388: expected 8 fields, saw 9\nSkipping line 220626: expected 8 fields, saw 9\nSkipping line 227933: expected 8 fields, saw 11\nSkipping line 228957: expected 8 fields, saw 10\nSkipping line 245933: expected 8 fields, saw 9\nSkipping line 251296: expected 8 fields, saw 9\nSkipping line 259941: expected 8 fields, saw 9\nSkipping line 261529: expected 8 fields, saw 9\n'


### Check no.of records and features given in each dataset

In [3]:
books.head(5)

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [88]:
print ('No.of records ',books.shape[0])
print ('features ',books.shape[1])

No.of records  271360
features  8


In [8]:
books.describe().transpose()

Unnamed: 0,count,unique,top,freq
ISBN,271360,271360,0765300346,1
bookTitle,271360,242135,Selected Poems,27
bookAuthor,271359,102023,Agatha Christie,632
yearOfPublication,271360,202,2002,13903
publisher,271358,16807,Harlequin,7535
imageUrlS,271360,271044,http://images.amazon.com/images/P/055327449X.0...,2
imageUrlM,271360,271044,http://images.amazon.com/images/P/039450691X.0...,2
imageUrlL,271357,271041,http://images.amazon.com/images/P/014070728X.0...,2


In [9]:
books.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271360 entries, 0 to 271359
Data columns (total 8 columns):
ISBN                 271360 non-null object
bookTitle            271360 non-null object
bookAuthor           271359 non-null object
yearOfPublication    271360 non-null object
publisher            271358 non-null object
imageUrlS            271360 non-null object
imageUrlM            271360 non-null object
imageUrlL            271357 non-null object
dtypes: object(8)
memory usage: 16.6+ MB


In [4]:
ratings.head(5)

Unnamed: 0,userID,ISBN,bookRating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [89]:
print ('No.of records ',ratings.shape[0])
print ('features ',ratings.shape[1])


No.of records  1149780
features  3


In [12]:
ratings.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
userID,1149780.0,140386.395126,80562.277718,2.0,70345.0,141010.0,211028.0,278854.0
bookRating,1149780.0,2.86695,3.854184,0.0,0.0,0.0,7.0,10.0


In [13]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1149780 entries, 0 to 1149779
Data columns (total 3 columns):
userID        1149780 non-null int64
ISBN          1149780 non-null object
bookRating    1149780 non-null int64
dtypes: int64(2), object(1)
memory usage: 26.3+ MB


In [6]:
users.head(5)

Unnamed: 0,userID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [90]:
print ('No.of records ',users.shape[0])
print ('features ',users.shape[1])


No.of records  278858
features  3


In [16]:
users.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
userID,278858.0,139429.5,80499.51502,1.0,69715.25,139429.5,209143.75,278858.0
Age,168096.0,34.751434,14.428097,0.0,24.0,32.0,44.0,244.0


In [17]:
users.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 278858 entries, 0 to 278857
Data columns (total 3 columns):
userID      278858 non-null int64
Location    278858 non-null object
Age         168096 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 6.4+ MB


## Exploring books dataset

In [12]:
books.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


### Drop last three columns containing image URLs which will not be required for analysis

In [29]:
books= books.drop("imageUrlS",axis=1)

In [30]:
books=books.drop("imageUrlM",axis=1)

In [31]:
books =books.drop("imageUrlL",axis=1)

In [33]:
books.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


**yearOfPublication**

### Check unique values of yearOfPublication


In [35]:
books.yearOfPublication.unique()

array([2002, 2001, 1991, 1999, 2000, 1993, 1996, 1988, 2004, 1998, 1994,
       2003, 1997, 1983, 1979, 1995, 1982, 1985, 1992, 1986, 1978, 1980,
       1952, 1987, 1990, 1981, 1989, 1984, 0, 1968, 1961, 1958, 1974,
       1976, 1971, 1977, 1975, 1965, 1941, 1970, 1962, 1973, 1972, 1960,
       1966, 1920, 1956, 1959, 1953, 1951, 1942, 1963, 1964, 1969, 1954,
       1950, 1967, 2005, 1957, 1940, 1937, 1955, 1946, 1936, 1930, 2011,
       1925, 1948, 1943, 1947, 1945, 1923, 2020, 1939, 1926, 1938, 2030,
       1911, 1904, 1949, 1932, 1928, 1929, 1927, 1931, 1914, 2050, 1934,
       1910, 1933, 1902, 1924, 1921, 1900, 2038, 2026, 1944, 1917, 1901,
       2010, 1908, 1906, 1935, 1806, 2021, '2000', '1995', '1999', '2004',
       '2003', '1990', '1994', '1986', '1989', '2002', '1981', '1993',
       '1983', '1982', '1976', '1991', '1977', '1998', '1992', '1996',
       '0', '1997', '2001', '1974', '1968', '1987', '1984', '1988',
       '1963', '1956', '1970', '1985', '1978', '1973', '1980'

As it can be seen from above that there are some incorrect entries in this field. It looks like Publisher names 'DK Publishing Inc' and 'Gallimard' have been incorrectly loaded as yearOfPublication in dataset due to some errors in csv file.


Also some of the entries are strings and same years have been entered as numbers in some places. We will try to fix these things in the coming questions.

### Check the rows having 'DK Publishing Inc' as yearOfPublication

In [36]:
publishingyr_DK = books[books['yearOfPublication']=='DK Publishing Inc']

In [37]:
publishingyr_DK.shape

(2, 5)

### Drop the rows having `'DK Publishing Inc'` and `'Gallimard'` as `yearOfPublication`

In [40]:
books.drop(books[books['yearOfPublication'] == 'DK Publishing Inc'].index ,inplace = True)

In [41]:
books.drop(books[books['yearOfPublication'] == 'Gallimard'].index ,inplace = True)

In [42]:
books.shape

(271357, 5)

### Change the datatype of yearOfPublication to 'int'

In [44]:
books.dtypes

ISBN                 object
bookTitle            object
bookAuthor           object
yearOfPublication    object
publisher            object
dtype: object

In [45]:
books['yearOfPublication'] = books['yearOfPublication'].astype(str).astype(int)

In [46]:
books.dtypes

ISBN                 object
bookTitle            object
bookAuthor           object
yearOfPublication     int32
publisher            object
dtype: object

### Drop NaNs in `'publisher'` column


In [47]:
books.dropna(subset=['publisher'])

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
0,0195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,0060973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,0374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,0393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company
5,0399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group
6,0425176428,What If?: The World's Foremost Military Histor...,Robert Cowley,2000,Berkley Publishing Group
7,0671870432,PLEADING GUILTY,Scott Turow,1993,Audioworks
8,0679425608,Under the Black Flag: The Romance and the Real...,David Cordingly,1996,Random House
9,074322678X,Where You'll Find Me: And Other Stories,Ann Beattie,2002,Scribner


## Exploring Users dataset

In [83]:
print(users.shape)
users.head()

(278858, 3)


Unnamed: 0,userID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


### Get all unique values in ascending order for column `Age`

In [84]:
age = users['Age'].unique()

In [85]:
sorted(age)

[nan,
 0.0,
 1.0,
 2.0,
 3.0,
 4.0,
 5.0,
 6.0,
 7.0,
 8.0,
 9.0,
 10.0,
 11.0,
 12.0,
 13.0,
 14.0,
 15.0,
 16.0,
 17.0,
 18.0,
 19.0,
 20.0,
 21.0,
 22.0,
 23.0,
 24.0,
 25.0,
 26.0,
 27.0,
 28.0,
 29.0,
 30.0,
 31.0,
 32.0,
 33.0,
 34.0,
 35.0,
 36.0,
 37.0,
 38.0,
 39.0,
 40.0,
 41.0,
 42.0,
 43.0,
 44.0,
 45.0,
 46.0,
 47.0,
 48.0,
 49.0,
 50.0,
 51.0,
 52.0,
 53.0,
 54.0,
 55.0,
 56.0,
 57.0,
 58.0,
 59.0,
 60.0,
 61.0,
 62.0,
 63.0,
 64.0,
 65.0,
 66.0,
 67.0,
 68.0,
 69.0,
 70.0,
 71.0,
 72.0,
 73.0,
 74.0,
 75.0,
 76.0,
 77.0,
 78.0,
 79.0,
 80.0,
 81.0,
 82.0,
 83.0,
 84.0,
 85.0,
 86.0,
 87.0,
 88.0,
 89.0,
 90.0,
 91.0,
 92.0,
 93.0,
 94.0,
 95.0,
 96.0,
 97.0,
 98.0,
 99.0,
 100.0,
 101.0,
 102.0,
 103.0,
 104.0,
 105.0,
 106.0,
 107.0,
 108.0,
 109.0,
 110.0,
 111.0,
 113.0,
 114.0,
 115.0,
 116.0,
 118.0,
 119.0,
 123.0,
 124.0,
 127.0,
 128.0,
 132.0,
 133.0,
 136.0,
 137.0,
 138.0,
 140.0,
 141.0,
 143.0,
 146.0,
 147.0,
 148.0,
 151.0,
 152.0,
 156.0,
 157.0,
 159.0,


Age column has some invalid entries like nan, 0 and very high values like 100 and above

In [78]:
users.any()

userID      True
Location    True
Age         True
dtype: bool

### Values below 5 and above 90 do not make much sense for our book rating case...hence replace these by NaNs

In [93]:
#users.loc[(users.Age>90) |(users.Age<5)]=np.NaN
users['Age'] = users['Age'].apply(lambda x:np.nan if x > 90 else x)
users['Age'] = users['Age'].apply(lambda x:np.nan if x < 5 else x)
users['Age'].mean()

34.72384041631816

### Replace null values in column `Age` with mean

In [94]:
users['Age'] = users['Age'].replace(np.nan,users['Age'].mean())
users['Age'].isnull().sum()

0

### Change the datatype of `Age` to `int`

In [95]:
users['Age'] = users['Age'].astype(int)


In [96]:
print(sorted(users.Age.unique()))

[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]


## Exploring the Ratings Dataset

### check the shape

In [97]:
ratings.shape

(1149780, 3)

In [98]:
n_users = users.shape[0]
n_books = books.shape[0]

In [99]:
ratings.head(5)

Unnamed: 0,userID,ISBN,bookRating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


### Ratings dataset should have books only which exist in our books dataset. Drop the remaining rows

In [100]:
ratings = ratings[ratings.ISBN.isin(books.ISBN)]
ratings.shape

(1031136, 3)

### Ratings dataset should have ratings from users which exist in users dataset. Drop the remaining rows

In [101]:
ratings = ratings[ratings.userID.isin(users.userID)]
ratings.shape 

(1031136, 3)

### Consider only ratings from 1-10 and leave 0s in column `bookRating`

In [102]:
ratings = ratings[ratings['bookRating']>=1]
ratings = ratings[ratings['bookRating']<=10]
ratings.shape

(383842, 3)

### Find out which rating has been given highest number of times

In [103]:
ratings['bookRating'].value_counts()

8     91804
10    71225
7     66402
9     60778
5     45355
6     31687
4      7617
3      5118
2      2375
1      1481
Name: bookRating, dtype: int64

In [None]:
From above result can say rating 8 has given maximum time.

### **Collaborative Filtering Based Recommendation Systems**

### For more accurate results only consider users who have rated atleast 100 books

In [104]:
ratedmorethan100 = pd.DataFrame(ratings['userID'].value_counts() > 99)
ratedatleast100 = ratedmorethan100[ratedmorethan100['userID'] == True].index
ratedatleast100

Int64Index([ 11676,  98391, 189835, 153662,  23902, 235105,  76499, 171118,
             16795, 248718,
            ...
            117384,  36299, 169682, 211919, 156300,  95010,  33145,  26544,
            208406,  36609],
           dtype='int64', length=449)

In [105]:
ratings = ratings[ratings['userID'].isin(ratedatleast100)]
ratings

Unnamed: 0,userID,ISBN,bookRating
1456,277427,002542730X,10
1458,277427,003008685X,8
1461,277427,0060006641,10
1465,277427,0060542128,7
1474,277427,0061009059,9
1477,277427,0062507109,8
1483,277427,0132220598,8
1488,277427,0140283374,6
1490,277427,014039026X,8
1491,277427,0140390715,7


In [106]:
book_data1 = pd.merge(ratings, books, on='ISBN')
book_data1.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...
1,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...
2,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...
3,52584,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...
4,110934,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...,http://images.amazon.com/images/P/002542730X.0...


### Generating ratings matrix from explicit ratings


#### Note: since NaNs cannot be handled by training algorithms, replace these by 0, which indicates absence of ratings

In [107]:
Ratings_df = ratings.pivot(index = 'userID', columns = 'ISBN', values = 'bookRating').fillna(0)
Ratings_df.head()
Ratings_df.shape

(449, 66574)

In [108]:
Ratings_df.isna().sum().sum()

0

### Generate the predicted ratings using SVD with no.of singular values to be 50

In [110]:
from scipy.sparse.linalg import svds
U, sigma, Vt = svds(Ratings_df, k = 50)

In [111]:
sigma

array([147.92121923, 149.34383594, 150.07402888, 152.20117656,
       152.87418704, 154.61308372, 154.8009367 , 155.9576219 ,
       158.05647521, 159.21079485, 159.81671145, 162.01964856,
       162.77851803, 163.33054935, 166.02489424, 166.8162491 ,
       168.04973155, 170.7748546 , 171.01325758, 173.2942917 ,
       174.57625311, 176.65730369, 178.61914388, 180.29517228,
       182.25079073, 184.10707214, 187.61690418, 189.75277078,
       190.96974587, 195.14643816, 199.83137575, 201.70083342,
       202.18713995, 203.48701648, 207.26450247, 209.9298699 ,
       213.23599152, 216.88280902, 224.26954903, 231.66187407,
       235.67096044, 249.95821298, 252.02931103, 261.24819623,
       267.9821169 , 281.01208736, 293.69539697, 379.58353138,
       634.74439146, 680.41331629])

### Take a particular user_id

### Lets find the recommendations for user with id `2110`

#### Note: Execute the below cells to get the variables loaded

In [112]:
userID = 2110

In [113]:
user_id = 2 #2nd row in ratings matrix and predicted matrix

### Get the predicted ratings for userID `2110` and sort them in descending order

In [114]:
sigma = np.diag(sigma)

In [115]:
sigma

array([[147.92121923,   0.        ,   0.        , ...,   0.        ,
          0.        ,   0.        ],
       [  0.        , 149.34383594,   0.        , ...,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        , 150.07402888, ...,   0.        ,
          0.        ,   0.        ],
       ...,
       [  0.        ,   0.        ,   0.        , ..., 379.58353138,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
        634.74439146,   0.        ],
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
          0.        , 680.41331629]])

In [116]:
all_users_predicted_ratings = np.dot(np.dot(U, sigma), Vt)

In [117]:
preds_df = pd.DataFrame(all_users_predicted_ratings, columns = Ratings_df.columns)
preds_df.head()

ISBN,0000913154,0001046438,000104687X,0001047213,0001047973,000104799X,0001048082,0001053736,0001053744,0001055607,...,B000092Q0A,B00009EF82,B00009NDAN,B0000DYXID,B0000T6KHI,B0000VZEJQ,B0000X8HIE,B00013AX9E,B0001I1KOG,B000234N3A
0,0.025341,-0.002146,-0.001431,-0.002146,-0.002146,0.002971,-0.00392,0.007035,0.007035,0.012316,...,0.00018,0.000226,0.042081,-0.016804,-0.080027,0.004746,0.028314,0.00012,-0.001693,0.067503
1,-0.010012,-0.003669,-0.002446,-0.003669,-0.003669,0.001075,0.00144,-0.0035,-0.0035,0.001612,...,-0.000363,0.000403,0.008142,0.001104,-0.029224,0.000999,0.002363,-0.000242,2.9e-05,-0.013059
2,-0.015054,-0.015457,-0.010304,-0.015457,-0.015457,0.007281,-0.014033,0.011941,0.011941,0.011796,...,-0.000455,0.001907,0.047982,0.005737,0.117859,0.006945,0.003119,-0.000304,0.009009,-0.057692
3,-0.021499,0.035602,0.023735,0.035602,0.035602,0.030307,0.024215,-0.001053,-0.001053,0.067579,...,0.002971,0.009912,0.086248,-0.008818,0.016154,0.028848,-0.000125,0.001981,0.031201,-0.046664
4,0.002077,-0.007965,-0.00531,-0.007965,-0.007965,0.002947,0.003057,0.000231,0.000231,0.00608,...,0.00212,0.001597,-0.012181,0.00942,0.673458,0.002591,-0.008229,0.001413,0.004918,0.047773


In [118]:
def recommend_books(predictions_df, userID,userId, books_df, original_ratings_df, num_recommendations = False):
    user_row_number = userID   #UserID starts at zero not 1
    sorted_user_predictions = predictions_df.loc[user_row_number].sort_values(ascending = False)
    
    user_data = original_ratings_df[original_ratings_df.userID == (userId)]
    user_full = (user_data.merge(books, how = 'left', left_on = 'ISBN', right_on = 'ISBN').
                sort_values(['bookRating'], ascending = False)
                )
    print('User {0} has already rated {1} books.'.format(userID, user_full.dropna().shape[0]))
    print('Recommending the highest {0} predicted ratings books not already read.'.format(num_recommendations))
    
    recommendations = (books_df[~books_df['ISBN'].isin(user_full['ISBN'])].
                      merge(pd.DataFrame(sorted_user_predictions).reset_index(), how = 'left',
                           left_on = 'ISBN',
                           right_on = 'ISBN').
                      rename(columns = {user_row_number: 'Predictions'}).
                      sort_values('Predictions', ascending = False).
                      iloc[:num_recommendations, :-1])
    return user_full, recommendations, sorted_user_predictions, user_data, user_full

In [119]:
already_rated, predictions, sorted_user_predictions, user_data, user_full = recommend_books(preds_df, 2,2110,books, ratings, 10)

User 2 has already rated 103 books.
Recommending the highest 10 predicted ratings books not already read.


In [120]:
#Already rated
already_rated

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
76,2110,067166865X,10,STAR TREK YESTERDAY'S SON (Star Trek: The Orig...,A.C. Crispin,1988,Audioworks,http://images.amazon.com/images/P/067166865X.0...,http://images.amazon.com/images/P/067166865X.0...,http://images.amazon.com/images/P/067166865X.0...
52,2110,0590109715,10,"The Andalite Chronicles (Elfangor's Journey, A...",Katherine Applegate,1997,Apple,http://images.amazon.com/images/P/0590109715.0...,http://images.amazon.com/images/P/0590109715.0...,http://images.amazon.com/images/P/0590109715.0...
64,2110,0590629786,10,"The Visitor (Animorphs, No 2)",K. A. Applegate,1996,Scholastic,http://images.amazon.com/images/P/0590629786.0...,http://images.amazon.com/images/P/0590629786.0...,http://images.amazon.com/images/P/0590629786.0...
63,2110,0590629778,10,"The Invasion (Animorphs, No 1)",K. A. Applegate,1996,Scholastic,http://images.amazon.com/images/P/0590629778.0...,http://images.amazon.com/images/P/0590629778.0...,http://images.amazon.com/images/P/0590629778.0...
61,2110,059046678X,10,The Yearbook,Peter Lerangis,1994,Scholastic,http://images.amazon.com/images/P/059046678X.0...,http://images.amazon.com/images/P/059046678X.0...,http://images.amazon.com/images/P/059046678X.0...
55,2110,059035342X,10,Harry Potter and the Sorcerer's Stone (Harry P...,J. K. Rowling,1999,Arthur A. Levine Books,http://images.amazon.com/images/P/059035342X.0...,http://images.amazon.com/images/P/059035342X.0...,http://images.amazon.com/images/P/059035342X.0...
93,2110,0812505042,10,The Time Machine,H. G. Wells,1995,Tor Books,http://images.amazon.com/images/P/0812505042.0...,http://images.amazon.com/images/P/0812505042.0...,http://images.amazon.com/images/P/0812505042.0...
54,2110,0590213040,10,The Andalite's Gift (Animorphs : Megamorphs 1),K. A. Applegate,1997,Scholastic,http://images.amazon.com/images/P/0590213040.0...,http://images.amazon.com/images/P/0590213040.0...,http://images.amazon.com/images/P/0590213040.0...
53,2110,0590109960,10,Watchers #1: Last Stop,Peter Lerangis,1998,Scholastic,http://images.amazon.com/images/P/0590109960.0...,http://images.amazon.com/images/P/0590109960.0...,http://images.amazon.com/images/P/0590109960.0...
82,2110,0679805265,10,Long Shot (Three Investigators Crimebusters (P...,Megan Stine,1993,Random House Children's Books,http://images.amazon.com/images/P/0679805265.0...,http://images.amazon.com/images/P/0679805265.0...,http://images.amazon.com/images/P/0679805265.0...


In [121]:
#recommendation for user
predictions

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
407,0316666343,The Lovely Bones: A Novel,Alice Sebold,2002,"Little, Brown",http://images.amazon.com/images/P/0316666343.0...,http://images.amazon.com/images/P/0316666343.0...,http://images.amazon.com/images/P/0316666343.0...
2116,0345350499,The Mists of Avalon,MARION ZIMMER BRADLEY,1987,Del Rey,http://images.amazon.com/images/P/0345350499.0...,http://images.amazon.com/images/P/0345350499.0...,http://images.amazon.com/images/P/0345350499.0...
2438,0440214041,The Pelican Brief,John Grisham,1993,Dell,http://images.amazon.com/images/P/0440214041.0...,http://images.amazon.com/images/P/0440214041.0...,http://images.amazon.com/images/P/0440214041.0...
455,044021145X,The Firm,John Grisham,1992,Bantam Dell Publishing Group,http://images.amazon.com/images/P/044021145X.0...,http://images.amazon.com/images/P/044021145X.0...,http://images.amazon.com/images/P/044021145X.0...
521,0312195516,The Red Tent (Bestselling Backlist),Anita Diamant,1998,Picador USA,http://images.amazon.com/images/P/0312195516.0...,http://images.amazon.com/images/P/0312195516.0...,http://images.amazon.com/images/P/0312195516.0...
20670,0345318862,Golem in the Gears (Xanth Novels (Paperback)),PIERS ANTHONY,1986,Del Rey,http://images.amazon.com/images/P/0345318862.0...,http://images.amazon.com/images/P/0345318862.0...,http://images.amazon.com/images/P/0345318862.0...
4810,0345313151,Bearing an Hourglass (Incarnations of Immortal...,Piers Anthony,1991,Del Rey Books,http://images.amazon.com/images/P/0345313151.0...,http://images.amazon.com/images/P/0345313151.0...,http://images.amazon.com/images/P/0345313151.0...
6320,0380752891,"Man from Mundania (Xanth Trilogy, No 12)",Piers Anthony,1990,Harper Mass Market Paperbacks,http://images.amazon.com/images/P/0380752891.0...,http://images.amazon.com/images/P/0380752891.0...,http://images.amazon.com/images/P/0380752891.0...
44448,051511605X,Undue Influence,Steven Paul Martini,1995,Jove Books,http://images.amazon.com/images/P/051511605X.0...,http://images.amazon.com/images/P/051511605X.0...,http://images.amazon.com/images/P/051511605X.0...
8977,043936213X,Harry Potter and the Sorcerer's Stone (Book 1),J. K. Rowling,2001,Scholastic,http://images.amazon.com/images/P/043936213X.0...,http://images.amazon.com/images/P/043936213X.0...,http://images.amazon.com/images/P/043936213X.0...


In [122]:
sorted_user_predictions

ISBN
0316666343    1.015398
059035342X    0.778666
0345350499    0.697309
0440214041    0.665439
044021145X    0.663549
0312195516    0.642840
0345318862    0.639465
0345313151    0.631446
0380752891    0.629143
051511605X    0.617955
043936213X    0.614288
0385504209    0.613232
0312966970    0.605433
0440213525    0.603722
0812548051    0.602907
0380752859    0.598778
0345322231    0.588352
0345318854    0.579408
0452282152    0.572168
0812548094    0.571572
0812517725    0.564544
0345322215    0.560832
0440211727    0.560544
0380759489    0.559758
0812551478    0.558102
006016848X    0.550815
0345313097    0.547583
0886774802    0.544630
0553280368    0.541396
0446310786    0.540218
                ...   
055321313X   -0.128899
0140251367   -0.131740
0836220854   -0.132223
0670835382   -0.133569
0375706062   -0.134258
0440404193   -0.134705
0679767789   -0.135518
0062501860   -0.135882
0140501800   -0.136724
0515090166   -0.137002
0140328696   -0.137862
0515095826   -0.138192
052594

### Create a dataframe with name `user_data` containing userID `2110` explicitly interacted books

In [127]:
user_data_2100 = ratings[ratings['userID'] == 2110]
user_data_2100.head()

Unnamed: 0,userID,ISBN,bookRating
14448,2110,60987529,7
14449,2110,64472779,8
14450,2110,140022651,10
14452,2110,142302163,8
14453,2110,151008116,5


In [128]:
user_data_2100.shape

(103, 3)

In [123]:
user_data.head()

Unnamed: 0,userID,ISBN,bookRating
14448,2110,60987529,7
14449,2110,64472779,8
14450,2110,140022651,10
14452,2110,142302163,8
14453,2110,151008116,5


In [124]:
user_data.shape

(103, 3)

### Combine the user_data and and corresponding book data(`book_data`) in a single dataframe with name `user_full_info`

In [125]:
user_full

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
76,2110,067166865X,10,STAR TREK YESTERDAY'S SON (Star Trek: The Orig...,A.C. Crispin,1988,Audioworks,http://images.amazon.com/images/P/067166865X.0...,http://images.amazon.com/images/P/067166865X.0...,http://images.amazon.com/images/P/067166865X.0...
52,2110,0590109715,10,"The Andalite Chronicles (Elfangor's Journey, A...",Katherine Applegate,1997,Apple,http://images.amazon.com/images/P/0590109715.0...,http://images.amazon.com/images/P/0590109715.0...,http://images.amazon.com/images/P/0590109715.0...
64,2110,0590629786,10,"The Visitor (Animorphs, No 2)",K. A. Applegate,1996,Scholastic,http://images.amazon.com/images/P/0590629786.0...,http://images.amazon.com/images/P/0590629786.0...,http://images.amazon.com/images/P/0590629786.0...
63,2110,0590629778,10,"The Invasion (Animorphs, No 1)",K. A. Applegate,1996,Scholastic,http://images.amazon.com/images/P/0590629778.0...,http://images.amazon.com/images/P/0590629778.0...,http://images.amazon.com/images/P/0590629778.0...
61,2110,059046678X,10,The Yearbook,Peter Lerangis,1994,Scholastic,http://images.amazon.com/images/P/059046678X.0...,http://images.amazon.com/images/P/059046678X.0...,http://images.amazon.com/images/P/059046678X.0...
55,2110,059035342X,10,Harry Potter and the Sorcerer's Stone (Harry P...,J. K. Rowling,1999,Arthur A. Levine Books,http://images.amazon.com/images/P/059035342X.0...,http://images.amazon.com/images/P/059035342X.0...,http://images.amazon.com/images/P/059035342X.0...
93,2110,0812505042,10,The Time Machine,H. G. Wells,1995,Tor Books,http://images.amazon.com/images/P/0812505042.0...,http://images.amazon.com/images/P/0812505042.0...,http://images.amazon.com/images/P/0812505042.0...
54,2110,0590213040,10,The Andalite's Gift (Animorphs : Megamorphs 1),K. A. Applegate,1997,Scholastic,http://images.amazon.com/images/P/0590213040.0...,http://images.amazon.com/images/P/0590213040.0...,http://images.amazon.com/images/P/0590213040.0...
53,2110,0590109960,10,Watchers #1: Last Stop,Peter Lerangis,1998,Scholastic,http://images.amazon.com/images/P/0590109960.0...,http://images.amazon.com/images/P/0590109960.0...,http://images.amazon.com/images/P/0590109960.0...
82,2110,0679805265,10,Long Shot (Three Investigators Crimebusters (P...,Megan Stine,1993,Random House Children's Books,http://images.amazon.com/images/P/0679805265.0...,http://images.amazon.com/images/P/0679805265.0...,http://images.amazon.com/images/P/0679805265.0...


In [129]:
user_full_info = user_data_2100.merge(books,how = 'left' , left_on ='ISBN' , right_on = 'ISBN')
user_full_info=user_full_info.drop(columns=['userID','bookRating'],axis=1)

In [130]:
user_full_info.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
0,60987529,Confessions of an Ugly Stepsister : A Novel,Gregory Maguire,2000,Regan Books,http://images.amazon.com/images/P/0060987529.0...,http://images.amazon.com/images/P/0060987529.0...,http://images.amazon.com/images/P/0060987529.0...
1,64472779,All-American Girl,Meg Cabot,2003,HarperTrophy,http://images.amazon.com/images/P/0064472779.0...,http://images.amazon.com/images/P/0064472779.0...,http://images.amazon.com/images/P/0064472779.0...
2,140022651,Journey to the Center of the Earth,Jules Verne,1965,Penguin Books,http://images.amazon.com/images/P/0140022651.0...,http://images.amazon.com/images/P/0140022651.0...,http://images.amazon.com/images/P/0140022651.0...
3,142302163,The Ghost Sitter,Peni R. Griffin,2002,Puffin Books,http://images.amazon.com/images/P/0142302163.0...,http://images.amazon.com/images/P/0142302163.0...,http://images.amazon.com/images/P/0142302163.0...
4,151008116,Life of Pi,Yann Martel,2002,Harcourt,http://images.amazon.com/images/P/0151008116.0...,http://images.amazon.com/images/P/0151008116.0...,http://images.amazon.com/images/P/0151008116.0...


### Get top 10 recommendations for above given userID from the books not already rated by that user

In [131]:
predictions

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
407,0316666343,The Lovely Bones: A Novel,Alice Sebold,2002,"Little, Brown",http://images.amazon.com/images/P/0316666343.0...,http://images.amazon.com/images/P/0316666343.0...,http://images.amazon.com/images/P/0316666343.0...
2116,0345350499,The Mists of Avalon,MARION ZIMMER BRADLEY,1987,Del Rey,http://images.amazon.com/images/P/0345350499.0...,http://images.amazon.com/images/P/0345350499.0...,http://images.amazon.com/images/P/0345350499.0...
2438,0440214041,The Pelican Brief,John Grisham,1993,Dell,http://images.amazon.com/images/P/0440214041.0...,http://images.amazon.com/images/P/0440214041.0...,http://images.amazon.com/images/P/0440214041.0...
455,044021145X,The Firm,John Grisham,1992,Bantam Dell Publishing Group,http://images.amazon.com/images/P/044021145X.0...,http://images.amazon.com/images/P/044021145X.0...,http://images.amazon.com/images/P/044021145X.0...
521,0312195516,The Red Tent (Bestselling Backlist),Anita Diamant,1998,Picador USA,http://images.amazon.com/images/P/0312195516.0...,http://images.amazon.com/images/P/0312195516.0...,http://images.amazon.com/images/P/0312195516.0...
20670,0345318862,Golem in the Gears (Xanth Novels (Paperback)),PIERS ANTHONY,1986,Del Rey,http://images.amazon.com/images/P/0345318862.0...,http://images.amazon.com/images/P/0345318862.0...,http://images.amazon.com/images/P/0345318862.0...
4810,0345313151,Bearing an Hourglass (Incarnations of Immortal...,Piers Anthony,1991,Del Rey Books,http://images.amazon.com/images/P/0345313151.0...,http://images.amazon.com/images/P/0345313151.0...,http://images.amazon.com/images/P/0345313151.0...
6320,0380752891,"Man from Mundania (Xanth Trilogy, No 12)",Piers Anthony,1990,Harper Mass Market Paperbacks,http://images.amazon.com/images/P/0380752891.0...,http://images.amazon.com/images/P/0380752891.0...,http://images.amazon.com/images/P/0380752891.0...
44448,051511605X,Undue Influence,Steven Paul Martini,1995,Jove Books,http://images.amazon.com/images/P/051511605X.0...,http://images.amazon.com/images/P/051511605X.0...,http://images.amazon.com/images/P/051511605X.0...
8977,043936213X,Harry Potter and the Sorcerer's Stone (Book 1),J. K. Rowling,2001,Scholastic,http://images.amazon.com/images/P/043936213X.0...,http://images.amazon.com/images/P/043936213X.0...,http://images.amazon.com/images/P/043936213X.0...


In [132]:
sorted_user_predictions1 = preds_df.loc[2].sort_values(ascending = False)

In [133]:
recommendations = (books[~books['ISBN'].isin(user_full['ISBN'])].
                      merge(pd.DataFrame(sorted_user_predictions1).reset_index(), how = 'left',
                           left_on = 'ISBN',
                           right_on = 'ISBN').rename(columns = {2: 'Predictions'}).
                      sort_values('Predictions', ascending = False).
                   iloc[:10, :-1])

In [134]:
recommendations

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
407,0316666343,The Lovely Bones: A Novel,Alice Sebold,2002,"Little, Brown",http://images.amazon.com/images/P/0316666343.0...,http://images.amazon.com/images/P/0316666343.0...,http://images.amazon.com/images/P/0316666343.0...
2116,0345350499,The Mists of Avalon,MARION ZIMMER BRADLEY,1987,Del Rey,http://images.amazon.com/images/P/0345350499.0...,http://images.amazon.com/images/P/0345350499.0...,http://images.amazon.com/images/P/0345350499.0...
2438,0440214041,The Pelican Brief,John Grisham,1993,Dell,http://images.amazon.com/images/P/0440214041.0...,http://images.amazon.com/images/P/0440214041.0...,http://images.amazon.com/images/P/0440214041.0...
455,044021145X,The Firm,John Grisham,1992,Bantam Dell Publishing Group,http://images.amazon.com/images/P/044021145X.0...,http://images.amazon.com/images/P/044021145X.0...,http://images.amazon.com/images/P/044021145X.0...
521,0312195516,The Red Tent (Bestselling Backlist),Anita Diamant,1998,Picador USA,http://images.amazon.com/images/P/0312195516.0...,http://images.amazon.com/images/P/0312195516.0...,http://images.amazon.com/images/P/0312195516.0...
20670,0345318862,Golem in the Gears (Xanth Novels (Paperback)),PIERS ANTHONY,1986,Del Rey,http://images.amazon.com/images/P/0345318862.0...,http://images.amazon.com/images/P/0345318862.0...,http://images.amazon.com/images/P/0345318862.0...
4810,0345313151,Bearing an Hourglass (Incarnations of Immortal...,Piers Anthony,1991,Del Rey Books,http://images.amazon.com/images/P/0345313151.0...,http://images.amazon.com/images/P/0345313151.0...,http://images.amazon.com/images/P/0345313151.0...
6320,0380752891,"Man from Mundania (Xanth Trilogy, No 12)",Piers Anthony,1990,Harper Mass Market Paperbacks,http://images.amazon.com/images/P/0380752891.0...,http://images.amazon.com/images/P/0380752891.0...,http://images.amazon.com/images/P/0380752891.0...
44448,051511605X,Undue Influence,Steven Paul Martini,1995,Jove Books,http://images.amazon.com/images/P/051511605X.0...,http://images.amazon.com/images/P/051511605X.0...,http://images.amazon.com/images/P/051511605X.0...
8977,043936213X,Harry Potter and the Sorcerer's Stone (Book 1),J. K. Rowling,2001,Scholastic,http://images.amazon.com/images/P/043936213X.0...,http://images.amazon.com/images/P/043936213X.0...,http://images.amazon.com/images/P/043936213X.0...
