<a href="https://colab.research.google.com/github/Masupa/Daily-Learning/blob/main/book_recommendation_engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

    # Import Libraries:
        numpy, pandas
        
    1. Import Data:
        user, book & rating data
        
    2. Data Cleaning (each dataset)
        - missing values
        - handling dtype
        - drop irrelevant columns
        - handling range of values
        
    3. Non-Personalised Recommendation
        * Most read book
        * Highest rated book
        
    4. Personalised Recommendation
        * Collaborative-based filtering (similar users/ similar books)

    5. Matrix Factorization
    
    6. Predicting Ratings using SVD

#### Import Libraries

In [1]:
import numpy as np
import pandas as pd

#### 1. Import Data

In [2]:
book_info = pd.read_csv("https://raw.githubusercontent.com/tttgm/fellowshipai/master/book_crossing_dataset/BX-Books.csv", sep= ';', error_bad_lines= False, encoding= 'latin-1')
user_info = pd.read_csv("https://raw.githubusercontent.com/tttgm/fellowshipai/master/book_crossing_dataset/BX-Users.csv", sep= ';', error_bad_lines= False, encoding= 'latin-1')
ratings = pd.read_csv("https://raw.githubusercontent.com/tttgm/fellowshipai/master/book_crossing_dataset/BX-Book-Ratings.csv", sep= ';', error_bad_lines=False, encoding='Latin-1')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
# user_info = pd.read_csv("./Datasets/users_data.csv", sep=";", error_bad_lines= False, encoding= 'latin-1')
# book_info = pd.read_csv("./Datasets/books_data.csv", sep=";", error_bad_lines= False, encoding= 'latin-1')
# ratings = pd.read_csv("./Datasets/book_ratings.csv", sep=";", error_bad_lines=False, encoding='Latin-1')

#### 2. Data Cleaning

`Missing Values: User Information`

In [4]:
user_info.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [5]:
user_info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 278858 entries, 0 to 278857
Data columns (total 3 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   User-ID   278858 non-null  int64  
 1   Location  278858 non-null  object 
 2   Age       168096 non-null  float64
dtypes: float64(1), int64(1), object(1)
memory usage: 6.4+ MB


In [6]:
# Checking missing values; user_inf
user_info.isna().sum()

User-ID          0
Location         0
Age         110762
dtype: int64

In [7]:
# Summary Stats
pd.DataFrame(user_info.Age.agg({np.min, np.max, np.mean, np.median}))

Unnamed: 0,Age
median,32.0
mean,34.751434
amax,244.0
amin,0.0


In [8]:
# Fill missing values with median; this is unaffected by outlier values present here
user_info.fillna(value=user_info.Age.median(), inplace=True)

    Observation:
    
    * Our data contains users who are above 100years old. i.e oldest user is 244years
    * If further contains users who are 0years old.
    
    In an ideal world, it's rare to have users below 5yrs old and above 100yrs old who read books from a website.
    Therefore, we handle these buy deleting them.

In [9]:
# Remove all entries with Age < 5 and > 80
user_info = user_info[(user_info.Age > 5.0) & (user_info.Age < 80.0)]

In [10]:
# Convert "Age" dtype to int
user_info.Age = user_info.Age.astype(int)

`Missing Values: Book Information`

In [11]:
book_info.tail()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
271374,0440400988,There's a Bat in Bunk Five,Paula Danziger,1988,Random House Childrens Pub (Mm),http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...
271375,0525447644,From One to One Hundred,Teri Sloat,1991,Dutton Books,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...
271376,006008667X,Lily Dale : The True Story of the Town that Ta...,Christine Wicker,2004,HarperSanFrancisco,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...
271377,0192126040,Republic (World's Classics),Plato,1996,Oxford University Press,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...
271378,0767409752,A Guided Tour of Rene Descartes' Meditations o...,Christopher Biffle,2000,McGraw-Hill Humanities/Social Sciences/Languages,http://images.amazon.com/images/P/0767409752.0...,http://images.amazon.com/images/P/0767409752.0...,http://images.amazon.com/images/P/0767409752.0...


In [12]:
# Checking for missing values
book_info.isna().sum()

ISBN                   0
Book-Title             0
Book-Author            1
Year-Of-Publication    0
Publisher              2
Image-URL-S            0
Image-URL-M            0
Image-URL-L            3
dtype: int64

In [13]:
# Drop rows with any missing entries
book_info.dropna(how='any', axis=0, inplace=True)

In [14]:
# Convert "Year-Of-Publication" dtype to int
book_info['Year-Of-Publication'] = book_info['Year-Of-Publication'].astype(int)

In [15]:
# Summary Stats
book_info.describe()

Unnamed: 0,Year-Of-Publication
count,271373.0
mean,1959.755156
std,258.014145
min,0.0
25%,1989.0
50%,1995.0
75%,2000.0
max,2050.0


    Observations:
    
    Some of our books have a "Year-Of-Publication" that is not current. For simplicity's sake, we can set any
    book's publication year to 0 if it's beyond the current year.

In [16]:
# Set "Year-Of-Publication" to 0 for entries beyond current
book_info['Year-Of-Publication'] = list(map(lambda x: 0 if x > 2021 else x, book_info['Year-Of-Publication']))

In [17]:
# Delete "Image..." Cols
book_info.drop(columns=['Image-URL-S', 'Image-URL-M', 'Image-URL-L'], inplace=True)

In [18]:
# "Publisher" contains entires with "&amp;" 
# i.e. "W. W. Norton &amp; Company" is supposed to be "W. W. Norton & Company"
book_info['Publisher'] = book_info['Publisher'].str.replace("&amp;", "&")

`Missing Values: Ratings`

In [19]:
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [20]:
# Basic information
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1149780 entries, 0 to 1149779
Data columns (total 3 columns):
 #   Column       Non-Null Count    Dtype 
---  ------       --------------    ----- 
 0   User-ID      1149780 non-null  int64 
 1   ISBN         1149780 non-null  object
 2   Book-Rating  1149780 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 26.3+ MB


In [21]:
# Summary Stats
pd.DataFrame(ratings['Book-Rating'].agg({np.min, np.max, np.median, np.mean}))

Unnamed: 0,Book-Rating
median,0.0
mean,2.86695
amax,10.0
amin,0.0


In [22]:
ratings.isna().sum()

User-ID        0
ISBN           0
Book-Rating    0
dtype: int64

In [23]:
# Merge all datasets
data = user_info.merge(ratings, left_on='User-ID', right_on='User-ID').\
merge(book_info, left_on='ISBN', right_on='ISBN')

#### 3. Non-Personalised Recommendations

`What were the most read book? Ideally, these books could be recommended to everyone`

In [24]:
# Aggregate "Book-Title" by how many times a book appears
book_freq = pd.DataFrame(data['Book-Title'].value_counts())
# Reset index
book_freq.reset_index(inplace=True)
# Rename columns
book_freq.columns = ['Book_Title', 'Number of times book was read']

book_freq.head()

Unnamed: 0,Book_Title,Number of times book was read
0,Wild Animus,2492
1,The Lovely Bones: A Novel,1288
2,The Da Vinci Code,892
3,A Painted House,832
4,The Nanny Diaries: A Novel,823


    Note:
    
    * At the most basic level, we will recommend every new users of these books because they were the most
      read books in the data we are looking at.

`Which were the highly-rated books?`

In [25]:
# Filter-out a list of books that were rating more than 50 times
read_50 = data['Book-Title'].value_counts()[data['Book-Title'].value_counts() > 50].index

# Filtering out books by "read_50"
book_50 = data[data['Book-Title'].isin(read_50)]

# Group books by "title" and aggregate the mean-rating
rating_groupby = book_50.groupby(by='Book-Title')[['Book-Rating']].mean()

# Sort DataFrame
rating_groupby.sort_values(by='Book-Rating', ascending=False, inplace=True)

# Reseting index
rating_groupby.reset_index(inplace=True)

rating_groupby.head(10)

Unnamed: 0,Book-Title,Book-Rating
0,Free,8.017857
1,The Stand (The Complete and Uncut Edition),6.175439
2,Griffin &amp Sabine: An Extraordinary Correspo...,6.041667
3,Harry Potter and the Prisoner of Azkaban (Book 3),5.828235
4,Harry Potter and the Goblet of Fire (Book 4),5.798956
5,The Little Prince,5.785714
6,The Cat in the Hat,5.754717
7,Harry Potter and the Sorcerer's Stone (Book 1),5.729242
8,The Hobbit,5.7
9,Harry Potter and the Order of the Phoenix (Boo...,5.511628


    Note:
    
    * The following books were those highly-rated books in the data we are working with.
    
    * We filtered-out books that were rated more than 50 times. This step ensured that we overlooked
      recommending books that were rated a few times and have a high rating.

#### 4. Personalised Recommendation

    Here, we will move from recommending the most read and the highly-rated books to making the recommendation
    personal. The techniques that we will be employing is a collaborative-based filtering.

`Basic Intuition behind Collaborative-based filtering`

    Given a group of book-readers who rated a set of books in the same way (similar readers), if a subset
    of these readers rate others books in the same way, then those books can be recommended to the 
    remaining subset.
    
    Steps followed:
    
    1. Computer a user-to-item matrix for the readers and books; with ratings as values.
    2. Normalize the dataset by centering all ratings around zero, so we can fill-in the missing values with zero

        Note:
        - Missing values are due to instances where uses didn't watch or rate
          a movie.
        - Hence, we can't fill them in with zero without normalizing as this
          will misrepresent the information'

    3. We will compute cosine-similarity to find similar readers based on how
      they rated books and similar books based on how they were rated by users.

In [26]:
# Pivot_table users vs book with ratings as values
user_to_item = data.iloc[:50000].pivot_table(index='User-ID', columns=['Book-Title'], values='Book-Rating')

In [27]:
user_to_item.head()

Book-Title,1984,2nd Chance,"40 Tons Of Trouble (Women Who Dare) (Harlequin Super Romance, No 726)",7b,A Bend in the Road,A Confederacy of Dunces,A Coral Kiss,A Country Courtship (Zebra Regency Romance),A Judgement in Stone,A Kiss Remembered,A Man in Full,A Map of the World,A Monk Swimming,A Painted House,A Second Chicken Soup for the Woman's Soul (Chicken Soup for the Soul Series),A Small Dark Place,A Soldier of the Great War,"A Suitable Boy : Novel, A",A Widow for One Year,A Year by the Sea: Thoughts of an Unfinished Woman,A la vora del pou (El BalancÃÂ­),A-Z of Behaving Badly,ALL THAT REMAINS,ANGELA'S ASHES,About a boy,Affinity,Airframe,Alice in Wonderland and Through the Looking Glass (Illustrated Junior Library),Alice's Adventures in Wonderland and Through the Looking Glass,Alice's Tulips,All He Ever Wanted: A Novel,All Our Yesterdays (Large Print),All That Remains (Kay Scarpetta Mysteries (Paperback)),All or Nothing (Wheeler Large Print Books),All the King's Men,Alpha Teach Yourself American Sign Language in 24 Hours (Alpha Teach Yourself in 24 Hours),"Always A Bridesmaid (Harlequin American Romance, No 266)",Always Daddy's Girl: Understanding Your Father's Impact on Who You Are,"Amazing Grace : Lives of Children and the Conscience of a Nation, The",American Gods,...,Urn Burial,Vanished,Veronika Deschliesst Zu Sterben / Vernika Decides to Die,Vertical Poetry: Recent Poems,Victorious Christians You Should Know,Vinegar Hill (Oprah's Book Club (Paperback)),Waiting,War and Peace (Wordsworth Classics),Was It Something I Said?,"Watch Over Me (Rocky Mountain Rescue) (Harlequin Intrigue, No 454)",Watercolor School (Reader's Digest Learn-As-You-Go Guides),Wednesday the Rabbi Got Wet,Westmark,What If?: The World's Foremost Military Historians Imagine What Might Have Been,What Shall I Be (Barbie Carryalong),What a Wonderful World: A Lifetime of Recordings,Where You'll Find Me: And Other Stories,Where or When : A Novel,"Whisper of Evil (Hooper, Kay. Evil Trilogy.)",White Oleander : A Novel,Why Are Boys So Weird (Tales from Third Grade),Wicked : The Life and Times of the Wicked Witch of the West,Wicked: The Life and Times of the Wicked Witch of the West,Wie Barney es sieht.,Wild Animus,Wilderness Tips,Windmills of the Gods,Winter Raven,Winter Solstice,Wish You Well,Witness for the Prosecution,Wolf Moon,Women,Worshipful Company of Fletchers: Poems,Wouldn't Take Nothing for My Journey Now,Yarrow,Year's Best Fantasy (Year's Best Fantasy),You Can Surf the Net: Your Guide to the World of the Internet,Zodiac: The Eco-Thriller,stardust
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,0.0,,,5.0,,,,,,,,,,,,,,,,,,,,,,,
9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
12,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [28]:
# Mean ratings
mean_rating = user_to_item.mean(axis=1)

# Substract all values from mean_ratings to center them
user_to_item = user_to_item.sub(mean_rating)

# Fill-in missing values with 0
user_to_item.fillna(value=0, inplace=True)

In [29]:
user_to_item.head()

Unnamed: 0_level_0,1984,2nd Chance,"40 Tons Of Trouble (Women Who Dare) (Harlequin Super Romance, No 726)",7b,A Bend in the Road,A Confederacy of Dunces,A Coral Kiss,A Country Courtship (Zebra Regency Romance),A Judgement in Stone,A Kiss Remembered,A Man in Full,A Map of the World,A Monk Swimming,A Painted House,A Second Chicken Soup for the Woman's Soul (Chicken Soup for the Soul Series),A Small Dark Place,A Soldier of the Great War,"A Suitable Boy : Novel, A",A Widow for One Year,A Year by the Sea: Thoughts of an Unfinished Woman,A la vora del pou (El BalancÃÂ­),A-Z of Behaving Badly,ALL THAT REMAINS,ANGELA'S ASHES,About a boy,Affinity,Airframe,Alice in Wonderland and Through the Looking Glass (Illustrated Junior Library),Alice's Adventures in Wonderland and Through the Looking Glass,Alice's Tulips,All He Ever Wanted: A Novel,All Our Yesterdays (Large Print),All That Remains (Kay Scarpetta Mysteries (Paperback)),All or Nothing (Wheeler Large Print Books),All the King's Men,Alpha Teach Yourself American Sign Language in 24 Hours (Alpha Teach Yourself in 24 Hours),"Always A Bridesmaid (Harlequin American Romance, No 266)",Always Daddy's Girl: Understanding Your Father's Impact on Who You Are,"Amazing Grace : Lives of Children and the Conscience of a Nation, The",American Gods,...,278202,278218,278221,278254,278255,278257,278314,278333,278342,278346,278350,278356,278373,278390,278418,278422,278469,278514,278522,278535,278541,278543,278552,278554,278561,278582,278586,278633,278681,278692,278723,278724,278798,278800,278813,278819,278832,278843,278844,278851
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [30]:
# Cosine Similarity
from sklearn.metrics.pairwise import cosine_similarity

In [41]:
# Similar Books
book_to_book_matrix = pd.DataFrame(cosine_similarity(user_to_item.T), index=user_to_item.T.index, columns=user_to_item.T.index)

In [42]:
book_to_book_matrix.head()

Book-Title,2nd Chance,A Confederacy of Dunces,A Kiss Remembered,A Map of the World,A Monk Swimming,A Painted House,A Second Chicken Soup for the Woman's Soul (Chicken Soup for the Soul Series),A Soldier of the Great War,Airframe,All He Ever Wanted: A Novel,All That Remains (Kay Scarpetta Mysteries (Paperback)),All or Nothing (Wheeler Large Print Books),All the King's Men,Always Daddy's Girl: Understanding Your Father's Impact on Who You Are,An Atmosphere of Eternity: Stories of India,Angels &amp Demons,Anil's Ghost,Atonement : A Novel,Bant/Spec.Last of the Breed,Before I Say Good-Bye,Beloved (Plume Contemporary Fiction),Black Beauty (Illustrated Classics),Black Market,Bleachers,Bless The Beasts And Children : Bless The Beasts And Children,Blood Oath,Body of Evidence (Kay Scarpetta Mysteries (Paperback)),Breathing Lessons,Bridget Jones's Diary,Bringing Down the House: The Inside Story of Six M.I.T. Students Who Took Vegas for Millions,By the Rivers of Babylon,Care Packages : Letters to Christopher Reeve from Strangers and Other Friends,Carolina Moon,Chicken Soup for the Soul (Chicken Soup for the Soul),Chocolate Jesus,Clara Callan,Classical Mythology,Clifford's Sports Day,Congo,Contact,...,The Red Tent : A Novel,The Rescue,The Right Man : The Surprise Presidency of George W. Bush,"The Ruby in the Smoke (Sally Lockhart Trilogy, Book 1)",The Short Forever,The Silent Cry (William Monk Novels (Paperback)),The Soulbane Stratagem,The Street Lawyer,The Sum of All Fears,The Tall Pine Polka,The Tao of Pooh,The Testament,The Therapeutic Touch: How to Use Your Hands to Help or to Heal,"The Touch of Your Shadow, the Whisper of Your Name (Babylon 5, Book 5)",The Witchfinder (Amos Walker Mystery Series),The yawning heights,This Year It Will Be Different: And Other Stories,Through Wolf's Eyes (Wolf),Thursday Next in the Well Of Lost Plots (Thursday Next Novels (Penguin Books)),Timeline,Tis : A Memoir,To Kill a Mockingbird,Tree Grows In Brooklyn,Tu Nombre Escrito En El Agua (La Sonrisa Vertical),Turning Thirty,Twenty Minute Retreats: Revive Your Spirits in Just Minutes a Day (A Pan Self-discovery Title),"Ulysses (Ã?Ã?bersetzg. WollschlÃ?ÃÂ¤ger). ( Neue Folge, 100).",Under the Black Flag: The Romance and the Reality of Life Among the Pirates,Unnatural Exposure,Vanished,Veronika Deschliesst Zu Sterben / Vernika Decides to Die,What If?: The World's Foremost Military Historians Imagine What Might Have Been,Where You'll Find Me: And Other Stories,Where or When : A Novel,"Whisper of Evil (Hooper, Kay. Evil Trilogy.)",Wicked: The Life and Times of the Wicked Witch of the West,Wie Barney es sieht.,Wild Animus,Winter Solstice,Wish You Well
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2nd Chance,1.0,0.001773,0.010017,-0.001855,-0.011089,0.019748,0.0,0.008205,0.013257,-0.028424,0.062041,0.0,0.0,0.0,0.0,0.027674,0.007243,-0.014228,0.002255,0.035938,0.010137,0.0,0.019925,0.03084,0.0,0.004379,0.096454,-0.004404,-0.002234,0.0,0.030585,0.010961,0.009458,0.014657,0.011914,0.017961,0.0,0.010456,0.017212,0.0,...,0.031145,0.009793,0.0,-0.004789,0.012655,0.000258,0.0,0.012835,0.008041,0.007183,0.028339,0.039321,0.0,0.0,0.0,0.0,0.012511,0.0,0.0,0.018359,0.015453,-0.003049,0.022689,0.0,0.0,0.0,0.0,0.0,0.012894,-0.015835,0.022164,0.0,0.0,0.010943,0.010684,-0.019805,0.0,0.03022,0.007427,0.03614
A Confederacy of Dunces,0.001773,1.0,-0.010511,0.00657,0.010305,0.005854,0.024598,0.0,-0.00291,-0.019883,0.00834,0.0,0.0,0.0,0.0,0.00825,0.0,0.030324,0.0,-0.00936,0.005992,0.0,0.011112,-0.000325,0.0,0.0,0.006758,0.001353,0.025018,-0.027266,-0.013012,0.019475,0.019552,0.012267,0.037541,-0.019648,0.0,0.018577,-0.002491,0.0,...,-0.00332,-0.005436,0.0,0.008935,-0.010938,0.009625,0.0,-3.2e-05,0.010482,0.025524,0.034959,0.003829,0.0,0.0,0.0,0.0,0.008235,0.0,0.0,-0.009426,0.015918,0.024815,0.0,0.0,0.0,0.0,0.0,0.0,-0.030754,0.007034,-0.024245,0.0,0.0,0.024497,0.0,-0.019839,0.0,0.01066,-0.016206,0.012772
A Kiss Remembered,0.010017,-0.010511,1.0,0.010369,0.006404,0.02762,0.0,0.0,0.01141,0.0,0.030842,0.0,0.0,0.0,0.0,0.029033,0.0,0.00283,0.0,0.043419,0.009514,0.0,0.0,0.010779,0.0,0.112098,-0.013556,0.015357,-0.010453,0.0,0.030111,0.0,0.06504,0.016927,0.0,0.024627,0.0,0.0,0.019391,0.0,...,0.0,0.020523,0.0,0.0,0.089757,0.017455,0.0,-0.011853,0.002935,0.0,0.007352,0.034046,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023742,0.017188,0.006241,-0.049166,0.0,0.0,0.0,0.0,0.0,0.004931,0.063777,0.030389,0.0,0.0,0.036748,0.013273,0.033223,0.0,0.010024,0.007116,0.037906
A Map of the World,-0.001855,0.00657,0.010369,1.0,0.03577,0.034011,0.0,-0.00281,-0.003532,0.039579,0.036917,0.0,0.009262,0.0,0.0,0.003066,0.005532,0.002413,0.0,0.03068,0.042734,0.0,0.001119,0.021841,0.0,-0.024092,0.043302,0.043133,-0.00534,0.0,0.028986,0.0,-0.010762,0.009799,0.0091,0.007382,0.0,-0.007051,0.015553,-0.011231,...,-0.012626,0.042958,0.0,0.015364,0.01619,0.000485,0.0,0.022894,0.008348,-0.029719,0.000688,0.024705,0.019683,0.0,0.0,0.0,0.004503,0.0,0.016071,0.014689,0.022349,0.017351,-0.012592,0.0,0.0,0.0,0.0,0.0,0.040232,0.022773,0.023917,0.0,0.0,-0.009197,0.023417,0.056119,0.0,0.015309,0.00927,0.002957
A Monk Swimming,-0.011089,0.010305,0.006404,0.03577,1.0,0.007291,0.0,0.0,0.02622,0.0,-0.039957,0.0,0.0,0.0,0.0,0.001298,0.0,0.002973,0.0,-0.012763,-0.016057,0.0,0.013655,-0.022677,0.0,0.0,-0.02065,0.01746,0.022742,0.0,-0.049251,0.0,0.001034,0.011395,0.0,-0.02679,0.0,-0.027832,-0.009038,0.0,...,-0.020727,-0.008179,0.0,0.0,-0.032042,-0.001296,0.0,0.004186,0.007146,-0.085725,-0.054121,-0.004872,0.0,0.0,0.0,0.0,0.022458,0.0,0.0,-0.020971,-0.012827,0.019712,-0.043083,0.0,0.0,0.0,0.0,0.0,0.003752,0.042152,-0.033059,0.0,0.0,0.024227,0.00998,-0.003213,0.0,-0.000754,0.001056,-0.010976


`Sample book recommendation`

In [53]:
def find_10_similar_books(book_title, similarity_matrix):
  """
  book_title: The title of the book with which we wish to find similar books
  similarity_matrix: The matrix of similar books
  """

  # Similar books
  top_10_similar_books = pd.DataFrame(similarity_matrix[book_title].sort_values(ascending=False)).iloc[1:]

  return top_10_similar_books.head(10)

In [50]:
book_title = "2nd Chance"
similar_books_10 = find_10_similar_books(book_title=book_title, similarity_matrix=book_to_book_matrix)
print(similar_books_10)

                                                    2nd Chance
Book-Title                                                    
The Beach House                                       0.103502
Body of Evidence (Kay Scarpetta Mysteries (Pape...    0.096454
Death in the Clouds                                   0.073438
PLEADING GUILTY                                       0.068353
The Midnight Club                                     0.065393
Season of the Machete                                 0.062080
All That Remains (Kay Scarpetta Mysteries (Pape...    0.062041
Small Town Girl                                       0.058460
Protect and Defend                                    0.049797
If Morning Ever Comes                                 0.044467


`Sample similar users`

In [51]:
# Similar Readers
user_to_user_matrix = pd.DataFrame(cosine_similarity(user_to_item), index=user_to_item.index, columns=user_to_item.index)

In [52]:
user_to_user_matrix.head()

User-ID,2,8,9,10,12,14,16,17,19,20,22,23,26,32,36,39,42,44,51,53,56,64,67,68,69,70,73,75,77,78,79,81,82,83,85,86,87,88,91,92,...,277827,277828,277901,277903,277922,277928,277932,277965,278002,278007,278026,278075,278122,278137,278144,278153,278176,278188,278194,278202,278218,278221,278254,278333,278342,278373,278390,278418,278422,278514,278535,278554,278561,278633,278692,278723,278819,278832,278843,278851
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.233168,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.096379,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [54]:
def find_10_similar_readers(user_id, similarity_matrix):
  """
  user_id: The ID of a user
  similarity_matrix: The matrix of similar users
  """

  # Similar books
  top_10_similar_readers = pd.DataFrame(similarity_matrix[user_id].sort_values(ascending=False)).iloc[1:]

  return top_10_similar_readers.head(10)

In [58]:
user_id = 8
similar_readers_10 = find_10_similar_readers(user_id=user_id, similarity_matrix=user_to_user_matrix)

print(similar_readers_10)

                8
User-ID          
83160    0.405989
169187   0.405989
252282   0.400231
248583   0.400231
153461   0.400231
180297   0.400231
136509   0.324136
256233   0.320689
49300    0.319451
236322   0.285500


**5. Matrix Factorisation**

In [32]:
# Number of cells with values
empty_cells = user_to_item.isnull().values.sum()

# Total number of cells
full_cells = user_to_item.size

# Sparcity of matrix
sparcity = empty_cells / empty_cells

In [33]:
print("Sparcity:", sparcity)

Sparcity: 1.0


    Note:

    We can observe that nearly 100% of the matrix is filled-in. Therefore, using
    this matrix to build a model such as KNN to find similar users and predict
    rating will be a bad idea.

    Hence, we will focus our attention on matrix factorization to fill-in the
    matrix.

In [33]:
# Import the required libraries 
from scipy.sparse.linalg import svds

# Decompose the matrix
U, sigma, Vt = svds(user_to_item, k=5)

In [34]:
# Convert sigma into a 2D matrix
sigma = np.diag(sigma)

# Computer dot product of U and sigma
U_sigma = np.dot(U, sigma)

# Computer dot product of U_sigma and Vt
U_sigma_Vt = np.dot(U_sigma, Vt)

# Uncenter the ratings back
uncentered_ratings = U_sigma_Vt + mean_rating.values.reshape(-1, 1)

In [37]:
# DataFrame with uncentered ratings
cal_pred_ratings_df = pd.DataFrame(uncentered_ratings, index=user_to_item.index, columns=user_to_item.columns)

In [39]:
# SVD Predictions
def get_svd_predictions(rating_dataframe, user_id):
  """
  rating_dataframe: Dataframe contain uncentered ratings resulting from matrix factorisation
  user_id: Book user's ID
  """

  user_ratings = rating_dataframe.loc[user_id,:].sort_values(ascending=False)
  return user_ratings

In [40]:
user_id = 2
print(get_svd_predictions(cal_pred_ratings_df, user_id))

278851    0.0
83573     0.0
83579     0.0
83584     0.0
83587     0.0
         ... 
179826    0.0
179831    0.0
179843    0.0
179885    0.0
1984      0.0
Name: 2, Length: 19058, dtype: float64


**6. SVD Validations**

In [43]:
# Import Mean_Squared_Error
from sklearn.metrics import mean_squared_error

# Extract the ground truth to compare your predictions against
actual_values = user_to_item.iloc[:20, :100].values
predicted_values = cal_pred_ratings_df.iloc[:20, :100].values

# Create a mask of actual_values to only look at the non-missing values in the ground truth
mask = ~np.isnan(actual_values)

# Print the performance of both predictions and compare
print(mean_squared_error(actual_values[mask], predicted_values[mask], squared=False))

4.886187608400121


In [44]:
data.head()

Unnamed: 0,User-ID,Location,Age,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,2,"stockton, california, usa",18,195153448,0,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,8,"timmins, ontario, canada",32,2005018,5,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,11400,"ottawa, ontario, canada",49,2005018,0,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
3,11676,"n/a, n/a, n/a",32,2005018,8,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
4,41385,"sudbury, ontario, canada",32,2005018,0,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
