##Import Libraries
Let's import some libraries to get started!

In [None]:
# import libraries (you may add additional imports but you may not have to)
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt

##Get Data
Let's start by reading in the df_books and df_ratings file into a pandas dataframe.



In [None]:
# get data files
!wget https://cdn.freecodecamp.org/project-data/books/book-crossings.zip

!unzip book-crossings.zip

books_filename = 'BX-Books.csv'
ratings_filename = 'BX-Book-Ratings.csv'

--2022-11-21 21:28:01--  https://cdn.freecodecamp.org/project-data/books/book-crossings.zip
Resolving cdn.freecodecamp.org (cdn.freecodecamp.org)... 104.26.3.33, 172.67.70.149, 104.26.2.33, ...
Connecting to cdn.freecodecamp.org (cdn.freecodecamp.org)|104.26.3.33|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26085508 (25M) [application/zip]
Saving to: ‘book-crossings.zip’


2022-11-21 21:28:02 (20.3 MB/s) - ‘book-crossings.zip’ saved [26085508/26085508]

Archive:  book-crossings.zip
  inflating: BX-Book-Ratings.csv     
  inflating: BX-Books.csv            
  inflating: BX-Users.csv            


In [None]:
# import csv data into dataframes
df_books = pd.read_csv(
    books_filename,
    encoding = "ISO-8859-1",
    sep=";",
    header=0,
    names=['isbn', 'title', 'author'],
    usecols=['isbn', 'title', 'author'],
    dtype={'isbn': 'str', 'title': 'str', 'author': 'str'})

df_ratings = pd.read_csv(
    ratings_filename,
    encoding = "ISO-8859-1",
    sep=";",
    header=0,
    names=['user', 'isbn', 'rating'],
    usecols=['user', 'isbn', 'rating'],
    dtype={'user': 'int32', 'isbn': 'str', 'rating': 'float32'})

In [None]:
df_books.head()

Unnamed: 0,isbn,title,author
0,195153448,Classical Mythology,Mark P. O. Morford
1,2005018,Clara Callan,Richard Bruce Wright
2,60973129,Decision in Normandy,Carlo D'Este
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata
4,393045218,The Mummies of Urumchi,E. J. W. Barber


In [None]:
df_ratings.head()

Unnamed: 0,user,isbn,rating
0,276725,034545104X,0.0
1,276726,0155061224,5.0
2,276727,0446520802,0.0
3,276729,052165615X,3.0
4,276729,0521795028,6.0


##Data Pre processing
The First Step is to Calculate user and book rating counts, to Add them to the rating Dataframe, and finally filter following the recommended Criterias for more  Segnificance : Total User Ratings> 200 and Total Book Rating >100 

In [None]:
user_RatingCount = df_ratings.groupby('user')['rating'].count().reset_index().rename(columns = {'rating':'TotaluserRating'})
book_RatingCount = df_ratings.groupby('isbn')['rating'].count().reset_index().rename(columns = {'rating':'TotalbookRating'})
df_ratings = df_ratings.merge(user_RatingCount,how='left', left_on='user', right_on='user')
df_ratings = df_ratings.merge(book_RatingCount, how='left', left_on='isbn', right_on='isbn')
df_ratings_2 =df_ratings.loc[(df_ratings['TotaluserRating']>=200) & (df_ratings['TotalbookRating']>=100)]



In [None]:
df_ratings_2

Unnamed: 0,user,isbn,rating,TotaluserRating,TotalbookRating
1456,277427,002542730X,10.0,497,171
1469,277427,0060930535,0.0,497,494
1471,277427,0060934417,0.0,497,350
1474,277427,0061009059,9.0,497,291
1484,277427,0140067477,0.0,497,189
...,...,...,...,...,...
1147304,275970,0804111359,0.0,1376,167
1147436,275970,140003065X,0.0,1376,157
1147439,275970,1400031346,0.0,1376,106
1147440,275970,1400031354,0.0,1376,202


The Next Step is to merge the Ratings Dataset with The book Dataset

In [None]:
 books_with_ratings = pd.merge(df_ratings_2, df_books, on='isbn')
books_with_ratings.head()


Unnamed: 0,user,isbn,rating,TotaluserRating,TotalbookRating,title,author
0,277427,002542730X,10.0,497,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
1,3363,002542730X,0.0,901,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
2,11676,002542730X,6.0,13602,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
3,12538,002542730X,10.0,1351,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
4,13552,002542730X,0.0,709,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner


It's Important to remove the duplicated Rows 

In [None]:
books_with_ratings_2 = books_with_ratings.drop_duplicates(['title', 'user'])

In [None]:
books_with_ratings_2

Unnamed: 0,user,isbn,rating,TotaluserRating,TotalbookRating,title,author
0,277427,002542730X,10.0,497,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
1,3363,002542730X,0.0,901,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
2,11676,002542730X,6.0,13602,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
3,12538,002542730X,10.0,1351,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
4,13552,002542730X,0.0,709,171,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner
...,...,...,...,...,...,...,...
49512,238864,0399149325,0.0,353,113,Portrait of a Killer: Jack the Ripper -- Case ...,Patricia Cornwell
49513,251843,0399149325,1.0,338,113,Portrait of a Killer: Jack the Ripper -- Case ...,Patricia Cornwell
49514,253821,0399149325,0.0,337,113,Portrait of a Killer: Jack the Ripper -- Case ...,Patricia Cornwell
49515,265115,0399149325,0.0,1221,113,Portrait of a Killer: Jack the Ripper -- Case ...,Patricia Cornwell


The Next Step is to Create a Pivot Table With Ratings for each book and each User, Then Concert it to a matrix to feed the KNN Model , The Different Users Ratings Constitute the Different Dimensions for each Book Title

In [None]:
books_with_ratings_pivot = pd.pivot_table(data=books_with_ratings_2, values='rating', index='title', columns='user').fillna(0)
print(books_with_ratings_pivot)

user                                                254     2276    2766    \
title                                                                        
1984                                                   9.0     0.0     0.0   
1st to Die: A Novel                                    0.0     0.0     0.0   
2nd Chance                                             0.0    10.0     0.0   
4 Blondes                                              0.0     0.0     0.0   
A Beautiful Mind: The Life of Mathematical Geni...     0.0     0.0     0.0   
...                                                    ...     ...     ...   
Without Remorse                                        0.0     0.0     0.0   
Year of Wonders                                        0.0     0.0     0.0   
You Belong To Me                                       0.0     0.0     0.0   
Zen and the Art of Motorcycle Maintenance: An I...     0.0     0.0     0.0   
\O\" Is for Outlaw"                                    0.0     0

In [None]:
books_with_ratings_matrix = csr_matrix(books_with_ratings_pivot.values)



###Creating the Model
We create a KNN Model by passing the Ratings Matrix we created earlier

In [None]:
model_knn = NearestNeighbors(algorithm='auto', metric='cosine')
model_knn.fit(books_with_ratings_matrix)

NearestNeighbors(metric='cosine')

In [None]:
NearestNeighbors(algorithm='auto', leaf_size=30, metric='cosine',
                 metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                 radius=1.0)

NearestNeighbors(metric='cosine')


###Creating the Recommendation Function
We create a that will returns 5 a list of 5 similar books with their distances from the book argument.

In [None]:
# function to return recommended books - this will be tested

def get_recommends(book = ""):

  X = books_with_ratings_pivot[books_with_ratings_pivot.index == book]
  X = X.to_numpy().reshape(1,-1)
  distances, indices = model_knn.kneighbors(X,n_neighbors=8)
  recommended_books = []
  for x in reversed(range(1,6)):
      bookrecommended = [books_with_ratings_pivot.index[indices.flatten()[x]], distances.flatten()[x]]
      recommended_books.append(bookrecommended)
  recommended_books = [book, recommended_books]
  
  return recommended_books

## Challenge Test
This Function Is Provided in the Challenge intital Notebook in order to test the Model Created using The get_recommends function, 

In [None]:
books = get_recommends("Where the Heart Is (Oprah's Book Club (Paperback))")
print(books)

def test_book_recommendation():
  test_pass = True
  recommends = get_recommends("Where the Heart Is (Oprah's Book Club (Paperback))")
  if recommends[0] != "Where the Heart Is (Oprah's Book Club (Paperback))":
    test_pass = False
  recommended_books = ["I'll Be Seeing You", 'The Weight of Water', 'The Surgeon', 'I Know This Much Is True']
  recommended_books_dist = [0.8, 0.77, 0.77, 0.77]
  for i in range(2): 
    if recommends[1][i][0] not in recommended_books:
      test_pass = False
    if abs(recommends[1][i][1] - recommended_books_dist[i]) >= 0.05:
      test_pass = False
  if test_pass:
    print("You passed the challenge! 🎉🎉🎉🎉🎉")
  else:
    print("You haven't passed yet. Keep trying!")

test_book_recommendation()

["Where the Heart Is (Oprah's Book Club (Paperback))", [["I'll Be Seeing You", 0.8016211], ['The Weight of Water', 0.77085835], ['The Surgeon', 0.7699411], ['I Know This Much Is True', 0.7677075], ['The Lovely Bones: A Novel', 0.7234864]]]
You passed the challenge! 🎉🎉🎉🎉🎉


The Recommended Books List Match the Expected List so The Challenge is Succesfuly Completed
