# Problem:

In this challenge, you will create a book recommendation algorithm using K-Nearest Neighbors.

You will use the Book-Crossings dataset. This dataset contains 1.1 million ratings (scale of 1-10) of 270,000 books by 90,000 users.

After importing and cleaning the data, use NearestNeighbors from sklearn.neighbors to develop a model that shows books that are similar to a given book. The Nearest Neighbors algorithm measures the distance to determine the “closeness” of instances.

Create a function named get_recommends that takes a book title (from the dataset) as an argument and returns a list of 5 similar books with their distances from the book argument.

This code:

get_recommends("The Queen of the Damned (Vampire Chronicles (Paperback))")
should return:

[
  'The Queen of the Damned (Vampire Chronicles (Paperback))',
  [
    ['Catch 22', 0.793983519077301],
    ['The Witching Hour (Lives of the Mayfair Witches)', 0.7448656558990479],
    ['Interview with the Vampire', 0.7345068454742432],
    ['The Tale of the Body Thief (Vampire Chronicles (Paperback))', 0.5376338362693787],
    ['The Vampire Lestat (Vampire Chronicles, Book II)', 0.5178412199020386]
  ]
]
Notice that the data returned from get_recommends() is a list. The first element in the list is the book title passed into the function. The second element in the list is a list of five more lists. Each of the five lists contains a recommended book and the distance from the recommended book to the book passed into the function.

If you graph the dataset (optional), you will notice that most books are not rated frequently. To ensure statistical significance, remove from the dataset users with less than 200 ratings and books with less than 100 ratings.

The first three cells import libraries you may need and the data to use. The final cell is for testing. Write all your code in between those cells.

----

In [1]:
# import libraries (you may add additional imports but you may not have to)
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt

# get data files
!wget https://cdn.freecodecamp.org/project-data/books/book-crossings.zip

!unzip book-crossings.zip

books_filename = 'BX-Books.csv'
ratings_filename = 'BX-Book-Ratings.csv'

# import csv data into dataframes
df_books = pd.read_csv(
    books_filename,
    encoding = "ISO-8859-1",
    sep=";",
    header=0,
    names=['isbn', 'title', 'author'],
    usecols=['isbn', 'title', 'author'],
    dtype={'isbn': 'str', 'title': 'str', 'author': 'str'})

df_ratings = pd.read_csv(
    ratings_filename,
    encoding = "ISO-8859-1",
    sep=";",
    header=0,
    names=['user', 'isbn', 'rating'],
    usecols=['user', 'isbn', 'rating'],
    dtype={'user': 'int32', 'isbn': 'str', 'rating': 'float32'})



--2024-07-27 03:58:38--  https://cdn.freecodecamp.org/project-data/books/book-crossings.zip
Resolving cdn.freecodecamp.org (cdn.freecodecamp.org)... 104.26.2.33, 104.26.3.33, 172.67.70.149, ...
Connecting to cdn.freecodecamp.org (cdn.freecodecamp.org)|104.26.2.33|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26085508 (25M) [application/zip]
Saving to: ‘book-crossings.zip’


2024-07-27 03:58:38 (84.0 MB/s) - ‘book-crossings.zip’ saved [26085508/26085508]

Archive:  book-crossings.zip
  inflating: BX-Book-Ratings.csv     
  inflating: BX-Books.csv            
  inflating: BX-Users.csv            


In [2]:
# Eliminar a usuarios con menos de 200 calificaciones # por la significacia estadistica
rating_agrupado = df_ratings.groupby("user").rating.count()
rating_lista_de_Indices= rating_agrupado[rating_agrupado >= 200].index # retorna una lista de id de usuarios que cumplen con la condicion
# Tenemos que agrupar el indice por usuario, luego tenemos que contar la cantidad de ratings que se mando.
# luego tenemos que filtrar por l oque viene siendo la cnantidad , aquellos que no cumplan con
# la condician no apareceran en el nuevo dataframe filtrado
df_ratings_filtrado = df_ratings[df_ratings["user"].isin(rating_lista_de_Indices)]
print(df_ratings_filtrado.shape) # creo que esta bien # comparemos su tamaño con el original # <-----------
print(df_ratings.shape) # podemos comparra como es que se achico un monton

(527556, 3)
(1149780, 3)


In [3]:
# df_ratings_filtrado

# df_books.head(3) #
# eliminar a los libros con menos de 100 califaciones
# usaremos el df_ratings, para captar el ISBn
libros_agrupados = df_ratings.groupby("isbn").rating.count()
indices_de_libros_agrupados = libros_agrupados[libros_agrupados >= 100].index # queremos descartar los que tienen menos de 100 calificaciones
df_books_filtrado = df_books[df_books["isbn"].isin(indices_de_libros_agrupados)]
df_books_filtrado


Unnamed: 0,isbn,title,author
18,0440234743,The Testament,John Grisham
19,0452264464,Beloved (Plume Contemporary Fiction),Toni Morrison
26,0971880107,Wild Animus,Rich Shapero
27,0345402871,Airframe,Michael Crichton
28,0345417623,Timeline,MICHAEL CRICHTON
...,...,...,...
28072,0425178765,Easy Prey,John Sandford
29215,0449223604,M Is for Malice,Sue Grafton
30535,0345444884,The Talisman,STEPHEN KING
30775,0060008032,Angels,Marian Keyes


In [4]:
print(df_books_filtrado.shape)
print(df_books.shape)

(727, 3)
(271379, 3)


In [5]:
# vemos que tantos valores NaN tienen estos datos
print(df_books_filtrado.isnull().sum())
print(df_ratings_filtrado.isnull().sum())

isbn      0
title     0
author    0
dtype: int64
user      0
isbn      0
rating    0
dtype: int64


In [6]:
df_books_filtrado.head()

Unnamed: 0,isbn,title,author
18,440234743,The Testament,John Grisham
19,452264464,Beloved (Plume Contemporary Fiction),Toni Morrison
26,971880107,Wild Animus,Rich Shapero
27,345402871,Airframe,Michael Crichton
28,345417623,Timeline,MICHAEL CRICHTON


In [7]:
df_ratings_filtrado.head()

Unnamed: 0,user,isbn,rating
1456,277427,002542730X,10.0
1457,277427,0026217457,0.0
1458,277427,003008685X,8.0
1459,277427,0030615321,0.0
1460,277427,0060002050,0.0


In [8]:
# Antes de unir las columnas comparamos que tamaño tiene nuestras bases de datos

print("b", df_books_filtrado.shape)
print("r", df_ratings_filtrado.shape)

b (727, 3)
r (527556, 3)


In [9]:
# nos aseguramos que los df. sean del mismo tipo

# df_books_filtrado['isbn'] = df_books_filtrado['isbn'].astype(str) #
print(df_books_filtrado['isbn'].dtype)
# df_ratings_filtrado['isbn'] = df_ratings_filtrado['isbn'].astype(str) # de lo contrario descomentamos
print(df_ratings_filtrado['isbn'].dtype)

# Unimos con merge usando isbn como principal clave (similar a los join de SQL)

# Unir los dataframes en función de la columna 'isbn'
df_unido = pd.merge(df_books_filtrado, df_ratings_filtrado, on='isbn')

# Mostrar las primeras filas del dataframe unido
print(df_unido.head(10))
print("tamaño de df_unido : ", df_unido.shape)


object
object
         isbn          title        author    user  rating
0  0440234743  The Testament  John Grisham  277478     0.0
1  0440234743  The Testament  John Grisham    2977     0.0
2  0440234743  The Testament  John Grisham    3363     0.0
3  0440234743  The Testament  John Grisham    7346     9.0
4  0440234743  The Testament  John Grisham    9856     0.0
5  0440234743  The Testament  John Grisham   11676     9.0
6  0440234743  The Testament  John Grisham   13552     8.0
7  0440234743  The Testament  John Grisham   14521     0.0
8  0440234743  The Testament  John Grisham   16795     0.0
9  0440234743  The Testament  John Grisham   23768     0.0
tamaño de df_unido :  (49517, 5)


In [10]:
print(df_unido.shape)
df_unido.head(10)

(49517, 5)


Unnamed: 0,isbn,title,author,user,rating
0,440234743,The Testament,John Grisham,277478,0.0
1,440234743,The Testament,John Grisham,2977,0.0
2,440234743,The Testament,John Grisham,3363,0.0
3,440234743,The Testament,John Grisham,7346,9.0
4,440234743,The Testament,John Grisham,9856,0.0
5,440234743,The Testament,John Grisham,11676,9.0
6,440234743,The Testament,John Grisham,13552,8.0
7,440234743,The Testament,John Grisham,14521,0.0
8,440234743,The Testament,John Grisham,16795,0.0
9,440234743,The Testament,John Grisham,23768,0.0


In [11]:
# aun tenemos libros que no estan calificados con frecuencia
# mostramos solo los libros que si estan calificados
df_unido[df_unido["rating"] > 1]

# los que calificaron en total son 12502 personas vamos a quedarnos con eso

Unnamed: 0,isbn,title,author,user,rating
3,0440234743,The Testament,John Grisham,7346,9.0
5,0440234743,The Testament,John Grisham,11676,9.0
6,0440234743,The Testament,John Grisham,13552,8.0
13,0440234743,The Testament,John Grisham,30533,6.0
14,0440234743,The Testament,John Grisham,31315,10.0
...,...,...,...,...,...
49476,0060008032,Angels,Marian Keyes,271705,8.0
49477,0515135739,Eleventh Hour: An FBI Thriller (FBI Thriller (...,Catherine Coulter,11676,5.0
49479,0515135739,Eleventh Hour: An FBI Thriller (FBI Thriller (...,Catherine Coulter,30276,7.0
49490,0515135739,Eleventh Hour: An FBI Thriller (FBI Thriller (...,Catherine Coulter,107021,10.0


In [12]:
df_unido["rating"] = df_unido["rating"].astype(int)
df = df_unido[df_unido["rating"] >= 1]
df

Unnamed: 0,isbn,title,author,user,rating
3,0440234743,The Testament,John Grisham,7346,9
5,0440234743,The Testament,John Grisham,11676,9
6,0440234743,The Testament,John Grisham,13552,8
13,0440234743,The Testament,John Grisham,30533,6
14,0440234743,The Testament,John Grisham,31315,10
...,...,...,...,...,...
49476,0060008032,Angels,Marian Keyes,271705,8
49477,0515135739,Eleventh Hour: An FBI Thriller (FBI Thriller (...,Catherine Coulter,11676,5
49479,0515135739,Eleventh Hour: An FBI Thriller (FBI Thriller (...,Catherine Coulter,30276,7
49490,0515135739,Eleventh Hour: An FBI Thriller (FBI Thriller (...,Catherine Coulter,107021,10


In [13]:
df['user'].value_counts() # cantidad de calificaciones que da el usuario

user
11676     445
16795     127
95359     118
104636     90
60244      81
         ... 
145451      1
238526      1
116599      1
254971      1
178950      1
Name: count, Length: 834, dtype: int64

In [14]:
df.groupby("title").rating.mean() # calculamos un promedio de cada libro
u= df.groupby("title").rating.describe() # calculamos un a de cada libro
u #

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1984,22.0,9.090909,0.971454,7.0,9.00,9.0,10.0,10.0
1st to Die: A Novel,49.0,7.959184,1.606746,3.0,7.00,8.0,9.0,10.0
2nd Chance,38.0,7.947368,1.575957,4.0,7.00,8.0,9.0,10.0
4 Blondes,14.0,5.428571,1.741542,1.0,5.00,5.5,7.0,7.0
A Beautiful Mind: The Life of Mathematical Genius and Nobel Laureate John Nash,11.0,7.727273,1.678744,5.0,6.50,8.0,9.0,10.0
...,...,...,...,...,...,...,...,...
Without Remorse,13.0,8.307692,0.947331,7.0,8.00,8.0,9.0,10.0
Year of Wonders,22.0,8.636364,1.135801,6.0,8.00,9.0,9.0,10.0
You Belong To Me,11.0,7.727273,1.618080,5.0,6.50,8.0,8.5,10.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,14.0,8.000000,1.358732,6.0,7.00,8.0,9.0,10.0


In [15]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn.base import BaseEstimator, TransformerMixin

In [16]:
# # Datos originales

# data = {
#     'Peso (g)': [150, 120, 200, 300, 80, 50],
#     'Color de la piel': ['Rojo', 'Verde', 'Amarillo', 'Morado', 'Rojo', 'Verde'],
#     'Sabor predominante': ['Dulce', 'Ácido', 'Dulce', 'Amargo', 'Dulce', 'Ácido'],
#     'Temporada': ['Verano', 'Verano', 'Otoño', 'Invierno', 'Verano', 'Primavera'],
#     'Tamaño': ['Mediano', 'Pequeño', 'Grande', 'Grande', 'Pequeño', 'Pequeño'],
#     'Forma': ['Redonda', 'Ovalada', 'Redonda', 'Ovalada', 'Redonda', 'Alargada'],
#     'Región de cultivo': ['Trópico', 'Mediterráneo', 'Trópico', 'Trópico', 'Mediterráneo', 'Trópico'],
#     'Tipo de piel': ['Lisa', 'Rugosa', 'Lisa', 'Peluda', 'Rugosa', 'Lisa'],
#     'Tipo de semilla': ['Con semillas', 'Sin semillas', 'Con pocas semillas', 'Con muchas semillas', 'Sin semillas', 'Con semillas'],
#     'Metodo de consumo': ['Cruda', 'En jugo', 'Cruda', 'Cocida', 'En jugo', 'Cruda'],
#     'Altura (cm)': [7.5, 6.0, 15.0, 10.0, 5.0, 8.0],
#     'Ancho (cm)': [6.0, 4.0, 3.0, 5.0, 2.5, 3.5],
#     'Profundidad (cm)': [6.0, 4.0, 3.0, 5.0, 2.5, 3.5],
#     'Fruta': ['Manzana', 'Lima', 'Plátano', 'Uva', 'Fresa', 'Pepino']
# }
# df = pd.DataFrame(data)

# Aumentar el DataFrame sin usar la multiplicación directa
repeticiones = 2 # duplicamos nuestro df
df_aumentado = pd.concat([df] * repeticiones, ignore_index=True)
df = df_aumentado

# # Codificación ordinal para 'Tamaño'
# size_mapping = {'Pequeño': 1, 'Mediano': 2, 'Grande': 3}
# df['Tamaño'] = df['Tamaño'].map(size_mapping)

# Separar características y etiqueta
X = df.drop(columns='title') # title seria en este caso
y = df['title']

# Dividir los datos en entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Dividir el conjunto de entrenamiento en entrenamiento y validación
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)

# Definir las listas de características categóricas y numéricas
categorical_features = ['isbn', 'author']
numerical_features = ['user','rating']

# Crear una clase para embeddings personalizados
class CustomEmbeddingTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, embedding_dim=4):
        self.embedding_dim = embedding_dim

    def fit(self, X, y=None):
        self.categories_ = {}
        self.embeddings_ = {}

        for col in X.columns:
            categories = X[col].unique()
            self.categories_[col] = categories
            self.embeddings_[col] = np.random.rand(len(categories), self.embedding_dim)

        return self

    def transform(self, X):
        X_transformed = []
        for col in X.columns:
            embedding = self.embeddings_[col]
            cat_to_idx = {cat: idx for idx, cat in enumerate(self.categories_[col])}
            X_col = X[col].map(cat_to_idx).fillna(-1).astype(int)
            X_embedded = np.array([embedding[idx] if idx != -1 else np.zeros(self.embedding_dim) for idx in X_col])
            X_transformed.append(X_embedded)

        return np.hstack(X_transformed)

# Preprocesamiento para características categóricas y numéricas con imputación
numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')), # en este caso no hace falta la imputacion pero lo dejamos por costumbre
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('embedding', CustomEmbeddingTransformer(embedding_dim=10))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Crear el pipeline
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', KNeighborsClassifier(n_neighbors=5))
])

# Entrenar el pipeline con el conjunto de entrenamiento
pipeline.fit(X_train, y_train)

# Evaluar en el conjunto de validación
val_predictions = pipeline.predict(X_val)
print("Predicciones de validación:", val_predictions)
print("Valores correctos de validación:", list(y_val))

# Evaluar en el conjunto de prueba
test_predictions = pipeline.predict(X_test)
print("Predicciones de prueba:", test_predictions)
print("Valores correctos de prueba:", list(y_test))


Predicciones de validación: ['Good in Bed'
 "Chicken Soup for the Woman's Soul (Chicken Soup for the Soul Series (Paper))"
 'Empire Falls' ... 'The Secret Life of Bees'
 'A Child Called \\It\\": One Child\'s Courage to Survive"'
 'Tears of the Moon (Irish Trilogy)']
Valores correctos de validación: ['Good in Bed', "Chicken Soup for the Woman's Soul (Chicken Soup for the Soul Series (Paper))", 'Empire Falls', 'Life of Pi', 'Carolina Moon', 'Jemima J: A Novel About Ugly Ducklings and Swans', 'Neverwhere', 'Black House', 'The Loop', 'Hard Eight : A Stephanie Plum Novel (A Stephanie Plum Novel)', 'The Golden Compass (His Dark Materials, Book 1)', 'From a Buick 8 : A Novel', 'The Poisonwood Bible: A Novel', 'Girl, Interrupted', 'The Key to Midnight', 'Stranger in a Strange Land (Remembering Tomorrow)', 'SHIPPING NEWS', "Tom Clancy's Op-Center (Tom Clancy's Op Center (Paperback))", "Song of Solomon (Oprah's Book Club (Paperback))", 'While My Pretty One Sleeps', 'Fast Food Nation: The Dark Si

In [17]:
# creamos una funcion para facilitar la llamada de evaluacion categorica:

def evaluacion_categoricos(y_pred = val_predictions, y_pred_t = test_predictions, y_val = y_val, y_test = y_test):
  # y_pred = val_predictions
  # Calcular las métricas de evaluación
  print("----------------------------------------------------------------")
  print("Evaluacion - Val.")
  print("----------------------------------------------------------------")
  accuracy_ = accuracy_score(y_val, y_pred)
  precision_ = precision_score(y_val, y_pred, average='weighted')
  recall_ = recall_score(y_val, y_pred, average='weighted')
  f1_ = f1_score(y_val, y_pred, average='weighted')
  conf_matrix_ = confusion_matrix(y_val, y_pred)

  print(f"Exactitud: {accuracy_:.2f}")
  print(f"Precisión: {precision_:.2f}")
  print(f"Recall: {recall_:.2f}")
  print(f"F1-Score: {f1_:.2f}")
  print(f"Matriz de Confusión:\n{conf_matrix_}")
  print("----------------------------------------------------------------")

  print(" ")
  print(" ")
  print(" ")

  print("----------------------------------------------------------------")
  print("Prueba - Test")
  print("----------------------------------------------------------------")

  # y_pred_t = test_predictions
  # Calcular las métricas de evaluación
  accuracy = accuracy_score(y_test, y_pred_t)
  precision = precision_score(y_test, y_pred_t, average='weighted')
  recall = recall_score(y_test, y_pred_t, average='weighted')
  f1 = f1_score(y_test, y_pred_t, average='weighted')
  conf_matrix = confusion_matrix(y_test, y_pred_t)

  print(f"Exactitud: {accuracy:.2f}")
  print(f"Precisión: {precision:.2f}")
  print(f"Recall: {recall:.2f}")
  print(f"F1-Score: {f1:.2f}")
  print(f"Matriz de Confusión:\n{conf_matrix}")
  print("----------------------------------------------------------------")

  return accuracy, accuracy_, precision, precision_, recall, recall_, f1, f1_, conf_matrix, conf_matrix_

evaluacion_categoricos() # el bosque aleatorio predice bien


----------------------------------------------------------------
Evaluacion - Val.
----------------------------------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Exactitud: 0.86
Precisión: 0.88
Recall: 0.86
F1-Score: 0.86
Matriz de Confusión:
[[11  0  0 ...  0  0  0]
 [ 0 18  0 ...  0  0  0]
 [ 0  3  8 ...  0  0  0]
 ...
 [ 0  0  0 ...  1  0  0]
 [ 0  0  0 ...  0  4  0]
 [ 0  0  0 ...  0  0  5]]
----------------------------------------------------------------
 
 
 
----------------------------------------------------------------
Prueba - Test
----------------------------------------------------------------


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Exactitud: 0.87
Precisión: 0.89
Recall: 0.87
F1-Score: 0.87
Matriz de Confusión:
[[ 9  0  0 ...  0  0  0]
 [ 0 22  0 ...  0  0  0]
 [ 0  1 13 ...  0  0  0]
 ...
 [ 0  0  0 ...  2  0  0]
 [ 0  0  0 ...  0  4  0]
 [ 0  0  0 ...  0  0  6]]
----------------------------------------------------------------


(0.8713147410358566,
 0.8647140864714087,
 0.8901306532960194,
 0.8823490455166835,
 0.8713147410358566,
 0.8647140864714087,
 0.8668178373299571,
 0.8568870744861548,
 array([[ 9,  0,  0, ...,  0,  0,  0],
        [ 0, 22,  0, ...,  0,  0,  0],
        [ 0,  1, 13, ...,  0,  0,  0],
        ...,
        [ 0,  0,  0, ...,  2,  0,  0],
        [ 0,  0,  0, ...,  0,  4,  0],
        [ 0,  0,  0, ...,  0,  0,  6]]),
 array([[11,  0,  0, ...,  0,  0,  0],
        [ 0, 18,  0, ...,  0,  0,  0],
        [ 0,  3,  8, ...,  0,  0,  0],
        ...,
        [ 0,  0,  0, ...,  1,  0,  0],
        [ 0,  0,  0, ...,  0,  4,  0],
        [ 0,  0,  0, ...,  0,  0,  5]]))

In [18]:

df = df.drop_duplicates(["title", "user"])
piv = df.pivot(index='title', columns='user', values='rating').fillna(0)
matrix = piv.values
from sklearn.neighbors import NearestNeighbors
model_knn=NearestNeighbors(metric='cosine',algorithm='brute')
model_knn.fit(matrix)

# function to return recommended books - this will be tested
def get_recommends(book = ""):
  x=piv.loc[book].array.reshape(1, -1)
  distances,indices=model_knn.kneighbors(x,n_neighbors=6)
  R_books=[]
  for distance,indice in zip(distances[0],indices[0]):
    if distance!=0:
      R_book=piv.index[indice]
      R_books.append([R_book,distance])
  recommended_books=[book,R_books[::-1]]
  return recommended_books

In [19]:
get_recommends('The Queen of the Damned (Vampire Chronicles (Paperback))')

['The Queen of the Damned (Vampire Chronicles (Paperback))',
 [['Catch 22', 0.7939835419270879],
  ['The Witching Hour (Lives of the Mayfair Witches)', 0.7448657003312193],
  ['Interview with the Vampire', 0.7345068863988313],
  ['The Tale of the Body Thief (Vampire Chronicles (Paperback))',
   0.5376338446489461],
  ['The Vampire Lestat (Vampire Chronicles, Book II)', 0.5178411864186413]]]

In [20]:
books = get_recommends("Where the Heart Is (Oprah's Book Club (Paperback))")
print(books)

def test_book_recommendation():
  test_pass = True
  recommends = get_recommends("Where the Heart Is (Oprah's Book Club (Paperback))")
  if recommends[0] != "Where the Heart Is (Oprah's Book Club (Paperback))":
    test_pass = False
  recommended_books = ["I'll Be Seeing You", 'The Weight of Water', 'The Surgeon', 'I Know This Much Is True']
  recommended_books_dist = [0.8, 0.77, 0.77, 0.77]
  for i in range(2):
    if recommends[1][i][0] not in recommended_books:
      test_pass = False
    if abs(recommends[1][i][1] - recommended_books_dist[i]) >= 0.05:
      test_pass = False
  if test_pass:
    print("You passed the challenge! 🎉🎉🎉🎉🎉")
  else:
    print("You haven't passed yet. Keep trying!")

test_book_recommendation()

["Where the Heart Is (Oprah's Book Club (Paperback))", [["I'll Be Seeing You", 0.8016210581447822], ['The Weight of Water', 0.7708583572697411], ['The Surgeon', 0.7699410973804288], ['I Know This Much Is True', 0.7677075092617776], ['The Lovely Bones: A Novel', 0.7234864549790632], ["Where the Heart Is (Oprah's Book Club (Paperback))", 2.220446049250313e-16]]]
You passed the challenge! 🎉🎉🎉🎉🎉
