# Content-based filtering

In this notebook, we look through the code detailed in your trains around creating content-based recommender functions.

**NOTE**: the functions and most code in this notebook are exactly the same as what appears in the trains. All we have done here is broken down the functions into pieces like in the webinars to demonstrate how each piece fits together.

In [1]:
# Import our regular old heroes
import numpy as np
import pandas as pd
import scipy as sp # <-- The sister of Numpy, used in our code for numerical efficientcy.
import matplotlib.pyplot as plt
import seaborn as sns

# Entity featurization and similarity computation
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

# Libraries used during sorting procedures.
import operator # <-- Convienient item retrieval during iteration
import heapq # <-- Efficient sorting of large lists

# Imported for our sanity
import warnings
warnings.filterwarnings('ignore')

---

## Read in our data

We're making use of a dataset on books. The books dataframe contains information on each book in our system, while the book_ratings dataframe contains information on how users have rated the books in our system.

In [2]:
books = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/unsupervised_sprint/books_with_tags.csv')
books.head(3)

Unnamed: 0,id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url,tag_name
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...,to-read fantasy favorites currently-reading yo...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPrÃ©",1997.0,Harry Potter and the Philosopher's Stone,...,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...,to-read fantasy favorites currently-reading yo...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...,to-read fantasy favorites currently-reading yo...


In [3]:
book_ratings = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/unsupervised_sprint/book_ratings.csv')
book_ratings.head()

Unnamed: 0,user_id,book_id,title,rating
0,314,1,Harry Potter and the Half-Blood Prince (Harry ...,5
1,439,1,Harry Potter and the Half-Blood Prince (Harry ...,3
2,588,1,Harry Potter and the Half-Blood Prince (Harry ...,5
3,1169,1,Harry Potter and the Half-Blood Prince (Harry ...,4
4,1185,1,Harry Potter and the Half-Blood Prince (Harry ...,4


---

## Preparing for content-based filtering


We need to gather the properties of our books to inform our content-based filtering. Here, we are using the 'tag_name' column as well as the 'author' column as some of these properties. We join them for ease of engineering.

We select the two columns (authors and tag_name), fill any NaNs with and empty string to ensure NaNs do not show up in our new column, extract the values in these columns to a list, and then join them with spaces to form one long string containing the information from the author and tag_name columns in one.

In [4]:
books['auth_tags'] = (pd.Series(books[['authors', 'tag_name']]
                      .fillna('')
                      .values.tolist()).str.join(' '))

# Convienient indexes to map between book titles and indexes of
# the books dataframe
titles = books['title']
indices = pd.Series(books.index, index=books['title'])

In [5]:
books[['auth_tags']]

Unnamed: 0,auth_tags
0,Suzanne Collins to-read fantasy favorites curr...
1,"J.K. Rowling, Mary GrandPrÃ© to-read fantasy f..."
2,Stephenie Meyer to-read fantasy favorites curr...
3,Harper Lee to-read favorites currently-reading...
4,F. Scott Fitzgerald to-read favorites currentl...
...,...
9995,Ilona Andrews to-read fantasy favorites curren...
9996,Robert A. Caro to-read favorites currently-rea...
9997,Patrick O'Brian to-read favorites currently-re...
9998,Peggy Orenstein to-read favorites currently-re...


The indices variable we created is for easier mapping between book titles and their index number:

In [6]:
indices

title
The Hunger Games (The Hunger Games, #1)                                                         0
Harry Potter and the Sorcerer's Stone (Harry Potter, #1)                                        1
Twilight (Twilight, #1)                                                                         2
To Kill a Mockingbird                                                                           3
The Great Gatsby                                                                                4
                                                                                             ... 
Bayou Moon (The Edge, #2)                                                                    9995
Means of Ascent (The Years of Lyndon Johnson, #2)                                            9996
The Mauritius Command                                                                        9997
Cinderella Ate My Daughter: Dispatches from the Frontlines of the New Girlie-Girl Culture    9998
The First Worl

Now that we have a column that has our item properties, since they're all in text, we can use a **vectoriser** to transform this text into numerical features:

In [7]:
tf = TfidfVectorizer(analyzer='word', ngram_range=(1,2),
                     min_df=0, stop_words='english')

# Produce a feature matrix, where each row corresponds to a book,
# with TF-IDF features as columns
tf_authTags_matrix = tf.fit_transform(books['auth_tags'])

In [8]:
# Lets convert to a dataframe!
matrix = pd.DataFrame(tf_authTags_matrix.toarray())
#Add column names
feature_names = tf.get_feature_names_out()
matrix.columns = feature_names
# Let's have a look
matrix.head()

Unnamed: 0,00,00 04,00 100,00 batman,00 class,00 cookbooks,00 exploring,00 fall,00 graphic,00 poem,...,ｆａｖｏｒｉｔｅｓ games,ｆａｖｏｒｉｔｅｓ george,ｆａｖｏｒｉｔｅｓ gothic,ｆａｖｏｒｉｔｅｓ neuroscience,ｆａｖｏｒｉｔｅｓ siege,ｆａｖｏｕｒｉｔｅｓ,ｍａｎｇａ,ｍａｎｇａ berserk,ｓｅｒｉｅｓ,ｓｅｒｉｅｓ carpathian
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Across the columns, we have each feature detected by the vectoriser from the auth_tags column, and each row represents an individual book


Now we create our cosine similarity matrix:

In [9]:
cosine_sim_authTags = cosine_similarity(tf_authTags_matrix,
                                        tf_authTags_matrix)
print (cosine_sim_authTags.shape)

(10000, 10000)


In [10]:
# Let's make it a dataframe so we can see what it actually looks like
csm = pd.DataFrame(cosine_sim_authTags)
csm

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,9990,9991,9992,9993,9994,9995,9996,9997,9998,9999
0,1.000000,0.212463,0.256205,0.114150,0.114387,0.249485,0.203154,0.119615,0.109280,0.130216,...,0.033812,0.146470,0.057586,0.110632,0.043917,0.148963,0.023823,0.047303,0.073948,0.015780
1,0.212463,1.000000,0.203087,0.160923,0.146519,0.190962,0.312341,0.148391,0.091583,0.174332,...,0.038455,0.176826,0.072073,0.225264,0.061529,0.162518,0.034095,0.049348,0.066292,0.021737
2,0.256205,0.203087,1.000000,0.103265,0.116068,0.180759,0.193366,0.111773,0.090215,0.104739,...,0.029333,0.154701,0.042480,0.108201,0.046110,0.177293,0.026144,0.031318,0.039670,0.013780
3,0.114150,0.160923,0.103265,1.000000,0.563909,0.173063,0.227782,0.604673,0.124214,0.404918,...,0.071404,0.067189,0.089277,0.076868,0.340644,0.063104,0.058043,0.068342,0.070960,0.049731
4,0.114387,0.146519,0.116068,0.563909,1.000000,0.173083,0.241318,0.659947,0.107721,0.428209,...,0.058384,0.078974,0.086336,0.070534,0.362179,0.072018,0.054453,0.066320,0.089231,0.037761
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,0.148963,0.162518,0.177293,0.063104,0.072018,0.086049,0.164887,0.061697,0.071041,0.066880,...,0.027563,0.190161,0.051243,0.107689,0.032547,1.000000,0.020273,0.033368,0.054454,0.012850
9996,0.023823,0.034095,0.026144,0.058043,0.054453,0.028882,0.030144,0.052528,0.042371,0.042677,...,0.021783,0.031707,0.251315,0.019866,0.057195,0.020273,1.000000,0.031741,0.069983,0.230792
9997,0.047303,0.049348,0.031318,0.068342,0.066320,0.048778,0.051715,0.058977,0.060068,0.081321,...,0.027752,0.056432,0.070166,0.029702,0.099729,0.033368,0.031741,1.000000,0.025241,0.069674
9998,0.073948,0.066292,0.039670,0.070960,0.089231,0.090031,0.071252,0.077542,0.035413,0.092066,...,0.035704,0.059247,0.188314,0.048714,0.048709,0.054454,0.069983,0.025241,1.000000,0.069480


---

## Content based top-N recommendations

We start by by generating a top-N list of movies similar to one which we prompt the system with.    

This is done by:

  1. Select an initial movie to generate recommendations from.
  2. Extract all the similarity values between the initial item and each other item in the similarity matrix.
  3. Sort the resulting values in descending order (most similar to least similar).
  4. Select the top N similarity values, and return the corresponding movie details to the user

Here is the full function as detailed in the trains - but we will break it down into steps afterwards:

In [11]:
def content_generate_top_N_recommendations(book_title, N=10):

    # Convert the string book title to a numeric index for our
    # similarity matrix
    b_idx = indices[book_title]

    # Extract all similarity values computed with the reference book title
    sim_scores = list(enumerate(cosine_sim_authTags[b_idx]))

    # Sort the values, keeping a copy of the original index of each value
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Select the top-N values for recommendation
    sim_scores = sim_scores[1:N+1]

    # Collect indexes
    book_indices = [i[0] for i in sim_scores]

    # Convert the indexes back into titles
    return titles.iloc[book_indices]

Our first step is to select an initial movie to generate recommendations from. Simple enough, let's choose "The Hobbit".

Next, we need to extract all similarity values between our item and each other item in the similarity matrix. Remember what our similarity matrix looks like - along both the columns and the index, it has the indices of the books in our system. So first, we need to extract the index of our specific book - this is what we created the `indices` variable for!

`b_idx = indices[book_title]` gives us this index value. Below we do it with out specific book, getting the index output of `6`:

In [12]:
idx = indices["The Hobbit"]
idx

6

Now we can go ahead and use this index value to extract the similarity values from our similarity matrix. Since we have the index number, we can use this like we would when specifying a column from a dataframe: `cosine_sim_authTags[b_idx]`

We use the `enumerate` function to extract both the index of each book as well as the similarity value. We save these within a `list` (note this is quite a long output!):

`sim_scores = list(enumerate(cosine_sim_authTags[b_idx]))`



In [13]:
# Extract all similarity values computed with the reference book title
sim_scores = list(enumerate(cosine_sim_authTags[idx]))
sim_scores

[(0, 0.2031537045720106),
 (1, 0.31234052726054307),
 (2, 0.19336561517995215),
 (3, 0.22778203071555905),
 (4, 0.24131766027357032),
 (5, 0.12528167636992135),
 (6, 1.0000000000000002),
 (7, 0.22977504031630913),
 (8, 0.09812195750816734),
 (9, 0.26494905991488804),
 (10, 0.13135404635121847),
 (11, 0.21438815071913928),
 (12, 0.31343129805123027),
 (13, 0.2685646842551872),
 (14, 0.11305566194713222),
 (15, 0.08167184639176125),
 (16, 0.20376493685377928),
 (17, 0.29675551614686996),
 (18, 0.6257704973956681),
 (19, 0.19261184477174198),
 (20, 0.27119276665685094),
 (21, 0.0856654150229806),
 (22, 0.31891194873915363),
 (23, 0.3016031019299681),
 (24, 0.2936009747624131),
 (25, 0.07979606486671756),
 (26, 0.29153440054953883),
 (27, 0.2970748403182591),
 (28, 0.1432880875912021),
 (29, 0.07893814138150615),
 (30, 0.09313591463555532),
 (31, 0.21591721407402847),
 (32, 0.1852441128365345),
 (33, 0.07341472343680096),
 (34, 0.12154943338177071),
 (35, 0.2262922192706261),
 (36, 0.36890

Next, we want to accomplish step 3: Sort the resulting values in descending order (most similar to least similar).

We use python's `sorted` function on the `sim_scores` variable we just created to do so.

We specify that we want to sort based on the similarity value by using the `key` argument - the lambda function used is indicating we want to use the value in index position 1 --> remember in `sim_scores`, we have tuples of (book-index, similarity-score), and so the element we want to sort on is in index position 1!

`reverse = True` ensures we are sorting in reverse order:

In [14]:
# Sort the values, keeping a copy of the original index of each value
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

sim_scores

[(6, 1.0000000000000002),
 (188, 0.7199257463912604),
 (154, 0.6713283635913986),
 (160, 0.6509149314903538),
 (18, 0.6257704973956681),
 (610, 0.6095169076444306),
 (4975, 0.5672438980003054),
 (2308, 0.5480869887896085),
 (963, 0.5455644849244053),
 (465, 0.4679734533625209),
 (8271, 0.44015859107621946),
 (1366, 0.4299623140347748),
 (1321, 0.4118169630582569),
 (53, 0.4068092298111437),
 (367, 0.4040273907201017),
 (61, 0.4025719152450129),
 (479, 0.4012646313871812),
 (936, 0.3942230254375662),
 (331, 0.3837504280760113),
 (592, 0.3802084939788822),
 (36, 0.3689056660159267),
 (218, 0.36843386525358346),
 (1203, 0.3670989249645184),
 (116, 0.36665204806581186),
 (2124, 0.36511889876376175),
 (1692, 0.36163668625293577),
 (2953, 0.3612357957105976),
 (1258, 0.3605665695280754),
 (1349, 0.3591440688941397),
 (109, 0.3558986661690446),
 (2197, 0.3533907881419986),
 (1462, 0.35286656943630457),
 (3062, 0.3515059613484011),
 (816, 0.3507193337441393),
 (948, 0.35051998288418523),
 (193

For the final step, we need to select the top-N similarity values, and return the corresponding book details. We take a slice from our `sim_scores` variable, and in this case since we want the top 10, we're going to take from index 1 to index 11 (remember, the last index is excluded when slicing!). We start with index 1, because at index 0 will be the book with the highest similarity - which will be our reference book that we are looking for recommendations based on, as it will have a similarity score of 1 with itself!

We then extract the `book_indices` --> remember, we have tuples of (book-index, similarity-score), we now want the values in index position 0. `book_indices = [i[0] for i in sim_scores]`



In [15]:
    # Select the top-N values for recommendation
    sim_scores = sim_scores[1:11]

    # Collect indexes
    book_indices = [i[0] for i in sim_scores]
    book_indices

[188, 154, 160, 18, 610, 4975, 2308, 963, 465, 8271]

Finally we want to convert these indices back into the book titles - we use the `titles` variable we made earlier for this!:

In [16]:
titles.iloc[book_indices]

188     The Lord of the Rings (The Lord of the Rings, ...
154            The Two Towers (The Lord of the Rings, #2)
160     The Return of the King (The Lord of the Rings,...
18      The Fellowship of the Ring (The Lord of the Ri...
610              The Silmarillion (Middle-Earth Universe)
4975        Unfinished Tales of NÃºmenor and Middle-Earth
2308                               The Children of HÃºrin
963     J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...
465                             The Hobbit: Graphic Novel
8271                   The Complete Guide to Middle-Earth
Name: title, dtype: object

Now let's compare that output to what we actually get from the function:

In [17]:
content_generate_top_N_recommendations("The Hobbit", N=10)

188     The Lord of the Rings (The Lord of the Rings, ...
154            The Two Towers (The Lord of the Rings, #2)
160     The Return of the King (The Lord of the Rings,...
18      The Fellowship of the Ring (The Lord of the Ri...
610              The Silmarillion (Middle-Earth Universe)
4975        Unfinished Tales of NÃºmenor and Middle-Earth
2308                               The Children of HÃºrin
963     J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...
465                             The Hobbit: Graphic Novel
8271                   The Complete Guide to Middle-Earth
Name: title, dtype: object

---

## Content based ratings predictions

We can also predict ratings using the content based method. We can modify our content-based filtering algorithm to do this in the following manner:

1. Select a reference user from the database and a reference item they have not rated.
2. For the user, gather the similarity values between the reference item and each item the user has rated.
3. Sort the gathered similarity values in descending order (most similar to least similar).
4. Select the k highest similarity values which are above a given threshold value, creating a collection K
5. Compute a weighted average rating from these values, which is the sum of the similarity values of each item multiplied by its assigned user-rating, divided by the sum of the similarity values.

Here is the full function as detailed in the trains - but we will break it down into steps afterwards:

In [18]:
def content_generate_rating_estimate(book_title, user, rating_data, k=20, threshold=0.0):

    # Convert the book title to a numeric index for our
    # similarity matrix
    b_idx = indices[book_title]
    neighbors = [] # <-- Stores our collection of similarity values

    # Gather the similarity ratings between each book the user has rated
    # and the reference book
    for index, row in rating_data[rating_data['user_id']==user].iterrows():
        sim = cosine_sim_authTags[b_idx-1, indices[row['title']]-1]
        neighbors.append((sim, row['rating']))

    # Select the top-N values from our collection
    k_neighbors = heapq.nlargest(k, neighbors, key=lambda t: t[0])

    # Compute the weighted average using similarity scores and
    # user item ratings.
    simTotal, weightedSum = 0, 0
    for (simScore, rating) in k_neighbors:
        # Ensure that similarity ratings are above a given threshold
        if (simScore > threshold):
            simTotal += simScore
            weightedSum += simScore * rating
    try:
        predictedRating = weightedSum / simTotal
    except ZeroDivisionError:
        # Cold-start problem - No ratings given by user.
        # We use the average rating for the reference item as a proxy in this case
        predictedRating = np.mean(rating_data[rating_data['title']==book_title]['rating'])
    return predictedRating

The first thing we need to do, is select a reference user and a reference book that we want to make a prediction for. This comes in with the `book_title` and `user` defined when calling the function.

Our first line of the function makes use of the `indices` variable we created earlier, to retrieve the index of the reference book:

In [30]:
idx = indices["What to Expect the First Year (What to Expect)"]
idx

7682

Next, we want to gather the similarity values between this book and all the other books our specified user has rated. Note here we're using user 314. We create an empty list to store these values, and we are using our for-loop to loop through our `rating_data` dataframe - in our case this is the `book_ratings` dataframe. We filter this dataframe for our specific user to find the ratings our user has given other books, and collect similarity values from our similarity matrix for each of these books:

In [31]:
neighbors = [] # <-- Stores our collection of similarity values

# Gather the similarity ratings between each book the user has rated
# and the reference book
for index, row in book_ratings[book_ratings['user_id']==314].iterrows():
    sim = cosine_sim_authTags[idx-1, indices[row['title']]-1]
    neighbors.append((sim, row['rating']))


neighbors

[(0.04053971452213725, 5),
 (0.08641753662924306, 3),
 (0.08591020607111916, 4),
 (0.07343021763590994, 5),
 (0.029681348449349824, 3),
 (0.021592170346217594, 4),
 (0.03388298762602309, 4),
 (1.0, 3),
 (0.0898365785633652, 3),
 (0.1690929763330219, 4),
 (0.07282416708778255, 3),
 (0.09694529608321803, 4),
 (0.016686926264055765, 3),
 (0.05675058585403304, 4),
 (0.05006456095761311, 3),
 (0.22199155731376957, 3),
 (0.015004416267103596, 2)]

Now that we have the similarity values and their ratings from our user, we can then sort by similarity to get our values from most to least similar using heapq. We also specify `k` which is how many similar items we want (in this case, we've specified 20). We also specify the `key` which tells heapq which of the values in each tuple to sort by - by using index 0 (`t[0]`) we are telling heapq to sort by the first value in each tuple - in this case, the similarity value:

In [32]:
# Select the top-N values from our collection
k_neighbors = heapq.nlargest(20, neighbors, key=lambda t: t[0])

k_neighbors

[(1.0, 3),
 (0.22199155731376957, 3),
 (0.1690929763330219, 4),
 (0.09694529608321803, 4),
 (0.0898365785633652, 3),
 (0.08641753662924306, 3),
 (0.08591020607111916, 4),
 (0.07343021763590994, 5),
 (0.07282416708778255, 3),
 (0.05675058585403304, 4),
 (0.05006456095761311, 3),
 (0.04053971452213725, 5),
 (0.03388298762602309, 4),
 (0.029681348449349824, 3),
 (0.021592170346217594, 4),
 (0.016686926264055765, 3),
 (0.015004416267103596, 2)]

The next step is to compute a weighted average rating from these values. Below we are calculating the sum of similarities for our denominator (`simTotal`) and our weighted sum of ratings (`weightedSum`). Both these values start out at 0, and as we go through our `k_neighbors` list we add to these values.

We loop through each of the similarity score and rating tuples in our `k_neighbors` list firstly to see if our similarity score is above a certain threshold (if the books are not particularly similar, do we really want to include them in our rating predictions?). If our similarity score passes this threshold, we then increase our `simTotal` by the similarity score, and our `weightedSum` by our similarity score * the rating for that book

(see the equation below for weighted average to understand how these values go together)

In [33]:
# Compute the weighted average using similarity scores and
# user item ratings.
simTotal, weightedSum = 0, 0
for (simScore, rating) in k_neighbors:
    # Ensure that similarity ratings are above a given threshold
    if (simScore > 0.0):
        simTotal += simScore
        weightedSum += simScore * rating

In [34]:
simTotal

2.1606512460039626

In [35]:
weightedSum

7.159063408374511

$ weighted \ average = \frac{(sim_{A} * rating_{A}) + ... + (sim_{n} * rating_{n})}{sim_{A} + ... + sim_{n}} $

Now we've got our numerator and denominator, we can plug them into our formula. In the below code, we are trying to do just this!

However, we do have to account for the possibility of a `ZeroDivisionError` which would arise if our user has not rated anything else in our system (cold start problem!) In this case, we simply calculate the average rating of our reference book as our predicted rating:

In [36]:
try:
    predictedRating = weightedSum / simTotal
except ZeroDivisionError:
    # Cold-start problem - No ratings given by user.
    # We use the average rating for the reference item as a proxy in this case
    predictedRating = np.mean(book_ratings[book_ratings['title']=="What to Expect the First Year (What to Expect)"]['rating'])

predictedRating

3.3133822136334636

In [37]:
# Subset of ratings from user 314
book_ratings[book_ratings['user_id'] == 314][3:10]

Unnamed: 0,user_id,book_id,title,rating
401,314,6,Harry Potter and the Goblet of Fire (Harry Pot...,5
1500,314,29,The Mother Tongue: English and How It Got That...,3
1600,314,30,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...,4
1900,314,36,The Lord of the Rings: Weapons and Warfare,4
2300,314,98,What to Expect the First Year (What to Expect),3
2400,314,105,Chapterhouse: Dune (Dune Chronicles #6),3
2501,314,106,Dune Messiah (Dune Chronicles #2),4


Now let's compare that output to what we actually get from the function:

In [39]:
title = "What to Expect the First Year (What to Expect)"
actual_rating = book_ratings[(book_ratings['user_id'] == 314) & (book_ratings['title'] == title)]['rating'].values[0]
pred_rating = content_generate_rating_estimate(book_title=title, user=314, rating_data=book_ratings)
print (f"Title - {title}")
print ("---")
print (f"Actual rating: \t\t {actual_rating}")
print (f"Predicted rating: \t {pred_rating}")

Title - What to Expect the First Year (What to Expect)
---
Actual rating: 		 3
Predicted rating: 	 3.3133822136334636
