# CF book rec for user with context

This notebook demonstrates predicting rating of books for a user with records using goodreads-10k.

We have seen that for UBCF (user-based collaborative filtering), adjusted cosine similarity as a model for has good CV score (for RMSE, MAE). This notebook's model is then based on adjusted cosine.

This notebook implements model to generate top book ids for users.

**Note: The indices for vectors, matrices are the same as the `book_id, user_id` in this dataset.**

In [1]:
# you might need these to define your own rating vectors
import numpy as np
import scipy as sp
# the following imports the model needed
from cf_model import cf_model
# from cf-model import 
# !readlink -f . # this reads filepath to the directory holding this notebook.

In [2]:
# define your path to data directory here
fp = "/home/zebalgebra/School/DVA/The-Last-Book-Bender/Data/Raw/"
fname = "ratings_for_cf.npz"

if you don't have the file `ratings_for_cf.npz`, execute this block to generate one (this assumes that you have the `ratings.csv` file from [this link](https://github.com/malcolmosh/goodbooks-10k-extended/blob/master/ratings.csv)). this block basically reads raw csv, center each rating, and shift a little to give more useful recommendations. if you don't have the file `ratings.csv`, you can download it at [this link](https://github.com/malcolmosh/goodbooks-10k-extended/blob/master/ratings.csv), or simply execute the line with the url; note that this will take a fair amount of time (about 5-6x amount of time on my machine and internet speed).

In [3]:
%%time
import pandas as pd
df = pd.read_csv(fp + "ratings.csv")
# df = pd.read_csv("https://raw.githubusercontent.com/malcolmosh/goodbooks-10k-extended/master/ratings.csv")

# this shifts ratings by mean
mean = df.groupby("user_id").agg({"rating": "mean"}).rename(columns={"rating": "mean"})
df = df.merge(mean, on="user_id")
df["rating"] = df["rating"] - df["mean"] + 10 ** (-8)

# generate and save csc matrix
mat = sp.sparse.csc_matrix(
    (
        np.array(df["rating"]),
        (
            np.array(df["book_id"]),
            np.array(df["user_id"])
        )
    )
)
with open(fp + "ratings_for_cf.npz", "wb") as f:
    sp.sparse.save_npz(f, mat)

CPU times: user 4.42 s, sys: 1.29 s, total: 5.71 s
Wall time: 4.91 s


## Usage Demonstration
Initialize model with the processed ratings matrix, specified by filepath to directory and filename.

In [3]:
# fp = "/home/zebalgebra/School/DVA/The-Last-Book-Bender/Data/Raw/"
# fname = "ratings_for_cf.npz"
# should take about 0.5s to load
%%time
model = cf_model(fp, fname)

CPU times: user 460 ms, sys: 996 ms, total: 1.46 s
Wall time: 411 ms


You can define context as:
1. A dictionary of book_id: value.
2. A list or tuple of pairs (book_id, value) or [book_id, value].
3. An numpy vector with v[book_id]=value.
4. A scipy sparse vector.
5. An integer, which would betreated as an user_id.

To get top recommendations, you need to specify how many neighbors to use (this is the value of `k` to pass in), how many recommendations to generate (this is the value of `m` to pass in).

Say your user rated book id 1 with 1, book id 2 also with 1. You can generate book recs as follows:

In [11]:
%%time
context_dict = {1: 1, 2: 1}
context_t_t = ((1, 1), (2, 1))
context_l_t = [(1, 1), (2, 1)]
context_t_l = ([1, 1], [2, 1])
context_l_l = [[1, 1], [2, 1]]
context_np = np.array([0, 1, 1, 0, 0])
context_sp_csc = sp.sparse.csc_matrix(np.array([0, 1, 1, 0, 0]))
context_sp_csr = sp.sparse.csr_matrix(np.array([0, 1, 1, 0, 0]))
context_sp_coo = sp.sparse.coo_matrix(np.array([0, 1, 1, 0, 0]))
context_int = 4

context = context_int

model.get_top_m_recs_k_neighbors(context, k=100, m=10)

CPU times: user 35.6 ms, sys: 0 ns, total: 35.6 ms
Wall time: 34.4 ms


[(2.2575757675757577, 6341),
 (2.0952381052380953, 3224),
 (2.00000001, 2157),
 (2.00000001, 1969),
 (1.9342067276600015, 5778),
 (1.7640449538202247, 7500),
 (1.7640449538202247, 3964),
 (1.6859504232231404, 8916),
 (1.6859504232231404, 4270),
 (1.6859504232231404, 2017)]

This says for example that the book with id 681 gives a difference in rating of 2.124 to the user's baseline.