# Demonstration of ``recommendx``

#### Updated: May 2020

This Jupyter notebook demonstrates the basics of using the ``recommendx`` Python package for explicit users ratings data. This guide assumes that you have already installed ``recommendx``. The easiest way to install is by using ``pip``:

```python
pip install recommendx
```

The data used in this example is a small, non-random sample from the [MovieLens](https://grouplens.org/datasets/movielens/) dataset.

We begin by importing the Python packages needed in this notebook.

In [1]:
import pandas as pd
import numpy as np
from recommendx import RWR
from recommendx import RWT

## Data
The data used in this demonstration are provided alongside this notebook. This should enable anyone to replicate the steps in this notebook. We will use 2 csv files:
1. **ratingsdat.csv** - user-item ratings data
2. **itemdat.csv** - observable item attributes

We can load the ratings data and view its layout.

In [2]:
ratings = pd.read_csv('ratingsdat.csv')
ratings.head(10)

Unnamed: 0,userId,title,rating,AMPM
0,A,Ace Ventura,4.0,PM
1,A,Clueless,4.0,AM
2,A,Die Hard,4.0,PM
3,A,Iron Man,3.5,PM
4,A,The Shining,4.0,PM
5,B,Ace Ventura,3.5,PM
6,B,Die Hard,4.0,AM
7,B,Iron Man,4.5,AM
8,B,The Shining,3.5,AM
9,B,Get Out,4.5,PM


In [3]:
print(ratings.shape)
print(ratings['userId'].unique())
print(ratings['title'].unique())

(47, 4)
['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J']
['Ace Ventura' 'Clueless' 'Die Hard' 'Iron Man' 'The Shining' 'Get Out']


This simple dataset contains 47 ratings. There are 10 different unique users in the data ('A',...,'J') and 6 different items ('Ace Ventura',...,'Get Out'). Though not shown, there are two unique time periods ('AM','PM').

Next, we load the observable attribute data.

In [4]:
itemdat = pd.read_csv('itemdat.csv')
itemdat

Unnamed: 0,title,comedy,horror
0,Ace Ventura,1,0
1,Clueless,1,0
2,Die Hard,0,0
3,Get Out,0,1
4,Iron Man,0,0
5,The Shining,0,1


In this dataframe, note that each item is a row. There are two observed item attributes, which are simply genre indicators in this case. 

``recommendx`` requires that data inputs be NumPy arrays so our final step will be to convert the dataframes to arrays.

In [5]:
dat = np.array(ratings)
att = np.array(itemdat)

## Recommendation with Regressors (``RWR``)

We will create an instance of the ``RWR`` class and then fit it using the example data from above. Because we have so few ratings in this simple dataset, I will specify that our recommendation model uses only 3 latent attributes.

In [6]:
rwr = RWR(n_factors = 3)
rwr.fit(dat,att)

Our estimated model parameters are attributes of our instance. We can access them as follows: ``rwr.intercept_``, ``rwr.bu``,``rwr.B``,``rwr.alpha_``,and ``rwr.Z``. Please see the documentation for more details.

We can view the estimated user-specific coefficents for the item observable attributes as follows:

In [7]:
rwr.B

array([[ 0.09850014,  0.05449469],
       [-0.05973401,  0.0669737 ],
       [-0.28011176,  0.37422123],
       [ 0.19432835, -0.02196002],
       [-0.40419099, -0.06377156],
       [-0.3017002 ,  0.26244235],
       [-0.39822518,  0.10037626],
       [-0.16719338,  0.15965804],
       [-0.27780916, -0.08273066],
       [-0.21058007,  0.38930298]])

We may also want to view the model's mean squared prediction error.

In [8]:
rwr.accuracy(dat,att)

array([0.52608556])

Finally, we can use the ``predict()`` method to predict the ratings for any user-item pair. As an example, user 'A' did not rate the movie 'Get Out'. We can predict this rating as follows:

In [9]:
rwr.predict('A','Get Out')

array([3.86479523])

Finally, note that ``RWR`` will perform traditional SVD if no item attributes are provided. This is accomplished by omitting the array of observed attributes when we use ``fit()``.

In [10]:
rwr2 = RWR(n_factors = 3)
rwr2.fit(dat)
rwr2.accuracy(dat)

array([0.69681959])

The mean squared prediction error is lower (in this example) when observed item attributes are included, although I make no claim that this is universally true.

## Recommendation with Time (``RWT``)

We can use the same data, but we will allow user taste parameters to vary according to time ('AM' or 'PM'). Again, I will specify 3 latent item attributes. The syntax is virtually identical to ``RWR``.

In [11]:
rwt = RWT(n_factors = 3)
rwt.fit(dat,att)
rwt.accuracy(dat,att)

array([0.38625324])

In terms of our model attributes, we can most easily see the difference by viewing the shape of our two user preference attributes.

In [12]:
print(rwt.B.shape)
print(rwt.alpha_.shape)

(2, 10, 2)
(2, 10, 3)


These are now 3-dimensional arrays. To view these coefficients, it is helpful to know how ``RWT`` has assigned the values 'AM' and 'PM'. Fortunately, we have a time "dictionary" that can help.

In [13]:
rwt.timedict

array([['AM', 0],
       ['PM', 1]], dtype=object)

If we would like to view the $\beta_u$ coefficients for 'AM', we can do so as follows:

In [14]:
rwt.B[0,:,:]

array([[ 0.07555079, -0.00514984],
       [ 0.0048872 , -0.14826482],
       [-0.43450748, -0.01598492],
       [ 0.00579544, -0.0158765 ],
       [-0.65411126,  0.00249884],
       [-0.53609518,  0.44413453],
       [-0.76642724,  0.21717155],
       [-0.31177538,  0.27481313],
       [-0.13719698, -0.13555624],
       [-0.42594556,  0.34121468]])

Finally, prediction now requires a specified time period.

In [15]:
rwt.predict('A','Get Out','AM')

array([3.83959932])

In [16]:
rwt.predict('A','Get Out','PM')

array([3.92030306])