### Understanding Sparse Matrix

Sparse matrix is constructed by row, col, value format. One of the problem of sparse matrix is that it's hard to extract values from it as it contains a lot of zeros. To solve this problem, by converting sparse matrix into compressed sparse row, we can easily obtain the value required without going through huge number of zeros.

In [23]:
import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM
from lightfm.evaluation import precision_at_k

In [4]:
data = fetch_movielens(min_rating=4.0)

In [14]:
for key, value in data.items():
    print(key, type(value), value.shape)

train <class 'scipy.sparse.coo.coo_matrix'> (943, 1682)
test <class 'scipy.sparse.coo.coo_matrix'> (943, 1682)
item_features <class 'scipy.sparse.csr.csr_matrix'> (1682, 1682)
item_feature_labels <class 'numpy.ndarray'> (1682,)
item_labels <class 'numpy.ndarray'> (1682,)


In [17]:
model = LightFM(loss='warp')

In [18]:
model.fit(data['train'], epochs=30, num_threads=2)

<lightfm.lightfm.LightFM at 0x10cd1d790>

943 rows 1682 cols with 49906 values 
* each row represents a user
* each columns represent a movie with it's rating

1. convert to compressed sparse row
2. get user id and it's indices
3. use the indices to obtain movie names

In [62]:
data['train'].toarray()

array([[5., 0., 4., ..., 0., 0., 0.],
       [4., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [5., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 5., 0., ..., 0., 0., 0.]], dtype=float32)

User Id 1 with rated movie indices

In [64]:
data['train'].tocsr()[1].indices

array([  0,  13,  24,  99, 110, 126, 236, 241, 254, 256, 268, 271, 272,
       274, 275, 276, 278, 281, 282, 283, 284, 285, 292, 294, 298, 299,
       300, 301, 302, 303, 305, 309, 310, 312, 315], dtype=int32)

By using the indicies we can obtain the movie names from labels

In [60]:
data['item_feature_labels'][data['train'].tocsr()[1].indices]

array(['Toy Story (1995)', 'Postino, Il (1994)', 'Birdcage, The (1996)',
       'Fargo (1996)', 'Truth About Cats & Dogs, The (1996)',
       'Godfather, The (1972)', 'Jerry Maguire (1996)', 'Kolya (1996)',
       "My Best Friend's Wedding (1997)", 'Men in Black (1997)',
       'Full Monty, The (1997)', 'Good Will Hunting (1997)',
       'Heat (1995)', 'Sense and Sensibility (1995)',
       'Leaving Las Vegas (1995)', 'Restoration (1995)',
       'Once Upon a Time... When We Were Colored (1995)',
       'Time to Kill, A (1996)', 'Emma (1996)', 'Tin Cup (1996)',
       'Secrets & Lies (1996)', 'English Patient, The (1996)',
       'Donnie Brasco (1997)', 'Breakdown (1997)', 'Hoodlum (1997)',
       'Air Force One (1997)', 'In & Out (1997)',
       'L.A. Confidential (1997)', "Ulee's Gold (1997)",
       'Fly Away Home (1996)',
       'Mrs. Brown (Her Majesty, Mrs. Brown) (1997)',
       'Rainmaker, The (1997)', 'Wings of the Dove, The (1997)',
       'Titanic (1997)', 'As Good As It Get

In [61]:
data['item_labels'][data['train'].tocsr()[1].indices]

array(['Toy Story (1995)', 'Postino, Il (1994)', 'Birdcage, The (1996)',
       'Fargo (1996)', 'Truth About Cats & Dogs, The (1996)',
       'Godfather, The (1972)', 'Jerry Maguire (1996)', 'Kolya (1996)',
       "My Best Friend's Wedding (1997)", 'Men in Black (1997)',
       'Full Monty, The (1997)', 'Good Will Hunting (1997)',
       'Heat (1995)', 'Sense and Sensibility (1995)',
       'Leaving Las Vegas (1995)', 'Restoration (1995)',
       'Once Upon a Time... When We Were Colored (1995)',
       'Time to Kill, A (1996)', 'Emma (1996)', 'Tin Cup (1996)',
       'Secrets & Lies (1996)', 'English Patient, The (1996)',
       'Donnie Brasco (1997)', 'Breakdown (1997)', 'Hoodlum (1997)',
       'Air Force One (1997)', 'In & Out (1997)',
       'L.A. Confidential (1997)', "Ulee's Gold (1997)",
       'Fly Away Home (1996)',
       'Mrs. Brown (Her Majesty, Mrs. Brown) (1997)',
       'Rainmaker, The (1997)', 'Wings of the Dove, The (1997)',
       'Titanic (1997)', 'As Good As It Get

In [30]:
def sample_recommendation(model, data, user_ids):
    n_users, n_items = data['train'].shape
    
    for user_id in user_ids:
        
        known_positives = data['item_labels'][data['train'].tocsr()[user_id].indices]
        
        score = model.predict(user_id, np.arange(n_items))
        
        top_items = data['item_labels'][np.argsort(-score)]
        
        print("\nUser %s" % user_id)
        print("\nKnown positives: ")
        
        for x in known_positives[:3]:
            print("%s" % x)
        
        print("\nRecommended: ")
        
        for x in top_items[:3]:
            print("%s" % x)

In [31]:
sample_recommendation(model, data, [3,25,450])


User 3

Known positives: 
Seven (Se7en) (1995)
Contact (1997)
Starship Troopers (1997)

Recommended: 
Scream (1996)
Starship Troopers (1997)
Contact (1997)

User 25

Known positives: 
Dead Man Walking (1995)
Star Wars (1977)
Fargo (1996)

Recommended: 
Titanic (1997)
Contact (1997)
English Patient, The (1996)

User 450

Known positives: 
Contact (1997)
George of the Jungle (1997)
Event Horizon (1997)

Recommended: 
Scream (1996)
Kiss the Girls (1997)
Starship Troopers (1997)
