# Recommender Systems 2020/21

### Practice 4 - Cython for SLIM MSE


### Cython is a superset of Python, allowing you to use C-like operations and import C code. Cython files (.pyx) are compiled and support static typing.

### Why do we use it (or any other compiled language)? If the code is written properly it is fast... I mean, FAST

In [1]:
import time
import numpy as np

### Let's implement something simple

In [2]:
def isPrime(n):
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [3]:
print("Is prime 2? {}".format(isPrime(2)))
print("Is prime 3? {}".format(isPrime(3)))
print("Is prime 5? {}".format(isPrime(5)))
print("Is prime 15? {}".format(isPrime(15)))
print("Is prime 20? {}".format(isPrime(20)))

Is prime 2? True
Is prime 3? True
Is prime 5? True
Is prime 15? False
Is prime 20? False


In [4]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 10.08 sec


#### Load Cython magic command, this takes care of the compilation step. If you are writing code outside Jupyter you'll have to compile using other tools. See at the end of the notebook for details.

In [5]:
%load_ext Cython

#### Declare Cython function, paste the same code as before. The function will be compiled and then executed with a Python interface

In [6]:
%%cython
def isPrime(n):
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [7]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 5.11 sec


#### As you can see by just compiling the same code we got some improvement.
#### To go seriously higher, we have to use some static tiping

In [8]:
%%cython
# Declare the tipe of the arguments
def isPrime(long n):
    
    # Declare index of for loop
    cdef long i
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [9]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 0.22 sec


#### Cython code with two tipe declaration, for n and i, runs 50x faster than Python

#### Main benefits of Cython:
* Compiled, no interpreter
* Static typing, no overhead
* Fast loops, no need to vectorize. Vectorization sometimes performes lots of useless operations
* Numpy, which is fast in python, when opertions are not vectorizable often becomes slooooow compared to a carefully written Cython code

## SLIM MSE with Cython

#### Load the usual data.

In [10]:
from Notebooks_utils.data_splitter import train_test_holdout
from Data_manager.Movielens.Movielens10MReader import Movielens10MReader

data_reader = Movielens10MReader()
data_loaded = data_reader.load_data()

URM_all = data_loaded.get_URM_all()

URM_train, URM_test = train_test_holdout(URM_all, train_perc = 0.8)

Movielens10M: Verifying data consistency...
Movielens10M: Verifying data consistency... Passed!
DataReader: current dataset is: <class 'Data_manager.Dataset.Dataset'>
	Number of items: 10681
	Number of users: 69878
	Number of interactions in URM_all: 10000054
	Value range in URM_all: 0.50-5.00
	Interaction density: 1.34E-02
	Interactions per user:
		 Min: 2.00E+01
		 Avg: 1.43E+02
		 Max: 7.36E+03
	Interactions per item:
		 Min: 0.00E+00
		 Avg: 9.36E+02
		 Max: 3.49E+04
	Gini Index: 0.57

	ICM name: ICM_genres, Value range: 1.00 / 1.00, Num features: 20, feature occurrences: 21564, density 1.01E-01
	ICM name: ICM_tags, Value range: 1.00 / 69.00, Num features: 10217, feature occurrences: 108563, density 9.95E-04
	ICM name: ICM_all, Value range: 1.00 / 69.00, Num features: 10237, feature occurrences: 130127, density 1.19E-03




In [11]:
URM_train

<69878x10681 sparse matrix of type '<class 'numpy.float64'>'
	with 8000109 stored elements in Compressed Sparse Row format>

### What do we need for a SLIM MSE?

* Item-Item similarity matrix
* Computing prediction
* Update rule
* Training loop and some patience


In [12]:
n_users, n_items = URM_train.shape

## Step 1: We create a dense similarity matrix, initialized as zero

In [13]:
item_item_S = np.zeros((n_items, n_items), dtype = np.float)
item_item_S

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

## Step 2: We sample an interaction and compute the prediction of the current SLIM model

In [14]:
URM_train_coo = URM_train.tocoo()

sample_index = np.random.randint(URM_train_coo.nnz)
sample_index

2884300

In [15]:
user_id = URM_train_coo.row[sample_index]
item_id = URM_train_coo.col[sample_index]
true_rating = URM_train_coo.data[sample_index]

(user_id, item_id, true_rating)

(25240, 590, 3.0)

In [16]:
predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]
predicted_rating

0.0

#### The first predicted rating is zero, of course, the model is "empty"

### Step 3: We compute the prediction error and update the item-item similarity

In [17]:
prediction_error = true_rating - predicted_rating
prediction_error

3.0

### The error is positive, so we need to increase the prediction our model computes. Meaning, we have to increase the values in the item-item similarity matrix

### Which item similarities we modify? Only those we used to compute the prediction, i.e., only the items in the profile of the sampled user. 

In [18]:
items_in_user_profile = URM_train[user_id].indices
items_in_user_profile

array([   2,    7,   16,   17,   37,   82,   84,   91,   92,  176,  187,
        242,  252,  258,  311,  324,  368,  382,  396,  411,  438,  443,
        473,  489,  514,  578,  590,  596,  630,  635,  648,  650,  694,
        993, 1006, 1032, 1115, 1146, 1179, 1197, 1291, 1302, 1324, 1350,
       1665, 1667, 1668, 1682, 1698, 1704, 1738, 1802, 1818, 2276, 2726,
       3069, 3071, 3102, 3915, 5505])

#### Apply the update rule

In [19]:
item_item_S[items_in_user_profile,item_id]

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [20]:
learning_rate = 1e-4

item_item_S[items_in_user_profile,item_id] += learning_rate * prediction_error * true_rating

In [21]:
item_item_S[items_in_user_profile,item_id]

array([0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009,
       0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009,
       0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009,
       0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009,
       0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009,
       0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009,
       0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009,
       0.0009, 0.0009, 0.0009, 0.0009])

### Let's check what the new prediction for the same user-item interaction would be

In [22]:
predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]
predicted_rating

0.18629999999999994

### The value is not zero anymore, but higher, we are moving in the right direction

### And now? Sample another interaction and repeat... a lot of times

### Let's put all together in a training loop.

In [23]:
URM_train_coo = URM_train.tocoo()
item_item_S = np.zeros((n_items, n_items), dtype = np.float)

learning_rate = 1e-6

loss = 0.0
start_time = time.time()
for sample_num in range(100000):
    
    # Randomly pick sample
    sample_index = np.random.randint(URM_train_coo.nnz)

    user_id = URM_train_coo.row[sample_index]
    item_id = URM_train_coo.col[sample_index]
    true_rating = URM_train_coo.data[sample_index]

    # Compute prediction
    predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]
        
    # Compute prediction error, or gradient
    prediction_error = true_rating - predicted_rating
    loss += prediction_error**2
    
    # Update model, in this case the similarity
    items_in_user_profile = URM_train[user_id].indices
    item_item_S[items_in_user_profile,item_id] += learning_rate * prediction_error * true_rating
    
    # Print some stats
    if (sample_num +1)% 5000 == 0:
        elapsed_time = time.time() - start_time
        samples_per_second = (sample_num +1)/elapsed_time
        print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/(sample_num +1), samples_per_second))


Iteration 5000 in 2.05 seconds, loss is 13.42. Samples per second 2439.08
Iteration 10000 in 3.87 seconds, loss is 13.45. Samples per second 2584.19
Iteration 15000 in 5.75 seconds, loss is 13.43. Samples per second 2608.72
Iteration 20000 in 7.57 seconds, loss is 13.39. Samples per second 2641.99
Iteration 25000 in 9.40 seconds, loss is 13.36. Samples per second 2658.19
Iteration 30000 in 11.33 seconds, loss is 13.34. Samples per second 2647.90
Iteration 35000 in 13.51 seconds, loss is 13.28. Samples per second 2590.73
Iteration 40000 in 15.37 seconds, loss is 13.23. Samples per second 2602.49
Iteration 45000 in 17.23 seconds, loss is 13.21. Samples per second 2611.74
Iteration 50000 in 19.35 seconds, loss is 13.18. Samples per second 2584.06
Iteration 55000 in 21.78 seconds, loss is 13.16. Samples per second 2525.04
Iteration 60000 in 24.10 seconds, loss is 13.12. Samples per second 2489.17
Iteration 65000 in 26.18 seconds, loss is 13.08. Samples per second 2482.97
Iteration 70000 in

### What do we see? The loss oscillates over time, sometimes it goes down sometimes up.
### How long do we train such a model?

* An epoch: a complete loop over all the train data
* Usually you train for multiple epochs. Depending on the algorithm and data 10s or 100s of epochs.

In [24]:
def train_one_epoch(URM_train, item_item_S, learning_rate):
    
    URM_train_coo = URM_train.tocoo()

    loss = 0.0
    start_time = time.time()
    for sample_num in range(URM_train.nnz):

        # Randomly pick sample
        sample_index = np.random.randint(URM_train_coo.nnz)

        user_id = URM_train_coo.row[sample_index]
        item_id = URM_train_coo.col[sample_index]
        true_rating = URM_train_coo.data[sample_index]

        # Compute prediction
        predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]

        # Compute prediction error, or gradient
        prediction_error = true_rating - predicted_rating
        loss += prediction_error**2

        # Update model, in this case the similarity
        items_in_user_profile = URM_train[user_id].indices
        item_item_S[items_in_user_profile,item_id] += learning_rate * prediction_error * true_rating

        # Print some stats
        if (sample_num +1)% 5000 == 0:
            elapsed_time = time.time() - start_time
            samples_per_second = (sample_num +1)/elapsed_time
            print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/(sample_num +1), samples_per_second))
         
            # Stop training because this implementation is too slow
            print("\tStopping the epoch early because this implementation is too slow")
            return item_item_S
            
    return item_item_S

In [25]:
n_items = URM_train.shape[1]
learning_rate = 1e-6
    
item_item_S = np.zeros((n_items, n_items), dtype = np.float)

for n_epoch in range(5):
    item_item_S = train_one_epoch(URM_train, item_item_S, learning_rate)
    

Iteration 5000 in 2.20 seconds, loss is 13.40. Samples per second 2276.75
	Stopping the epoch early because this implementation is too slow
Iteration 5000 in 1.93 seconds, loss is 13.39. Samples per second 2590.73
	Stopping the epoch early because this implementation is too slow
Iteration 5000 in 2.72 seconds, loss is 13.47. Samples per second 1839.12
	Stopping the epoch early because this implementation is too slow
Iteration 5000 in 2.00 seconds, loss is 13.32. Samples per second 2500.57
	Stopping the epoch early because this implementation is too slow
Iteration 5000 in 2.09 seconds, loss is 13.28. Samples per second 2396.20
	Stopping the epoch early because this implementation is too slow


In [26]:
item_item_S

array([[4.29901665e-05, 1.28224374e-05, 1.28840515e-05, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [3.39901665e-05, 3.58974169e-04, 3.98450881e-05, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [2.49968500e-05, 1.21828256e-04, 2.95352769e-04, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       ...,
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])

### How do we use this similarity? As in a simple item-based KNN

### Let's estimate the training time. Say we train for 10 epochs and we have 8M interactions in the train data...

In [27]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 31511.69 seconds, or 525.19 minutes


### ... ehm, 10 hours
### Unacceptable!

### Let's rewrite the loop with some smarter use of the data structures. In particular:
* Use the indptr/indices data structures to get the seen items
* Not much else we can do with this tools

In [28]:
URM_train_coo = URM_train.tocoo()
item_item_S = np.zeros((n_items, n_items), dtype = np.float)

learning_rate = 1e-6

loss = 0.0
start_time = time.time()
for sample_num in range(100000):
    
    # Randomly pick sample
    sample_index = np.random.randint(URM_train_coo.nnz)

    user_id = URM_train_coo.row[sample_index]
    item_id = URM_train_coo.col[sample_index]
    true_rating = URM_train_coo.data[sample_index]

    # Compute prediction
    predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]
        
    # Compute prediction error, or gradient
    prediction_error = true_rating - predicted_rating
    loss += prediction_error**2
    
    # Update model, in this case the similarity
    items_in_user_profile = URM_train.indices[URM_train.indptr[user_id]:URM_train.indptr[user_id+1]]
    item_item_S[items_in_user_profile,item_id] += learning_rate * prediction_error * true_rating
    
    # Print some stats
    if (sample_num +1)% 5000 == 0:
        elapsed_time = time.time() - start_time
        samples_per_second = (sample_num +1)/elapsed_time
        print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/(sample_num +1), samples_per_second))


Iteration 5000 in 1.84 seconds, loss is 13.45. Samples per second 2724.26
Iteration 10000 in 3.36 seconds, loss is 13.39. Samples per second 2976.14
Iteration 15000 in 4.88 seconds, loss is 13.42. Samples per second 3073.76
Iteration 20000 in 6.44 seconds, loss is 13.38. Samples per second 3105.46
Iteration 25000 in 8.04 seconds, loss is 13.29. Samples per second 3109.43
Iteration 30000 in 9.56 seconds, loss is 13.26. Samples per second 3138.02
Iteration 35000 in 11.13 seconds, loss is 13.22. Samples per second 3144.66
Iteration 40000 in 12.68 seconds, loss is 13.21. Samples per second 3154.48
Iteration 45000 in 14.20 seconds, loss is 13.19. Samples per second 3168.95
Iteration 50000 in 15.73 seconds, loss is 13.17. Samples per second 3178.59
Iteration 55000 in 17.17 seconds, loss is 13.14. Samples per second 3203.22
Iteration 60000 in 18.60 seconds, loss is 13.11. Samples per second 3225.78
Iteration 65000 in 20.05 seconds, loss is 13.09. Samples per second 3242.65
Iteration 70000 in 

In [29]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 24176.97 seconds, or 402.95 minutes


### We now got 7 hours, just as bad as before

### Let's see what we can do with Cython
### First step, just compile it. We do not have the data at compile time, so we put the loop in a function

In [30]:
%%cython
import numpy as np
import time

def do_some_training(URM_train):
    
    URM_train_coo = URM_train.tocoo()
    n_items = URM_train.shape[1]
    
    item_item_S = np.zeros((n_items, n_items), dtype = np.float16)

    learning_rate = 1e-6

    loss = 0.0
    start_time = time.time()
    for sample_num in range(100000):

        # Randomly pick sample
        sample_index = np.random.randint(URM_train_coo.nnz)

        user_id = URM_train_coo.row[sample_index]
        item_id = URM_train_coo.col[sample_index]
        true_rating = URM_train_coo.data[sample_index]

        # Compute prediction
        predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]

        # Compute prediction error, or gradient
        prediction_error = true_rating - predicted_rating
        loss += prediction_error**2

        # Update model, in this case the similarity
        items_in_user_profile = URM_train.indices[URM_train.indptr[user_id]:URM_train.indptr[user_id+1]]
        item_item_S[items_in_user_profile,item_id] += learning_rate * prediction_error * true_rating

        # Print some stats
        if (sample_num +1)% 5000 == 0:
            elapsed_time = time.time() - start_time
            samples_per_second = (sample_num +1)/elapsed_time
            print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/(sample_num +1), samples_per_second))
    
    return loss, samples_per_second

In [31]:
loss, samples_per_second = do_some_training(URM_train)

Iteration 5000 in 1.44 seconds, loss is 13.34. Samples per second 3472.83
Iteration 10000 in 2.84 seconds, loss is 13.33. Samples per second 3521.38
Iteration 15000 in 4.25 seconds, loss is 13.30. Samples per second 3529.41
Iteration 20000 in 5.67 seconds, loss is 13.29. Samples per second 3527.35
Iteration 25000 in 7.10 seconds, loss is 13.27. Samples per second 3521.02
Iteration 30000 in 8.56 seconds, loss is 13.23. Samples per second 3504.63
Iteration 35000 in 10.04 seconds, loss is 13.21. Samples per second 3487.77
Iteration 40000 in 11.50 seconds, loss is 13.19. Samples per second 3478.21
Iteration 45000 in 12.99 seconds, loss is 13.18. Samples per second 3464.14
Iteration 50000 in 14.50 seconds, loss is 13.17. Samples per second 3448.25
Iteration 55000 in 16.08 seconds, loss is 13.15. Samples per second 3420.37
Iteration 60000 in 17.70 seconds, loss is 13.12. Samples per second 3389.80
Iteration 65000 in 19.24 seconds, loss is 13.08. Samples per second 3378.35
Iteration 70000 in 

In [32]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 24599.79 seconds, or 410.00 minutes


### Still far too time consuming
### The compiler is just porting in C all operations that the python interpreter would have to perform, dynamic tiping included. Have a look at the html reports in the Cython_examples folder

### Now try to add some types: If you use a variable only as a C object, use primitive tipes

* cdef int namevar
* cdef double namevar
* cdef float namevar
* cdef double[:] singledimensionarray
* cdef double[:,:] bidimensionalmatrix


### We now use types for all main variables


In [33]:
%%cython
import numpy as np
import time

def do_some_training(URM_train):

    URM_train_coo = URM_train.tocoo()
    n_items = URM_train.shape[1]

    cdef double[:,:] item_item_S = np.zeros((n_items, n_items), dtype = np.float)
    cdef double learning_rate = 1e-6
    cdef double loss = 0.0
    cdef long start_time = time.time()
    cdef double true_rating, predicted_rating, prediction_error
    cdef int[:] items_in_user_profile
    cdef int index, sample_num, user_id, item_id, 

    for sample_num in range(100000):

        # Randomly pick sample
        sample_index = np.random.randint(URM_train_coo.nnz)

        user_id = URM_train_coo.row[sample_index]
        item_id = URM_train_coo.col[sample_index]
        true_rating = URM_train_coo.data[sample_index]

        # Compute prediction
        items_in_user_profile = URM_train.indices[URM_train.indptr[user_id]:URM_train.indptr[user_id+1]]
        predicted_rating = 0.0

        for index in items_in_user_profile:
            predicted_rating += item_item_S[index,item_id]

        # Compute prediction error, or gradient
        prediction_error = true_rating - predicted_rating
        loss += prediction_error**2

        # Update model, in this case the similarity
        for index in items_in_user_profile:
            item_item_S[index,item_id] += learning_rate * prediction_error * true_rating

        # Print some stats
        if (sample_num +1)% 5000 == 0:
            elapsed_time = time.time() - start_time
            samples_per_second = (sample_num +1)/elapsed_time
            print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/(sample_num +1), samples_per_second))

    return loss, samples_per_second

In [34]:
loss, samples_per_second = do_some_training(URM_train)

Iteration 5000 in 1.55 seconds, loss is 13.51. Samples per second 3218.10
Iteration 10000 in 2.48 seconds, loss is 13.44. Samples per second 4025.87
Iteration 15000 in 3.40 seconds, loss is 13.42. Samples per second 4406.75
Iteration 20000 in 4.32 seconds, loss is 13.39. Samples per second 4625.47
Iteration 25000 in 5.30 seconds, loss is 13.38. Samples per second 4713.42
Iteration 30000 in 6.20 seconds, loss is 13.34. Samples per second 4835.60
Iteration 35000 in 7.14 seconds, loss is 13.34. Samples per second 4899.21
Iteration 40000 in 8.05 seconds, loss is 13.33. Samples per second 4966.49
Iteration 45000 in 8.94 seconds, loss is 13.32. Samples per second 5035.24
Iteration 50000 in 9.85 seconds, loss is 13.32. Samples per second 5076.87
Iteration 55000 in 10.82 seconds, loss is 13.32. Samples per second 5081.27
Iteration 60000 in 11.76 seconds, loss is 13.30. Samples per second 5100.33
Iteration 65000 in 12.74 seconds, loss is 13.29. Samples per second 5100.43
Iteration 70000 in 13.6

In [35]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 15307.25 seconds, or 255.12 minutes


### This is why we use it for machine learning algorithms...

### Some operations are still done with sparse matrices, those cannot be correctly optimized because the compiler does not know how what is the type of the data.

### To address this, we create typed arrays in which we put the URM_train data
####  For example, this operation: user_id = URM_train_coo.row[sample_index]
#### Becomes:
#### cdef int user_id
#### cdef int[:] URM_train_coo_row = URM_train_coo.row
#### user_id = URM_train_coo_row[sample_index]

### We can also skip the creation of the items_in_user_profile array and replace the np.random call with the faster native C function rand()

## To obtain a more reliable speed estimate we increase the number of samples and the print step by a factor of 10

In [36]:
%%cython

import numpy as np
import time

from libc.stdlib cimport rand, srand, RAND_MAX

def do_some_training(URM_train):

    URM_train_coo = URM_train.tocoo()
    cdef int n_items = URM_train.shape[1]
    cdef int n_interactions = URM_train.nnz
    cdef int[:] URM_train_row = URM_train_coo.row
    cdef int[:] URM_train_col = URM_train_coo.col
    cdef double[:] URM_train_data = URM_train_coo.data
    cdef int[:] URM_train_indices = URM_train.indices
    cdef int[:] URM_train_indptr = URM_train.indptr

    cdef double[:,:] item_item_S = np.zeros((n_items, n_items), dtype = np.float)
    cdef double learning_rate = 1e-6
    cdef double loss = 0.0
    cdef long start_time = time.time()
    cdef double true_rating, predicted_rating, prediction_error
    cdef int start_profile, end_profile
    cdef int index, sample_num, user_id, item_id, seen_item_id

    for sample_num in range(1000000):

        # Randomly pick sample
        index = rand() % n_interactions

        user_id = URM_train_row[index]
        item_id = URM_train_col[index]
        true_rating = URM_train_data[index]

        # Compute prediction
        start_profile = URM_train_indptr[user_id]
        end_profile = URM_train_indptr[user_id+1]
        predicted_rating = 0.0

        for index in range(start_profile, end_profile):
            seen_item_id = URM_train_indices[index]
            predicted_rating += item_item_S[seen_item_id,item_id]

        # Compute prediction error, or gradient
        prediction_error = true_rating - predicted_rating
        loss += prediction_error**2

        # Update model, in this case the similarity
        for index in range(start_profile, end_profile):
            seen_item_id = URM_train_indices[index]
            item_item_S[seen_item_id,item_id] += learning_rate * prediction_error * true_rating

        # Print some stats
        if (sample_num +1)% 50000 == 0:
            elapsed_time = time.time() - start_time
            samples_per_second = (sample_num +1)/elapsed_time
            print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/(sample_num +1), samples_per_second))

    return loss, samples_per_second

In [37]:
loss, samples_per_second = do_some_training(URM_train)

Iteration 50000 in 0.75 seconds, loss is 13.82. Samples per second 66983.90
Iteration 100000 in 1.18 seconds, loss is 13.72. Samples per second 84480.08
Iteration 150000 in 1.64 seconds, loss is 13.61. Samples per second 91516.04
Iteration 200000 in 2.07 seconds, loss is 13.52. Samples per second 96431.28
Iteration 250000 in 2.51 seconds, loss is 13.41. Samples per second 99442.63
Iteration 300000 in 2.97 seconds, loss is 13.31. Samples per second 100875.72
Iteration 350000 in 3.41 seconds, loss is 13.21. Samples per second 102527.93
Iteration 400000 in 3.85 seconds, loss is 13.11. Samples per second 103789.18
Iteration 450000 in 4.29 seconds, loss is 13.01. Samples per second 104798.16
Iteration 500000 in 4.73 seconds, loss is 12.92. Samples per second 105619.96
Iteration 550000 in 5.18 seconds, loss is 12.82. Samples per second 106102.96
Iteration 600000 in 5.62 seconds, loss is 12.73. Samples per second 106684.94
Iteration 650000 in 6.05 seconds, loss is 12.64. Samples per second 10

In [38]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 733.11 seconds, or 12.22 minutes


### As the source code gets less readable due to the addition of types and native C functions, it also gets remarkably faster

### We started from a naive python implementation which took 10 hours (2k samples per second) and we now have a Cython one with takes 12 minutes (100k samples per second).

In [39]:
%%cython

import numpy as np
import time

from libc.stdlib cimport rand, srand, RAND_MAX

def train_multiple_epochs(URM_train, learning_rate_input, n_epochs):

    URM_train_coo = URM_train.tocoo()
    cdef int n_items = URM_train.shape[1]
    cdef int n_interactions = URM_train.nnz
    cdef int[:] URM_train_row = URM_train_coo.row
    cdef int[:] URM_train_col = URM_train_coo.col
    cdef double[:] URM_train_data = URM_train_coo.data
    cdef int[:] URM_train_indices = URM_train.indices
    cdef int[:] URM_train_indptr = URM_train.indptr

    cdef double[:,:] item_item_S = np.zeros((n_items, n_items), dtype = np.float)
    cdef double learning_rate = learning_rate_input
    cdef double loss = 0.0
    cdef long start_time
    cdef double true_rating, predicted_rating, prediction_error
    cdef int start_profile, end_profile
    cdef int index, sample_num, user_id, item_id, seen_item_id
    
    for n_epoch in range(n_epochs):
        
        loss = 0.0
        start_time = time.time()
        
        for sample_num in range(n_interactions):

            # Randomly pick sample
            index = rand() % n_interactions

            user_id = URM_train_row[index]
            item_id = URM_train_col[index]
            true_rating = URM_train_data[index]

            # Compute prediction
            start_profile = URM_train_indptr[user_id]
            end_profile = URM_train_indptr[user_id+1]
            predicted_rating = 0.0

            for index in range(start_profile, end_profile):
                seen_item_id = URM_train_indices[index]
                predicted_rating += item_item_S[seen_item_id,item_id]

            # Compute prediction error, or gradient
            prediction_error = true_rating - predicted_rating
            loss += prediction_error**2

            # Update model, in this case the similarity
            for index in range(start_profile, end_profile):
                seen_item_id = URM_train_indices[index]
                item_item_S[seen_item_id,item_id] += learning_rate * prediction_error * true_rating

#             # Print some stats
#             if (sample_num +1)% 1000000 == 0:
#                 elapsed_time = time.time() - start_time
#                 samples_per_second = (sample_num+1)/elapsed_time
#                 print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/(sample_num+1), samples_per_second))

            
        elapsed_time = time.time() - start_time
        samples_per_second = (sample_num+1)/elapsed_time
     
        print("Epoch {} complete in in {:.2f} seconds, loss is {:.3E}. Samples per second {:.2f}".format(n_epoch+1, time.time() - start_time, loss/(sample_num+1), samples_per_second))

    return np.array(item_item_S), loss/(sample_num+1), samples_per_second

In [40]:
n_items = URM_train.shape[1]
learning_rate = 1e-4
    
item_item_S, loss, samples_per_second = train_multiple_epochs(URM_train, learning_rate, 10)

Epoch 1 complete in in 75.85 seconds, loss is 3.339E-01. Samples per second 105467.71
Epoch 2 complete in in 77.28 seconds, loss is 7.267E-03. Samples per second 103523.93
Epoch 3 complete in in 78.89 seconds, loss is 2.116E-03. Samples per second 101405.04
Epoch 4 complete in in 78.21 seconds, loss is 9.991E-04. Samples per second 102285.16
Epoch 5 complete in in 75.77 seconds, loss is 5.464E-04. Samples per second 105578.95
Epoch 6 complete in in 76.28 seconds, loss is 3.353E-04. Samples per second 104873.28
Epoch 7 complete in in 82.32 seconds, loss is 2.155E-04. Samples per second 97182.82
Epoch 8 complete in in 75.70 seconds, loss is 1.471E-04. Samples per second 105680.85
Epoch 9 complete in in 76.46 seconds, loss is 1.036E-04. Samples per second 104627.58
Epoch 10 complete in in 76.01 seconds, loss is 7.159E-05. Samples per second 105249.71


In [41]:
item_item_S

array([[0.22772382, 0.23774737, 0.2735745 , ..., 0.        , 0.        ,
        0.        ],
       [0.22492054, 0.55480576, 0.14784503, ..., 0.        , 0.        ,
        0.        ],
       [0.22588263, 0.4646729 , 0.73793833, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

### Note how the loss decreased?

## How to use Cython outside a notebook

### Step1: Create a .pyx file and write your code

### Step2: Create a compilation script "compileCython.py" with the following content

In [None]:
# This code will not run in a notebook cell

try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension


from Cython.Distutils import build_ext
import numpy
import sys
import re


if len(sys.argv) != 4:
    raise ValueError("Wrong number of paramethers received. Expected 4, got {}".format(sys.argv))


# Get the name of the file to compile
fileToCompile = sys.argv[1]

# Remove the argument from sys argv in order for it to contain only what setup needs
del sys.argv[1]

extensionName = re.sub("\.pyx", "", fileToCompile)


ext_modules = Extension(extensionName,
                [fileToCompile],
                extra_compile_args=['-O3'],
                include_dirs=[numpy.get_include(),],
                )

setup(
    cmdclass={'build_ext': build_ext},
    ext_modules=[ext_modules]
)


### Step3: Compile your code with the following command 

python compileCython.py Cython_examples\SLIM_MSE_fastest.pyx build_ext --inplace

### Step4: Generate cython report and look for "yellow lines". The report is an .html file which represents how many operations are necessary to translate each python operation in cython code. If a line is white, it has a direct C translation. If it is yellow it will require many indirect steps that will slow down execution. Some of those steps may be inevitable, some may be removed via static typing.

### IMPORTANT: white does not mean fast!! If a system call is involved that part might be slow anyway.

cython -a Cython_examples\SLIM_MSE_fastest.pyx

### Step5: Add static types and C functions to remove "yellow" lines.

#### If you use a variable only as a C object, use primitive tipes 
cdef int namevar

def double namevar

cdef float namevar

#### If you call a function only within C code, use a specific declaration "cdef"

cdef function_name(self, int param1, double param2):
...



## Step6: Iterate step 4 and 5 until you are satisfied with how clean your code is, then compile. An example of non optimized code can be found in the source folder of this notebook with the _SLOW suffix

## Step7: the compilation generates a file wose name is something like "SLIM_MSE_fastest.cp36-win_amd64.pyd" and tells you the source file, the architecture it is compiled for and the OS

## Step8: Import and use the compiled file as if it were any python object, function or class

In [None]:
from Cython_examples.SLIM_MSE_fastest import train_multiple_epochs

train_multiple_epochs(URM_train, 1e-3, 1)

# A few warnings on ML algorithms

### - Why do we bother with KNNs if we have ML?
#### Because sometimes ML algorithms work better than heuristic ones, sometimes they do not

### - ML algorithms are always best because they learn from the data
#### Not really... Yes they learn from the data but the data is sometimes too noisy, too sparse and does not yeld good results. There is plenty of examples of cases where ML algorithms are not the best choice.

### - We should use this complex model because it can in theory approximate any function!!
#### Theory is important but... does it work in practice? Often complex modes are veeeery difficult to train and you need to use lots of tricks: adaptive gradients, data normalization, careful initialization and crafted batches...

### - I have trained my model for 2 epochs but the result is not great
#### Have you just used the default learning rate or have you optimized it? Why did you stop after 2 epochs? You may need 100s of epochs

### - If I select a high learning rate (maybe 1e-3) after 5 epochs I get a result wich is not very good, if I use a lower learning rate (maybe 1e-6) the result is much worse
#### Of course, the lower the learning rate the slower the training process but the best solution you may find. Again, you may need 100s of epochs

### - Training and optimizing the hyperparameters of this ML model takes several hours, what am I doing wrong?
#### Probably nothing, ML is computationally expensive and takes time... A few hours is a normal timespan. Sometimes the end result is still not satisfactory.