# Recommender Systems 2020/21

### Practice 4 - Cython for machine learning


### Cython is a superset of Python, allowing you to use C-like operations and import C code. Cython files (.pyx) are compiled and support static typing.

### Why do we use it (or any other compiled language)? If the code is written properly it is fast... I mean, FAST

In [57]:
import time
import numpy as np

### Let's implement something simple

In [58]:
def isPrime(n):
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [59]:
print("Is prime 2? {}".format(isPrime(2)))
print("Is prime 3? {}".format(isPrime(3)))
print("Is prime 5? {}".format(isPrime(5)))
print("Is prime 15? {}".format(isPrime(15)))
print("Is prime 20? {}".format(isPrime(20)))

Is prime 2? True
Is prime 3? True
Is prime 5? True
Is prime 15? False
Is prime 20? False


In [60]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 8.56 sec


#### Load Cython magic command, this takes care of the compilation step. If you are writing code outside Jupyter you'll have to compile using other tools. See at the end of the notebook for details.

In [96]:
%load_ext Cython

The Cython extension is already loaded. To reload it, use:
  %reload_ext Cython


#### Declare Cython function, paste the same code as before. The function will be compiled and then executed with a Python interface

In [62]:
%%cython
def isPrime(n):
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [63]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 4.17 sec


#### As you can see by just compiling the same code we got some improvement.
#### To go seriously higher, we have to use some static tiping

In [8]:
%%cython
# Declare the tipe of the arguments
def isPrime(long n):
    
    # Declare index of for loop
    cdef long i
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [9]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 0.25 sec


#### Cython code with two tipe declaration, for n and i, runs 50x faster than Python

#### Main benefits of Cython:
* Compiled, no interpreter
* Static typing, no overhead
* Fast loops, no need to vectorize. Vectorization sometimes performes lots of useless operations
* Numpy, which is fast in python, when opertions are not vectorizable often becomes slooooow compared to a carefully written Cython code

## SLIM MSE with Cython

#### Load the usual data.

In [126]:
from Notebooks_utils.data_splitter import train_test_holdout
from Data_manager.Movielens.Movielens10MReader import Movielens10MReader

data_reader = Movielens10MReader()
data_loaded = data_reader.load_data()

URM_all = data_loaded.get_URM_all()

URM_train, URM_test = train_test_holdout(URM_all, train_perc = 0.8)

Movielens10M: Verifying data consistency...
Movielens10M: Verifying data consistency... Passed!
DataReader: current dataset is: <class 'Data_manager.Dataset.Dataset'>
	Number of items: 10681
	Number of users: 69878
	Number of interactions in URM_all: 10000054
	Value range in URM_all: 0.50-5.00
	Interaction density: 1.34E-02
	Interactions per user:
		 Min: 2.00E+01
		 Avg: 1.43E+02
		 Max: 7.36E+03
	Interactions per item:
		 Min: 0.00E+00
		 Avg: 9.36E+02
		 Max: 3.49E+04
	Gini Index: 0.57

	ICM name: ICM_genres, Value range: 1.00 / 1.00, Num features: 20, feature occurrences: 21564, density 1.01E-01
	ICM name: ICM_tags, Value range: 1.00 / 69.00, Num features: 10217, feature occurrences: 108563, density 9.95E-04
	ICM name: ICM_all, Value range: 1.00 / 69.00, Num features: 10237, feature occurrences: 130127, density 1.19E-03




In [127]:
URM_train

<69878x10681 sparse matrix of type '<class 'numpy.float64'>'
	with 7999180 stored elements in Compressed Sparse Row format>

### What do we need for a SLIM MSE?

* Item-Item similarity matrix
* Computing prediction
* Update rule
* Training loop and some patience


In [128]:
n_users, n_items = URM_train.shape

## Step 1: We create a dense similarity matrix, initialized as zero

In [129]:
item_item_S = np.zeros((n_items, n_items), dtype = np.float)
item_item_S

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

## Step 2: We sample an interaction and compute the prediction of the current SLIM model

In [130]:
URM_train_coo = URM_train.tocoo()

sample_index = np.random.randint(URM_train_coo.nnz)
sample_index

4038520

In [131]:
user_id = URM_train_coo.row[sample_index]
item_id = URM_train_coo.col[sample_index]
rating = URM_train_coo.data[sample_index]

(user_id, item_id, rating)

(35242, 323, 3.0)

In [132]:
predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]
predicted_rating

0.0

#### The first predicted rating is zero, of course, the model is "empty"

### Step 3: We compute the prediction error and update the item-item similarity

In [133]:
prediction_error = rating - predicted_rating
prediction_error

3.0

### The error is positive, so we need to increase the prediction our model computes. Meaning, we have to increase the values in the item-item similarity matrix

### Which item similarities we modify? Only those we used to compute the prediction, i.e., only the items in the profile of the sampled user. 

In [134]:
items_in_user_profile = URM_train[user_id].indices
items_in_user_profile

array([   2,    4,    7,    9,   11,   12,   13,   14,   15,   16,   17,
         18,   19,   22,   25,   26,   73,   74,   75,   76,   77,   82,
         83,   84,   85,   86,   87,   88,   91,   92,   93,  113,  139,
        213,  304,  314,  318,  323,  329,  331,  345,  358, 1008, 1179,
       1293, 1298, 1410, 1412, 1426, 1535, 1559, 1574, 1576])

#### Apply the update rule

In [135]:
item_item_S[items_in_user_profile,item_id]

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0.])

In [136]:
learning_rate = 1e-4

item_item_S[items_in_user_profile,item_id] += prediction_error * learning_rate

In [137]:
item_item_S[items_in_user_profile,item_id]

array([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003,
       0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003,
       0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003,
       0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003,
       0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003,
       0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003,
       0.0003, 0.0003, 0.0003, 0.0003, 0.0003])

### And now? Sample another interaction and repeat... a lot of times

### Let's put all together in a training loop.

In [139]:
URM_train_coo = URM_train.tocoo()
item_item_S = np.zeros((n_items, n_items), dtype = np.float)

learning_rate = 1e-6

loss = 0.0
start_time = time.time()
for sample_num in range(100000):
    
    # Randomly pick sample
    sample_index = np.random.randint(URM_train_coo.nnz)

    user_id = URM_train_coo.row[sample_index]
    item_id = URM_train_coo.col[sample_index]
    rating = URM_train_coo.data[sample_index]

    # Compute prediction
    predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]
        
    # Compute prediction error, or gradient
    prediction_error = rating - predicted_rating
    loss += prediction_error**2
    
    # Update model, in this case the similarity
    items_in_user_profile = URM_train[user_id].indices
    item_item_S[items_in_user_profile,item_id] += prediction_error * learning_rate
    
    # Print some stats
    if (sample_num +1)% 5000 == 0:
        elapsed_time = time.time() - start_time
        samples_per_second = sample_num/elapsed_time
        print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/sample_num, samples_per_second))

Iteration 5000 in 2.17 seconds, loss is 13.48. Samples per second 2299.32
Iteration 10000 in 4.34 seconds, loss is 13.41. Samples per second 2304.85
Iteration 15000 in 6.56 seconds, loss is 13.43. Samples per second 2286.65
Iteration 20000 in 8.55 seconds, loss is 13.44. Samples per second 2338.93
Iteration 25000 in 10.62 seconds, loss is 13.44. Samples per second 2354.26
Iteration 30000 in 12.63 seconds, loss is 13.43. Samples per second 2375.83
Iteration 35000 in 14.77 seconds, loss is 13.40. Samples per second 2369.79
Iteration 40000 in 16.84 seconds, loss is 13.38. Samples per second 2374.68
Iteration 45000 in 19.07 seconds, loss is 13.36. Samples per second 2360.16
Iteration 50000 in 21.59 seconds, loss is 13.33. Samples per second 2315.60
Iteration 55000 in 23.95 seconds, loss is 13.32. Samples per second 2296.76
Iteration 60000 in 26.18 seconds, loss is 13.33. Samples per second 2291.39
Iteration 65000 in 28.38 seconds, loss is 13.33. Samples per second 2290.34
Iteration 70000 i

### What do we see? The loss oscillates over time, sometimes it goes down sometimes up.
### How long do we train such a model?

* An epoch: a complete loop over all the train data
* Usually you train for multiple epochs. Depending on the algorithm and data 10s or 100s of epochs.

### Let's estimate the training time. Say we train for 10 epochs and we have 8M interactions in the train data...

In [140]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 34287.10 seconds, or 571.45 minutes


### ... ehm, 10 hours
### Unacceptable!

### Let's rewrite the loop with some smarter use of the data structures. In particular:
* Use the indptr/indices data structures to get the seen items
* Not much else we can do with this tools

In [141]:
URM_train_coo = URM_train.tocoo()
item_item_S = np.zeros((n_items, n_items), dtype = np.float)

learning_rate = 1e-6

loss = 0.0
start_time = time.time()
for sample_num in range(100000):
    
    # Randomly pick sample
    sample_index = np.random.randint(URM_train_coo.nnz)

    user_id = URM_train_coo.row[sample_index]
    item_id = URM_train_coo.col[sample_index]
    rating = URM_train_coo.data[sample_index]

    # Compute prediction
    predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]
        
    # Compute prediction error, or gradient
    prediction_error = rating - predicted_rating
    loss += prediction_error**2
    
    # Update model, in this case the similarity
    items_in_user_profile = URM_train.indices[URM_train.indptr[user_id]:URM_train.indptr[user_id]]
    item_item_S[items_in_user_profile,item_id] += prediction_error * learning_rate
    
    # Print some stats
    if (sample_num +1)% 5000 == 0:
        elapsed_time = time.time() - start_time
        samples_per_second = sample_num/elapsed_time
        print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/sample_num, samples_per_second))

Iteration 5000 in 1.68 seconds, loss is 13.50. Samples per second 2971.89
Iteration 10000 in 3.21 seconds, loss is 13.56. Samples per second 3119.63
Iteration 15000 in 4.74 seconds, loss is 13.53. Samples per second 3166.17
Iteration 20000 in 6.25 seconds, loss is 13.51. Samples per second 3202.22
Iteration 25000 in 7.73 seconds, loss is 13.52. Samples per second 3232.58
Iteration 30000 in 9.23 seconds, loss is 13.55. Samples per second 3248.92
Iteration 35000 in 10.75 seconds, loss is 13.53. Samples per second 3257.05
Iteration 40000 in 12.28 seconds, loss is 13.53. Samples per second 3257.86
Iteration 45000 in 13.74 seconds, loss is 13.53. Samples per second 3276.04
Iteration 50000 in 15.23 seconds, loss is 13.54. Samples per second 3283.17
Iteration 55000 in 16.75 seconds, loss is 13.53. Samples per second 3282.94
Iteration 60000 in 18.27 seconds, loss is 13.52. Samples per second 3284.73
Iteration 65000 in 19.79 seconds, loss is 13.51. Samples per second 3285.08
Iteration 70000 in 

In [143]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 24583.25 seconds, or 409.72 minutes


### We now got 7 hours, just as bad as before

### Let's see what we can do with Cython
### First step, just compile it. We do not have the data at compile time, so we put the loop in a function

In [148]:
%%cython
import numpy as np
import time

def do_some_training(URM_train):
    
    URM_train_coo = URM_train.tocoo()
    n_items = URM_train.shape[1]
    
    item_item_S = np.zeros((n_items, n_items), dtype = np.float16)

    learning_rate = 1e-6

    loss = 0.0
    start_time = time.time()
    for sample_num in range(100000):

        # Randomly pick sample
        sample_index = np.random.randint(URM_train_coo.nnz)

        user_id = URM_train_coo.row[sample_index]
        item_id = URM_train_coo.col[sample_index]
        rating = URM_train_coo.data[sample_index]

        # Compute prediction
        predicted_rating = URM_train[user_id].dot(item_item_S[:,item_id])[0]

        # Compute prediction error, or gradient
        prediction_error = rating - predicted_rating
        loss += prediction_error**2

        # Update model, in this case the similarity
        items_in_user_profile = URM_train.indices[URM_train.indptr[user_id]:URM_train.indptr[user_id]]
        item_item_S[items_in_user_profile,item_id] += prediction_error * learning_rate

        # Print some stats
        if (sample_num +1)% 5000 == 0:
            elapsed_time = time.time() - start_time
            samples_per_second = sample_num/elapsed_time
            print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/sample_num, samples_per_second))   
    
    return loss, samples_per_second

In [149]:
loss, samples_per_second = do_some_training(URM_train)

Iteration 5000 in 1.36 seconds, loss is 13.43. Samples per second 3675.52
Iteration 10000 in 2.71 seconds, loss is 13.46. Samples per second 3693.55
Iteration 15000 in 4.03 seconds, loss is 13.44. Samples per second 3719.78
Iteration 20000 in 5.37 seconds, loss is 13.47. Samples per second 3721.92
Iteration 25000 in 6.65 seconds, loss is 13.47. Samples per second 3758.47
Iteration 30000 in 7.96 seconds, loss is 13.49. Samples per second 3768.03
Iteration 35000 in 9.29 seconds, loss is 13.48. Samples per second 3768.39
Iteration 40000 in 10.60 seconds, loss is 13.48. Samples per second 3774.70
Iteration 45000 in 11.91 seconds, loss is 13.46. Samples per second 3779.62
Iteration 50000 in 13.23 seconds, loss is 13.45. Samples per second 3777.86
Iteration 55000 in 14.58 seconds, loss is 13.46. Samples per second 3771.75
Iteration 60000 in 15.90 seconds, loss is 13.47. Samples per second 3773.07
Iteration 65000 in 17.27 seconds, loss is 13.46. Samples per second 3763.70
Iteration 70000 in 1

In [150]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 21380.64 seconds, or 356.34 minutes


### Still far too time consuming
#### The compiler is just porting in C all operations that the python interpreter would have to perform, dynamic tiping included

### Now try to add some types: If you use a variable only as a C object, use primitive tipes

* cdef int namevar
* cdef double namevar
* cdef float namevar
* cdef double[:] singledimensionarray
* cdef double[:,:] bidimensionalmatrix


### We now use types for all main variables


In [167]:
%%cython
import numpy as np
import time

def do_some_training(URM_train):

    URM_train_coo = URM_train.tocoo()
    n_items = URM_train.shape[1]

    cdef double[:,:] item_item_S = np.zeros((n_items, n_items), dtype = np.float)
    cdef double learning_rate = 1e-6
    cdef double loss = 0.0
    cdef long start_time = time.time()
    cdef double predicted_rating, prediction_error
    cdef int[:] items_in_user_profile
    cdef int index, sample_num

    for sample_num in range(100000):

        # Randomly pick sample
        sample_index = np.random.randint(URM_train_coo.nnz)

        user_id = URM_train_coo.row[sample_index]
        item_id = URM_train_coo.col[sample_index]
        rating = URM_train_coo.data[sample_index]

        # Compute prediction
        items_in_user_profile = URM_train.indices[URM_train.indptr[user_id]:URM_train.indptr[user_id]]
        predicted_rating = 0.0

        for index in items_in_user_profile:
            predicted_rating += item_item_S[index,item_id]

        # Compute prediction error, or gradient
        prediction_error = rating - predicted_rating
        loss += prediction_error**2

        # Update model, in this case the similarity
        for index in items_in_user_profile:
            item_item_S[index,item_id] += prediction_error * learning_rate

        # Print some stats
        if (sample_num +1)% 5000 == 0:
            elapsed_time = time.time() - start_time
            samples_per_second = sample_num/elapsed_time
            print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/sample_num, samples_per_second))

    return loss, samples_per_second

In [168]:
loss, samples_per_second = do_some_training(URM_train)

Iteration 5000 in 0.41 seconds, loss is 13.34. Samples per second 12051.75
Iteration 10000 in 0.46 seconds, loss is 13.38. Samples per second 21699.33
Iteration 15000 in 0.50 seconds, loss is 13.42. Samples per second 29771.74
Iteration 20000 in 0.55 seconds, loss is 13.43. Samples per second 36574.45
Iteration 25000 in 0.59 seconds, loss is 13.42. Samples per second 42385.20
Iteration 30000 in 0.63 seconds, loss is 13.42. Samples per second 47256.86
Iteration 35000 in 0.68 seconds, loss is 13.45. Samples per second 51635.42
Iteration 40000 in 0.72 seconds, loss is 13.46. Samples per second 55491.56
Iteration 45000 in 0.76 seconds, loss is 13.46. Samples per second 58836.46
Iteration 50000 in 0.81 seconds, loss is 13.46. Samples per second 61893.95
Iteration 55000 in 0.85 seconds, loss is 13.46. Samples per second 64566.48
Iteration 60000 in 0.90 seconds, loss is 13.46. Samples per second 66901.78
Iteration 65000 in 0.94 seconds, loss is 13.46. Samples per second 69160.76
Iteration 700

In [169]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 997.48 seconds, or 16.62 minutes


### This is why we use it for machine learning algorithms...

### Some operations are still done with sparse matrices, those cannot be correctly optimized because the compiler does not know how what is the type of the data.
### To address this, we create typed arrays in which we put the URM_train data

### We can also remove the np.random call, and use the faster native C function

## To obtain a more reliable speed estimate we increase the number of samples and the print step by a factor of 10

In [170]:
%%cython

import numpy as np
import time
from libc.stdlib cimport rand, srand, RAND_MAX

def do_some_training(URM_train):

    URM_train_coo = URM_train.tocoo()
    
    cdef int n_interactions = URM_train.nnz
    cdef int n_items = URM_train.shape[1]
    cdef int[:] URM_train_row = URM_train_coo.row
    cdef int[:] URM_train_col = URM_train_coo.col
    cdef double[:] URM_train_data = URM_train_coo.data
    cdef int[:] URM_train_indices = URM_train.indices
    cdef int[:] URM_train_indptr = URM_train.indptr

    cdef double[:,:] item_item_S = np.zeros((n_items, n_items), dtype = np.float)
    cdef double learning_rate = 1e-6
    cdef double loss = 0.0
    cdef long start_time = time.time()
    cdef double predicted_rating, prediction_error
    cdef int[:] items_in_user_profile
    cdef int index, sample_num

    for sample_num in range(1000000):

        # Randomly pick sample
        sample_index = rand() % n_interactions

        user_id = URM_train_row[sample_index]
        item_id = URM_train_col[sample_index]
        rating = URM_train_data[sample_index]

        # Compute prediction
        items_in_user_profile = URM_train_indices[URM_train_indptr[user_id]:URM_train_indptr[user_id]]
        predicted_rating = 0.0

        for index in items_in_user_profile:
            predicted_rating += item_item_S[index,item_id]

        # Compute prediction error, or gradient
        prediction_error = rating - predicted_rating
        loss += prediction_error**2

        # Update model, in this case the similarity
        for index in items_in_user_profile:
            item_item_S[index,item_id] += prediction_error * learning_rate

        # Print some stats
        if (sample_num +1)% 50000 == 0:
            elapsed_time = time.time() - start_time
            samples_per_second = sample_num/elapsed_time
            print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/sample_num, samples_per_second))

    return loss, samples_per_second

In [171]:
loss, samples_per_second = do_some_training(URM_train)

Iteration 50000 in 0.39 seconds, loss is 14.04. Samples per second 129647.63
Iteration 100000 in 0.50 seconds, loss is 14.04. Samples per second 199336.35
Iteration 150000 in 0.61 seconds, loss is 14.04. Samples per second 245631.80
Iteration 200000 in 0.72 seconds, loss is 14.04. Samples per second 278289.76
Iteration 250000 in 0.83 seconds, loss is 14.03. Samples per second 300595.76
Iteration 300000 in 0.95 seconds, loss is 14.03. Samples per second 317229.34
Iteration 350000 in 1.06 seconds, loss is 14.03. Samples per second 331221.58
Iteration 400000 in 1.17 seconds, loss is 14.03. Samples per second 342553.57
Iteration 450000 in 1.28 seconds, loss is 14.02. Samples per second 351368.39
Iteration 500000 in 1.39 seconds, loss is 14.02. Samples per second 359269.40
Iteration 550000 in 1.50 seconds, loss is 14.02. Samples per second 366735.31
Iteration 600000 in 1.61 seconds, loss is 14.02. Samples per second 372734.35
Iteration 650000 in 1.72 seconds, loss is 14.03. Samples per seco

In [172]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 199.58 seconds, or 3.33 minutes


### As the source code gets less readable due to the addition of types and native C functions, it also gets remarkably faster

### We started from a naive python implementation which took 10 hours and we now have a Cython one with takes 3 minutes.

## How to use Cython outside a notebook

### Step1: Create a .pyx file and write your code

### Step2: Create a compilation script "compileCython.py" with the following content

In [None]:
# This code will not run in a notebook cell

try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension


from Cython.Distutils import build_ext
import numpy
import sys
import re


if len(sys.argv) != 4:
    raise ValueError("Wrong number of paramethers received. Expected 4, got {}".format(sys.argv))


# Get the name of the file to compile
fileToCompile = sys.argv[1]

# Remove the argument from sys argv in order for it to contain only what setup needs
del sys.argv[1]

extensionName = re.sub("\.pyx", "", fileToCompile)


ext_modules = Extension(extensionName,
                [fileToCompile],
                extra_compile_args=['-O3'],
                include_dirs=[numpy.get_include(),],
                )

setup(
    cmdclass={'build_ext': build_ext},
    ext_modules=[ext_modules]
)


### Step3: Compile your code with the following command 

python compileCython.py Cosine_Similarity_Cython.pyx build_ext --inplace

### Step4: Generate cython report and look for "yellow lines". The report is an .html file which represents how many operations are necessary to translate each python operation in cython code. If a line is white, it has a direct C translation. If it is yellow it will require many indirect steps that will slow down execution. Some of those steps may be inevitable, some may be removed via static typing.

### IMPORTANT: white does not mean fast!! If a system call is involved that part might be slow anyway.

cython -a Cosine_Similarity_Cython.pyx

### Step5: Add static types and C functions to remove "yellow" lines.

#### If you use a variable only as a C object, use primitive tipes 
cdef int namevar

def double namevar

cdef float namevar

#### If you call a function only within C code, use a specific declaration "cdef"

cdef function_name(self, int param1, double param2):
...



## Step6: Iterate step 4 and 5 until you are satisfied with how clean your code is, then compile. An example of non optimized code can be found in the source folder of this notebook with the _SLOW suffix

## Step7: the compilation generates a file wose name is something like "Cosine_Similarity_Cython.cpython-36m-x86_64-linux-gnu.so" and tells you the source file, the architecture it is compiled for and the OS

## Step8: Import and use the compiled file as if it were a python class

In [None]:
from Base.Simialrity.Cython.Cosine_Similarity_Cython import Cosine_Similarity

cosine_cython = Cosine_Similarity(URM_train, TopK=100)

start_time = time.time()

cosine_cython.compute_similarity()

print("Similarity computed in {:.2f} seconds".format(time.time()-start_time))