# Recommender Systems 2018/19

### Practice 4 - Similarity with Cython


### Cython is a superset of Python, allowing you to use C-like operations and import C code. Cython files (.pyx) are compiled and support static typing.

In [1]:
import time
import numpy as np

### Let's implement something simple

In [2]:
def isPrime(n):
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [3]:
print("Is prime 2? {}".format(isPrime(2)))
print("Is prime 3? {}".format(isPrime(3)))
print("Is prime 5? {}".format(isPrime(5)))
print("Is prime 15? {}".format(isPrime(15)))
print("Is prime 20? {}".format(isPrime(20)))

Is prime 2? True
Is prime 3? True
Is prime 5? True
Is prime 15? False
Is prime 20? False


In [4]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 8.17 sec


#### Load Cython magic command, this takes care of the compilation step. If you are writing code outside Jupyter you'll have to compile using other tools

In [5]:
%load_ext Cython

#### Declare Cython function, paste the same code as before. The function will be compiled and then executed with a Python interface

In [6]:
%%cython
def isPrime(n):
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [7]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 3.97 sec


#### As you can see by just compiling the same code we got some improvement.
#### To go seriously higher, we have to use some static tiping

In [8]:
%%cython
# Declare the tipe of the arguments
def isPrime(long n):
    
    # Declare index of for loop
    cdef long i
    
    i = 2
    
    # Usually you loop up to sqrt(n)
    while i < n:
        if n % i == 0:
            return False
        
        i += 1
        
    return True

In [9]:
start_time = time.time()

result = isPrime(80000023)

print("Is Prime 80000023? {}, time required {:.2f} sec".format(result, time.time()-start_time))

Is Prime 80000023? True, time required 0.25 sec


#### Cython code with two tipe declaration, for n and i, runs 50x faster than Python

#### Main benefits of Cython:
* Compiled, no interpreter
* Static typing, no overhead
* Fast loops, no need to vectorize. Vectorization sometimes performes lots of useless operations
* Numpy, which is fast in python, becomes often slooooow compared to a carefully written Cython code

### Similarity with Cython

#### Load the usual data.

In [10]:
from Data_manager.split_functions.split_train_validation_random_holdout import split_train_in_two_percentage_global_sample
from Data_manager.Movielens10M.Movielens10MReader import Movielens10MReader

data_reader = Movielens10MReader()
data_reader.load_data()

URM_all = data_reader.get_URM_all()

URM_train, URM_test = split_train_in_two_percentage_global_sample(URM_all, train_percentage = 0.8)

DataReader: Preloaded data not found, reading from original files...
Movielens10MReader: Unable to fild data zip file. Downloading...
Downloading: http://files.grouplens.org/datasets/movielens/ml-10m.zip
In folder: Data_manager_split_datasets/Movielens10M/ml-10m.zip
DataReader: Downloaded 100.00%, 62.53 MB, 1685 KB/s, 38 seconds passed
Movielens10MReader: loading genres
Movielens10MReader: loading tags


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ferra\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


Movielens10MReader: loading URM


  dtype={0:str, 1:str, 2:float, 3:float})


Movielens10MReader: cleaning temporary files
Movielens10MReader: loading complete
DataReader: Verifying data consistency...
DataReader: Verifying data consistency... Passed!
DataReader: Creating folder 'Data_manager_split_datasets/Movielens10M/original/'
DataReader: Saving complete!
DataReader: current dataset is: <class 'Data_manager.Movielens10M.Movielens10MReader.Movielens10MReader'>
	Number of items: 10680
	Number of users: 69878
	Number of interactions in URM_all: 9973605
	Interaction density: 1.34E-02
	Interactions per user:
		 Min: 1.90E+01
		 Avg: 1.43E+02
		 Max: 7.36E+03
	Interactions per item:
		 Min: 0.00E+00
		 Avg: 9.34E+02
		 Max: 3.49E+04
	Gini Index: 0.57



In [11]:
URM_train

<69878x10680 sparse matrix of type '<class 'numpy.float64'>'
	with 7979148 stored elements in Compressed Sparse Row format>

#### Since we cannot store in memory the whole similarity, we compute it one row at a time

In [12]:
itemIndex=1
item_ratings = URM_train[:,itemIndex]
item_ratings = item_ratings.toarray().squeeze()

item_ratings.shape

(69878,)

In [13]:
this_item_weights = URM_train.T.dot(item_ratings)
this_item_weights.shape

(10680,)

#### Once we have the scores for that row, we get the TopK

In [14]:
k=10

top_k_idx = np.argsort(this_item_weights) [-k:]
top_k_idx

array([  30,   60, 1051,  720,  723,  639,  351,  256,  766,    1],
      dtype=int64)

In [15]:
import scipy.sparse as sps

In [16]:
# Function hiding some conversion checks
def check_matrix(X, format='csc', dtype=np.float32):
    if format == 'csc' and not isinstance(X, sps.csc_matrix):
        return X.tocsc().astype(dtype)
    elif format == 'csr' and not isinstance(X, sps.csr_matrix):
        return X.tocsr().astype(dtype)
    elif format == 'coo' and not isinstance(X, sps.coo_matrix):
        return X.tocoo().astype(dtype)
    elif format == 'dok' and not isinstance(X, sps.dok_matrix):
        return X.todok().astype(dtype)
    elif format == 'bsr' and not isinstance(X, sps.bsr_matrix):
        return X.tobsr().astype(dtype)
    elif format == 'dia' and not isinstance(X, sps.dia_matrix):
        return X.todia().astype(dtype)
    elif format == 'lil' and not isinstance(X, sps.lil_matrix):
        return X.tolil().astype(dtype)
    else:
        return X.astype(dtype)

#### Create a Basic Collaborative filtering recommender using only cosine similarity

In [17]:
class BasicItemKNN_CF_Recommender(object):
    """ ItemKNN recommender with cosine similarity and no shrinkage"""

    def __init__(self, URM):
        self.dataset = URM
        
        
    def compute_similarity(self, URM):
        
        # We explore the matrix column-wise
        URM = check_matrix(URM, 'csc')     
                    
        values = []
        rows = []
        cols = []
        
        start_time = time.time()
        processedItems = 0
        
        # Compute all similarities for each item using vectorization
        for itemIndex in range(URM.shape[1]):
            
            processedItems += 1
            
            if processedItems % 100==0:
                
                itemPerSec = processedItems/(time.time()-start_time)
                
                print("Similarity item {}, {:.2f} item/sec, required time {:.2f} min".format(
                    processedItems, itemPerSec, URM.shape[1]/itemPerSec/60))
            
            # All ratings for a given item
            item_ratings = URM[:,itemIndex]
            item_ratings = item_ratings.toarray().squeeze()
            
            # Compute item similarities
            this_item_weights = URM_train.T.dot(item_ratings)
            
            # Sort indices and select TopK
            top_k_idx = np.argsort(this_item_weights) [-self.k:]
            
            # Incrementally build sparse matrix
            values.extend(this_item_weights[top_k_idx])
            rows.extend(np.arange(URM.shape[1])[top_k_idx])
            cols.extend(np.ones(self.k) * itemIndex)
            
        self.W_sparse = sps.csc_matrix((values, (rows, cols)),
                                       shape=(URM.shape[1], URM.shape[1]),
                                       dtype=np.float32)

        

    def fit(self, k=50, shrinkage=100):

        self.k = k
        self.shrinkage = shrinkage
        
        item_weights = self.compute_similarity(self.dataset)
        
        item_weights = check_matrix(item_weights, 'csr')
        
        
    def recommend(self, user_id, at=None, exclude_seen=True):
        # compute the scores using the dot product
        user_profile = self.URM[user_id]
        scores = user_profile.dot(self.W_sparse).toarray().ravel()

        if exclude_seen:
            scores = self.filter_seen(user_id, scores)

        # rank items
        ranking = scores.argsort()[::-1]
            
        return ranking[:at]
    
    
    def filter_seen(self, user_id, scores):

        start_pos = self.URM.indptr[user_id]
        end_pos = self.URM.indptr[user_id+1]

        user_profile = self.URM.indices[start_pos:end_pos]
        
        scores[user_profile] = -np.inf

        return scores

#### Let's isolate the compute_similarity function 

In [18]:
def compute_similarity(URM, k=100):

    # We explore the matrix column-wise
    URM = check_matrix(URM, 'csc')
    
    n_items = URM.shape[1]

    values = []
    rows = []
    cols = []

    start_time = time.time()
    processedItems = 0

    # Compute all similarities for each item using vectorization
    # for itemIndex in range(n_items):
    for itemIndex in range(1000):

        processedItems += 1

        if processedItems % 100==0:

            itemPerSec = processedItems/(time.time()-start_time)

            print("Similarity item {}, {:.2f} item/sec, required time {:.2f} min".format(
                processedItems, itemPerSec, n_items/itemPerSec/60))

        # All ratings for a given item
        item_ratings = URM[:,itemIndex]
        item_ratings = item_ratings.toarray().squeeze()

        # Compute item similarities
        this_item_weights = URM.T.dot(item_ratings)

        # Sort indices and select TopK
        top_k_idx = np.argsort(this_item_weights) [-k:]

        # Incrementally build sparse matrix
        values.extend(this_item_weights[top_k_idx])
        rows.extend(np.arange(URM.shape[1])[top_k_idx])
        cols.extend(np.ones(k) * itemIndex)

    W_sparse = sps.csc_matrix((values, (rows, cols)),
                            shape=(n_items, n_items),
                            dtype=np.float32)

    return W_sparse
        

In [19]:
compute_similarity(URM_train)

Similarity item 100, 86.50 item/sec, required time 2.06 min
Similarity item 200, 86.39 item/sec, required time 2.06 min
Similarity item 300, 86.53 item/sec, required time 2.06 min
Similarity item 400, 86.48 item/sec, required time 2.06 min
Similarity item 500, 86.37 item/sec, required time 2.06 min
Similarity item 600, 86.36 item/sec, required time 2.06 min
Similarity item 700, 86.30 item/sec, required time 2.06 min
Similarity item 800, 86.30 item/sec, required time 2.06 min
Similarity item 900, 86.40 item/sec, required time 2.06 min
Similarity item 1000, 86.53 item/sec, required time 2.06 min


<10680x10680 sparse matrix of type '<class 'numpy.float32'>'
	with 100000 stored elements in Compressed Sparse Column format>

### We see that computing the similarity takes more or less 2 minutes
### Now we use the same identical code, but we compile it

In [20]:
%%cython
import time
import numpy as np
import scipy.sparse as sps

def compute_similarity_compiled(URM, k=100):

    # We explore the matrix column-wise
    URM = URM.tocsc()
    
    n_items = URM.shape[1]

    values = []
    rows = []
    cols = []

    start_time = time.time()
    processedItems = 0

    # Compute all similarities for each item using vectorization
    # for itemIndex in range(n_items):
    for itemIndex in range(1000):

        processedItems += 1

        if processedItems % 100==0:

            itemPerSec = processedItems/(time.time()-start_time)

            print("Similarity item {}, {:.2f} item/sec, required time {:.2f} min".format(
                processedItems, itemPerSec, n_items/itemPerSec/60))

        # All ratings for a given item
        item_ratings = URM[:,itemIndex]
        item_ratings = item_ratings.toarray().squeeze()

        # Compute item similarities
        this_item_weights = URM.T.dot(item_ratings)

        # Sort indices and select TopK
        top_k_idx = np.argsort(this_item_weights) [-k:]

        # Incrementally build sparse matrix
        values.extend(this_item_weights[top_k_idx])
        rows.extend(np.arange(URM.shape[1])[top_k_idx])
        cols.extend(np.ones(k) * itemIndex)

    W_sparse = sps.csc_matrix((values, (rows, cols)),
                            shape=(n_items, n_items),
                            dtype=np.float32)

    return W_sparse
        

In [21]:
compute_similarity_compiled(URM_train)

Similarity item 100, 59.41 item/sec, required time 3.00 min
Similarity item 200, 58.89 item/sec, required time 3.02 min
Similarity item 300, 58.62 item/sec, required time 3.04 min
Similarity item 400, 58.72 item/sec, required time 3.03 min
Similarity item 500, 58.69 item/sec, required time 3.03 min
Similarity item 600, 58.57 item/sec, required time 3.04 min
Similarity item 700, 58.59 item/sec, required time 3.04 min
Similarity item 800, 58.51 item/sec, required time 3.04 min
Similarity item 900, 58.52 item/sec, required time 3.04 min
Similarity item 1000, 58.63 item/sec, required time 3.04 min


<10680x10680 sparse matrix of type '<class 'numpy.float32'>'
	with 100000 stored elements in Compressed Sparse Column format>

#### As opposed to the previous example, compilation by itself is not very helpful. Why?
#### Because the compiler is just porting in C all operations that the python interpreter would have to perform, dynamic tiping included

### Now try to add some tipes

In [22]:
%%cython
import time
import numpy as np
import scipy.sparse as sps

cimport numpy as np

"""
Determine the operative system. The interface of numpy returns a different type for argsort under windows and linux

http://docs.cython.org/en/latest/src/userguide/language_basics.html#conditional-compilation
"""
IF UNAME_SYSNAME == "linux":
    DEF LONG_t = "long"
ELIF  UNAME_SYSNAME == "Windows":
    DEF LONG_t = "long long"
ELSE:
    DEF LONG_t = "long long"
    
def compute_similarity_compiled(URM, int k=100):
    
    cdef int itemIndex, processedItems
    
    # We use the numpy syntax, allowing us to perform vectorized operations
    cdef np.ndarray[double, ndim=1] item_ratings, this_item_weights
    cdef np.ndarray[LONG_t, ndim=1] top_k_idx

    # We explore the matrix column-wise
    URM = URM.tocsc()
    
    n_items = URM.shape[1]

    values = []
    rows = []
    cols = []

    start_time = time.time()
    processedItems = 0

    # Compute all similarities for each item using vectorization
    # for itemIndex in range(n_items):
    for itemIndex in range(1000):

        processedItems += 1

        if processedItems % 100==0:

            itemPerSec = processedItems/(time.time()-start_time)

            print("Similarity item {}, {:.2f} item/sec, required time {:.2f} min".format(
                processedItems, itemPerSec, n_items/itemPerSec/60))

        # All ratings for a given item
        item_ratings = URM[:,itemIndex].toarray().squeeze()

        # Compute item similarities
        this_item_weights = URM.T.dot(item_ratings)

        # Sort indices and select TopK
        top_k_idx = np.argsort(this_item_weights) [-k:]

        # Incrementally build sparse matrix
        values.extend(this_item_weights[top_k_idx])
        rows.extend(np.arange(URM.shape[1])[top_k_idx])
        cols.extend(np.ones(k) * itemIndex)

    W_sparse = sps.csc_matrix((values, (rows, cols)),
                            shape=(n_items, n_items),
                            dtype=np.float32)

    return W_sparse

In [23]:
compute_similarity_compiled(URM_train)

Similarity item 100, 58.10 item/sec, required time 3.06 min
Similarity item 200, 57.39 item/sec, required time 3.10 min
Similarity item 300, 57.14 item/sec, required time 3.12 min
Similarity item 400, 57.03 item/sec, required time 3.12 min
Similarity item 500, 57.07 item/sec, required time 3.12 min
Similarity item 600, 57.11 item/sec, required time 3.12 min
Similarity item 700, 57.12 item/sec, required time 3.12 min
Similarity item 800, 56.94 item/sec, required time 3.13 min
Similarity item 900, 57.01 item/sec, required time 3.12 min
Similarity item 1000, 57.11 item/sec, required time 3.12 min


<10680x10680 sparse matrix of type '<class 'numpy.float32'>'
	with 100000 stored elements in Compressed Sparse Column format>

### Still no luck! Why?
### There are a few reasons:
* We are getting the data from the sparse matrix using its interface, which is SLOW
* We are transforming sparse data into a dense array, which is SLOW
* We are performing a dot product against a dense vector

#### You colud find a workaround... here we do something different

### Proposed solution
### Change the algorithm!

### Instead of performing the dot product, let's implement somenting that computes the similarity using sparse data directly

### We loop through the data and update selectively the similarity matrix cells. 
### Underlying idea:
* When I select an item I can know which users rated it
* Instead of looping through the other items trying to find common users, I use the URM to find which other items that user rated
* The user I am considering will be common between the two, so I increment the similarity of the two items
* Instead of following the path item1 -> loop item2 -> find user, i go item1 -> loop user -> loop item2

In [24]:
data_matrix = np.array([[1,1,0,1],[0,1,1,1],[1,0,1,0]])
data_matrix = sps.csc_matrix(data_matrix)
data_matrix.todense()

matrix([[1, 1, 0, 1],
        [0, 1, 1, 1],
        [1, 0, 1, 0]], dtype=int32)

### Example: Compute the similarities for item 1

#### Step 1: get users that rated item 1

In [25]:
users_rated_item = data_matrix[:,1]
users_rated_item.indices

array([0, 1])

#### Step 2: count how many times those users rated other items

In [26]:
item_similarity = data_matrix[users_rated_item.indices].sum(axis = 0)
np.array(item_similarity).squeeze()

array([1, 2, 1, 2])

#### Verify our result against the common method. We can see that the similarity values for col 1 are identical

In [27]:
similarity_matrix_product = data_matrix.T.dot(data_matrix)
similarity_matrix_product.toarray()[:,1]

array([1, 2, 1, 2], dtype=int32)

In [28]:
# The following code works for implicit feedback only
def compute_similarity_new_algorithm(URM, k=100):

    # We explore the matrix column-wise
    URM = check_matrix(URM, 'csc')
    URM.data = np.ones_like(URM.data)
    
    n_items = URM.shape[1]

    values = []
    rows = []
    cols = []

    start_time = time.time()
    processedItems = 0

    # Compute all similarities for each item using vectorization
    # for itemIndex in range(n_items):
    for itemIndex in range(1000):

        processedItems += 1

        if processedItems % 100==0:

            itemPerSec = processedItems/(time.time()-start_time)

            print("Similarity item {}, {:.2f} item/sec, required time {:.2f} min".format(
                processedItems, itemPerSec, n_items/itemPerSec/60))

        # All ratings for a given item
        users_rated_item = URM.indices[URM.indptr[itemIndex]:URM.indptr[itemIndex+1]]

        # Compute item similarities
        this_item_weights = URM[users_rated_item].sum(axis = 0)
        this_item_weights = np.array(this_item_weights).squeeze()

        # Sort indices and select TopK
        top_k_idx = np.argsort(this_item_weights) [-k:]

        # Incrementally build sparse matrix
        values.extend(this_item_weights[top_k_idx])
        rows.extend(np.arange(URM.shape[1])[top_k_idx])
        cols.extend(np.ones(k) * itemIndex)

    W_sparse = sps.csc_matrix((values, (rows, cols)),
                            shape=(n_items, n_items),
                            dtype=np.float32)

    return W_sparse
        

In [29]:
compute_similarity_new_algorithm(URM_train)

Similarity item 100, 21.28 item/sec, required time 8.36 min
Similarity item 200, 22.35 item/sec, required time 7.96 min
Similarity item 300, 22.52 item/sec, required time 7.91 min
Similarity item 400, 22.19 item/sec, required time 8.02 min
Similarity item 500, 22.45 item/sec, required time 7.93 min
Similarity item 600, 22.12 item/sec, required time 8.05 min
Similarity item 700, 23.05 item/sec, required time 7.72 min
Similarity item 800, 23.51 item/sec, required time 7.57 min
Similarity item 900, 23.94 item/sec, required time 7.43 min
Similarity item 1000, 24.27 item/sec, required time 7.33 min


<10680x10680 sparse matrix of type '<class 'numpy.float32'>'
	with 100000 stored elements in Compressed Sparse Column format>

#### Slower but expected, dot product operations are implemented in an efficient way and here we are using an indirect approach

### Now let's write this algorithm in Cython

In [30]:
%%cython

import time

import numpy as np
cimport numpy as np
from cpython.array cimport array, clone

import scipy.sparse as sps



"""
Determine the operative system. The interface of numpy returns a different type for argsort under windows and linux

http://docs.cython.org/en/latest/src/userguide/language_basics.html#conditional-compilation
"""
IF UNAME_SYSNAME == "linux":
    DEF LONG_t = "long"
ELIF  UNAME_SYSNAME == "Windows":
    DEF LONG_t = "long long"
ELSE:
    DEF LONG_t = "long long"



cdef class Cosine_Similarity:

    cdef int TopK
    cdef long n_items

    # Arrays containing the sparse data
    cdef int[:] user_to_item_row_ptr, user_to_item_cols
    cdef int[:] item_to_user_rows, item_to_user_col_ptr
    cdef double[:] user_to_item_data, item_to_user_data

    # In case you select no TopK
    cdef double[:,:] W_dense

    
    def __init__(self, URM, TopK = 100):
        """
        Dataset must be a matrix with items as columns
        :param dataset:
        :param TopK:
        """

        super(Cosine_Similarity, self).__init__()

        self.n_items = URM.shape[1]

        self.TopK = min(TopK, self.n_items)

        URM = URM.tocsr()
        self.user_to_item_row_ptr = URM.indptr
        self.user_to_item_cols = URM.indices
        self.user_to_item_data = np.array(URM.data, dtype=np.float64)

        URM = URM.tocsc()
        self.item_to_user_rows = URM.indices
        self.item_to_user_col_ptr = URM.indptr
        self.item_to_user_data = np.array(URM.data, dtype=np.float64)

        if self.TopK == 0:
            self.W_dense = np.zeros((self.n_items,self.n_items))



    cdef int[:] getUsersThatRatedItem(self, long item_id):
        return self.item_to_user_rows[self.item_to_user_col_ptr[item_id]:self.item_to_user_col_ptr[item_id+1]]

    cdef int[:] getItemsRatedByUser(self, long user_id):
        return self.user_to_item_cols[self.user_to_item_row_ptr[user_id]:self.user_to_item_row_ptr[user_id+1]]

    
    
    cdef double[:] computeItemSimilarities(self, long item_id_input):
        """
        For every item the cosine similarity against other items depends on whether they have users in common. 
        The more common users the higher the similarity.
        
        The basic implementation is:
        - Select the first item
        - Loop through all other items
        -- Given the two items, get the users they have in common
        -- Update the similarity considering all common users
        
        That is VERY slow due to the common user part, in which a long data structure is looped multiple times.
        
        A better way is to use the data structure in a different way skipping the search part, getting directly
        the information we need.
        
        The implementation here used is:
        - Select the first item
        - Initialize a zero valued array for the similarities
        - Get the users who rated the first item
        - Loop through the users
        -- Given a user, get the items he rated (second item)
        -- Update the similarity of the items he rated
        
        
        """

        # Create template used to initialize an array with zeros
        # Much faster than np.zeros(self.n_items)
        cdef array[double] template_zero = array('d')
        cdef array[double] result = clone(template_zero, self.n_items, zero=True)


        cdef long user_index, user_id, item_index, item_id_second

        cdef int[:] users_that_rated_item = self.getUsersThatRatedItem(item_id_input)
        cdef int[:] items_rated_by_user

        cdef double rating_item_input, rating_item_second

        # Get users that rated the items
        for user_index in range(len(users_that_rated_item)):

            user_id = users_that_rated_item[user_index]
            rating_item_input = self.item_to_user_data[self.item_to_user_col_ptr[item_id_input]+user_index]

            # Get all items rated by that user
            items_rated_by_user = self.getItemsRatedByUser(user_id)

            for item_index in range(len(items_rated_by_user)):

                item_id_second = items_rated_by_user[item_index]

                # Do not compute the similarity on the diagonal
                if item_id_second != item_id_input:
                    # Increment similairty
                    rating_item_second = self.user_to_item_data[self.user_to_item_row_ptr[user_id]+item_index]

                    result[item_id_second] += rating_item_input*rating_item_second

        return result


    def compute_similarity(self):

        cdef int itemIndex, innerItemIndex
        cdef long long topKItemIndex

        cdef long long[:] top_k_idx

        # Declare numpy data type to use vetor indexing and simplify the topK selection code
        cdef np.ndarray[LONG_t, ndim=1] top_k_partition, top_k_partition_sorting
        cdef np.ndarray[np.float64_t, ndim=1] this_item_weights_np

        cdef double[:] this_item_weights

        cdef long processedItems = 0

        # Data structure to incrementally build sparse matrix
        # Preinitialize max possible length
        cdef double[:] values = np.zeros((self.n_items*self.TopK))
        cdef int[:] rows = np.zeros((self.n_items*self.TopK,), dtype=np.int32)
        cdef int[:] cols = np.zeros((self.n_items*self.TopK,), dtype=np.int32)
        cdef long sparse_data_pointer = 0


        start_time = time.time()

        # Compute all similarities for each item
        for itemIndex in range(self.n_items):

            processedItems += 1

            if processedItems % 10000==0 or processedItems==self.n_items:

                itemPerSec = processedItems/(time.time()-start_time)

                print("Similarity item {} ( {:2.0f} % ), {:.2f} item/sec, required time {:.2f} min".format(
                    processedItems, processedItems*1.0/self.n_items*100, itemPerSec, (self.n_items-processedItems) / itemPerSec / 60))

            this_item_weights = self.computeItemSimilarities(itemIndex)

            if self.TopK == 0:

                for innerItemIndex in range(self.n_items):
                    self.W_dense[innerItemIndex,itemIndex] = this_item_weights[innerItemIndex]

            else:

                # Sort indices and select TopK
                # Using numpy implies some overhead, unfortunately the plain C qsort function is even slower
                # top_k_idx = np.argsort(this_item_weights) [-self.TopK:]

                # Sorting is done in three steps. Faster then plain np.argsort for higher number of items
                # because we avoid sorting elements we already know we don't care about
                # - Partition the data to extract the set of TopK items, this set is unsorted
                # - Sort only the TopK items, discarding the rest
                # - Get the original item index

                this_item_weights_np = - np.array(this_item_weights)
                
                # Get the unordered set of topK items
                top_k_partition = np.argpartition(this_item_weights_np, self.TopK-1)[0:self.TopK]
                # Sort only the elements in the partition
                top_k_partition_sorting = np.argsort(this_item_weights_np[top_k_partition])
                # Get original index
                top_k_idx = top_k_partition[top_k_partition_sorting]



                # Incrementally build sparse matrix
                for innerItemIndex in range(len(top_k_idx)):

                    topKItemIndex = top_k_idx[innerItemIndex]

                    values[sparse_data_pointer] = this_item_weights[topKItemIndex]
                    rows[sparse_data_pointer] = topKItemIndex
                    cols[sparse_data_pointer] = itemIndex

                    sparse_data_pointer += 1


        if self.TopK == 0:

            return np.array(self.W_dense)

        else:

            values = np.array(values[0:sparse_data_pointer])
            rows = np.array(rows[0:sparse_data_pointer])
            cols = np.array(cols[0:sparse_data_pointer])

            W_sparse = sps.csr_matrix((values, (rows, cols)),
                                    shape=(self.n_items, self.n_items),
                                    dtype=np.float32)

            return W_sparse




In [31]:
cosine_cython = Cosine_Similarity(URM_train, TopK=100)

start_time = time.time()

cosine_cython.compute_similarity()

print("Similarity computed in {:.2f} seconds".format(time.time()-start_time))

Similarity item 10000 ( 94 % ), 741.80 item/sec, required time 0.02 min
Similarity item 10680 ( 100 % ), 781.29 item/sec, required time 0.00 min
Similarity computed in 13.71 seconds


### Better... much better. There are a few other things you could do, but at this point it is not worth the effort

## How to use Cython outside a notebook

### Step1: Create a .pyx file and write your code

### Step2: Create a compilation script "compileCython.py" with the following content

In [None]:
# This code will not run in a notebook cell

try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension


from Cython.Distutils import build_ext
import numpy
import sys
import re


if len(sys.argv) != 4:
    raise ValueError("Wrong number of paramethers received. Expected 4, got {}".format(sys.argv))


# Get the name of the file to compile
fileToCompile = sys.argv[1]

# Remove the argument from sys argv in order for it to contain only what setup needs
del sys.argv[1]

extensionName = re.sub("\.pyx", "", fileToCompile)


ext_modules = Extension(extensionName,
                [fileToCompile],
                extra_compile_args=['-O3'],
                include_dirs=[numpy.get_include(),],
                )

setup(
    cmdclass={'build_ext': build_ext},
    ext_modules=[ext_modules]
)


### Step3: Compile your code with the following command 

python compileCython.py Cosine_Similarity_Cython.pyx build_ext --inplace

### Step4: Generate cython report and look for "yellow lines". The report is an .html file which represents how many operations are necessary to translate each python operation in cython code. If a line is white, it has a direct C translation. If it is yellow it will require many indirect steps that will slow down execution. Some of those steps may be inevitable, some may be removed via static typing.

### IMPORTANT: white does not mean fast!! If a system call is involved that part might be slow anyway.

cython -a Cosine_Similarity_Cython.pyx

### Step5: Add static types and C functions to remove "yellow" lines.

#### If you use a variable only as a C object, use primitive tipes 
cdef int namevar

def double namevar

cdef float namevar

#### If you call a function only within C code, use a specific declaration "cdef"

cdef function_name(self, int param1, double param2):
...



## Step6: Iterate step 4 and 5 until you are satisfied with how clean your code is, then compile. An example of non optimized code can be found in the source folder of this notebook with the _SLOW suffix

## Step7: the compilation generates a file wose name is something like "Cosine_Similarity_Cython.cpython-36m-x86_64-linux-gnu.so" and tells you the source file, the architecture it is compiled for and the OS

## Step8: Import and use the compiled file as if it were a python class

In [None]:
from Base.Simialrity.Cython.Cosine_Similarity_Cython import Cosine_Similarity

cosine_cython = Cosine_Similarity(URM_train, TopK=100)

start_time = time.time()

cosine_cython.compute_similarity()

print("Similarity computed in {:.2f} seconds".format(time.time()-start_time))