### Матричные факторизации

В данной работе вам предстоит познакомиться с практической стороной матричных разложений.
Работа поделена на 4 задания:
1. Вам необходимо реализовать SVD разложения используя SGD на explicit данных
2. Вам необходимо реализовать матричное разложения используя ALS на implicit данных
3. Вам необходимо реализовать матричное разложения используя BPR(pair-wise loss) на implicit данных
4. Вам необходимо реализовать матричное разложения используя WARP(list-wise loss) на implicit данных

Мягкий дедлайн 28 Сентября (пишутся замечания, выставляется оценка, есть возможность исправить до жесткого дедлайна)

Жесткий дедлайн 5 Октября (Итоговая проверка)

In [1]:
import implicit
import pandas as pd
import numpy as np
import scipy.sparse as sp

# from lightfm.datasets import fetch_movielens

В данной работе мы будем работать с explicit датасетом movieLens, в котором представленны пары user_id movie_id и rating выставленный пользователем фильму

Скачать датасет можно по ссылке https://grouplens.org/datasets/movielens/1m/

In [2]:
ratings = pd.read_csv('ml-1m/ratings.dat', delimiter='::', header=None, 
        names=['user_id', 'movie_id', 'rating', 'timestamp'], 
        usecols=['user_id', 'movie_id', 'rating'], engine='python')

In [3]:
movie_info = pd.read_csv('ml-1m/movies.dat', delimiter='::', header=None, 
        names=['movie_id', 'name', 'category'], engine='python')

Explicit данные

In [4]:
ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
0,1,1193,5
1,1,661,3
2,1,914,3
3,1,3408,4
4,1,2355,5
5,1,1197,3
6,1,1287,5
7,1,2804,5
8,1,594,4
9,1,919,4


Для того, чтобы преобразовать текущий датасет в Implicit, давайте считать что позитивная оценка это оценка >=4

In [5]:
implicit_ratings = ratings.loc[(ratings['rating'] >= 4)]

In [6]:
implicit_ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
0,1,1193,5
3,1,3408,4
4,1,2355,5
6,1,1287,5
7,1,2804,5
8,1,594,4
9,1,919,4
10,1,595,5
11,1,938,4
12,1,2398,4


Удобнее работать с sparse матричками, давайте преобразуем DataFrame в CSR матрицы

In [7]:
users = implicit_ratings["user_id"]
movies = implicit_ratings["movie_id"]
user_item = sp.coo_matrix((np.ones_like(users), (users, movies)))
user_item_t_csr = user_item.T.tocsr()
user_item_csr = user_item.tocsr()

В качестве примера воспользуемся ALS разложением из библиотеки implicit

Зададим размерность латентного пространства равным 64, это же определяет размер user/item эмбедингов

In [8]:
model = implicit.als.AlternatingLeastSquares(factors=64, iterations=100, calculate_training_loss=True)



В качестве loss здесь всеми любимый RMSE

In [9]:
model.fit(user_item_t_csr)

HBox(children=(FloatProgress(value=0.0), HTML(value='')))




Построим похожие фильмы по 1 movie_id = Истории игрушек

In [10]:
movie_info.head(5)

Unnamed: 0,movie_id,name,category
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [11]:
get_similars = lambda item_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                        for x in model.similar_items(item_id)]

Как мы видим, симилары действительно оказались симиларами.

Качество симиларов часто является хорошим способом проверить качество алгоритмов.

P.S. Если хочется поглубже разобраться в том как разные алгоритмы формируют разные латентные пространства, рекомендую загружать полученные вектора в tensorBoard и смотреть на сформированное пространство

In [12]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 "2286    Bug's Life, A (1998)",
 '33    Babe (1995)',
 '584    Aladdin (1992)',
 '2315    Babe: Pig in the City (1998)',
 '360    Lion King, The (1994)',
 '1838    Mulan (1998)',
 '1526    Hercules (1997)',
 '2618    Tarzan (1999)']

Давайте теперь построим рекомендации для юзеров

Как мы видим юзеру нравится фантастика, значит и в рекомендациях ожидаем увидеть фантастику

In [13]:
get_user_history = lambda user_id, implicit_ratings : [movie_info[movie_info["movie_id"] == x]["name"].to_string() 
                                            for x in implicit_ratings[implicit_ratings["user_id"] == user_id]["movie_id"]]

In [14]:
get_user_history(4, implicit_ratings)

['3399    Hustler, The (1961)',
 '2882    Fistful of Dollars, A (1964)',
 '1196    Alien (1979)',
 '1023    Die Hard (1988)',
 '257    Star Wars: Episode IV - A New Hope (1977)',
 '1959    Saving Private Ryan (1998)',
 '476    Jurassic Park (1993)',
 '1180    Raiders of the Lost Ark (1981)',
 '1885    Rocky (1976)',
 '1081    E.T. the Extra-Terrestrial (1982)',
 '3349    Thelma & Louise (1991)',
 '3633    Mad Max (1979)',
 '2297    King Kong (1933)',
 '1366    Jaws (1975)',
 '1183    Good, The Bad and The Ugly, The (1966)',
 '2623    Run Lola Run (Lola rennt) (1998)',
 '2878    Goldfinger (1964)',
 '1220    Terminator, The (1984)']

Получилось! 

Мы действительно порекомендовали пользователю фантастику и боевики, более того встречаются продолжения тех фильмов, которые он высоко оценил

In [15]:
get_recommendations = lambda user_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                               for x in model.recommend(user_id, user_item_csr)]

In [16]:
get_recommendations(4, model)

['585    Terminator 2: Judgment Day (1991)',
 '1271    Indiana Jones and the Last Crusade (1989)',
 '1182    Aliens (1986)',
 '1284    Butch Cassidy and the Sundance Kid (1969)',
 '2502    Matrix, The (1999)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '3402    Close Encounters of the Third Kind (1977)',
 '847    Godfather, The (1972)',
 '1892    Rain Man (1988)',
 '1179    Princess Bride, The (1987)']

Теперь ваша очередь реализовать самые популярные алгоритмы матричных разложений

Что будет оцениваться:
1. Корректность алгоритма
2. Качество получившихся симиларов
3. Качество итоговых рекомендаций для юзера

### Задание 1. Не использую готовые решения, реализовать SVD разложение используя SGD на explicit данных

In [129]:
N_USERS = users.max() + 1
N_ITEMS = movies.max() + 1


def scalar_prods(vecs1, vecs2):
    return np.sum(vecs1 * vecs2, axis=1).flatten()


class MatrixFactorizationBase:
    def __init__(self, dim, reg_param, n_users, n_items):
        self.dim = dim
        self.n_users = n_users
        self.n_items = n_items
        init_std = 1 / dim ** .5
        self.users_embeddings = np.random.normal(0, init_std, (n_users, dim))
        self.items_embeddings = np.random.normal(0, init_std, (n_items, dim))
        self.users_biases = np.random.uniform(0, .5, n_users)
        self.items_biases = np.random.uniform(0, .5, n_items)
        self.reg_param = reg_param
    
    def fit(self, interactions, n_epochs, lr):
        pass
    
    def similarities(self, users_ids, items_ids):
        return self.users_biases[users_ids] + self.items_biases[items_ids] + \
                scalar_prods(self.users_embeddings[users_ids], self.items_embeddings[items_ids])
    
    def recommend(self, user_id, _ = None, n_recs = 20):
        similarities = self.items_embeddings @ self.users_embeddings[user_id]
        closest_item_ids = similarities.argsort()[::-1][:n_recs]
        return list(zip(closest_item_ids, similarities[closest_item_ids]))
    
    def similar_items(self, item_id, n_items = 20):
        similarities = self.items_embeddings @ self.items_embeddings[item_id]
        items_by_similariry = similarities.argsort()[::-1]
        items_by_similariry = items_by_similariry[items_by_similariry != item_id]
        most_similar_items = items_by_similariry[:n_items]
        return list(zip(most_similar_items, similarities[most_similar_items]))

In [343]:
SGD_BATCH_SIZE = 512


class GradientDescentMatrixFactorization(MatrixFactorizationBase):
    def __init__(self, dim, reg_alpha, n_users, n_items):
        super().__init__(dim, reg_alpha, n_users, n_items)
    
    
    def make_gd_step(self, users_ids, items_ids, targets, lr):
        users_gradients = np.zeros_like(self.users_embeddings)
        items_gradients = np.zeros_like(self.items_embeddings)
        users_biases_gradients = np.zeros_like(self.users_biases)
        items_biases_gradients = np.zeros_like(self.items_biases)
        
        predictions = self.similarities(users_ids, items_ids)
        errors_gradients = np.expand_dims(2 * (predictions - targets), 1)
        np.add.at(users_gradients, users_ids, errors_gradients * self.items_embeddings[items_ids])
        np.add.at(items_gradients, items_ids, errors_gradients * self.users_embeddings[users_ids])
        np.add.at(users_gradients, users_ids, 2 * self.reg_param * self.users_embeddings[users_ids])
        np.add.at(items_gradients, items_ids, 2 * self.reg_param * self.items_embeddings[items_ids])
        np.add.at(users_biases_gradients, users_ids, errors_gradients.flatten())
        np.add.at(items_biases_gradients, items_ids, errors_gradients.flatten())
        loss = np.sum((predictions - targets) ** 2) + \
               self.reg_param * (np.linalg.norm(self.users_embeddings[users_ids], axis=1).sum() + \
                                 np.linalg.norm(self.items_embeddings[items_ids], axis=1).sum())

        self.users_embeddings -= lr * users_gradients
        self.items_embeddings -= lr * items_gradients
        self.users_biases -= lr * users_biases_gradients
        self.items_biases -= lr * items_biases_gradients
        return loss
    
    
    def fit(self, interactions, n_epochs, lr):
        n_negatives = n_samples = len(interactions.data)
        users_ids = interactions.row
        items_ids = interactions.col
        
        unique_users = np.array(list(set(users_ids)))
        unique_items = np.array(list(set(items_ids)))
        
        for epoch in range(n_epochs):
            neg_users = np.random.choice(self.n_users, n_negatives)
            neg_items = np.random.choice(self.n_items, n_negatives)
            all_users = np.concatenate((users_ids, neg_users))
            all_items = np.concatenate((items_ids, neg_items))
            targets = np.concatenate((np.ones(n_samples), np.zeros(n_negatives)))
            indexes = np.arange(n_samples + n_negatives)
            np.random.shuffle(indexes)
            
            loss = 0.
            for batch_start in range(0, len(indexes), SGD_BATCH_SIZE):
                batch_indexes = indexes[batch_start:batch_start + SGD_BATCH_SIZE]
                loss += self.make_gd_step(all_users[batch_indexes], all_items[batch_indexes], 
                                          targets[batch_indexes], lr)
            
            print(f'Epoch {epoch + 1} loss {loss:.3f}')

In [344]:
gd_model = GradientDescentMatrixFactorization(64, .01, N_USERS, N_ITEMS)
gd_model.fit(user_item, 10, .1)
gd_model.fit(user_item, 10, .01)
gd_model.fit(user_item, 10, .001)
gd_model.fit(user_item, 10, .0001)

Epoch 1 loss 207739.255
Epoch 2 loss 170673.102
Epoch 3 loss 161184.963
Epoch 4 loss 158644.071
Epoch 5 loss 157408.196
Epoch 6 loss 156283.177
Epoch 7 loss 155980.688
Epoch 8 loss 155240.273
Epoch 9 loss 155232.102
Epoch 10 loss 154799.076
Epoch 1 loss 120608.357
Epoch 2 loss 111983.791
Epoch 3 loss 108809.027
Epoch 4 loss 106787.285
Epoch 5 loss 105632.541
Epoch 6 loss 104470.472
Epoch 7 loss 104048.391
Epoch 8 loss 103079.217
Epoch 9 loss 102889.211
Epoch 10 loss 102215.109
Epoch 1 loss 100178.767
Epoch 2 loss 99653.408
Epoch 3 loss 99449.168
Epoch 4 loss 99181.710
Epoch 5 loss 99206.868
Epoch 6 loss 99160.086
Epoch 7 loss 98985.337
Epoch 8 loss 99090.865
Epoch 9 loss 98862.203
Epoch 10 loss 98819.208
Epoch 1 loss 98651.113
Epoch 2 loss 98808.555
Epoch 3 loss 98327.102
Epoch 4 loss 98488.179
Epoch 5 loss 99002.406
Epoch 6 loss 98661.000
Epoch 7 loss 98395.529
Epoch 8 loss 98458.462
Epoch 9 loss 98737.554
Epoch 10 loss 98672.993


In [352]:
get_similars(858, gd_model)  # The Godfather

['1203    Godfather: Part II, The (1974)',
 '3266    Jail Bait (1954)',
 '761    Vie est belle, La (Life is Rosey) (1987)',
 '104    Nobody Loves Me (Keiner liebt mich) (1994)',
 '3526    Held Up (2000)',
 "3748    Pot O' Gold (1941)",
 '739    Man from Down Under, The (1943)',
 "3713    Shaft's Big Score! (1972)",
 '3583    City of the Living Dead (Paura nella cittа dei...',
 '2369    Outside Ozona (1998)',
 '3346    Mirror, The (Zerkalo) (1975)',
 '32    Wings of Courage (1995)',
 '3015    Home Page (1999)',
 '720    Institute Benjamenta, or This Dream People Cal...',
 '587    Tough and Deadly (1995)',
 '784    Midnight Dancers (Sibak) (1994)',
 '3391    Hillbillys in a Haunted House (1967)',
 '281    New York Cop (1996)',
 '109    Taxi Driver (1976)',
 '2818    Simon Sez (1999)']

### Задание 2. Не использую готовые решения, реализовать матричное разложение используя ALS на implicit данных

In [356]:
from collections import defaultdict

from scipy.sparse.linalg import lsqr


def solve_parameters(target_embeddings, target_biases, interactions_lists, 
                     fixed_embeddings, fixed_biases, dim, reg_alpha):
    loss = 0.
    for x, (fixed_indexes, targets) in interactions_lists.items():
        a = np.hstack((
            np.ones((len(fixed_indexes), 1)),
            fixed_embeddings[fixed_indexes]
        ))
        b = targets - fixed_biases[fixed_indexes]
        
        a = np.vstack((a, np.zeros((dim, dim + 1))))
        a[np.arange(dim) + len(fixed_indexes), np.arange(dim) + 1] = reg_alpha
        b = np.concatenate((b, np.zeros(dim)))
        
        solution, *_ = np.linalg.lstsq(a, b, None)
        target_biases[x] = solution[0]
        target_embeddings[x] = solution[1:]
        loss += np.sum((a @ solution - b) ** 2)
    return loss


class ALSMatrixFactorization(MatrixFactorizationBase):
    def __init__(self, dim, reg_alpha, n_users, n_items):
        super().__init__(dim, reg_alpha, n_users, n_items)
    
    def fit(self, interactions, n_iterations):
        users_ids = interactions.row
        items_ids = interactions.col
        n_negatives = n_positives = interactions.nnz
        
        negative_users_ids = np.random.choice(np.unique(users_ids), n_negatives)
        negative_items_ids = np.random.choice(np.unique(items_ids), n_negatives)
        
        users_int_lists = defaultdict(lambda: ([], []))
        items_int_lists = defaultdict(lambda: ([], []))
        for user_id, item_id, target in zip(np.concatenate((users_ids, negative_users_ids)), 
                                            np.concatenate((items_ids, negative_items_ids)),
                                            np.concatenate((np.ones(n_positives), np.zeros(n_negatives)))):
            user_items_ids, user_targets = users_int_lists[user_id]
            user_items_ids.append(item_id)
            user_targets.append(target)
            item_users_ids, item_targets = items_int_lists[item_id]
            item_users_ids.append(user_id)
            item_targets.append(target)
        users_int_lists = {user_id: (np.array(user_items_ids), np.array(user_targets))
                           for user_id, (user_items_ids, user_targets) in users_int_lists.items()}
        items_int_lists = {item_id: (np.array(item_users_ids), np.array(item_targets))
                           for item_id, (item_users_ids, item_targets) in items_int_lists.items()}
        
        for iteration in range(n_iterations):
            users_loss = solve_parameters(self.users_embeddings, self.users_biases, users_int_lists, 
                                          self.items_embeddings, self.items_biases, self.dim, self.reg_param)
            items_loss = solve_parameters(self.items_embeddings, self.items_biases, items_int_lists, 
                                          self.users_embeddings, self.users_biases, self.dim, self.reg_param)
            print(f'Iteration {iteration + 1} loss {users_loss + items_loss:.3f}')

In [357]:
als_model = ALSMatrixFactorization(64, 1, N_USERS, N_ITEMS)
als_model.fit(user_item, 10)

Iteration 1 loss 290660.530
Iteration 2 loss 141956.850
Iteration 3 loss 117941.981
Iteration 4 loss 108628.399
Iteration 5 loss 103560.520
Iteration 6 loss 100334.521
Iteration 7 loss 98084.057
Iteration 8 loss 96414.384
Iteration 9 loss 95118.509
Iteration 10 loss 94078.219


In [360]:
get_similars(1240, als_model)  # Terminator

['1023    Die Hard (1988)',
 '3633    Mad Max (1979)',
 '1196    Alien (1979)',
 '3458    Predator (1987)',
 '1182    Aliens (1986)',
 '2219    Thing, The (1982)',
 '3402    Close Encounters of the Third Kind (1977)',
 '1222    Glory (1989)',
 '1113    Escape from New York (1981)',
 '1885    Rocky (1976)',
 '1271    Indiana Jones and the Last Crusade (1989)',
 '2916    Robocop (1987)',
 '2458    Westworld (1973)',
 '1491    Fifth Element, The (1997)',
 '1355    Star Trek IV: The Voyage Home (1986)',
 '1353    Star Trek: The Wrath of Khan (1982)',
 '1952    Dune (1984)',
 '1188    Clockwork Orange, A (1971)',
 '2571    Superman (1978)',
 '2125    Untouchables, The (1987)']

### Задание 3. Не использую готовые решения, реализовать матричное разложение BPR на implicit данных

In [361]:
BPR_BATCH_SIZE = 16


BPR_MARGIN = 10


class BPRMF(MatrixFactorizationBase):
    def __init__(self, dim, reg_alpha, n_users, n_items):
        super().__init__(dim, reg_alpha, n_users, n_items)
        
    def fit(self, interactions, n_epochs, lr):
        users = interactions.row
        positives = interactions.col
            
        for epoch in range(1, n_epochs + 1):
            negatives = np.random.choice(np.unique(positives), len(interactions.data))
            
            loss = 0.
            indexes = np.arange(interactions.nnz)
            for batch_start in range(0, interactions.nnz, BPR_BATCH_SIZE):
                batch_indexes = indexes[batch_start:batch_start + BPR_BATCH_SIZE]
                batch_users = users[batch_indexes]
                batch_positives = positives[batch_indexes]
                batch_negatives = negatives[batch_indexes]
                
                items_embeddings_diff = self.items_embeddings[batch_positives] - self.items_embeddings[batch_negatives]
                x_uij = scalar_prods(self.users_embeddings[batch_users], items_embeddings_diff) + \
                    self.items_biases[batch_positives] - self.items_biases[batch_negatives]
                x_uij = np.maximum(x_uij, -100)
                mask = x_uij < BPR_MARGIN
                x_uij_negxp = np.exp(-np.minimum(x_uij, BPR_MARGIN))
                loss += np.log((1 + x_uij_negxp)).sum()
                loss += self.reg_param * (
                    np.linalg.norm(self.users_embeddings[batch_users], axis=1).sum() + 
                    np.linalg.norm(self.items_embeddings[batch_positives], axis=1).sum() + 
                    np.linalg.norm(self.items_embeddings[batch_negatives], axis=1).sum())
                loss_grads = -x_uij_negxp[mask] / (1 + x_uij_negxp[mask])
                positive_biases_grads = loss_grads
                negative_biases_grads = -loss_grads
                loss_grads = loss_grads.reshape((-1, 1))
                user_grads = loss_grads * items_embeddings_diff[mask] + \
                        self.reg_param * self.users_embeddings[batch_users][mask]
                positive_grads = loss_grads * self.users_embeddings[batch_users][mask] + \
                        self.reg_param * self.items_embeddings[batch_positives][mask]
                negative_grads = -loss_grads * self.users_embeddings[batch_users][mask] + \
                        self.reg_param * self.items_embeddings[batch_negatives][mask]
                
                np.add.at(self.users_embeddings, batch_users[mask], -lr * user_grads)
                np.add.at(self.items_embeddings, batch_positives[mask], -lr * positive_grads)
                np.add.at(self.items_embeddings, batch_negatives[mask], -lr * negative_grads)
                np.add.at(self.items_biases, batch_positives[mask], -lr * positive_biases_grads)
                np.add.at(self.items_biases, batch_negatives[mask], -lr * negative_biases_grads)
            print(f'Epoch {epoch} loss {loss:.3f}')

In [370]:
bpr_model = BPRMF(256, .01, N_USERS, N_ITEMS)
bpr_model.fit(user_item, 10, .01)
bpr_model.fit(user_item, 10, .001)

Epoch 1 loss 299042.323
Epoch 2 loss 240029.660
Epoch 3 loss 225742.266
Epoch 4 loss 219017.032
Epoch 5 loss 214659.635
Epoch 6 loss 211594.017
Epoch 7 loss 209342.547
Epoch 8 loss 206350.896
Epoch 9 loss 205297.520
Epoch 10 loss 202867.692
Epoch 1 loss 201016.670
Epoch 2 loss 200779.868
Epoch 3 loss 200177.160
Epoch 4 loss 199488.917
Epoch 5 loss 199659.475
Epoch 6 loss 199520.982
Epoch 7 loss 199657.655
Epoch 8 loss 199400.754
Epoch 9 loss 198900.161
Epoch 10 loss 198868.315


In [371]:
get_similars(1721, bpr_model)  # Titanic

['1545    G.I. Jane (1997)',
 "61    Mr. Holland's Opus (1995)",
 '3188    Bodyguard, The (1992)',
 '1732    U.S. Marshalls (1998)',
 '2655    Runaway Bride (1999)',
 "1342    Preacher's Wife, The (1996)",
 '1450    Saint, The (1997)',
 '289    Outbreak (1995)',
 '1358    Young Guns II (1990)',
 '1825    Six Days Seven Nights (1998)',
 '1848    Armageddon (1998)',
 '166    First Knight (1995)',
 '3181    Alive (1993)',
 '601    One Fine Day (1996)',
 '2427    Blast from the Past (1999)',
 '1823    Perfect Murder, A (1998)',
 '1014    Robin Hood: Prince of Thieves (1991)',
 '593    Pretty Woman (1990)',
 '795    Time to Kill, A (1996)',
 '778    Nutty Professor, The (1996)']

### Задание 4. Не использую готовые решения, реализовать матричное разложение WARP на implicit данных

In [380]:
WARP_BATCH_SIZE = 4
WARP_MAX_SAMPLE_TRIALS = 1000
WARP_MARGIN = 1


def project_vectors(vectors, indexes, max_norm):
    vector_norms = np.linalg.norm(vectors[indexes], axis=1)
    vectors[indexes] *= np.maximum(max_norm / vector_norms, 1).reshape((-1, 1))


class WARPMF(MatrixFactorizationBase):
    def __init__(self, dim, reg_param, n_users, n_items):
        super().__init__(dim, reg_param, n_users, n_items)
        
    def fit(self, interactions, n_epochs, lr):
        users = interactions.row
        positives = interactions.col
        unique_items = np.unique(positives)
            
        for epoch in range(1, n_epochs + 1):
            loss = 0.
            indexes = np.arange(interactions.nnz)
            for batch_start in range(0, interactions.nnz, WARP_BATCH_SIZE):
                batch_indexes = indexes[batch_start:batch_start + WARP_BATCH_SIZE]
                batch_users = users[batch_indexes]
                batch_positives = positives[batch_indexes]
                positives_similarities = self.similarities(batch_users, batch_positives)
                
                batch_negatives = np.random.choice(unique_items, len(batch_users))
                negatives_similarities = self.similarities(batch_users, batch_negatives)
                good_mask = positives_similarities - negatives_similarities > WARP_MARGIN
                sampling_counters = np.ones(len(batch_users))
                for _ in range(WARP_MAX_SAMPLE_TRIALS):
                    n_good = good_mask.sum()
                    if n_good == 0:
                        break
                    batch_negatives[good_mask] = np.random.choice(unique_items, n_good)
                    sampling_counters[good_mask] += 1
                    negatives_similarities[good_mask] = self.similarities(
                        batch_users[good_mask], batch_negatives[good_mask])
                    good_mask = positives_similarities - negatives_similarities > WARP_MARGIN
                to_opt_mask = ~good_mask
                n_to_opt = to_opt_mask.sum()
                
                batch_users = batch_users[to_opt_mask]
                batch_positives = batch_positives[to_opt_mask]
                batch_negatives = batch_negatives[to_opt_mask]
                positives_similarities = positives_similarities[to_opt_mask]
                negatives_similarities = negatives_similarities[to_opt_mask]
                samples_weights = np.log((WARP_MAX_SAMPLE_TRIALS - 1) / sampling_counters[to_opt_mask])
                
                
                loss += np.sum((WARP_MARGIN + negatives_similarities - positives_similarities) * samples_weights)
                positive_biases_grads = -samples_weights
                negative_biases_grads = samples_weights
                samples_weights = np.expand_dims(samples_weights, 1)
                user_grads = samples_weights * \
                        (self.items_embeddings[batch_negatives] - self.items_embeddings[batch_positives])
                positive_grads = samples_weights * (-self.users_embeddings[batch_users])
                negative_grads = samples_weights * self.users_embeddings[batch_users]
                
                np.add.at(self.users_embeddings, batch_users, -lr * user_grads)
                np.add.at(self.items_embeddings, batch_positives, -lr * positive_grads)
                np.add.at(self.items_embeddings, batch_negatives, -lr * negative_grads)
                project_vectors(self.users_embeddings, batch_users, self.reg_param)
                project_vectors(self.items_embeddings, batch_positives, self.reg_param)
                project_vectors(self.items_embeddings, batch_negatives, self.reg_param)
                np.add.at(self.items_biases, batch_positives, -lr * positive_biases_grads)
                np.add.at(self.items_biases, batch_negatives, -lr * negative_biases_grads)
            print(f'Epoch {epoch} loss {loss:.3f}')

In [381]:
warp_model = WARPMF(64, 1, N_USERS, N_ITEMS)
warp_model.fit(user_item, 5, .0001)

Epoch 1 loss 3358708.315
Epoch 2 loss 2668730.417
Epoch 3 loss 2484464.265
Epoch 4 loss 2475452.125
Epoch 5 loss 2507519.921


In [382]:
get_similars(260, warp_model)  # Star wars a new hope

['3520    Kill, Baby... Kill! (Operazione Paura) (1966)',
 '704    Of Love and Shadows (1994)',
 '3668    Lonely Are the Brave (1962)',
 '743    Month by the Lake, A (1995)',
 '1277    Real Genius (1985)',
 '556    Beans of Egypt, Maine, The (1994)',
 '2700    Yards, The (1999)',
 '1548    Cop Land (1997)',
 '2273    Hard Core Logo (1996)',
 '1646    Amistad (1997)',
 "3445    Joe Gould's Secret (2000)",
 '2325    Prince of Egypt, The (1998)',
 '3257    What Planet Are You From? (2000)',
 '1658    Harlem River Drive (1996)',
 '1414    Meet Wally Sparks (1997)',
 '3032    Fatal Attraction (1987)',
 '2190    Blame It on Rio (1984)',
 'Series([], )',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '730    Honigmond (1996)']