### Матричные факторизации

В данной работе вам предстоит познакомиться с практической стороной матричных разложений.
Работа поделена на 4 задания:
1. Вам необходимо реализовать SVD разложения используя SGD на explicit данных
2. Вам необходимо реализовать матричное разложения используя ALS на implicit данных
3. Вам необходимо реализовать матричное разложения используя BPR(pair-wise loss) на implicit данных
4. Вам необходимо реализовать матричное разложения используя WARP(list-wise loss) на implicit данных

Мягкий дедлайн 28 Сентября (пишутся замечания, выставляется оценка, есть возможность исправить до жесткого дедлайна)

Жесткий дедлайн 5 Октября (Итоговая проверка)

In [1]:
!pip install implicit lightfm

Collecting implicit
[?25l  Downloading https://files.pythonhosted.org/packages/bc/07/c0121884722d16e2c5beeb815f6b84b41cbf22e738e4075f1475be2791bc/implicit-0.4.4.tar.gz (1.1MB)
[K     |████████████████████████████████| 1.1MB 2.7MB/s 
[?25hCollecting lightfm
[?25l  Downloading https://files.pythonhosted.org/packages/e9/8e/5485ac5a8616abe1c673d1e033e2f232b4319ab95424b42499fabff2257f/lightfm-1.15.tar.gz (302kB)
[K     |████████████████████████████████| 307kB 13.3MB/s 
Building wheels for collected packages: implicit, lightfm
  Building wheel for implicit (setup.py) ... [?25l[?25hdone
  Created wheel for implicit: filename=implicit-0.4.4-cp36-cp36m-linux_x86_64.whl size=3419389 sha256=dcfcf0637feb8a6c2834f4861ea0434f96f21bf0e0dfa462b7e4e24235a25a82
  Stored in directory: /root/.cache/pip/wheels/bf/d4/ec/fd4f622fcbefb7521f149905295b2c26adecb23af38aa28217
  Building wheel for lightfm (setup.py) ... [?25l[?25hdone
  Created wheel for lightfm: filename=lightfm-1.15-cp36-cp36m-linux_x86

In [2]:
import implicit
import pandas as pd
import numpy as np
import scipy.sparse as sp
from tqdm.autonotebook import trange, tqdm

from lightfm.datasets import fetch_movielens

В данной работе мы будем работать с explicit датасетом movieLens, в котором представленны пары user_id movie_id и rating выставленный пользователем фильму

Скачать датасет можно по ссылке https://grouplens.org/datasets/movielens/1m/

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
datapath = "/content/drive/My Drive/AU/RecSys/ml-1m/"

In [5]:
ratings = pd.read_csv(datapath + 'ratings.dat', delimiter='::', header=None, 
        names=['user_id', 'movie_id', 'rating', 'timestamp'], 
        usecols=['user_id', 'movie_id', 'rating'], engine='python')

In [6]:
movie_info = pd.read_csv(datapath + 'movies.dat', delimiter='::', header=None, 
        names=['movie_id', 'name', 'category'], engine='python')

# Для удобства изменяю индексы!

In [7]:
ratings['user_id'] -= 1
ratings['movie_id'] -= 1
movie_info['movie_id'] -= 1
movie_info.set_index('movie_id')

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Toy Story (1995),Animation|Children's|Comedy
1,Jumanji (1995),Adventure|Children's|Fantasy
2,Grumpier Old Men (1995),Comedy|Romance
3,Waiting to Exhale (1995),Comedy|Drama
4,Father of the Bride Part II (1995),Comedy
...,...,...
3947,Meet the Parents (2000),Comedy
3948,Requiem for a Dream (2000),Drama
3949,Tigerland (2000),Drama
3950,Two Family House (2000),Drama


Explicit данные

In [8]:
ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
0,0,1192,5
1,0,660,3
2,0,913,3
3,0,3407,4
4,0,2354,5
5,0,1196,3
6,0,1286,5
7,0,2803,5
8,0,593,4
9,0,918,4


Для того, чтобы преобразовать текущий датасет в Implicit, давайте считать что позитивная оценка это оценка >=4

In [9]:
implicit_ratings = ratings.loc[(ratings['rating'] >= 4)]

In [10]:
implicit_ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
0,0,1192,5
3,0,3407,4
4,0,2354,5
6,0,1286,5
7,0,2803,5
8,0,593,4
9,0,918,4
10,0,594,5
11,0,937,4
12,0,2397,4


Удобнее работать с sparse матричками, давайте преобразуем DataFrame в CSR матрицы

In [11]:
users = implicit_ratings["user_id"]
movies = implicit_ratings["movie_id"]
user_item = sp.coo_matrix((np.ones_like(users), (users, movies)))
user_item_t_csr = user_item.T.tocsr()
user_item_csr = user_item.tocsr()

В качестве примера воспользуемся ALS разложением из библиотеки implicit

Зададим размерность латентного пространства равным 64, это же определяет размер user/item эмбедингов

In [None]:
model = implicit.als.AlternatingLeastSquares(factors=64, iterations=100, calculate_training_loss=True)

В качестве loss здесь всеми любимый RMSE

In [None]:
model.fit(user_item_t_csr)

HBox(children=(FloatProgress(value=0.0), HTML(value='')))




Построим похожие фильмы по 1 movie_id = Истории игрушек

In [None]:
movie_info.head(5)

Unnamed: 0,movie_id,name,category
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [None]:
get_similars = lambda item_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                        for x in model.similar_items(item_id)]

Как мы видим, симилары действительно оказались симиларами.

Качество симиларов часто является хорошим способом проверить качество алгоритмов.

P.S. Если хочется поглубже разобраться в том как разные алгоритмы формируют разные латентные пространства, рекомендую загружать полученные вектора в tensorBoard и смотреть на сформированное пространство

In [None]:
get_similars(1, model)

['0    Toy Story (1995)',
 '369    Red Rock West (1992)',
 '2284    Enemy of the State (1998)',
 'Series([], )',
 '1160    Double Life of Veronique, The (La Double Vie d...',
 '2275    Runaway Train (1985)',
 '1943    Back to the Future Part III (1990)',
 '1299    Kids of Survival (1993)',
 '2429    My Favorite Martian (1999)',
 '627    Land and Freedom (Tierra y libertad) (1995)']

Давайте теперь построим рекомендации для юзеров

Как мы видим юзеру нравится фантастика, значит и в рекомендациях ожидаем увидеть фантастику

In [None]:
get_user_history = lambda user_id, implicit_ratings : [movie_info[movie_info["movie_id"] == x]["name"].to_string() 
                                            for x in implicit_ratings[implicit_ratings["user_id"] == user_id]["movie_id"]]

In [None]:
get_user_history(4, implicit_ratings)

['3399    Hustler, The (1961)',
 '2882    Fistful of Dollars, A (1964)',
 '1196    Alien (1979)',
 '1023    Die Hard (1988)',
 '257    Star Wars: Episode IV - A New Hope (1977)',
 '1959    Saving Private Ryan (1998)',
 '476    Jurassic Park (1993)',
 '1180    Raiders of the Lost Ark (1981)',
 '1885    Rocky (1976)',
 '1081    E.T. the Extra-Terrestrial (1982)',
 '3349    Thelma & Louise (1991)',
 '3633    Mad Max (1979)',
 '2297    King Kong (1933)',
 '1366    Jaws (1975)',
 '1183    Good, The Bad and The Ugly, The (1966)',
 '2623    Run Lola Run (Lola rennt) (1998)',
 '2878    Goldfinger (1964)',
 '1220    Terminator, The (1984)']

Получилось! 

Мы действительно порекомендовали пользователю фантастику и боевики, более того встречаются продолжения тех фильмов, которые он высоко оценил

In [None]:
get_recommendations = lambda user_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                               for x in model.recommend(user_id, user_item_csr)]

In [None]:
get_recommendations(4, model)

['740    Dr. Strangelove or: How I Learned to Stop Worr...',
 '3859    Bank Dick, The (1940)',
 '1345    Crucible, The (1996)',
 '1129    Snowriders (1996)',
 '1190    Apocalypse Now (1979)',
 '1299    Kids of Survival (1993)',
 '2692    Iron Giant, The (1999)',
 '3290    Breaking Away (1979)',
 '2061    Atlantic City (1980)',
 '3776    Easy Money (1983)']

Теперь ваша очередь реализовать самые популярные алгоритмы матричных разложений

Что будет оцениваться:
1. Корректность алгоритма
2. Качество получившихся симиларов
3. Качество итоговых рекомендаций для юзера

### Задание 1. Не использую готовые решения, реализовать SVD разложение используя SGD на explicit данных

In [12]:
ex_user_item = sp.coo_matrix((ratings.rating, (ratings.user_id, ratings.movie_id)))
ex_user_item_t_csr = user_item.T.tocsr()
ex_user_item_csr = user_item.tocsr()

In [13]:
class SVD_SGD:
    def __init__(self, dim=64, iters=1e7, eps=1e-2, lmbda=1e-2, theta=1e-2):
        self.dim = dim
        self.iters = int(iters)
        self.eps = eps
        self.lmbda = lmbda
        self.theta = theta
        self.U = None
        self.V = None
        self.mu = None
        self.bu = None
        self.bv = None
    
    def fit(self, user_item):
        n_users, n_items = user_item.shape
        self.U = np.random.uniform(0, 1/np.sqrt(self.dim), (n_users, self.dim))
        self.V = np.random.uniform(0, 1/np.sqrt(self.dim), (n_items, self.dim))
        self.mu = user_item.mean()
        self.bu = np.array(user_item.mean(axis=1)).reshape(-1)
        self.bv = np.array(user_item.mean(axis=0)).reshape(-1)
        t = trange(self.iters)
        self.rmse(user_item, t)
        i_nonzero, j_nonzero = user_item.nonzero()
        for iter in t:
            x = np.random.randint(len(i_nonzero))
            i, j = i_nonzero[x], j_nonzero[x]
            error = self.score(i, j) - user_item[i, j]
            self.U[i] -= self.eps * (error * self.V[j] + self.lmbda * self.U[i])
            self.V[j] -= self.eps * (error * self.U[i] + self.lmbda * self.V[j])
            self.mu -= self.eps * error
            self.bu[i] -= self.eps * (error + self.theta * self.bu[i])
            self.bv[j] -= self.eps * (error + self.theta * self.bv[j])

            if (iter + 1) % 10000 == 0:
                self.rmse(user_item, t)

    def rmse(self, user_item, t, rmse_size=1000):
        loss = []
        i_nonzero, j_nonzero = user_item.nonzero()
        idxs = np.random.randint(len(i_nonzero), size=rmse_size)
        for i, j in map(lambda x: (i_nonzero[x], j_nonzero[x]), idxs):
            error = self.score(i, j) - user_item[i, j]
            loss.append(error ** 2)
        t.set_postfix({'rmse': np.sqrt(np.mean(loss))})
            
    def recommend(self, user_id, user_item, top_n=10):
        recommended_items = set(user_item[user_id].nonzero()[1])
        return sorted(list(set(range(self.V.shape[0])) - recommended_items), key=lambda j: -self.score(user_id, j))[:top_n]

    def similar_items(self, item_id, movie_info=movie_info, top_n=10):
        return np.argsort(np.linalg.norm(self.V - self.V[item_id], axis=1))[:top_n]
    
    def user_history(self, user_id, user_item):
        return [i for i in user_item[user_id].nonzero()[1]]

    def score(self, i, j):
        return self.U[i] @ self.V[j] + self.bu[i] + self.bv[j] + self.mu

In [14]:
def get_movies(idxs):
    return movie_info.set_index('movie_id').loc[[i for i in idxs if i in set(movie_info.movie_id)]]

In [None]:
SVD_model = SVD_SGD(iters=1e6)
SVD_model.fit(ex_user_item_csr)

HBox(children=(FloatProgress(value=0.0, max=1000000.0), HTML(value='')))




In [None]:
get_movies(SVD_model.similar_items(0))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Toy Story (1995),Animation|Children's|Comedy
1209,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War
3792,X-Men (2000),Action|Sci-Fi
1579,Men in Black (1997),Action|Adventure|Comedy|Sci-Fi
607,Fargo (1996),Crime|Drama|Thriller
1035,Die Hard (1988),Action|Thriller
1386,Jaws (1975),Action|Horror
1251,Chinatown (1974),Film-Noir|Mystery|Thriller
2320,Pleasantville (1998),Comedy
2027,Saving Private Ryan (1998),Action|Drama|War


In [None]:
get_movies(SVD_model.user_history(3, user_item_csr))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
259,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi
479,Jurassic Park (1993),Action|Adventure|Sci-Fi
1035,Die Hard (1988),Action|Thriller
1096,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi
1197,Raiders of the Lost Ark (1981),Action|Adventure
1200,"Good, The Bad and The Ugly, The (1966)",Action|Western
1213,Alien (1979),Action|Horror|Sci-Fi|Thriller
1239,"Terminator, The (1984)",Action|Sci-Fi|Thriller
1386,Jaws (1975),Action|Horror
1953,Rocky (1976),Action|Drama


In [None]:
get_movies(SVD_model.recommend(3, ex_user_item_csr))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
2558,"King and I, The (1999)",Animation|Children's
2379,Police Academy 3: Back in Training (1986),Comedy
3760,"Blood In, Blood Out (a.k.a. Bound by Honor) (1...",Crime|Drama
452,For Love or Money (1993),Comedy
3533,28 Days (2000),Comedy
985,Fly Away Home (1996),Adventure|Children's
3083,Home Page (1999),Documentary
2973,Bats (1999),Horror|Thriller
3373,Daughters of the Dust (1992),Drama
2971,Red Sorghum (Hong Gao Liang) (1987),Drama|War


### Задание 2. Не использую готовые решения, реализовать матричное разложение используя ALS на implicit данных

In [None]:
class ALS(SVD_SGD):
    def __init__(self, iters=100, eps=1e-3, lmbda=1e-3, **kargs):
        super().__init__(iters=iters, eps=eps, lmbda=lmbda, **kargs)

    def fit(self, user_item):
        user_item_t = user_item.transpose()
        n_users, n_items = user_item.shape
        self.U = np.random.uniform(0, 1/np.sqrt(self.dim), (n_users, self.dim))
        self.V = np.random.uniform(0, 1/np.sqrt(self.dim), (n_items, self.dim))
        t = trange(self.iters)
        self.rmse(user_item, t)
        for iter in t:
            error = self.U @ self.V.T
            error[user_item.nonzero()] -= 1
            self.U -= self.eps * (error @ self.V + self.lmbda * self.U)
            self.V -= self.eps * (error.T @ self.U + self.lmbda * self.V)
            self.rmse(user_item, t)
    
    def score(self, i, j):
        return self.U[i] @ self.V[j]

In [None]:
ALS_model = ALS(dim=64, iters=300, eps=1e-3, lmbda=1e-3)
ALS_model.fit(user_item_csr)

HBox(children=(FloatProgress(value=0.0, max=300.0), HTML(value='')))




In [None]:
get_movies(ALS_model.similar_items(0))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Toy Story (1995),Animation|Children's|Comedy
3113,Toy Story 2 (1999),Animation|Children's|Comedy
587,Aladdin (1992),Animation|Children's|Comedy|Musical
2320,Pleasantville (1998),Comedy
2760,"Iron Giant, The (1999)",Animation|Children's
2293,Antz (1998),Animation|Children's
2354,"Bug's Life, A (1998)",Animation|Children's|Comedy
594,Beauty and the Beast (1991),Animation|Children's|Musical
1906,Mulan (1998),Animation|Children's
363,"Lion King, The (1994)",Animation|Children's|Musical


In [None]:
get_movies(ALS_model.user_history(3, user_item_csr))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
259,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi
479,Jurassic Park (1993),Action|Adventure|Sci-Fi
1035,Die Hard (1988),Action|Thriller
1096,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi
1197,Raiders of the Lost Ark (1981),Action|Adventure
1200,"Good, The Bad and The Ugly, The (1966)",Action|Western
1213,Alien (1979),Action|Horror|Sci-Fi|Thriller
1239,"Terminator, The (1984)",Action|Sci-Fi|Thriller
1386,Jaws (1975),Action|Horror
1953,Rocky (1976),Action|Drama


In [None]:
get_movies(ALS_model.recommend(3, user_item_csr))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1290,Indiana Jones and the Last Crusade (1989),Action|Adventure
588,Terminator 2: Judgment Day (1991),Action|Sci-Fi|Thriller
456,"Fugitive, The (1993)",Action|Thriller
1303,Butch Cassidy and the Sundance Kid (1969),Action|Comedy|Western
1195,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Drama|Sci-Fi|War
1199,Aliens (1986),Action|Sci-Fi|Thriller|War
2570,"Matrix, The (1999)",Action|Sci-Fi|Thriller
3526,Predator (1987),Action|Sci-Fi|Thriller
1221,Full Metal Jacket (1987),Action|Drama|War
1952,"French Connection, The (1971)",Action|Crime|Drama|Thriller


### Задание 3. Не использую готовые решения, реализовать матричное разложение BPR на implicit данных

In [38]:
class BPR(SVD_SGD):
    def __init__(self, iters=1, eps=1e-3, lmbda=1e-4, **kargs):
        self.n_users = None
        self.n_items = None
        self.users = None
        self.pos_neg_items = None
        super().__init__(iters=iters, eps=eps, lmbda=lmbda, **kargs)

    def fit(self, user_item):
        self.n_users, self.n_items = user_item.shape
        self.U = np.random.uniform(0, 1/np.sqrt(self.dim), (self.n_users, self.dim))
        self.V = np.random.uniform(0, 1/np.sqrt(self.dim), (self.n_items, self.dim))
        t = trange(self.iters)
        self.users = np.unique(user_item.nonzero()[0])
        self.pos_neg_items = {}

        for u in self.users:
            pos_items = user_item[u].nonzero()[1]
            neg_items = list(set(range(self.n_items)) - set(pos_items))
            self.pos_neg_items[u] = (pos_items, neg_items)

        
        self.rmse(user_item, t)
        for iter in t:
            ti = tqdm(self.users)
            ti.set_description(f'Iteration {iter}')
            for iter2, u in enumerate(ti):
                pos_items, neg_items = self.pos_neg_items[u]
                for i in pos_items:
                    for j in np.random.choice(neg_items, size=5, replace=False):
                        U_u = self.U[u]
                        ex = 1./(1. + np.exp(self.score(u, i) - self.score(u, j)))
                        self.U[u] += self.eps * (ex * (self.V[i] - self.V[j]) - self.lmbda * U_u)
                        self.V[i] += self.eps * (ex * U_u - self.lmbda * self.V[i])
                        self.V[j] += self.eps * (ex * (- U_u) - self.lmbda * self.V[j])
                if not iter2 % 1000: self.rmse(user_item, ti)
            self.rmse(user_item, t)

    def rmse(self, user_item, t, size=20):
        idxs = user_item.nonzero()
        t.set_postfix({'rmse': np.linalg.norm((self.U @ self.V.T)[idxs] - user_item[idxs])/np.sqrt(len(idxs[0])),
             'loss': self.loss(user_item, size)})

    def loss(self, user_item, size):
        res = 0
        total = 0
        for u in np.random.choice(self.users, size=size, replace=False):
            pos_items, neg_items = self.pos_neg_items[u]
            for i in pos_items:
                for j in np.random.choice(neg_items, size=5, replace=False):
                    res += np.log(1. + np.exp(self.score(u, j) - self.score(u, i)))
                    total += 1
        return res / total

    
    def score(self, i, j):
        return self.U[i] @ self.V[j]

In [39]:
BPR_model = BPR(dim=64, iters=5, eps=1e-2, lmbda=1e-4)
BPR_model.fit(user_item_csr)

HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))





In [40]:
get_movies(BPR_model.similar_items(0))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Toy Story (1995),Animation|Children's|Comedy
1269,Back to the Future (1985),Comedy|Sci-Fi
1196,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance
3113,Toy Story 2 (1999),Animation|Children's|Comedy
2354,"Bug's Life, A (1998)",Animation|Children's|Comedy
1096,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi
2796,Big (1988),Comedy|Fantasy
2917,Ferris Bueller's Day Off (1986),Comedy
2986,Who Framed Roger Rabbit? (1988),Adventure|Animation|Film-Noir
2715,Ghostbusters (1984),Comedy|Horror


In [41]:
get_movies(BPR_model.user_history(3, user_item_csr))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
259,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi
479,Jurassic Park (1993),Action|Adventure|Sci-Fi
1035,Die Hard (1988),Action|Thriller
1096,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi
1197,Raiders of the Lost Ark (1981),Action|Adventure
1200,"Good, The Bad and The Ugly, The (1966)",Action|Western
1213,Alien (1979),Action|Horror|Sci-Fi|Thriller
1239,"Terminator, The (1984)",Action|Sci-Fi|Thriller
1386,Jaws (1975),Action|Horror
1953,Rocky (1976),Action|Drama


In [42]:
get_movies(BPR_model.recommend(3, user_item_csr))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1195,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Drama|Sci-Fi|War
592,"Silence of the Lambs, The (1991)",Drama|Thriller
2857,American Beauty (1999),Comedy|Drama
2761,"Sixth Sense, The (1999)",Thriller
857,"Godfather, The (1972)",Action|Crime|Drama
1209,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War
588,Terminator 2: Judgment Day (1991),Action|Sci-Fi|Thriller
2570,"Matrix, The (1999)",Action|Sci-Fi|Thriller
607,Fargo (1996),Crime|Drama|Thriller
1196,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance


### Задание 4. Не использую готовые решения, реализовать матричное разложение WARP на implicit данных

In [50]:
class WARP(BPR):
    def fit(self, user_item):
        self.n_users, self.n_items = user_item.shape
        self.U = np.random.uniform(0, 1/np.sqrt(self.dim), (self.n_users, self.dim))
        self.V = np.random.uniform(0, 1/np.sqrt(self.dim), (self.n_items, self.dim))
        t = trange(self.iters)
        self.users = np.unique(user_item.nonzero()[0])
        self.pos_neg_items = {}

        for u in self.users:
            pos_items = user_item[u].nonzero()[1]
            neg_items = list(set(range(self.n_items)) - set(pos_items))
            self.pos_neg_items[u] = (pos_items, neg_items)

        self.rmse(user_item, t)
        for iter in t:
            ti = tqdm(self.users)
            ti.set_description(f'Epoch {iter}')
            for iter2, u in enumerate(ti):
                pos_items, neg_items = self.pos_neg_items[u]
                for i in pos_items:
                    rank = 0
                    for j in np.random.permutation(neg_items):
                        rank += 1
                        if self.score(u, i) < self.score(u, j) + 1:
                            U_u = self.U[u]
                            w = np.log(len(neg_items)/rank)
                            self.U[u] += self.eps * (w * (self.V[i] - self.V[j]) - self.lmbda * U_u)
                            self.V[i] += self.eps * (w * U_u - self.lmbda * self.V[i])
                            self.V[j] += self.eps * (w * (- U_u) - self.lmbda * self.V[j])
                            break
                if not iter2 % 500: self.rmse(user_item, ti)
            self.rmse(user_item, t)

In [52]:
warp = WARP(dim=64, iters=5, eps=1e-3, lmbda=1e-3)
warp.fit(user_item_csr)

HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6038.0), HTML(value='')))





In [53]:
get_movies(warp.similar_items(0))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Toy Story (1995),Animation|Children's|Comedy
3113,Toy Story 2 (1999),Animation|Children's|Comedy
587,Aladdin (1992),Animation|Children's|Comedy|Musical
2354,"Bug's Life, A (1998)",Animation|Children's|Comedy
363,"Lion King, The (1994)",Animation|Children's|Musical
1196,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance
1072,Willy Wonka and the Chocolate Factory (1971),Adventure|Children's|Comedy|Fantasy
1269,Back to the Future (1985),Comedy|Sci-Fi
2917,Ferris Bueller's Day Off (1986),Comedy
2796,Big (1988),Comedy|Fantasy


In [54]:
get_movies(warp.user_history(3, user_item_csr))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
259,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi
479,Jurassic Park (1993),Action|Adventure|Sci-Fi
1035,Die Hard (1988),Action|Thriller
1096,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi
1197,Raiders of the Lost Ark (1981),Action|Adventure
1200,"Good, The Bad and The Ugly, The (1966)",Action|Western
1213,Alien (1979),Action|Horror|Sci-Fi|Thriller
1239,"Terminator, The (1984)",Action|Sci-Fi|Thriller
1386,Jaws (1975),Action|Horror
1953,Rocky (1976),Action|Drama


In [55]:
get_movies(warp.recommend(3, user_item_csr))

Unnamed: 0_level_0,name,category
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1195,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Drama|Sci-Fi|War
857,"Godfather, The (1972)",Action|Crime|Drama
2570,"Matrix, The (1999)",Action|Sci-Fi|Thriller
1209,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War
2857,American Beauty (1999),Comedy|Drama
2761,"Sixth Sense, The (1999)",Thriller
588,Terminator 2: Judgment Day (1991),Action|Sci-Fi|Thriller
592,"Silence of the Lambs, The (1991)",Drama|Thriller
607,Fargo (1996),Crime|Drama|Thriller
317,"Shawshank Redemption, The (1994)",Drama
