# 推薦模型 - 矩陣分解

在之前的介紹裡面，相似度關係是利用交易資料所形成的向量空間，而對兩兩用戶(或物品)來求得的。但是實務上使用這個方法會碰到問題：

1. 維度過於龐大與稀疏，例如電子商務網站會有上百萬個用品。而單一用戶往往只有很少量(數個）與物品的交互關係(交易紀錄interaction data)此時會形成[維度災難](https://zh.wikipedia.org/wiki/%E7%BB%B4%E6%95%B0%E7%81%BE%E9%9A%BE)，亦即在維度過大的情況下，所有的東西(人)相似度趨近於0(距離無窮遠)
2. 計算用戶（或物品)的兩兩關係會隨著用戶(物品）增加而耗時成指數增加

矩陣分解目的是解決**維度災難**的手法
## 矩陣分解

在矩陣分解的想法裡面，把用戶與物品的交易矩陣，用一種線性關係來逼近，

$$r_{ui} = \textbf{x}_u^T \cdot \textbf{y}_i $$

代表的意思為，用戶購買某商品的背後，有一種隱性特徵來決定購買的權重。而每個商品有關於這個隱性特徵的比例。這樣的線性關係，恰恰決定了用戶對某商品的打分。

- 舉例來說: 小明會給**超人特攻隊** `評分=5`, 原因可能是這部片背後有三種特徵: $\textbf{y}_{超人特攻隊}=$ `{ 恐怖：0, 喜劇：2, 卡通:3 }`,而小明對這三種特徵的喜好程度分別是: $\textbf{x}_{小明}$ = `{喜愛恐怖:0, 喜愛喜劇:0.9, 喜愛卡通: 1}`。按照矩陣分解的想法：

$$r_{小明-超人特攻隊} = \textbf{x}_{小明}^T \cdot \textbf{y}_{超人特攻隊} = 1.8 + 3 = 4.8 \approx 5$$

- 損失函數可寫成 
$$
L = \sum_{u,i \in S} \left(r_{ui} - \textbf{x}_u \cdot \textbf{y}_i\right)^2 + \lambda_x \sum_u \left\Vert \textbf{x}_u \right\Vert ^2  + \lambda_y \sum_i \left\Vert \textbf{y}_i \right\Vert ^2 $$

> 集合$s$表示有評分的物件(交互作用),$x_u,y_i$分別表示用戶$u$(物品$i$)的向量表示,$\lambda_x,\lambda_y$表示regularization

## explicit ALS 算法

- 要最小化此損失函數，可以先固定 $\textbf{y}_i$為常數對另一變數$\textbf{x}_u$進行微分，並另其為0求得關係...
- 相似的固定 $\textbf{x}_u$為常數，對另一變數$\textbf{y}_i$進行微分。
- 重複上述動作直到收斂

上述過程稱為Alternative Least Square (ALS)算法，由於物標函數是評價分數(1-5)的明顯用戶回饋分數，所以稱為explicit ALS算法。概念上的框架是基於矩陣分解，而針對兩組方程用交互迭代的方式取得收斂。

## Implicit ALS 算法

其實很常見的狀況是無從知道到底用戶對商品的評價是什麼，只能隱約猜測有買過的東西，對其偏好(preference)程度較高。但是對於沒有買過的商品，沒有購買存有兩種可能性
1. 不喜歡此類商品
2. 未察覺此商品

在此篇[論文](http://yifanhu.net/PUB/cf.pdf)中提出信心程度的想法，將explicit ALS做進一步的改良。


\begin{equation}
L = \sum_{u,i \in all} c_{ui}\left(p_{ui} - \textbf{x}_u \cdot \textbf{y}_i\right)^2 + \lambda_x \sum_u \left\Vert \textbf{x}_u \right\Vert ^2  + \lambda_y \sum_i \left\Vert \textbf{y}_i \right\Vert ^2 
\end{equation}


其中$p_{ui}$為喜歡或不喜歡${0,1}$之偏好，而$c_{ui}$代表說明用戶$u$對商品$i$之說明喜歡(或不喜歡)的信心程度，數值愈高代表信心程度愈大。與之前僅考慮有交互作用($\in S$)的情況不同，需要考慮所有未購買的狀況($\in all_{ui}$)。類似explicit解法，可以透過固定其中一個變數$\textbf{Y}_i$，微分後為零求得解析解(此解代表能使目標函數最小化)


$$
\textbf{X}_u = \left( \textbf{Y}^T\textbf{C}^u\textbf{Y}  + \lambda \textbf{I}\right)^{-1} \textbf{Y}^T\textbf{C}^uP(u)
$$

>若用戶有$m$個,商品有$n$個

> $\textbf{X}_u$代表用戶u的特徵向量($\in f\times 1$)

> $\textbf{Y}$為物品特徵向量縱向堆疊(`vstack`)的矩陣($\in n\times f$)

> $\textbf{C}^u$為對角線上才有值的$n \times n$矩陣

> $P(u)\in\mathbf{R}^{n\times 1}$包含每個用戶的喜好(1 or 0)二元結果

同理可以取得
$$
\textbf{Y}_i = \left( \textbf{X}^T\textbf{C}^i\textbf{X}  + \lambda \textbf{I}\right)^{-1} \textbf{X}^T\textbf{C}^iP(i)
$$



### 計算效能:

在上式中每個用戶的特徵向量$\textbf{X}_u$取得，必須依賴於

1. $\textbf{Y}^T\textbf{C}^u\textbf{Y},$需耗時$\mathcal{O}(f^2n)$
2. $\textbf{Y}^T\textbf{C}^u P(u)$ 其中$P(u)$大部分為零，除了少數有與商品作用的$\textit{u}_n$個人($\textit{u}_n \ll n$)
3. 反矩陣$\left( \textbf{X}^T\textbf{C}^i\textbf{X}  + \lambda \textbf{I}\right)^{-1}$

$\textbf{Y}^T\textbf{C}^u\textbf{Y}$進一步寫成$\textbf{Y}^T\left( \textbf{C}^u - 1\right)\textbf{Y} + \textbf{Y}^T\textbf{Y}$後一項不依賴於$u$僅需計算一次(不需要在用戶迴圈之中)，而前一項中的$\textbf{C}^u - 1$只有$\textit{u}_n$個非零項，可以大幅簡化計算為$\mathcal{O}(f^2\textit{n}_u)$。


* 詳細推倒看[這裡](http://datamusing.info/blog/2015/01/07/implicit-feedback-and-collaborative-filtering/)使用Dirac notation,或[這裡](https://math.stackexchange.com/questions/1072451/analytic-solution-for-matrix-factorization-using-alternating-least-squares/1073170#1073170)
_____

# 實作3D模型資料
## 前處理

In [1]:
import numpy as np 
import pandas as pd
import csv
import sys
from tqdm import tqdm
sys.path.append('../')

In [2]:
from rec_helper import *

In [3]:
df = pd.read_csv('../rec-a-sketch/model_likes_anon.psv',
                 sep='|',quotechar='\\',quoting=csv.QUOTE_MINIMAL)
print(df.count())
df.drop_duplicates(inplace=True)
print(df.count())
df = threshold_interaction(df,rowname='uid',colname='mid',row_min=5,col_min=10)
inter,uid_to_idx,idx_to_uid,mid_to_idx,idx_to_mid=df_to_spmatrix(df,'uid','mid')
train,test, user_idxs = train_test_split(inter,split_count=1,fraction=0.2)

modelname    632832
mid          632832
uid          632832
dtype: int64
modelname    632677
mid          632677
uid          632677
dtype: int64
Starting interactions info
Number of rows: 62583
Number of cols: 28806
Sparsity: 0.04%
Ending interactions info
Number of rows: 13496
Number of columns: 13618
Sparsity: 0.25%


## implicit ALS算法

In [4]:
def alternating_least_squares(Cui, factors, regularization, iterations=20):
    users, items = Cui.shape

    X = np.random.rand(users, factors) * 0.01
    Y = np.random.rand(items, factors) * 0.01

    Ciu = Cui.T.tocsr()
    for iteration in range(iterations):
        X,Y = least_squares(Cui, X, Y, regularization)
        Y,X = least_squares(Ciu, Y, X, regularization)
        print('iter:{}'.format(iteration))

    return X, Y

In [5]:
def least_squares(Cui, X, Y, regularization):
    users, factors = X.shape
    YtY = Y.T.dot(Y)

    for u in range(users):
        # accumulate YtCuY + regularization * I in A
        A = YtY + regularization * np.eye(factors)

        # accumulate YtCuPu in b
        b = np.zeros(factors)
#         if u % 1000 == 0:
#             print(u)
        for i in Cui[u,:].indices:
            confidence = Cui[u,i]
            factor = Y[i]
            A += (confidence - 1) * np.outer(factor, factor)
            b += confidence * factor

        # Xu = (YtCuY + regularization * I)^-1 (YtCuPu)
        X[u] = np.linalg.solve(A, b)
    return X,Y

In [6]:
users_embedding, items_embedding = alternating_least_squares(train,50,regularization=1,iterations=10) # time consuming : 15~20 min

iter:0
iter:1
iter:2
iter:3
iter:4
iter:5
iter:6
iter:7
iter:8
iter:9


In [7]:
items_embedding.shape

(13618, 50)

In [9]:
class TopRelated:
    ## 利用向量內積，查找最鄰近的物品(cosine based)
    def __init__(self, items_factors):
        ## 初始化需要正規化物品向量
        norms = np.linalg.norm(items_factors, axis=1)
        self.factors = items_factors / norms[:, np.newaxis]

    def get_related(self, itemid, N=10):
        scores = self.factors.dot(self.factors[itemid]) # cosine 
        best = np.argpartition(scores, -N)[-N:] # partion --> 小於此的放在左側
        return sorted(zip(best, scores[best]), key=lambda x: -x[1])

In [9]:
top_related = TopRelated(items_embedding)
top_related.get_related(10)

[(10, 1.0000000000000002),
 (73, 0.8333870013907535),
 (51, 0.64537707303043956),
 (11590, 0.62598405351397579),
 (56, 0.62092298718015437),
 (7, 0.61313760379281323),
 (19, 0.58468685040231083),
 (13005, 0.58408578340083528),
 (145, 0.56861429220694504),
 (11270, 0.5640808270205353)]

In [10]:
import annoy

In [11]:
class ApproximateTopRelated:
    def __init__(self, items_factors, treecount=20):
        index = annoy.AnnoyIndex(items_factors.shape[1], 'angular')
        for i, row in enumerate(items_factors):
            index.add_item(i, row)
        index.build(treecount)
        self.index = index

    def get_related(self, itemid, N=10):
        neighbours = self.index.get_nns_by_item(itemid, N)
        return sorted(((other, 1 - self.index.get_distance(itemid, other))
                      for other in neighbours), key=lambda x: -x[1])

In [12]:
approx_topRelated_item = ApproximateTopRelated(items_embedding)

In [13]:
approx_topRelated_item.get_related(10)

[(10, 1.0),
 (73, 0.42274248600006104),
 (51, 0.15783262252807617),
 (56, 0.12927967309951782),
 (7, 0.12038373947143555),
 (19, 0.08861315250396729),
 (13005, 0.08795374631881714),
 (11270, 0.0662771463394165),
 (1152, 0.05169880390167236),
 (107, 0.02649134397506714)]

# 評估

## item based

In [14]:
train[0,].indices

array([   0,   40,   48,   60,   63,  110,  111,  131,  167,  258,  308,
        315,  331,  404,  431,  445,  464,  504,  560,  741,  812,  821,
       1347, 1410, 1778, 1909, 2253, 2723, 3545, 3762, 4134, 4713, 4861,
       8093, 8780], dtype=int32)

In [15]:
from collections import defaultdict

In [16]:
def topNrecommend_ibcf(uid, items_factor, inter,nn=10,topN=10):
    top_related = TopRelated(items_factor)
    topN_dict = defaultdict(int)
    for item in inter[uid,].indices:
        topn_items = top_related.get_related(item,N=nn) ## cosine 相似
        for k,v in topn_items:
            topN_dict[k] += v
            
    sort_ids = sorted(topN_dict, key=topN_dict.get, reverse=True)[:topN] # sorted itemid by scores
    scores = [topN_dict[e] for e in sort_ids]
    return zip(sort_ids,scores)

用戶0推薦...

In [17]:
topn_items= topNrecommend_ibcf(uid=0,items_factor=items_embedding,inter=train)

In [18]:
list(topn_items)

[(76, 2.7163962787377449),
 (79, 1.8501942739938391),
 (167, 1.8205602844097419),
 (431, 1.8205602844097419),
 (756, 1.8147976840556201),
 (308, 1.6252744135161303),
 (504, 1.6214821746618866),
 (741, 1.6214821746618866),
 (1778, 1.5858867473992451),
 (622, 1.4542721502363598)]

In [19]:
def evaluate(train, test,user_idxs,items_factors=None,users_factors=None,nn=50, kind='ibcf'):
    hits = 0
    for user in tqdm(user_idxs):
        ## recommend topn items
        if kind == 'ibcf':
            topn_items = topNrecommend_ibcf(user, items_factors, inter=train,nn=nn)
        elif kind =='ubcf':
            topn_items = topNrecommend_ubcf(user, users_factors, inter=train, nn=nn)
        elif kind == 'inner':
            innerProduct = items_factors.dot(users_factors[user])
            topn_k = np.argsort(-innerProduct)[:10]
            topn_v = [innerProduct[e] for e in topn_k]
            topn_items = zip(topn_k,topn_v)
        ## real(test) item -- only 1 data exist in each test user
        y_item = test[user].indices
        score = 1 if y_item in list(zip(*topn_items))[0] else 0
        hits += score
    return hits/len(user_idxs)

In [20]:
evaluate(train, test, user_idxs, items_factors=items_embedding,kind='ibcf') ## 5.4 %

100%|██████████| 2699/2699 [01:09<00:00, 39.09it/s]


0.05557613931085587

## user based

In [21]:
def topNrecommend_ubcf(uid, users_factor, inter,nn=10,topN=10):
    top_related = TopRelated(users_factor)
    topN_dict = defaultdict(float)           
    topn_users = top_related.get_related(uid,N=nn) ## cosine 相似
    for top_u,v in topn_users:
        ## top_u買什麼
        for item in inter[top_u,].indices:
            topN_dict[item] += v

    sort_ids = sorted(topN_dict, key=topN_dict.get, reverse=True)[:topN] # sorted itemid by scores
    scores = [topN_dict[e] for e in sort_ids]
    return zip(sort_ids,scores)

In [22]:
list(topNrecommend_ubcf(0,users_embedding, train, nn=50, topN=10))

[(0, 34.410089672904441),
 (44, 12.561978619472963),
 (18, 7.5103976826143359),
 (28, 6.9053208293802317),
 (84, 6.3513422315909036),
 (31, 6.0606690420704021),
 (131, 5.9223405961915061),
 (11, 4.6633340311155704),
 (167, 4.4891150115483809),
 (48, 4.3329760554714234)]

In [23]:
evaluate(train,test,user_idxs,users_factors=users_embedding, kind='ubcf', nn=50) # 8.1 %

100%|██████████| 2699/2699 [00:35<00:00, 76.42it/s]


0.09447943682845499

## Inner Product 

In [24]:
evaluate(train,test,user_idxs,users_factors=users_embedding,items_factors=items_embedding, kind='inner', nn=50) # 8.1 %

100%|██████████| 2699/2699 [00:04<00:00, 542.85it/s]


0.07336050389032975

____

## Implicit
利用[implicit](https://github.com/benfred/implicit)套件來做

In [4]:
train64 = train.astype(np.float64)

In [5]:
import implicit
# initialize a model
model = implicit.als.AlternatingLeastSquares(factors=50,regularization=0.01)

# train the model on a sparse matrix of item/user/confidence weights
model.fit(train64.T)

In [6]:
# recommend items for a user
user_items = train64.tocsr()
recommendations = model.recommend(0, user_items)

# find related items
related = model.similar_items(itemid=0)

In [7]:
recommendations

[(5, 0.20515807627064975),
 (18, 0.1918636748483562),
 (28, 0.14988751674773698),
 (9, 0.14595963433612891),
 (31, 0.14067785135181021),
 (11, 0.13030475993688759),
 (4, 0.11807272315328837),
 (44, 0.11471354006636585),
 (76, 0.10809321407142497),
 (38, 0.10488577635996862)]

In [29]:
related

[(0, 0.99999999999999978),
 (28, 0.83579808129182787),
 (8246, 0.73776182575596472),
 (170, 0.72493331824975293),
 (45, 0.713338619042862),
 (53, 0.65034504543970606),
 (89, 0.64663699468764868),
 (12196, 0.63310832187945953),
 (38, 0.6325378694017737),
 (31, 0.60813921551971239)]

In [30]:
## evaluate 
evaluate(train,test,user_idxs,items_factors=model.item_factors, users_factors=model.user_factors,nn=50,kind='inner') # 6.26%

100%|██████████| 2699/2699 [00:04<00:00, 551.27it/s]


0.07224898110411264

In [31]:
exp = model.explain(userid=0,user_items=user_items,itemid=18)
exp

(0.19693425851085267,
 [(111, 0.0300435060968114),
  (0, 0.028031677280523037),
  (48, 0.024779612468137559),
  (167, 0.019750762877051669),
  (60, 0.01781029803664547),
  (431, 0.015441959379479299),
  (40, 0.015085249725167314),
  (404, 0.011380698781046616),
  (331, 0.0062267935471592767),
  (2253, 0.0060957748716609994)],
 (array([[ 1.77798619,  0.25830302,  0.34579991, ...,  0.31545459,
           0.20913901,  0.337622  ],
         [ 0.45925921,  1.95950158,  0.29607366, ...,  0.40830768,
           0.45751832,  0.32250159],
         [ 0.61482746,  0.66947797,  1.7959922 , ...,  0.2782589 ,
           0.18997199,  0.24979609],
         ..., 
         [ 0.56087391,  0.88156242,  0.72972414, ...,  1.67936275,
           0.07193199,  0.05288486],
         [ 0.37184627,  0.95052911,  0.54896758, ...,  0.72735494,
           1.6755984 ,  0.02354348],
         [ 0.60028725,  0.71915116,  0.66086571, ...,  0.69419078,
           0.5268558 ,  1.71609497]]), False))

In [32]:
def evaluate_model(train,test,model,user_idxs):
    hits = 0
    for user in tqdm(user_idxs):
        topn_items = model.recommend(user,train)
        y_item = test[user].indices
        score = 1 if y_item in list(zip(*topn_items))[0] else 0
        hits += score
    return hits/len(user_idxs)

In [33]:
## 移除已推薦過的
evaluate_model(train,test,model,user_idxs) # 7.3%

100%|██████████| 2699/2699 [00:02<00:00, 1015.34it/s]


0.09040385327899222

___

## 調參數
grid search

In [34]:
from sklearn.metrics import mean_squared_error
import itertools
import copy

In [35]:
def calculate_recall(model,train,test,user_idxs):
    """
    train: (csr_matrix) -- should be float64 
        u-i sparse matrix for training 
    model: (implicit)
        implicit model 
    user_idxs: (list)
        user idxs for test 
    """
    hits = 0
    for user in user_idxs:
        topn_items = model.recommend(user,train)
        y_item = test[user].indices ##  1 data each user (in my case)
        score = 1 if y_item in list(zip(*topn_items))[0] else 0
        hits += score
    return hits/len(user_idxs)

def grid_search_learning_curve(model,train,test,param_grids,user_idxs):
    curves = []
    keys,values = zip(*param_grids.items())
    for value in itertools.product(*values):
        params = dict(zip(keys,value))
        this_model = copy.deepcopy(model)
        for k,v in params.items():
            setattr(this_model,k,v)
        this_model.fit(train64.T)
        recall = calculate_recall(this_model,train,test,user_idxs)
        print('factors:{}, regularization:{}, recall:{:.2f}%'.format(this_model.factors,this_model.regularization,recall*100))
        curves.append({'params': params,                       
                       'recall@test': recall})
    return curves

In [36]:
param_grids = {
    'factors':[50,75,100],
    'regularization':[0,1e-3,1e-2,1e-1,1e1,1e2]
}
curves = grid_search_learning_curve(model,train,test,param_grids,user_idxs)

factors:50, regularization:0, recall:9.26%
factors:50, regularization:0.001, recall:9.26%
factors:50, regularization:0.01, recall:9.30%
factors:50, regularization:0.1, recall:9.26%
factors:50, regularization:10.0, recall:9.04%
factors:50, regularization:100.0, recall:3.45%
factors:75, regularization:0, recall:9.26%
factors:75, regularization:0.001, recall:9.26%
factors:75, regularization:0.01, recall:9.30%
factors:75, regularization:0.1, recall:9.26%
factors:75, regularization:10.0, recall:9.04%
factors:75, regularization:100.0, recall:3.45%
factors:100, regularization:0, recall:9.26%
factors:100, regularization:0.001, recall:9.26%
factors:100, regularization:0.01, recall:9.30%
factors:100, regularization:0.1, recall:9.26%
factors:100, regularization:10.0, recall:9.04%
factors:100, regularization:100.0, recall:3.45%


In [37]:
sorted(curves,key=lambda x:x['recall@test'], reverse=True)

[{'params': {'factors': 50, 'regularization': 0.01},
  'recall@test': 0.09299740644683216},
 {'params': {'factors': 75, 'regularization': 0.01},
  'recall@test': 0.09299740644683216},
 {'params': {'factors': 100, 'regularization': 0.01},
  'recall@test': 0.09299740644683216},
 {'params': {'factors': 50, 'regularization': 0},
  'recall@test': 0.09262689885142646},
 {'params': {'factors': 50, 'regularization': 0.001},
  'recall@test': 0.09262689885142646},
 {'params': {'factors': 50, 'regularization': 0.1},
  'recall@test': 0.09262689885142646},
 {'params': {'factors': 75, 'regularization': 0},
  'recall@test': 0.09262689885142646},
 {'params': {'factors': 75, 'regularization': 0.001},
  'recall@test': 0.09262689885142646},
 {'params': {'factors': 75, 'regularization': 0.1},
  'recall@test': 0.09262689885142646},
 {'params': {'factors': 100, 'regularization': 0},
  'recall@test': 0.09262689885142646},
 {'params': {'factors': 100, 'regularization': 0.001},
  'recall@test': 0.0926268988514

In [38]:
# At the beginning of the notebook
## https://stackoverflow.com/questions/18786912/get-output-from-the-logging-module-in-ipython-notebook/28195348
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

In [68]:
model = implicit.als.AlternatingLeastSquares(factors=50,regularization=0.01,calculate_training_loss=True)
model.fit(train64.T)

____

## 工人智慧
看看結果

In [10]:
top_related = TopRelated(model.item_factors)
top_related_itmes = top_related.get_related(itemid=0)

In [11]:
import requests
def get_thumbnails(top_related_items, idx, idx_to_mid, N=10):
#     row = sim[idx, :].A.ravel()
    topNitems,scores = zip(*top_related_items.get_related(idx))
    thumbs = []
    for x in topNitems:         
        response = requests.get('https://sketchfab.com/i/models/{}'.format(idx_to_mid[x])).json()
        thumb = [x['url'] for x in response['thumbnails']['images']]
#         print(thumb)
#         thumb = [x['url'] for x in response['thumbnails']['images'] if x['width'] == 200 and x['height']==200]
        if not thumb:
            print('no thumbnail')
        else:
            thumb = thumb[-2]
        thumbs.append(thumb)
    return thumbs

In [13]:
thumbs = get_thumbnails(top_related, idx=0, idx_to_mid=idx_to_mid)

In [14]:
from IPython.display import HTML, display

In [15]:
def display_item(thumbs,origin_id,N=5):
    try: 
        print('原圖======')
        thumb_html = '<img src='+ '\"'+thumbs[0]+'\">' 
        
    except TypeError:
        print('oops, 找不到小圖!!!')
        response = requests.get('https://sketchfab.com/i/models/{}'.format(idx_to_mid[origin_id])).json()
        thumb = [x['url'] for x in response['thumbnails']['images']][-2]
        thumb_html = '<img src= "{}"/>'.format(thumb)
        print('稍大的圖====')
    for url in thumbs[1:]:        
        if url:
            thumb_html += """ <img style='width:120px;margin:0px;float:left;border:1px solid black;' src='{}' />""".format(url)            
    return thumb_html

In [16]:
HTML(display_item(thumbs,0))



In [25]:
idx2 = 200
thumbs2 = get_thumbnails(top_related,idx=idx2,idx_to_mid=idx_to_mid)

In [26]:
HTML(display_item(thumbs2,idx2))



# 小結

矩陣分解利用一組用戶特徵向量與物品特徵向量之內積 $\textbf{X}_u \cdot \textbf{y}_i $，來推測每個人對商品的喜好程度，對於物品集較大的問題能有效解決維度爆炸的問題。但是在小樣本（$<10^5$）中的3D模型的問題集裡面，上一次的暴力解法卻更為準確。利用`implicit`套件能很快的取得特徵向量與建立模型，還有兩個實務上需要提及的問題:

1. 儘管能取得特徵向量，在計算最近鄰的商品/人時候，常會遇到必須把N個物品遍歷一次,才能找到最相似的人/商品。
2. 在給用戶推薦的時候，需要作內積計算，再對所有商品作排序。這個過程對百萬級別的在線實時推薦商品，會變成不可行（效能問題）。
3. 只有透過用戶的交易行為來推薦，對於物品的特徵（特性）完全忽視。

這三個問題將會放在下回來說明討論。