### SVD approach of Recommender system

* Considering a host-resource matrix $M_{ij}$ (or the user-rating matrix in the context of usual recommender system) with the matrix element being the #DL.

*  In the simplest SVD-based recommender system, we SVD-decompose
\begin{align}
M_{ij} = \sum_k U_{ik}D_{kk}V^T_{kj},
\end{align}
which represents the user preference, the "rating", of user-$i$ on the data-$j$.

* As $M$ is highly sparse, the practical solution of $M$ is to solve the optimization problem
\begin{equation}
min_{H_i, R_j} \sum_{p_{ij}\neq 0} (M_{ij}-H_i\cdot R_j)^2,
\end{equation}
Here, $H_i$ and $R_j$ reflect feature vector of host-$i$ and item-$j$.
 
* Use the iterative FunkSVD method based on https://github.com/gbolmier/funk-svd.

* TODO: Solve the minimize problem via neuron network....


In [1]:
#!/usr/bin/python
import os
import csv
import numpy as np
import matplotlib.pyplot as plt
import scipy.sparse
from collections import Counter
%matplotlib inline

MAT_FILE = 'hr_mat.dat'

def load_hr_mat(fname): 
    """
    Load "Host-Resource-#DL table to build the sparse matrix"
    """
    #data = np.genfromtxt(fname, delimiter=",")
    data = np.loadtxt(open(fname, "r"), delimiter=",", skiprows=0, dtype=int)
    return data

def build_sparse_mat(d):
    """
    Assume all index >= 0
    """
    hid = d[:,0]
    rid = d[:,1]
    dd  = d[:,2]
    smat = scipy.sparse.csr_matrix( ( dd.astype(float), (hid.astype(int), rid.astype(int))) )
    print ("Data shape: ", d.shape, smat.shape)
    print("Max/Min host: ", np.min(hid), np.max(hid) )
    print("Max/Min item: ", np.min(rid), np.max(rid) )          
    print("Max/Min data: ", np.min(dd), np.max(dd) )
    return smat

def svd(train, k):
    utilMat = np.array(train)
    # the nan or unavailable entries are masked
    mask = np.isnan(utilMat)
    masked_arr = np.ma.masked_array(utilMat, mask)
    item_means = np.mean(masked_arr, axis=0)
    # nan entries will replaced by the average rating for each item
    utilMat = masked_arr.filled(item_means)
    x = np.tile(item_means, (utilMat.shape[0],1))
    # we remove the per item average from all entries.
    # the above mentioned nan entries will be essentially zero now
    utilMat = utilMat - x
    # The magic happens here. U and V are user and item features
    U, s, V=np.linalg.svd(utilMat, full_matrices=False)
    s=np.diag(s)
    # we take only the k most significant features
    s=s[0:k,0:k]
    U=U[:,0:k]
    V=V[0:k,:]
    UsV = np.dot(np.dot(U,s), V)
    UsV = UsV + x
    print("svd done")
    return UsV

In [2]:
#### start SVD
 
import  scipy.sparse.linalg
data = load_hr_mat(MAT_FILE)
smat = build_sparse_mat(data)


global_average = smat.sum()/smat.count_nonzero()
print ("Avg #DL over all item: ", global_average)


# Simple SVD approach for recommendation
U, sigma, Vt = scipy.sparse.linalg.svds(smat, k = 128)
smat1 = U.dot(np.diag(sigma)).dot(Vt)

max_dl = smat.max()
print("Max DL: ", max_dl)

Data shape:  (4146, 3) (3459, 722)
Max/Min host:  0 3458
Max/Min item:  0 721
Max/Min data:  1 546
Avg #DL over all item:  4.6685962373371925
Max DL:  546.0


In [3]:
from funk_svd.svd import SVD

svd = SVD(learning_rate=0.001, regularization=0.005, n_epochs=1000, n_factors=15, min_rating=1, max_rating=max_dl)
svd.fit(X=data, shuffle=True)

Preprocessing data...

Epoch 1/1000
Epoch 101/1000
Epoch 201/1000
Epoch 301/1000
Epoch 401/1000
Epoch 501/1000
Epoch 601/1000
Epoch 701/1000
Epoch 801/1000
Epoch 901/1000


<funk_svd.svd.SVD at 0x8701d08>

In [5]:
### Good fitting
for d in data[0:10,:]:
    print( d[2], '\t', svd.predict_pair(d[0], d[1]) )

30 	 29.961953021278504
1 	 1.0294584912473914
4 	 3.9959279450279652
1 	 1.0361580777980839
1 	 1.0124776420353305
1 	 1.0132412270572957
27 	 26.977620245407586
2 	 2.0033780241359573
1 	 1.0321331873872026
1 	 1.0291950203018034


In [10]:
### For a new user,  some new nser
print(svd.predict_pair(1132111, d[2]) )

6.792553944988244
