## Recommendation system:
$$ Utility Matrix \mapsto Model \mapsto Suggestion $$
- Model:
>- Content-based systems
>- Collaborative filtering

### Content-based systems
$$ Utility Matrix \mapsto find~w_m(y_mn = w_mx_n+b_n) \mapsto Suggestion(w_m) $$
- w: is weight of user(m)
- x: vector information of movie(n)
- y: rating of user(m) for movie(n)

In [161]:
# Get information about movie
#----------------------------------------------#

import pandas as pd

item_cols = ['movie id', 'movie title' ,'release date','video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
 'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

items = pd.read_csv('./data/u.item', sep='|', names=item_cols, encoding='latin-1')

In [162]:
items = items.as_matrix()[:, -19:]

In [163]:
from sklearn.feature_extraction.text import TfidfTransformer

transformer = TfidfTransformer(smooth_idf=True, norm ='l2')
tfidf = transformer.fit_transform(items.tolist()).toarray()

In [164]:
# create rating score by movie and user
#----------------------------------------------#

r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']

ratings_base = pd.read_csv('./data/ua.base', sep='\t', names=r_cols, encoding='latin-1')
ratings = ratings_base.pivot_table(index=['user_id'], values=['rating'], columns='movie_id')

In [165]:
def get_movie_list(df, user_id):
    user_m = df.loc[[user_id]].dropna(axis=1)
    movie_list = []
    for movie in user_m.columns.values:
        movie_list.append(movie[1])
    final_list=[x - 1 for x in movie_list]
    return final_list

In [176]:
# find matrix W
#----------------------------------------------#

from sklearn.linear_model import Ridge


d = tfidf.shape[1]
n_users = np.shape(ratings)[0]
W = np.zeros((d, n_users))
b = np.zeros((1, n_users))
for i in range(n_users):
    user_id = i + 1
    movie_list = get_movie_list(ratings, user_id)
    X = tfidf[movie_list,:]
    y = user_m = ratings.loc[[user_id]].dropna(axis=1).values[0]
    rdg = Ridge(alpha=0.01, fit_intercept  = True)
    rdg.fit(X, y)
    W[:, i] = rdg.coef_
    b[0, i] = rdg.intercept_