# Movielens 영화 추천

Movielens?

가정1 : 별점을 **시청 횟수**로 해석해서 고려   
가정2 : 유저가 **3점 미만**으로 준 데이터는 **선호하지 않는다**고 가정

In [1]:
import pandas as pd
import os
import numpy as np
import random
import seaborn as sns
from scipy.sparse import csr_matrix
from implicit.als import AlternatingLeastSquares

## 1. 데이터 준비와 전처리

데이터 불러오기 

(1) **ratings.dat**

In [2]:
rating_file_path=os.getenv('HOME') + '/aiffel/recommendata_iu/data/ml-1m/ratings.dat'
ratings_cols = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_csv(rating_file_path, sep='::', names=ratings_cols, engine='python', encoding = "ISO-8859-1")
orginal_data_size = len(ratings)
ratings

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291
...,...,...,...,...
1000204,6040,1091,1,956716541
1000205,6040,1094,5,956704887
1000206,6040,562,5,956704746
1000207,6040,1096,4,956715648


결측치 확인

In [3]:
ratings.isnull().sum()

user_id      0
movie_id     0
rating       0
timestamp    0
dtype: int64

임의의 유저 선택 후 데이터 확인하기

In [4]:
condition = (ratings['user_id']==ratings.loc[0,'user_id'])
ratings.loc[condition]

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291
5,1,1197,3,978302268
6,1,1287,5,978302039
7,1,2804,5,978300719
8,1,594,4,978302268
9,1,919,4,978301368


가정에 따라 **3점 이상**의 데이터만 남긴다.

In [5]:
# 3점 이상만 남깁니다.
ratings = ratings[ratings['rating']>=3]
filtered_data_size = len(ratings)

print(f'orginal_data_size: {orginal_data_size}, filtered_data_size: {filtered_data_size}')
print(f'Ratio of Remaining Data is {filtered_data_size / orginal_data_size:.2%}')

orginal_data_size: 1000209, filtered_data_size: 836478
Ratio of Remaining Data is 83.63%


**ratings.dat** 살펴보기

고유의 영화, 아이디 수

In [6]:
ratings['movie_id'].nunique()

3628

In [7]:
ratings['user_id'].nunique()

6039

가정에 따라 rating을 **count**로 생각한다.

In [8]:
# rating 컬럼의 이름을 count로 바꿉니다.
ratings.rename(columns={'rating':'count'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [9]:
ratings['count']

0          5
1          3
2          3
3          4
4          5
          ..
1000203    3
1000205    5
1000206    5
1000207    4
1000208    4
Name: count, Length: 836478, dtype: int64

#### timestamp?

user가 item에 대해서 평가한 시간

In [10]:
pd.to_datetime(ratings['timestamp'][0], unit='s')

Timestamp('2000-12-31 22:12:40')

In [11]:
pd.to_datetime(ratings['timestamp'][10000], unit='s')

Timestamp('2000-12-27 00:46:02')

In [12]:
pd.to_datetime(1046454590, unit='s')

Timestamp('2003-02-28 17:49:50')

#### 956703932 ~ 1046454590 사이의 timestamp를 랜덤으로 선택하자.

In [13]:
ratings['timestamp'].sort_values(ascending=False)

825603     1046454590
825731     1046454548
825724     1046454548
825438     1046454443
825526     1046454320
              ...    
1000192     956703977
1000007     956703977
999873      956703954
1000153     956703954
1000138     956703932
Name: timestamp, Length: 836478, dtype: int64

___

(2) **movies.dat**

In [14]:
# 영화 제목을 보기 위해 메타 데이터를 읽어옵니다.
movie_file_path=os.getenv('HOME') + '/aiffel/recommendata_iu/data/ml-1m/movies.dat'
cols = ['movie_id', 'title', 'genre'] 
movies = pd.read_csv(movie_file_path, sep='::', names=cols, engine='python', encoding='ISO-8859-1')
movies.head()

Unnamed: 0,movie_id,title,genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


장르 종류

In [15]:
set(movies['genre'])

{'Action',
 'Action|Adventure',
 'Action|Adventure|Animation',
 "Action|Adventure|Animation|Children's|Fantasy",
 'Action|Adventure|Animation|Horror|Sci-Fi',
 "Action|Adventure|Children's",
 "Action|Adventure|Children's|Comedy",
 "Action|Adventure|Children's|Fantasy",
 "Action|Adventure|Children's|Sci-Fi",
 'Action|Adventure|Comedy',
 'Action|Adventure|Comedy|Crime',
 'Action|Adventure|Comedy|Horror',
 'Action|Adventure|Comedy|Horror|Sci-Fi',
 'Action|Adventure|Comedy|Romance',
 'Action|Adventure|Comedy|Sci-Fi',
 'Action|Adventure|Comedy|War',
 'Action|Adventure|Crime',
 'Action|Adventure|Crime|Drama',
 'Action|Adventure|Crime|Thriller',
 'Action|Adventure|Drama',
 'Action|Adventure|Drama|Romance',
 'Action|Adventure|Drama|Sci-Fi|War',
 'Action|Adventure|Drama|Thriller',
 'Action|Adventure|Fantasy',
 'Action|Adventure|Fantasy|Sci-Fi',
 'Action|Adventure|Horror',
 'Action|Adventure|Horror|Thriller',
 'Action|Adventure|Mystery',
 'Action|Adventure|Mystery|Sci-Fi',
 'Action|Adventure|Roma

데이터 접근이 쉽게 'title','genre'를 **소문자**로 변경

In [16]:
movies['title'] = movies['title'].str.lower()
movies['genre'] = movies['genre'].str.lower()
movies

Unnamed: 0,movie_id,title,genre
0,1,toy story (1995),animation|children's|comedy
1,2,jumanji (1995),adventure|children's|fantasy
2,3,grumpier old men (1995),comedy|romance
3,4,waiting to exhale (1995),comedy|drama
4,5,father of the bride part ii (1995),comedy
...,...,...,...
3878,3948,meet the parents (2000),comedy
3879,3949,requiem for a dream (2000),drama
3880,3950,tigerland (2000),drama
3881,3951,two family house (2000),drama


movie 검색해보기

연도까지 입력해야 해서 불편

In [17]:
movies.loc[movies['title'] == 'toy story 2 (1999)']

Unnamed: 0,movie_id,title,genre
3045,3114,toy story 2 (1999),animation|children's|comedy


In [18]:
movies.loc[movies['title'].str.contains('toy story')]

Unnamed: 0,movie_id,title,genre
0,1,toy story (1995),animation|children's|comedy
3045,3114,toy story 2 (1999),animation|children's|comedy


#### 내가 좋아하는 영화 검색

count도 임의로 설정한다.

1. Seven(1995) - id : 47, count = 5

In [19]:
movies.loc[movies['title'].str.contains('seven')]

Unnamed: 0,movie_id,title,genre
46,47,seven (se7en) (1995),crime|thriller
590,594,snow white and the seven dwarfs (1937),animation|children's|musical
1218,1237,"seventh seal, the (sjunde inseglet, det) (1957)",drama
1576,1619,seven years in tibet (1997),drama|war
1825,1894,six days seven nights (1998),adventure|comedy|romance
1950,2019,seven samurai (the magnificent seven) (shichin...,action|drama
1994,2063,seventh heaven (le septième ciel) (1997),drama|romance
2145,2214,number seventeen (1932),thriller
2169,2238,seven beauties (pasqualino settebellezze) (1976),comedy|drama
2194,2263,"seventh sign, the (1988)",thriller


2. Mission : Impossible(1996) - id : 648, count = 5

In [20]:
movies.loc[movies['title'].str.contains('mission')]

Unnamed: 0,movie_id,title,genre
642,648,mission: impossible (1996),action|adventure|mystery
2676,2745,"mission, the (1986)",drama
3285,3354,mission to mars (2000),sci-fi
3554,3623,mission: impossible 2 (2000),action|thriller
3817,3887,went to coney island on a mission from god... ...,drama


3. Toy Story (1995) - id : 1, count = 5

In [21]:
movies.loc[movies['title'].str.contains('toy story')]

Unnamed: 0,movie_id,title,genre
0,1,toy story (1995),animation|children's|comedy
3045,3114,toy story 2 (1999),animation|children's|comedy


4. The Silence of the Lambs (1991) - id : 593, count = 5

In [22]:
movies.loc[movies['title'].str.contains('lambs')]

Unnamed: 0,movie_id,title,genre
589,593,"silence of the lambs, the (1991)",drama|thriller


5. Fight Club (1999) - id : 2959, count = 4

In [23]:
movies.loc[movies['title'].str.contains('fight')]

Unnamed: 0,movie_id,title,genre
389,393,street fighter (1994),action
1840,1909,"x-files: fight the future, the (1998)",mystery|sci-fi|thriller
2890,2959,fight club (1999),drama
3574,3643,"fighting seabees, the (1944)",action|drama|war
3845,3915,girlfight (2000),drama


(3) **ratings + movies**

In [24]:
data = ratings.join(movies.set_index('movie_id'), on = 'movie_id')
data

Unnamed: 0,user_id,movie_id,count,timestamp,title,genre
0,1,1193,5,978300760,one flew over the cuckoo's nest (1975),drama
1,1,661,3,978302109,james and the giant peach (1996),animation|children's|musical
2,1,914,3,978301968,my fair lady (1964),musical|romance
3,1,3408,4,978300275,erin brockovich (2000),drama
4,1,2355,5,978824291,"bug's life, a (1998)",animation|children's|comedy
...,...,...,...,...,...,...
1000203,6040,1090,3,956715518,platoon (1986),drama|war
1000205,6040,1094,5,956704887,"crying game, the (1992)",drama|romance|war
1000206,6040,562,5,956704746,welcome to the dollhouse (1995),comedy|drama
1000207,6040,1096,4,956715648,sophie's choice (1982),drama


합치는 과정에서 결측치 생겼는지 확인

In [25]:
data.isnull().sum()

user_id      0
movie_id     0
count        0
timestamp    0
title        0
genre        0
dtype: int64

#### 추천 시스템에 내가 좋아하는 영화 입력

In [26]:
random.seed(20210818)

my_favorite = [47,648,1,593,2959] #세븐, 양들의 침묵, 토이스토리, 양들의 침묵, 파이트 클럽
#my_favorite = list(map(int,my_favorite))
my_time = []
for i in range(5):
    my_time.append(random.randint(956703932,1046454590))
    
my_title = [movies.loc[46,'title'],movies.loc[642,'title'],movies.loc[0,'title'],movies.loc[589,'title'],movies.loc[2890,'title']]
my_genre = [movies.loc[46,'genre'],movies.loc[642,'genre'],movies.loc[0,'genre'],movies.loc[589,'genre'],movies.loc[2890,'genre']]



#my_time = list(map(int,my_time))
my_list = pd.DataFrame({'user_id' : ['js']*5,'movie_id' : my_favorite, 'count' : [5,5,5,5,4] , 'timestamp' : my_time, 'title' : \
                       my_title, 'genre' : my_genre})

data_concat = pd.concat([data,my_list], ignore_index = True)
data_concat

Unnamed: 0,user_id,movie_id,count,timestamp,title,genre
0,1,1193,5,978300760,one flew over the cuckoo's nest (1975),drama
1,1,661,3,978302109,james and the giant peach (1996),animation|children's|musical
2,1,914,3,978301968,my fair lady (1964),musical|romance
3,1,3408,4,978300275,erin brockovich (2000),drama
4,1,2355,5,978824291,"bug's life, a (1998)",animation|children's|comedy
...,...,...,...,...,...,...
836478,js,47,5,999067212,seven (se7en) (1995),crime|thriller
836479,js,648,5,1006951044,mission: impossible (1996),action|adventure|mystery
836480,js,1,5,982841983,toy story (1995),animation|children's|comedy
836481,js,593,5,994365191,"silence of the lambs, the (1991)",drama|thriller


가장 인기 많은 **영화**는?

In [27]:
movie_count = data_concat.groupby('title')['user_id'].count().sort_values(ascending=False).head(30)
movie_count

title
american beauty (1999)                                   3211
star wars: episode iv - a new hope (1977)                2910
star wars: episode v - the empire strikes back (1980)    2885
star wars: episode vi - return of the jedi (1983)        2716
saving private ryan (1998)                               2561
terminator 2: judgment day (1991)                        2509
silence of the lambs, the (1991)                         2499
raiders of the lost ark (1981)                           2473
back to the future (1985)                                2460
matrix, the (1999)                                       2434
jurassic park (1993)                                     2413
sixth sense, the (1999)                                  2385
fargo (1996)                                             2371
braveheart (1995)                                        2314
men in black (1997)                                      2297
schindler's list (1993)                                  2257
pr

## 2. 모델 구성하기

user_id , movie_title index화

In [28]:
user_unique = data_concat['user_id'].unique()
movie_unique = data_concat['title'].unique()

# 유저, movie indexing
user_to_idx = {v:k for k,v in enumerate(user_unique)}
movie_to_idx = {v:k for k,v in enumerate(movie_unique)}

In [29]:
# indexing을 통해 데이터 컬럼 내 값을 바꾸는 코드
# dictionary 자료형의 get 함수는 https://wikidocs.net/16 을 참고하세요.

# user_to_idx.get을 통해 user_id 컬럼의 모든 값을 인덱싱한 Series를 구해 봅시다. 
# 혹시 정상적으로 인덱싱되지 않은 row가 있다면 인덱스가 NaN이 될 테니 dropna()로 제거합니다. 
temp_user_data = data_concat['user_id'].map(user_to_idx.get).dropna()
if len(temp_user_data) == len(data_concat):   # 모든 row가 정상적으로 인덱싱되었다면
    print('user_id column indexing OK!!')
    data_concat['user_id'] = temp_user_data   # data['user_id']을 인덱싱된 Series로 교체해 줍니다. 
else:
    print('user_id column indexing Fail!!')

# artist_to_idx을 통해 artist 컬럼도 동일한 방식으로 인덱싱해 줍니다. 
temp_movie_data = data_concat['title'].map(movie_to_idx.get).dropna()
if len(temp_movie_data) == len(data_concat):
    print('movie column indexing OK!!')
    data_concat['title'] = temp_movie_data
else:
    print('movie column indexing Fail!!')

data_concat

user_id column indexing OK!!
movie column indexing OK!!


Unnamed: 0,user_id,movie_id,count,timestamp,title,genre
0,0,1193,5,978300760,0,drama
1,0,661,3,978302109,1,animation|children's|musical
2,0,914,3,978301968,2,musical|romance
3,0,3408,4,978300275,3,drama
4,0,2355,5,978824291,4,animation|children's|comedy
...,...,...,...,...,...,...
836478,6039,47,5,999067212,220,crime|thriller
836479,6039,648,5,1006951044,58,action|adventure|mystery
836480,6039,1,5,982841983,40,animation|children's|comedy
836481,6039,593,5,994365191,121,drama|thriller


#### movie_id 는 최대값이 3952(중간에 빈 값이 있음)이므로 sparse matrix를 구성하기 위해선 title 활용

In [30]:
data_concat['movie_id'].sort_values(ascending=False)

787690    3952
136158    3952
298386    3952
21444     3952
111948    3952
          ... 
47471        1
60132        1
592151       1
665563       1
99129        1
Name: movie_id, Length: 836483, dtype: int64

In [31]:
data_concat['title'].sort_values(ascending=False)

823157    3627
822167    3626
812520    3625
812443    3624
811989    3623
          ... 
264203       0
462274       0
58421        0
4878         0
0            0
Name: title, Length: 836483, dtype: int64

**csr_matrix** 구성하기

In [32]:
num_user = data_concat['user_id'].nunique()
num_movie = data_concat['title'].nunique()

csr_data = csr_matrix((data_concat['count'], (data_concat['user_id'], data_concat['title'])), shape= (num_user, num_movie))
csr_data

<6040x3628 sparse matrix of type '<class 'numpy.longlong'>'
	with 836483 stored elements in Compressed Sparse Row format>

___

#### MF 모델 학습하기

In [33]:
#초기 설정
os.environ['OPENBLAS_NUM_THREADS']='1'
os.environ['KMP_DUPLICATE_LIB_OK']='True'
os.environ['MKL_NUM_THREADS']='1'

**Als** 모델 정의

In [34]:
# Implicit AlternatingLeastSquares 모델의 선언
als_model = AlternatingLeastSquares(factors=100, regularization=0.01, use_gpu=False, iterations=15, dtype=np.float32)

In [35]:
# als 모델은 input으로 (item X user 꼴의 matrix를 받기 때문에 Transpose해줍니다.)
csr_data_transpose = csr_data.T
csr_data_transpose

<3628x6040 sparse matrix of type '<class 'numpy.longlong'>'
	with 836483 stored elements in Compressed Sparse Column format>

In [36]:
# 모델 훈련
als_model.fit(csr_data_transpose)

  0%|          | 0/15 [00:00<?, ?it/s]

#### 나와 seven과의 관계를 확인

In [37]:
js, seven  = user_to_idx['js'], movie_to_idx['seven (se7en) (1995)']
js_vector, seven_vector = als_model.user_factors[js], als_model.item_factors[seven]

In [38]:
js_vector

array([ 0.65339255,  0.2552523 ,  0.20738667, -0.31959096,  0.20561974,
       -0.61191803,  0.11937266,  0.0926216 , -0.58485126,  0.0114001 ,
        1.0832313 ,  0.11104604, -0.2255594 ,  0.5659285 , -0.14490148,
        1.5360148 ,  0.9097392 , -0.60831606,  0.11688108, -0.46472138,
        0.86343515, -0.9632065 ,  0.07667996, -0.60944307, -0.08593376,
        1.0740038 ,  0.08456306, -0.11546434,  0.4726141 ,  0.03235072,
       -0.69715303,  0.04613259, -0.53400934,  0.9355613 , -0.40277112,
        0.43592823, -0.23528332, -0.40413335, -0.55324715,  0.35132855,
       -0.19609128,  0.670551  ,  0.06539285, -0.42723542,  0.11248975,
       -0.2713388 , -0.09370007,  0.021828  , -1.0400947 ,  0.4904608 ,
        0.5654408 ,  0.10320243,  0.74106497,  0.5285577 , -0.16218825,
        0.19950102,  0.51837033,  0.72512835,  0.0542467 , -0.3167928 ,
       -0.03958422, -0.03597993, -0.4918951 , -0.24410969,  0.17856981,
        0.35324585,  0.07575265, -0.02583706, -0.42823482,  0.73

In [39]:
seven_vector

array([ 0.0261443 ,  0.03236723,  0.01171583, -0.02021266,  0.00214848,
        0.0025777 , -0.01838938,  0.01563251, -0.02535387,  0.00176518,
        0.02020922,  0.0023605 ,  0.007887  ,  0.0145974 ,  0.01186926,
        0.03407011,  0.00792765,  0.01776448,  0.00616112, -0.00549287,
        0.03567027,  0.00102637,  0.01390664,  0.01513257,  0.00700274,
       -0.00425369, -0.02375882,  0.00617185,  0.00472735,  0.00955537,
       -0.00764001,  0.00373437, -0.01280488,  0.04407841, -0.01761988,
        0.00721643, -0.01480863, -0.00193471,  0.00545363, -0.00039025,
        0.00788868,  0.01853045,  0.01740216,  0.0031428 ,  0.01482856,
        0.01653221, -0.01562303,  0.01413015, -0.0239438 ,  0.0357586 ,
       -0.01021716, -0.01586837,  0.02674948,  0.01313429,  0.00332676,
        0.00744464,  0.01008658,  0.00645798,  0.00379093,  0.00632642,
        0.0076279 , -0.01018122, -0.00041851, -0.01237101,  0.01790199,
        0.02786931,  0.00068459,  0.02240548, -0.03545968,  0.01

#### 생각보다 낮은 결과

In [40]:
np.dot(js_vector, seven_vector)

0.42019188

In [41]:
mi = movie_to_idx['mission: impossible (1996)']
mi_vector = als_model.item_factors[mi]
np.dot(js_vector, mi_vector)

0.38319805

In [42]:
toystory = movie_to_idx['toy story (1995)']
toystory_vector = als_model.item_factors[toystory]
np.dot(js_vector, toystory_vector)

0.4416934

In [43]:
silent = movie_to_idx['silence of the lambs, the (1991)']
silent_vector = als_model.item_factors[silent]
np.dot(js_vector, silent_vector)

0.54024845

In [44]:
fight = movie_to_idx['fight club (1999)']
fight_vector = als_model.item_factors[fight]
np.dot(js_vector, fight_vector)

0.44838643

내가 뽑지 않은 영화

In [45]:
fairlady = movie_to_idx['my fair lady (1964)']
fairlady_vector = als_model.item_factors[fairlady]
np.dot(js_vector, fairlady_vector)

-0.029957814

점수가 가장 높았던 **양들의 침묵**과 **유사한** 영화들을 확인해보자.

In [46]:
favorite_movie = 'silence of the lambs, the (1991)'
movie_id = movie_to_idx[favorite_movie]
similar_movie = als_model.similar_items(movie_id, N=15)
similar_movie

[(121, 1.0),
 (157, 0.8070394),
 (51, 0.7600723),
 (222, 0.6761469),
 (23, 0.5781125),
 (269, 0.51910645),
 (472, 0.5083644),
 (38, 0.504795),
 (248, 0.4946388),
 (233, 0.4859819),
 (224, 0.38698718),
 (3517, 0.37803966),
 (220, 0.36682847),
 (48, 0.34531134),
 (289, 0.3439058)]

In [47]:
#artist_to_idx 를 뒤집어, index로부터 artist 이름을 얻는 dict를 생성합니다. 
idx_to_movie = {v:k for k,v in movie_to_idx.items()}
[idx_to_movie[i[0]] for i in similar_movie]

['silence of the lambs, the (1991)',
 'shawshank redemption, the (1994)',
 'fargo (1996)',
 'pulp fiction (1994)',
 "schindler's list (1993)",
 'goodfellas (1990)',
 'sling blade (1996)',
 'sixth sense, the (1999)',
 'good will hunting (1997)',
 'usual suspects, the (1995)',
 'l.a. confidential (1997)',
 'paralyzing fear: the story of polio in america, a (1998)',
 'seven (se7en) (1995)',
 'saving private ryan (1998)',
 'reservoir dogs (1992)']

#### als_model의 similar_items로 확인

In [48]:
# 비슷한 영화를 알려주는 함수
def get_similar_movie(movie_title: str):
    movie_id = movie_to_idx[movie_title]
    similar_movie = als_model.similar_items(movie_id)
    similar_movie = [idx_to_movie[i[0]] for i in similar_movie]
    return similar_movie

#### 비슷한 영화들을 얻었다.

In [49]:
get_similar_movie('silence of the lambs, the (1991)')

['silence of the lambs, the (1991)',
 'shawshank redemption, the (1994)',
 'fargo (1996)',
 'pulp fiction (1994)',
 "schindler's list (1993)",
 'goodfellas (1990)',
 'sling blade (1996)',
 'sixth sense, the (1999)',
 'good will hunting (1997)',
 'usual suspects, the (1995)']

In [50]:
get_similar_movie('toy story (1995)')

['toy story (1995)',
 'toy story 2 (1999)',
 'aladdin (1992)',
 "bug's life, a (1998)",
 'babe (1995)',
 'groundhog day (1993)',
 'lion king, the (1994)',
 "there's something about mary (1998)",
 'beauty and the beast (1991)',
 'pleasantville (1998)']

내가 좋아할 만한 영화 추천 받기

In [51]:
user = user_to_idx['js']
# recommend에서는 user*item CSR Matrix를 받습니다.
movie_recommended = als_model.recommend(user, csr_data, N=20, filter_already_liked_items=True) #filter? 이미 평가한 아이템은 제외
movie_recommended

[(38, 0.49172634),
 (222, 0.40203747),
 (233, 0.39003184),
 (50, 0.37959346),
 (157, 0.3513489),
 (51, 0.32742053),
 (289, 0.31263992),
 (248, 0.28534436),
 (224, 0.28497958),
 (472, 0.27255216),
 (99, 0.24678871),
 (124, 0.24081089),
 (119, 0.23997086),
 (23, 0.23957959),
 (269, 0.22625375),
 (317, 0.21374924),
 (826, 0.2114458),
 (138, 0.20659842),
 (110, 0.20556238),
 (221, 0.19980812)]

내가 **좋아하는** 스릴러와 픽사 영화들이 많이 출력됐다!

In [52]:
[idx_to_movie[i[0]] for i in movie_recommended]

['sixth sense, the (1999)',
 'pulp fiction (1994)',
 'usual suspects, the (1995)',
 'toy story 2 (1999)',
 'shawshank redemption, the (1994)',
 'fargo (1996)',
 'reservoir dogs (1992)',
 'good will hunting (1997)',
 'l.a. confidential (1997)',
 'sling blade (1996)',
 'american beauty (1999)',
 'matrix, the (1999)',
 'twister (1996)',
 "schindler's list (1993)",
 'goodfellas (1990)',
 'twelve monkeys (1995)',
 'game, the (1997)',
 'true lies (1994)',
 'groundhog day (1993)',
 'south park: bigger, longer and uncut (1999)']

#### 1위를 차지한 sixth_sense의 판단 근거는??

In [53]:
sixth_sense = movie_to_idx['sixth sense, the (1999)']
explain = als_model.explain(user, csr_data, itemid=sixth_sense)

스릴러인 양들의 침묵과 세븐이 상위에 있다!

In [54]:
[(idx_to_movie[i[0]], i[1]) for i in explain[1]]

[('silence of the lambs, the (1991)', 0.19547314401192353),
 ('seven (se7en) (1995)', 0.12703118704612895),
 ('fight club (1999)', 0.11099986086924336),
 ('toy story (1995)', 0.05423168803041139),
 ('mission: impossible (1996)', -0.005228031954165099)]

___

# 정리

- artist_recommendation을 진행한 것과 동일한 방식으로 movie_recommendation을 진행했다.

노드 진행하는 데는 큰 어려움이 없었지만 CSR matrix와 MAE와 같은 추천 시스템에서의 새로운 개념들을 익히기가 어려웠다.

AlternatingLeastSquares이 잘 구현돼있어 pandas로 약간의 전처리만 하면 충분히 원하는 결과를 얻을 수 있었다.

내가 선택한 Top 5 영화 목록은 세븐, 양들의 침묵, 파이트 클럽, 토이스토리, 미션 임파서블이었는데 선택한 영화들의 연관성이   
좀 떨어졌는지 선호 벡터와의 내적한 값이 크게 나오진 않았다. 그치만 이와는 완전 다른 마이 페어 레이디 영화를 입력했을 때 내적한 값이    
음수가 나온 것으로 보아 제대로 훈련된 것은 맞는 거 같다.