# [프로젝트4] SKB Btv 영화 데이터 추천 알고리즘 적용 및 평가하기


---


## 프로젝트 목표
---
- Latent factor collaborative filtering 추천 알고리즘을 적용해 봅니다.
- 추천 알고리즘 평가 방법(mAP, entropy, nDCG)을 통해 추천 알고리즘의 성능을 확인합니다. 


## 프로젝트 목차
---

1. **데이터 불러오기:** 4개의 csv 데이터를 불러옵니다.

2. **데이터 전처리하기:** 추천 알고리즘 적용을 위해 데이터를 전처리합니다. 

3. **추천 알고리즘 적용하기:** latent factor collaborative filtering 추천 알고리즘을 적용합니다. 

4. **추천 알고리즘 결과 평가하기:** 예측한 결과를 mAP, Entropy Diversity, nDCG로 평가합니다.


## 프로젝트 개요
---

SKB Btv 시청 데이터를 활용하여  latent factor collaborative filtering 추천 알고리즘을 적용한 후, 예측 결과를 다양한 지표(mAP, Entropy Diversity, nDCG)를 활용하여 평가합니다.

## 1. 데이터 불러오기
---

판다스 데이터 프레임 형태로 movie, views, question, test 데이터들을 불러들입니다.

In [None]:
# 판다스 라이브러리 사용을 위해
import pandas as pd

# 데이터 로드 (header=None은 컬럼 이름이 없다는 뜻입니다)
df_movie = pd.read_csv('/mnt/data/chapter_4/MYSUNI_MOVIES.csv', header=None)
df_views = pd.read_csv('/mnt/data/chapter_4/MYSUNI_VIEWS.csv', header=None)
df_question = pd.read_csv('/mnt/data/chapter_4/MYSUNI_QUESTION.csv', header=None)
df_test = pd.read_csv('/mnt/data/chapter_4/MYSUNI_TEST.csv', header=None)

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


## 2. 데이터 전처리하기
---

읽어들인 데이터들을 원하는 형태로 전처리 합니다.

`영화 정보`를 원하는 정보로 전처리 합니다.

In [None]:
# 영화 정보에 대해서 열 이름을 정하고, 첫 행을 지우고, MOVIE_ID의 데이터 타입을 변경함
df_movie.columns = ['MOVIE_ID', 'TITLE', 'RELEASE_MONTH']
df_movie.drop(0, inplace=True)
df_movie = df_movie.astype({'MOVIE_ID': 'int'})

`정답 정보`를 원하는 정보로 전처리 합니다.

In [None]:
# 정답 정보에 대해서 열 이름을 정하고, 첫 행을 지움
df_test.columns = ['USER_ID', 'MOVIE_ID', 'TITLE']
df_test.drop(0, inplace=True)

# 정답 정보에 대해서 MOVIE_ID, USER_ID 데이터 타입을 변경
df_test = df_test.astype({'MOVIE_ID': 'int'})
df_test = df_test.astype({'USER_ID': 'int'})

# 정답 정보에 대해 TITLE 열을 없애고, DURATION 열을 모두 1값으로 하여 추가함
df_test.drop('TITLE', axis=1, inplace=True)
df_test['DURATION']=1

본 예제에서는 서버단의 메모리 한계로 인하여 데이터 중 처음 100만개 만을 사용합니다.

**Tips**: `head(n)` 메소드를 사용하여 처음 n개 행의 데이터를 불러올 수 있습니다.

In [None]:
# 메모리 한계로 인하여 처음 100만개만 사용함
df_test = df_test.head(1000000)

In [None]:
# 정답 정보 확인
df_test

Unnamed: 0,USER_ID,MOVIE_ID,DURATION
1,0,1892,1
2,0,3082,1
3,0,3720,1
4,0,7938,1
5,0,8480,1
...,...,...,...
999996,35163,10894,1
999997,35163,10909,1
999998,35163,13336,1
999999,35163,14381,1


정답 정보 데이터가 1000000개 남은 것을 확인할 수 있습니다.

`문제 정보`에 대해 필요한 전처리를 합니다.

In [None]:
# 문제 정보에 대해서 열 이름을 정하고, 첫 행을 지우고, 데이터 타입을 변경함
df_question.columns = ['USER_ID', 'MOVIE_ID', 'DURATION', 'WATCH_DAY', 'WATCH_SEQ']
df_question.drop(0, inplace=True)
df_question = df_question.astype('int')

`DURATION`을 기반으로 rating 값을 계산합니다.

In [None]:
# 시청시간이 10분 이하면 0으로 변경
df_question.loc[df_question.DURATION<10, 'DURATION']=0
# 나머지는 1로 변경
df_question.loc[df_question.DURATION>0, 'DURATION']=1

`문제 정보`, `영화 정보` 에서 필요한 내용만 남겨둡니다.

In [None]:
# 문제 정보, 영화 정보에서 필요 없는 내용들을 제외함
df_question.drop('WATCH_DAY', axis=1, inplace=True)
df_question.drop('WATCH_SEQ', axis=1, inplace=True)
df_movie.drop('RELEASE_MONTH', axis=1, inplace=True)

`MOVIE_ID`를 기준으로 `영화 정보`, `문제 정보`를 결합합니다.

In [None]:
# MOVIE_ID를 기준으로 영화 정보, 문제 정보를 합함
user_movie_data = pd.merge(df_question, df_movie, on = 'MOVIE_ID')
user_movie_data

Unnamed: 0,USER_ID,MOVIE_ID,DURATION,TITLE
0,1,4660,1,원더풀라이프(1998)
1,647,4660,0,원더풀라이프(1998)
2,38017,4660,0,원더풀라이프(1998)
3,16671,4660,1,원더풀라이프(1998)
4,31970,4660,0,원더풀라이프(1998)
...,...,...,...,...
558695,55137,14102,0,더캐스팅
558696,55508,4908,1,부다페스트로큰롤
558697,55508,4908,0,부다페스트로큰롤
558698,55743,6120,0,헌티드(2018)


본 예제에서는 서버단의 메모리 한계로 인하여 `user_movie_data` 행렬의 처음 3만개 만을 사용합니다.

In [None]:
# 메모리 한계로 인하여 user_movie_data 행렬의 처음 3만개만 사용함
user_movie_data = user_movie_data.head(30000)

`user_movie_data`를 이용하여 `user_movie_rating` 행렬을 생성 합니다.

In [None]:
# user_movie_data를 이용하여 user_movie_rating 행렬을 생성함
# index는 USER_ID, columns는 MOVIE_ID로 설정. 이때 fillna를 사용하여 결측치를 0으로 처리
df_user_movie_ratings = user_movie_data.pivot_table('DURATION', index='USER_ID', columns='MOVIE_ID').fillna(0)

1) 각 사용자가 각 영화에 대해 평점(보았다 안보았다)을 매긴 값이 존재하도록 전처리를 하였습니다.

2) 각 사용자의 평균 평점을 구합니다.

3) 각 사용자의 각 영화에 대한 평점을 계산합니다.

In [None]:
# numpy 라이브러리 불러오기
import numpy as np

# matrix는 pivot_table 값을 numpy matrix로 만든 것 
matrix = df_user_movie_ratings.to_numpy()

# user_ratings_mean은 각 영화에 대한 사용자의 평균 평점
user_ratings_mean = np.mean(matrix, axis = 1)

# 사용자-영화에 대해 사용자 평균 평점을 뺀 것
# reshape 메소드를 사용하여 user_rating_mean의 dimension을 맞추기
matrix_user_mean = matrix - user_ratings_mean.reshape(-1, 1)

In [None]:
# matrix_user_mean 행렬의 shape 확인
df_user_movie_ratings

MOVIE_ID,579,1253,1372,1455,1616,3345,4660,4702,4863,5020,...,9722,10488,11538,11845,12364,13953,14282,14437,14489,14516
USER_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
55877,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
55878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
55887,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


최종적으로 생성한 `matrix_user_mean` 행렬을 확인합니다. `head()`를 사용하여 첫 5개의 데이터를 출력해 봅니다.

In [None]:
# matrix_user_mean을 데이터 프레임 형태로 바꾸고 행렬 앞 부분 확인
pd.DataFrame(matrix_user_mean, columns = df_user_movie_ratings.columns).head()

MOVIE_ID,579,1253,1372,1455,1616,3345,4660,4702,4863,5020,...,9722,10488,11538,11845,12364,13953,14282,14437,14489,14516
0,-0.285714,-0.285714,0.714286,0.714286,-0.285714,-0.285714,0.714286,-0.285714,-0.285714,0.714286,...,-0.285714,-0.285714,0.714286,0.714286,0.714286,-0.285714,0.714286,-0.285714,-0.285714,-0.285714
1,-0.035714,-0.035714,-0.035714,0.964286,-0.035714,-0.035714,-0.035714,-0.035714,-0.035714,-0.035714,...,-0.035714,-0.035714,-0.035714,-0.035714,-0.035714,-0.035714,-0.035714,-0.035714,-0.035714,-0.035714
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 3. 추천 알고리즘 적용하기
---

Latent factor collaborative filtering 추천 알고리즘을 적용합니다.

1) `matrix_user_mean` 행렬에 SVD를 적용합니다.

2) SVD 결과에서 얻은 `sigma` 값을 대각 행렬(diagonal matrix)로 바꿉니다.

3) SVD 행렬들의 곱과 `user_ratings_mean`을 사용하여 `svd_user_predicted_ratings` 행렬을 생성합니다.

### [TODO] `matrix_user_mean` 행렬에 SVD를 적용하는 코드를 작성하세요.

**Tips**: `svds(matrix, k=n)`를 사용하여 상위 n개의 truncated SVD를 구할 수 있습니다.

In [None]:
# scipy에서 제공해주는 svds 사용
from scipy.sparse.linalg import svds

# U 행렬, sigma 행렬, V 전치 행렬을 반환
U, sigma, Vt = svds(matrix_user_mean, k = 5)

In [None]:
# 각 행렬의 shape을 확인

print(U.shape)
print(sigma.shape)
print(Vt.shape)

(19303, 5)
(5,)
(5, 28)


### [TODO] SVD 결과에서 `sigma`로 나온 값을 **대각 행렬(diagonal matrix)** 로 바꾸는 코드를 작성하세요.

**Tips**: numpy의 `diag()`를 사용하여 대각 행렬을 생성할 수 있습니다.

In [None]:
# 현재 이 Sigma 행렬은 0이 아닌 값만 1차원 행렬로 표현되었으므로, 0이 포함된 대칭행렬로 변환을 위해 diag함
sigma = np.diag(sigma)

In [None]:
# sigma 행렬의 shape 확인
sigma.shape

(5, 5)

In [None]:
# sigam[0]의 값 확인
sigma[0]

array([34.9198233,  0.       ,  0.       ,  0.       ,  0.       ])

In [None]:
# sigam[1]의 값 확인
sigma[1]

array([ 0.      , 35.270901,  0.      ,  0.      ,  0.      ])

In [None]:
sigma

array([[34.9198233 ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        , 35.270901  ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        , 41.70367768,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , 42.66813404,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        , 44.98636909]])

`diag`를 적용한 `sigma` 행렬은 대각값들만 있는 것을 확인할 수 있습니다.

1) 원본 `user-movie` 평점 행렬을 만듭니다. 

2) 이를 user의 평균 점수를 빼서 `matrix_user_mean` 이라는 행렬로 만듭니다.

3) 2)번의 값에 SVD를 적용해 `U`, `sigma`, `Vt` 행렬을 구합니다.

4) `sigma` 행렬은 현재 0이 포함이 되지 않은 값으로만 구성되어 있으므로 이를 대칭행렬로 변환합니다.

### [TODO] SVD의 결과 행렬들의 곱과 `user_ratings_mean`을 사용하여 `svd_user_predicted_ratings` 행렬을 만드는 코드를 작성하세요

**Tips**: numpy의 `dot()`을 사용하여 행렬 내적을 수행할 수 있습니다.

In [None]:
# U, Sigma, Vt의 내적을 수행하면, 다시 원본 행렬로 복원이 된다. 
# 거기에 + 사용자 평균 rating을 적용한다. 
svd_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt) + user_ratings_mean.reshape(-1, 1)

In [None]:
# svd를 이용한 prediction 값들을 계산하고 확인
df_svd_preds = pd.DataFrame(svd_user_predicted_ratings, columns = df_user_movie_ratings.columns)
df_svd_preds.head() # 첫 5개의 값만 출력

MOVIE_ID,579,1253,1372,1455,1616,3345,4660,4702,4863,5020,...,9722,10488,11538,11845,12364,13953,14282,14437,14489,14516
0,0.308978,0.302964,0.305143,0.314296,-0.029316,0.302784,0.304263,0.306069,0.30311,0.31711,...,0.302958,0.303704,0.32448,0.428526,0.321769,0.254453,1.062628,0.303273,0.306057,-0.070155
1,0.043476,0.044295,0.043494,0.045237,-0.002651,0.043454,0.04341,0.043677,0.043376,0.045788,...,0.043622,0.043351,0.045192,0.049213,0.045182,0.045701,-0.003221,0.043366,0.043423,-0.001783
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# df_svd_preds 행렬의 shape 확인
df_svd_preds.shape

(19303, 28)

In [None]:
# 영화를 latent 기반 cf로 추천하는 메소드

def recommend_movies(df_svd_preds, user_id, ori_movies_df, ori_ratings_df, num_recommendations=5):
    # 현재는 index(0부터 시작)로 적용이 되어있으므로 user_id - 1을 해야함
    user_row_number = user_id - 1 
    
    # 최종적으로 만든 pred_df에서 사용자 index에 따라 영화 데이터 정렬 -> 영화 평점이 높은 순으로 정렬 됨
    sorted_user_predictions = df_svd_preds.iloc[user_row_number].sort_values(ascending=False)
    
    # 원본 평점 데이터에서 user id에 해당하는 데이터를 뽑아냄
    user_data = ori_ratings_df[ori_ratings_df.USER_ID == user_id]
    
    # 위에서 뽑은 user_data와 원본 영화 데이터를 합친다. 
    user_history = user_data.merge(ori_movies_df, on = 'MOVIE_ID').sort_values(['DURATION'], ascending=False)
    
    # 원본 영화 데이터에서 사용자가 본 영화 데이터를 제외한 데이터를 추출
    recommendations = ori_movies_df[~ori_movies_df['MOVIE_ID'].isin(user_history['MOVIE_ID'])]

    # 사용자의 영화 평점이 높은 순으로 정렬된 데이터와 위 recommendations을 합함 
    recommendations = recommendations.merge( pd.DataFrame(sorted_user_predictions).reset_index(), on = 'MOVIE_ID')
    
    # 컬럼 이름 바꾸고 정렬해서 return
    recommendations = recommendations.rename(columns = {user_row_number: 'Predictions'}).sort_values('Predictions', ascending = False).iloc[:num_recommendations, :]
    
    # 사용자의 시청 기록과 추천 영화 반환
    return user_history, recommendations

In [None]:
# USER_ID 270번에 대해 10개 추천하고 이미 시청한 리스트 확인
already_rated, predictions = recommend_movies(df_svd_preds, 270, df_movie, df_test, 10)
already_rated.head(10) 

Unnamed: 0,USER_ID,MOVIE_ID,DURATION,TITLE
0,270,42,1,천장지구(1990)
1,270,237,1,어바웃타임
30,270,14054,1,A-특공대
29,270,13784,1,로그
28,270,13519,1,겟썸3
27,270,13336,1,지금만나러갑니다(2018)
26,270,12968,1,아이언마스크:용패지미
25,270,12558,1,검객
24,270,11674,1,반지의제왕:반지원정대
23,270,11546,1,세이프(2011)


In [None]:
# 아직 보지 않은 영화로 10개 추천한 리스트 확인
predictions

Unnamed: 0,MOVIE_ID,TITLE,Predictions
24,14282,삼진그룹영어토익반,0.98992
21,11845,런,0.069371
16,7874,해피엔드,0.013815
12,6094,"어디갔어,버나뎃",0.00876
15,7421,콜래트럴(2004),0.007986
20,11538,나이브스아웃,0.005999
27,14516,빅매치,0.00411
0,579,담보,0.004059
22,12364,테넷,0.003891
26,14489,더레이서,0.001929


## 4. 추천 알고리즘 결과 평가하기
---

추천 알고리즘의 성능을 평가하기 위하여 `mAP`와 `Entropy Diversity`, `nDCG`를 이해하고 실행해봅니다.

### 4.1 mAP 구현하기
---

`mAP` (mean Average Precision)은 추천 알고리즘의 정답 비율의 평균을 계산한 값입니다.

아래 코드는 `mAP`를 간략히 구현한 메소드입니다. 추천 결과를 입력으로 넣으면 평균을 구하여 `mAP`에 저장 후 리턴합니다.

In [None]:
# mean Average Precision
def mAP(result):
    ap = 0
    
    # 각 사용자의 AP(Average Precision)의 합을 구합니다
    for r in result:
        ap += r['AP']
    
    # AP의 합을 사용자의 수로 나누어 mAP를 구합니다
    mAP = ap / len(result)
    
    return mAP

### 4.2 Entropy Diversity 구현하기
---

`Entropy Diversity`를 통해 추천 알고리즘이 얼마나 다양한 영화 장르를 추천했는지 평가합니다.

아래 코드는 `Entropy Diversity`를 계산하는 간단한 예제 코드입니다. 추천 영화의 장르 목록을 입력으로 넣으면 다양성을 계산한 값을 리턴합니다.

In [1]:
from math import log as ln

# Entropy Diversity
def diversity(dict_recommended):
    def p(n, N):
        if n == 0:
            return 0
        else:
            return (float(n) / N) * ln(float(n) / N)
    N = sum(dict_recommended.values())

    return -sum(p(n, N) for n in dict_recommended.values() if n != 0)

# Entropy Diversity를 계산해 봅니다
# 값이 클수록 더 다양한 상품을 추천한 것을 의미합니다
print(diversity({'act': 2, 'sf': 10, 'com': 10, 'thr': 5, 'spo': 10, 'mel': 100, 'rel': 10}))
print(diversity({'act': 30, 'sf': 40, 'com': 40, 'mel': 30}))

1.1669366259497473
1.376055285260417


결과를 살펴보면 두 번째 목록이 좀 더 다양한 영화를 추천한 것을 확인할 수 있습니다.

### 4.3 nDCG 구현하기
---

`nDCG`(normalized Discounted Cumulative Gain)는 랭킹 기반 추천 시스템에 주로 쓰이는 평가지표로, 추천 영화의 순서에 따라 추천 알고리즘을 평가합니다.

아래 코드는 `nDCG`를 구현한 메소드입니다. 

입력 변수를 `linear`, `exponetial`의 파라미터로 주어 어떤 식으로 `nDCG`를 계산할지 정할 수 있도록 하였습니다.

입력 값은 다음과 같습니다.
- `rel_true`: 1차원 배열이며 특정 사용자가 본 영화 목록
- `rel_pred`: 1차원 배열이며 특정 사용자에게 추천 시스템이 추천한 영화 목록
- `p`: 십진 정수이며 몇개를 추천할지 나타냄
- `form`: 스트링이며 `nDCG`를 계산할 때, `linear`, `exp.` 중 어떤 것을 사용할지 정함 (default는 `linear`)

In [1]:
import numpy as np

# rel_true: 1D array, relevance lists for paticual user, # of movies
# rel_pred: 1D array, predicted relevance lists, # of pred
# p: int, particular rank position
# form: string, two types of nDCG formula, linear/exp.

def ndcg(rel_true, rel_pred, p=None, form="linear"):
    rel_true = np.sort(rel_true)[::-1]
    p = min(len(rel_true), min(len(rel_pred), p))
    
    # 랭킹 순서에 따라 점점 비중을 줄여서 관련도를 계산
    discount = 1 / (np.log2(np.arange(p) + 2))

    # linear 계산식
    if form == "linear":
        idcg = np.sum(rel_true[:p] * discount)
        dcg = np.sum(rel_pred[:p] * discount)
    # exponential 계산식
    elif form == "exponential" or form == "exp":
        idcg = np.sum([2**x - 1 for x in rel_true[:p]] * discount)
        dcg = np.sum([2**x - 1 for x in rel_pred[:p]] * discount)
    # 그 외의 경우는 에러 처리
    else:
        raise ValueError("Only supported for two formula, 'linear' or 'exp'")

    # 정규화된 평가지표 반환
    return dcg / idcg

위의 `nDCG` 메소드를 적용하여 추천 알고리즘을 평가합니다.

In [None]:
# simple example of nDCG
if __name__ == "__main__":
    song_index = {'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4,
                  'F': 5, 'G': 6, 'H': 7, 'I': 8}
    user_lists = ["USER1", "USER2", "USER3"]

    # 각 사용자의 실제 플레이리스트
    relevance_true = {
        "USER1": [3, 3, 2, 2, 1, 1, 0, 0, 0],
        "USER2": [3, 2, 1, 1, 2, 0, 1, 1, 1],
        "USER3": [0, 1, 0, 1, 2, 3, 3, 1, 0]
    }

    # 추천 예시 1
    s1_prediction = {
        "USER1": ['A', 'E', 'C', 'D', 'F'],
        "USER2": ['G', 'E', 'A', 'B', 'D'],
        "USER3": ['C', 'G', 'F', 'B', 'E']
    }

    # 추천 예시 2
    s2_prediction = {
        "USER1": ['A', 'B', 'C', 'G', 'E'],
        "USER2": ['B', 'A', 'G', 'E', 'F'],
        "USER3": ['E', 'G', 'F', 'B', 'I']      
    }

    # 각 사용자별 추천 결과 nDCG 평가
    for user in user_lists:
        print(f'===={user}===')
        r_true = relevance_true[user]

        s1_pred = [r_true[song_index[song]] for song in s1_prediction[user]]
        s2_pred = [r_true[song_index[song]] for song in s2_prediction[user]]
   
        # linear nDCG 계산
        print(f'S1 nDCG@5 (linear): {ndcg(r_true, s1_pred, 5, "linear")}')
        print(f'S2 nDCG@5 (linear): {ndcg(r_true, s2_pred, 5, "linear")}')
        
        # exponential nDCG 계산
        print(f'S1 nDCG@5 (exponential): {ndcg(r_true, s1_pred, 5, "exp")}')
        print(f'S2 nDCG@5 (exponential): {ndcg(r_true, s2_pred, 5, "exp")}')

====USER1===
S1 nDCG@5 (linear): 0.8232936061974518
S2 nDCG@5 (linear): 0.8793791209851007
S1 nDCG@5 (exponential): 0.7406319169800546
S2 nDCG@5 (exponential): 0.911476869939315
====USER2===
S1 nDCG@5 (linear): 0.8241067540896558
S2 nDCG@5 (linear): 0.864255024163802
S1 nDCG@5 (exponential): 0.7200216168193889
S2 nDCG@5 (exponential): 0.821434096248145
====USER3===
S1 nDCG@5 (linear): 0.6850898875992608
S2 nDCG@5 (linear): 0.867837452040598
S1 nDCG@5 (exponential): 0.6922758990315323
S2 nDCG@5 (exponential): 0.826208951093206


`nDCG`의 값이 클수록 올바른 순서로 상품을 추천했다는 의미입니다.

위의 결과값을 살펴보면 두 번째 추천 목록이 모든 사용자에 대해 더 높은 `nDCG` 값을 가지므로 더 좋은 추천 목록이라 평가할 수 있습니다.

---