# Content-Based Filtering

- 콘텐츠 기반 필터링은 유저가 좋아하는 아이템과 유사한 아이템을 추천하는 방식입니다
- 주로 아이템의 메타데이터를 활용해 아이템의 특성을 잘 표현하는 벡터를 만들고, 아이템 간의 유사도를 cosine similarity를 통해 계산합니다

In [1]:
import random

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# 1. CSV 파일 불러오기
mangoplate_df = pd.read_csv('./data/MangoPlate_CB.csv')
mangoplate_df.head()

Unnamed: 0,name,address,category,main_mn,price,rating,rvw_cnt
0,서울치킨,중구,"['닭 ', ' 오리 요리']",후라이드치킨,17000.0,4.1,17
1,소신,유성구,"['카페 ', ' 디저트']",아메리카노,3800.0,4.1,60
2,성심당케익부띠끄,중구,['베이커리'],치아바타,3000.0,4.3,75
3,버기즈,서구,"['브런치 ', ' 버거 ', ' 샌드위치']",정보 없음,0.0,4.0,43
4,누오보나폴리,유성구,['이탈리안'],마리나라,10000.0,4.0,72


In [3]:
# 2. 아이템 벡터화
idx2res_name = mangoplate_df['name'].to_dict()
mangoplate_df.drop(columns=['name'], inplace=True)

address_one_hot = pd.get_dummies(mangoplate_df['address'])
mangoplate_df.drop(columns=['address'], inplace=True)
mangoplate_df = pd.concat([mangoplate_df, address_one_hot], axis=1)

# category, main_menu 정보는 어떻게 벡터화할 지 나중에 생각 - 일단 삭제 
mangoplate_df.drop(columns=['category', 'main_mn'], inplace=True)

# 각 attribute별 scale 차이가 너무 심함 - normalize
scaler = MinMaxScaler()
mangoplate_data = scaler.fit_transform(mangoplate_df)

for i in range(10):
    print(f"Vector for Restaurant {idx2res_name.get(i)}:\t{mangoplate_data[i]}")

Vector for Restaurant 서울치킨:	[0.11333333 0.28571429 0.04659498 0.         0.         0.
 1.        ]
Vector for Restaurant 소신:	[0.02533333 0.28571429 0.20071685 0.         0.         1.
 0.        ]
Vector for Restaurant 성심당케익부띠끄:	[0.02       0.57142857 0.25448029 0.         0.         0.
 1.        ]
Vector for Restaurant 버기즈:	[0.         0.14285714 0.13978495 0.         1.         0.
 0.        ]
Vector for Restaurant 누오보나폴리:	[0.06666667 0.14285714 0.2437276  0.         0.         1.
 0.        ]
Vector for Restaurant 동은성:	[0.10666667 0.42857143 0.03584229 0.         0.         0.
 1.        ]
Vector for Restaurant 솔밭묵집:	[0.        0.        0.1218638 0.        0.        1.        0.       ]
Vector for Restaurant 바라던바:	[0.         0.         0.06810036 0.         0.         1.
 0.        ]
Vector for Restaurant 치앙마이방콕:	[0.18666667 0.14285714 0.03584229 1.         0.         0.
 0.        ]
Vector for Restaurant 층층층:	[0.         0.         0.01433692 1.         0.         0.
 0.       

In [4]:
# 3. Cosine Similarity를 활용한 유사도 계산
similarity_matrix = cosine_similarity(mangoplate_data)
similarity_matrix

array([[1.        , 0.08459097, 0.95304872, ..., 0.04104445, 0.0398315 ,
        0.11544381],
       [0.08459097, 1.        , 0.17189363, ..., 0.98165729, 0.04349653,
        0.98449144],
       [0.95304872, 0.17189363, 1.        , ..., 0.08061904, 0.07459556,
        0.20582496],
       ...,
       [0.04104445, 0.98165729, 0.08061904, ..., 1.        , 0.02156806,
        0.96452378],
       [0.0398315 , 0.04349653, 0.07459556, ..., 0.02156806, 1.        ,
        0.05744306],
       [0.11544381, 0.98449144, 0.20582496, ..., 0.96452378, 0.05744306,
        1.        ]])

In [8]:
# 4. 추천 예시
for i in range(3):
    res_name = idx2res_name[random.randint(0, len(idx2res_name))]
    print(f"Suppose that user {i} likes restaurant {res_name}")
    
    print(f"We can recommend following restaurants to user {i}:")
    recommend_items_idx = np.argsort(similarity_matrix[i])[1:6]
    for item_idx in recommend_items_idx:
        print(f"\t{idx2res_name[item_idx]}") 

Suppose that user 0 likes restaurant 몇몇커피
We can recommend following restaurants to user 0:
	가배로스터스
	층층층
	무야
	르몽탁
	포케153
Suppose that user 1 likes restaurant 김화식당
We can recommend following restaurants to user 1:
	무야
	포케153
	원조태평소국밥
	하레하레
	킨토토
Suppose that user 2 likes restaurant 원조태평소국밥
We can recommend following restaurants to user 2:
	가배로스터스
	층층층
	무야
	르몽탁
	포케153
