# 추천 시스템 개론

### 추천 시스템 시초
* 인터넷 등장 -> 이메일 활성화 -> 기존에 있던 서비스를 인터넷에 옮기는 작업을 함 -> 가장 먼저 화두가 된건 뉴스
    - (온라인 뉴스) -> 구독 서비스에는 많은 뉴스 기사 중에서 대부분의 사람들은 특수한 목적이 있지 않는 이상 모든 뉴스를 다 읽지 않음 -> 필터링을 해서 전달해줘야 함(특정 키워드가 들어있는지 여부로 필터링 했음)
    - 키워드의 등록이 여러 개인 경우 우선순위에 따라 목록을 보여줌 -> 추천 시스템

### 어떤 기준을 통해 줄 세우기 할 것인가?

* CB(Contents-Based, 컨텐츠 기반) : 상품의 상세설명 기반 (예) 달팽이 크림 상품을 어떻게 판매? -> (특정상품의 키워드로 추천 가능) 달팽이, 크림, 보습 찾으면 -> 추천
* KB(Knowledge-Based, 지식 기반): 업계 전문가가 도메인 지식으로 만든 규칙 기반 (예) 20년 넘게 일한 직원 -> 어떤 형태(차림, 생김세 등등)의 40대 고객이 왔을 때 -> '어떤 상품을 구매할 것이다'라는 것을 예상할 수 있음

* <참고!!!> 데이터 분석을 할 때는 늘 데이터가 매우 많다는 것(빅데이터)을 염두해두어야 한다. 여기서 비용문제가 발생하며, 요즘엔 클라우드 환경으로 많이 넘어가고 있는 추세이다

### CB, KB의 단점

* 아이템의 종류나 고객 인원수가 많아지면 대응이 어려워짐
* -> 데이터가 매우 많아지므로 데이터의 경향성을 파악하기 힘들어진다.
* (예) CB -> 

### Collaborative Filtering, 협업(연결고리) 필터링

* 고객과 물건의 연결고리 기반 : 사용자들로부터 얻은 취향이나 기호(favor)에 대한 정보를 이용
* 과거의 경향이 미래에도 계속 유지될 것이라 가정 -> (예) 나와 비슷한 선택을 했던 사람들이 과거에 선택한 것을 기반으로 나에게 상품 추천
* User-Based: 특정 고객에게 유사한 선택을 했던 다른 사람의 선택을 추천 (유저 관점)
* Item-Based: 특정 물건을 선택한 고객에게 해당 물건을 선택한 다른 고객의 선택 추천 (아이템 관점), (예) 대부분의 고객들이 A 상품을 고를 때, B 상품도 같이 구매하니 A상품을 구매한 고객에게 B도 추천하자, 대표적으로 아마존에서 많이 활용한다.

* (예) 유저 A가 어떤 영화에 대해서 5점을 줬을 때 유저 B가 5점을 줬다면 유저 A가 봤던 다른 영화를 유저 B에게 추천할 수 있다 => 이것이 핵심적인 로직

<참고!!> 협업 필터링만으로 진행하는 것이 아니며, 협업 필터링을 기반으로 해서 다른 알고리즘을 추가해서 진행한다(기업에서는 알고리즘을 잘 공개하지 않는다).

### Cross/Up/Down Sell

* Cross sell : 세트 메뉴, 상품 끼워팔기 -> 같이 사면 좋을 것 보여주기
* Up Sell : 500원 추가하면 라지로 변경 가능(같은 목적 달성 + 더 높은 마진)
* Down Sell : 구매의도는 있지만, 가격에 부담을 느끼는 고객에게 대체제를 보여줌

데이터 셋 출처
* movieLens
* https://grouplens.org/datasets/movielens/

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os

In [6]:
path = 'C:/Users/LOQ/Desktop/generative_AI/03_대용량 데이터 분석/test3'

movies = pd.read_csv(path + '/' + 'movies.csv')
ratings = pd.read_csv(path + '/' + 'ratings.csv')

In [7]:
movies

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [8]:
ratings

# 어떤 유저가 어떤 영화에 대한 평점 정보 데이터

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


In [9]:
combined = pd.merge(ratings, movies)
combined

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,964982931,"Usual Suspects, The (1995)",Crime|Mystery|Thriller
...,...,...,...,...,...,...
100831,610,166534,4.0,1493848402,Split (2017),Drama|Horror|Thriller
100832,610,168248,5.0,1493850091,John Wick: Chapter Two (2017),Action|Crime|Thriller
100833,610,168250,5.0,1494273047,Get Out (2017),Horror
100834,610,168252,5.0,1493846352,Logan (2017),Action|Sci-Fi


#### pivot table?

* 데이터 분석에서 사용되는 강력한 도구로, 데이터를 요약하고 특정 기준에 따라 재구성할 수 있게 해준다. 주로 데이터 집계, 평균, 합계, 개수 등을 계산하는 데 사용됩니다. 피벗 테이블을 통해 복잡한 데이터 세트를 이해하기 쉽게 정리하고 분석할 수 있다.

* 피벗 테이블의 주요 개념

    - 행(Index): 피벗 테이블에서 행은 데이터를 그룹화할 기준이 되는 열입니다. 예를 들어, '지역'이나 '제품 카테고리'와 같은 값이 행으로 사용될 수 있습니다.

    - 열(Columns): 열은 데이터를 세부적으로 분류하는 기준이 되는 요소입니다. 예를 들어, '년도'나 '분기'와 같은 값이 열로 사용될 수 있습니다.

    - 값(Values): 값은 행과 열의 교차점에서 계산되는 데이터입니다. 예를 들어, 매출액, 판매량, 평균 등이 값으로 계산될 수 있습니다. 이 값들은 합계, 평균, 개수 등의 방식으로 요약될 수 있습니다.

    - 필터(Filters): 필터는 특정 조건에 따라 데이터를 필터링하여 피벗 테이블에 표시할 데이터를 제한하는 데 사용됩니다. 예를 들어, 특정 연도나 지역의 데이터만 보고 싶을 때 사용됩니다.

* 예시

판매 데이터가 포함된 데이터프레임이 있습니다:

|날짜 | 지역 | 제품 | 매출|
|-----|-----|-----|-----|
|2023-01-01 | 서울 | A | 100|
|2023-01-01 | 부산 | B | 200|
|2023-01-02 | 서울 | A | 150|
|2023-01-02 | 부산 | B | 250|


이 데이터를 피벗 테이블로 구성한다고 가정하면, 지역별 제품의 매출 합계를 보고자 할 때:


|지역 | 제품A | 제품B|
|-----|-----|-----|
|서울 | 250 | 0|
|부산 | 0 | 450|


위와 같이 지역을 행으로, 제품을 열로 설정하고, 매출을 값으로 설정한 피벗 테이블을 만들 수 있습니다. 이렇게 하면 각 지역과 제품별로 매출 합계가 요약됩니다.

In [11]:
# pivot table 생성
pvt = combined.pivot_table(index='userId', columns='title', values='rating')
pvt

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,4.0,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,,,,,,,,,,,...,,,,,,,,,,
607,,,,,,,,,,,...,,,,,,,,,,
608,,,,,,,,,,,...,,,,,,4.5,3.5,,,
609,,,,,,,,,,,...,,,,,,,,,,


In [15]:
# nan 데이터는 0으로 변경
pvt = combined.pivot_table(index='userId', columns='title', values='rating').fillna(0)
pvt

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
607,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,4.5,3.5,0.0,0.0,0.0
609,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Item-Based

* 아이템 간 유사도(상관관계)를 계산

In [16]:
itme_corr = pvt.corr()
itme_corr

# 수치가 높을 수록 연관관계(유사도 == 연결고리가)가 높다는 의미

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'71 (2014),1.000000,-0.001642,-0.002324,-0.001642,-0.002254,-0.001642,-0.006407,-0.001642,0.135943,-0.004325,...,-0.001642,0.339935,0.542247,0.706526,-0.001642,-0.007675,0.134327,0.325287,-0.008185,-0.001642
'Hellboy': The Seeds of Creation (2004),-0.001642,1.000000,0.706526,-0.001642,-0.002254,-0.001642,-0.006407,-0.001642,-0.010568,-0.004325,...,-0.001642,-0.004589,-0.002808,-0.002324,-0.001642,-0.007675,-0.007744,-0.003594,-0.008185,-0.001642
'Round Midnight (1986),-0.002324,0.706526,1.000000,-0.002324,-0.003191,-0.002324,0.170199,-0.002324,-0.014958,-0.006121,...,-0.002324,-0.006495,-0.003975,-0.003289,-0.002324,-0.010863,-0.010961,-0.005087,-0.011585,-0.002324
'Salem's Lot (2004),-0.001642,-0.001642,-0.002324,1.000000,0.857269,-0.001642,-0.006407,-0.001642,-0.010568,-0.004325,...,-0.001642,-0.004589,-0.002808,-0.002324,-0.001642,-0.007675,-0.007744,-0.003594,-0.008185,-0.001642
'Til There Was You (1997),-0.002254,-0.002254,-0.003191,0.857269,1.000000,-0.002254,-0.008797,-0.002254,-0.014510,-0.005938,...,-0.002254,-0.006301,-0.003856,-0.003191,-0.002254,-0.010538,-0.010632,-0.004935,-0.011238,-0.002254
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
eXistenZ (1999),-0.007675,-0.007675,-0.010863,-0.007675,-0.010538,-0.007675,0.187953,0.212646,0.053614,0.115396,...,-0.007675,-0.021449,-0.013126,-0.010863,-0.007675,1.000000,0.163022,-0.016800,0.138611,-0.007675
xXx (2002),0.134327,-0.007744,-0.010961,-0.007744,-0.010632,-0.007744,0.062174,-0.007744,0.241092,-0.000060,...,0.063291,0.291410,0.163464,0.240394,-0.007744,0.163022,1.000000,0.259049,0.065673,-0.007744
xXx: State of the Union (2005),0.325287,-0.003594,-0.005087,-0.003594,-0.004935,-0.003594,-0.014025,-0.003594,0.139511,-0.009467,...,-0.003594,0.376455,0.172818,0.227658,-0.003594,-0.016800,0.259049,1.000000,-0.017917,-0.003594
¡Three Amigos! (1986),-0.008185,-0.008185,-0.011585,-0.008185,-0.011238,-0.008185,0.353194,0.175610,0.125905,0.234514,...,0.175610,-0.022876,-0.013999,-0.011585,-0.008185,0.138611,0.065673,-0.017917,1.000000,-0.008185


* 어떤 영화를 선택했을 때, 그 영화와 유사도가 높은 컬럼들은 무엇인가?

In [19]:
target = 'Matrix' # 메트릭스 시리즈 영화 가져오기

for title in itme_corr.columns:
    if target in title:
        print(title)

Matrix Reloaded, The (2003)
Matrix Revolutions, The (2003)
Matrix, The (1999)


In [20]:
interested = 'Matrix Reloaded, The (2003)'
itme_corr.sort_values(by=interested, ascending=False)[interested].head() # 매트릭스와 비슷한영화들은 무엇인지 확인해보기

# 영화에 대한 정보(장르, 감독, 배우, ...)가 없음에도 불구하고,
# User가 좋아할만한 유사성 있는 것들을 구할 수 있음

title
Matrix Reloaded, The (2003)                            1.000000
Matrix Revolutions, The (2003)                         0.725888
X2: X-Men United (2003)                                0.592739
Star Wars: Episode II - Attack of the Clones (2002)    0.570682
X-Men: The Last Stand (2006)                           0.558934
Name: Matrix Reloaded, The (2003), dtype: float64

#### User-Based

* 유저 간 유사한 정도를 계산함
* -> (예) A유저(A집합) B유저(B집합)가 있을 때 B가 A와 유사도가 높다면 교집합(A와 B가 공통으로 본 영화들)이 되어있다고 판단할 수 있다.

    이때 A에게 영화를 추천하려고 할 때 A와 B의 차집합 부분을 추천해주면 된다.

    (단, 유사도가 너무 높아도 교집합(A와 B가 공통으로 본 영화들)이 너무 크기 때문에 추천할 수 있는 영화가 적어지므로 그 다음으로 높은 유사도를 보이는 유저 정보를 활용한다.)

In [21]:
user_corr = pvt.T.corr()
user_corr

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.000000,0.019396,0.053052,0.176911,0.120862,0.104406,0.143785,0.128542,0.055263,-0.000307,...,0.066248,0.149934,0.186959,0.056523,0.134402,0.121958,0.254192,0.262225,0.085430,0.098693
2,0.019396,1.000000,-0.002595,-0.003808,0.013181,0.016252,0.021564,0.023748,-0.003450,0.061877,...,0.198547,0.010885,-0.004038,-0.005348,-0.007923,0.011290,0.005809,0.032723,0.024371,0.089321
3,0.053052,-0.002595,1.000000,-0.004559,0.001886,-0.004581,-0.005637,0.001701,-0.003112,-0.005504,...,0.000148,-0.000588,0.011203,-0.004824,0.003674,-0.003255,0.012881,0.008089,-0.002964,0.015953
4,0.176911,-0.003808,-0.004559,1.000000,0.121014,0.065707,0.100595,0.054231,0.002412,0.015607,...,0.072841,0.114280,0.281852,0.039692,0.065483,0.164812,0.115109,0.116843,0.023926,0.062498
5,0.120862,0.013181,0.001886,0.121014,1.000000,0.294134,0.101721,0.426575,-0.004187,0.023468,...,0.061908,0.414929,0.095386,0.254115,0.141073,0.090149,0.145760,0.122600,0.258288,0.040361
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,0.121958,0.011290,-0.003255,0.164812,0.090149,0.047476,0.172484,0.081904,0.057979,0.054858,...,0.153879,0.084190,0.224593,0.035234,0.106729,1.000000,0.115978,0.188312,0.052375,0.093788
607,0.254192,0.005809,0.012881,0.115109,0.145760,0.142158,0.173287,0.178130,0.003252,-0.004817,...,0.080027,0.187581,0.173008,0.126261,0.101129,0.115978,1.000000,0.258232,0.142529,0.098496
608,0.262225,0.032723,0.008089,0.116843,0.122600,0.137932,0.305429,0.175906,0.086221,0.048357,...,0.136304,0.174056,0.164440,0.133722,0.144878,0.188312,0.258232,1.000000,0.109556,0.248902
609,0.085430,0.024371,-0.002964,0.023926,0.258288,0.207121,0.084491,0.421626,-0.003940,0.014980,...,0.029660,0.331051,0.045991,0.232113,0.089806,0.052375,0.142529,0.109556,1.000000,0.033702


In [23]:
interested = 379
user_corr.sort_values(by=interested, ascending=False)[interested].head()

# 126번 고객은 379번의 고객과 유사한 패턴을 보이고 있음을 알 수 있다.
# 379번 고객에게 새로운 영화를 추천하려고 한다면,
# 보통 0.7이상이면 높다고 판단하지만
# 0.8이상인 126번은 너무 높아서 너무 비슷한 영화들을 추천해서 고객이 지루함을 느낄 수 있기 때문에
# 다음으로 높은 유사도를 가진 470번 유저를 활용하여 영화를 추천한다면
# 470번과 379번의 차집합(470번 - 379번) 부분에 해당하는 영화를 선별해서 추천하면 된다.

userId
379    1.000000
126    0.812737
470    0.700414
347    0.688932
94     0.688177
Name: 379, dtype: float64

In [25]:
# 379번 고객에게 새로운 맞품형 영화 추천하기
# <주의!!> 379번과 470번이 서로 교집합인 것은 추천하지 않는다.
#          도메인의 특성상 재구매보다는 새로운 것을 보여주는 것이 의미가 있다.

user_1, user_2 = 379, 470
u1 = set(combined.loc[combined['userId'] ==  user_1]['title']) # 379번 고객이 본 영화 리스트
u2 = set(combined.loc[combined['userId'] ==  user_2]['title']) # 470번 고객이 본 영화 리스트
diff = u2.difference(u1) # 차집합
diff

# 영화가 너무 많으므로 여기서도 선별해서 추천해볼 수 있겠다.

{'Addams Family Values (1993)',
 'Aladdin (1992)',
 'Beauty and the Beast (1991)',
 'Beverly Hills Cop III (1994)',
 'Broken Arrow (1996)',
 'Clueless (1995)',
 'Dead Man Walking (1995)',
 'Disclosure (1994)',
 'Dragonheart (1996)',
 'Dumb & Dumber (Dumb and Dumber) (1994)',
 'Eraser (1996)',
 'Executive Decision (1996)',
 'Fargo (1996)',
 'Father of the Bride Part II (1995)',
 'Four Weddings and a Funeral (1994)',
 'GoldenEye (1995)',
 'Grumpier Old Men (1995)',
 'Heat (1995)',
 'Jane Eyre (1996)',
 'Jumanji (1995)',
 'Legends of the Fall (1994)',
 'Little Women (1994)',
 'Mission: Impossible (1996)',
 "Mr. Holland's Opus (1995)",
 'Multiplicity (1996)',
 'Natural Born Killers (1994)',
 'Nell (1994)',
 'Nixon (1995)',
 'Nutty Professor, The (1996)',
 'Othello (1995)',
 'Phantom, The (1996)',
 'Primal Fear (1996)',
 'Quiz Show (1994)',
 'Remains of the Day, The (1993)',
 'Restoration (1995)',
 'Richard III (1995)',
 'Sabrina (1995)',
 'Star Trek: Generations (1994)',
 'Striptease (1996

In [28]:
u2_all = combined.loc[combined['userId'] == user_2] # 470번 고객이 본 영화 중에서
filtered = u2_all.loc[combined['title'].isin(diff)] # 379번과 470번 고객이 본 영화를 차집한 영화 리스트
filtered.sort_values(by='rating', ascending=False).head()

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
72918,470,1,4.0,849224825,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
72931,470,36,4.0,849370395,Dead Man Walking (1995),Crime|Drama
72992,470,736,4.0,849224778,Twister (1996),Action|Adventure|Romance|Thriller
72976,470,515,4.0,849224825,"Remains of the Day, The (1993)",Drama|Romance
72974,470,494,4.0,849370395,Executive Decision (1996),Action|Adventure|Thriller
