# 리뷰 태그화 및 대표 태그 선정
영화 별 리뷰를 모아 대표하는 태그를 선정한다<br>
블로그 형식의 전시 리뷰는 **(1) 요약 후 형용사 추출 -> 대표 형용사 선정**과 (2)방법을 같이 사용<br>
짧은 리뷰가 많은 영화의 경우는 **(2) 전체 리뷰를 대상으로 형용사 추출 및 빈도수(등 기준)에 따라 대표 형용사 선정**

In [1]:
import pandas as pd
import numpy as np

from tqdm.notebook import tqdm
import nltk
#nltk.download('stopwords')
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
from nltk.tokenize import word_tokenize
import gensim
from gensim.models import Word2Vec
from sklearn.cluster import KMeans
from collections import Counter

import re

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Kim\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


### 리뷰 태그화 및 형용사 추출

In [2]:
def cleaning(word):
    word=re.sub('[-=+,#/\?:^$.@*\"※~&%ㆍ!』\\‘|\(\)\[\]\<\>`\'…》]','',word)
    return word
    
def remove_value(in_list):
    val = ['at','in','all','on','the','i','my',
           'no','todays','other','our','ours','you','your','myself']
    return [cleaning(word).lower() for word in in_list if word not in val]

def get_tags_to_txt(text):
    if type(text) != str:
        return ''
    text = word_tokenize(text)
    tags_dump = nltk.pos_tag(text)
    tags = []
    
    for word in tags_dump:
        if (word[1] == 'JJ') and (word[0] not in stop_words):
            tags.append(delete_punctuation_marks_tag(word[0]))
            
    tags = ','.join(tags)
    
    return tags

def delete_punctuation_marks_tag(word):
    word = word.replace(",","")
    word = word.replace(".","")
    word = word.replace(",","")
    word = word.replace("!","")
    word = word.replace(":","")
    word = word.replace("?","")
    word = word.replace("~","")
    word = word.replace("\\","")
    word = word.replace("\"","")
    word = word.replace(";","")
    return word.lower()

In [3]:
watcha_review=pd.read_csv('../data/movie_watcha_review.csv', index_col=0)

In [4]:
watcha_review.head()

Unnamed: 0,movie_id,user,rate,likes,review,en_review
0,0,최한영,4.5,656.0,뜨지 못한 명작,an outstanding masterpiece
1,0,seulkiki,5.0,575.0,사회라는 거대한 틀에서 표류하고있는 두 사람의 극복기. 정말괜찮은 영화.,a really nice movie about two people drifting ...
2,0,DamianJeonDongHyun,4.0,360.0,정말 정재영의 연기중 손에 꼽을 만한 작품이라고 말해주고 싶다.\r\n실제 사회에서...,I really want to tell you that this is one of ...
3,0,새별님,5.0,311.0,나는 이런 영화가 좋더라..감독은 틀림없이 따뜻한 사람일 것이다,I like movies like this. The director must be ...
4,0,BB,5.0,301.0,이 영화에서 가장 난해한건 바로 포스터ㅋ,The most difficult thing in this movie is the ...


In [5]:
review_tag=pd.DataFrame(columns=['movie_id','snt_review','tags'])

for i in range(670):
    movie=watcha_review.loc[watcha_review['movie_id']==i,:]
    snt=""
    for j in range(movie.index[0],movie.index[0]+len(movie)):
        snt+=movie.loc[j,'en_review']+" "
    temp=pd.Series([i,snt,None],index=['movie_id','snt_review','tags'] )
    review_tag=review_tag.append(temp,ignore_index=True)
review_tag.head()

Unnamed: 0,movie_id,snt_review,tags
0,0,an outstanding masterpiece a really nice movie...,
1,1,How many people can understand this movie? If ...,
2,2,"I was ashamed of the movie's social calling, b...",
3,3,Here's another well-made Spanish thriller. Eve...,
4,4,Why can't we make disaster films so plain and ...,


In [6]:
for i in tqdm(review_tag['movie_id']):
    review_tag.loc[i,'tags'] = get_tags_to_txt(review_tag.loc[i,'snt_review'])

HBox(children=(IntProgress(value=0, max=670), HTML(value='')))




In [7]:
review_tag.to_csv('review_tag.csv')

### convert tag to vector

In [8]:
review_tag=pd.read_csv('review_tag.csv', index_col=0)
review_tag.head()

Unnamed: 0,movie_id,snt_review,tags
0,0,an outstanding masterpiece a really nice movie...,"outstanding,nice,huge,fresh,real,warm,difficul..."
1,1,How many people can understand this movie? If ...,"many,unique,hard,similar,hard,slow,slow,long,s..."
2,2,"I was ashamed of the movie's social calling, b...","social,first,current,second,different,first,gr..."
3,3,Here's another well-made Spanish thriller. Eve...,"well-made,spanish,every,spanish,spanish,little..."
4,4,Why can't we make disaster films so plain and ...,"disaster,plain,immersive,well-made,simple,over..."


In [9]:
#all
token=[]
for i in range(670):
    token.extend(review_tag.iloc[i,2].replace('\'','').replace("[",'').replace("]",'').replace(" ","").split(","))

In [10]:
#tags of each movie
token_list=[]
for i in range(len(review_tag)):
    temp=review_tag.iloc[i,2].replace('\'','').replace("[",'').replace("]",'').replace(" ","").split(",")
    token_list.append(temp)

In [11]:
token_dict={}
for word in token:
    try:
        token_dict[word]+=1
    except:
        token_dict[word]=1
token_dict_sorted=sorted(token_dict.items(), key=(lambda x:x[1]), reverse=True)
token_dict_sorted[:5]

[('good', 2964), ('first', 1372), ('much', 997), ('many', 960), ('great', 901)]

### clustering vectors

In [12]:
w2v = gensim.models.KeyedVectors.load_word2vec_format('./GoogleNews-vectors-negative300.bin.gz', binary=True)

In [13]:
new_df = pd.DataFrame()
for word in token_dict_sorted:
    try:
        f_vec = w2v.get_vector(word[0])
        new_df[word[0]] = f_vec
    except:
        pass

In [14]:
x = new_df.T
cluster_count=100
model = KMeans(n_clusters=cluster_count)
model.fit(x)
model.predict(x)
Y=x.copy()
Y['kmeans_id'] = model.predict(x)

In [15]:
km = list(Y['kmeans_id'])
max_count = 0
max_count_id = ''
for i in range (0, cluster_count):
    if km.count(i) > max_count:
        max_count = km.count(i);
        max_count_id = i;
    print(i,': ', km.count(i))
max_count_id

0 :  24
1 :  43
2 :  30
3 :  34
4 :  79
5 :  50
6 :  42
7 :  92
8 :  59
9 :  39
10 :  45
11 :  42
12 :  44
13 :  42
14 :  3
15 :  44
16 :  67
17 :  73
18 :  26
19 :  3
20 :  16
21 :  34
22 :  55
23 :  99
24 :  30
25 :  76
26 :  14
27 :  38
28 :  15
29 :  77
30 :  47
31 :  44
32 :  37
33 :  44
34 :  3
35 :  31
36 :  54
37 :  17
38 :  26
39 :  1
40 :  17
41 :  24
42 :  62
43 :  97
44 :  15
45 :  16
46 :  43
47 :  91
48 :  15
49 :  24
50 :  39
51 :  33
52 :  38
53 :  48
54 :  15
55 :  73
56 :  85
57 :  26
58 :  26
59 :  35
60 :  14
61 :  30
62 :  33
63 :  31
64 :  2
65 :  44
66 :  43
67 :  49
68 :  13
69 :  48
70 :  46
71 :  66
72 :  38
73 :  37
74 :  202
75 :  65
76 :  52
77 :  46
78 :  42
79 :  10
80 :  40
81 :  37
82 :  21
83 :  20
84 :  20
85 :  146
86 :  8
87 :  74
88 :  4
89 :  6
90 :  2
91 :  2
92 :  33
93 :  48
94 :  28
95 :  44
96 :  5
97 :  17
98 :  74
99 :  15


74

In [16]:
km_df = pd.DataFrame()
km_df['group_'+str(max_count_id)] = pd.Series(list(Y[Y['kmeans_id'] == max_count_id].index))
for i in range(0, cluster_count):
    col_name = 'group_'+str(i)
    words = list(Y[Y['kmeans_id'] == i].index)
    km_df[col_name] = pd.Series(words)

In [17]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
km_df.fillna(0)

Unnamed: 0,group_74,group_0,group_1,group_2,group_3,group_4,group_5,group_6,group_7,group_8,group_9,group_10,group_11,group_12,group_13,group_14,group_15,group_16,group_17,group_18,group_19,group_20,group_21,group_22,group_23,group_24,group_25,group_26,group_27,group_28,group_29,group_30,group_31,group_32,group_33,group_34,group_35,group_36,group_37,group_38,group_39,group_40,group_41,group_42,group_43,group_44,group_45,group_46,group_47,group_48,group_49,group_50,group_51,group_52,group_53,group_54,group_55,group_56,group_57,group_58,group_59,group_60,group_61,group_62,group_63,group_64,group_65,group_66,group_67,group_68,group_69,group_70,group_71,group_72,group_73,group_75,group_76,group_77,group_78,group_79,group_80,group_81,group_82,group_83,group_84,group_85,group_86,group_87,group_88,group_89,group_90,group_91,group_92,group_93,group_94,group_95,group_96,group_97,group_98,group_99
0,single,daily,smart,sudden,understood,live,strong,boring,last,useless,interesting,calm,annoying,narrative,cool,alive,ambiguous,evil,black,old,free,chicken,emotional,faithful,main,violent,modern,several,many,cruel,wrong,young,visual,happy,scientific,noir,scary,acting,animal,korean,center,shallow,much,attractive,taiwanese,first,disappointing,actress,human,musical,false,difficult,tragic,detective,beautiful,dizzy,uncomfortable,social,yoo,romantic,convincing,funeral,good,able,fresh,gravitational,famous,martial,goose,sexual,silent,heavy,facial,detailed,impressive,attracted,bad,popcorn,sweet,scariest,infinite,crazy,provocative,funny,poor,little,criminal,empty,bluish,cognitive,underfed,wounded,sophisticated,cute,song,weak,bed,impossible,small,forced
1,light,endless,meticulous,unexpected,added,rid,high,predictable,new,unrealistic,important,cheerful,jealous,cinematic,warm,intact,complex,mysterious,white,married,unlimited,bean,empathy,noble,real,bloody,classic,ten,different,humane,childish,female,serial,sorry,quantum,thriller,vivid,responsible,teddy,japanese,0,naval,comfortable,easy,greek,second,painful,spy,inner,classical,exaggerated,awkward,fatal,representative,pleasant,drunk,angry,historical,ryu,lonely,straight,bereaved,great,likely,healthy,magnetic,favorite,liberal,fish,erotic,loud,powerful,shin,independent,dramatic,lost,sad,sticky,delicious,worst,absolute,mental,hateful,fun,inferior,hard,fake,dead,yellowish,auditory,emaciated,injured,passive,handsome,solo,sloppy,sleep,invisible,worth,desperate
2,middle,constant,charismatic,inevitable,emphasized,shed,normal,dull,original,unnecessary,unique,sincere,arrogant,novel,cold,awake,vague,fantasy,dark,girl,prepaid,noodle,sympathy,righteous,true,killing,sexy,half,similar,immoral,stupid,male,immersive,glad,biological,hardboiled,creepy,immersed,dog,american,0,underwater,smooth,realistic,sebastian,third,miserable,fictional,rational,music,contrary,bitter,fateful,critic,lovely,unconscious,afraid,political,gon,love,chase,deceased,perfect,welcome,hungry,0,popular,ideological,ant,lesbian,tearful,intense,nose,advanced,spectacular,touched,strange,poisonous,chewy,hardest,superior,mad,insensitive,comic,questionable,big,suspicious,hidden,reddish,sensory,0,0,functional,clumsy,sound,slow,goodnight,incomprehensible,ordinary,unable
3,wild,everyday,talented,unusual,highlight,forgive,overall,tacky,long,meaningless,successful,friendly,unkind,genre,hot,0,twisted,fairy,red,boy,0,kimchi,anguish,peaceful,obvious,suicide,elegant,hundred,various,despicable,innocent,elementary,graphic,disappointed,experimental,0,sorrowful,guilty,puppy,chinese,0,dam,usual,dangerous,fucking,fourth,frustrating,actor,philosophical,ensemble,distorted,harsh,destructive,police,magical,asleep,tired,commercial,roh,kiss,lead,anniversary,nice,willing,raw,0,top,revolutionary,whale,homosexual,smile,fierce,spleen,comprehensive,memorable,used,terrible,underwear,fried,funniest,irreplaceable,obsessive,abusive,witty,insufficient,right,illegal,unknown,discolored,dopamine,0,0,flexible,charming,lyrical,mixed,sofa,indescribable,huge,failed
4,artificial,regular,intelligent,unnatural,echo,miss,low,cheesy,previous,trivial,exciting,quiet,unfriendly,artistic,dry,0,elaborate,magic,colorful,born,0,cheese,happiness,patriotic,simple,drunken,stylish,few,countless,vile,selfish,disabled,virtual,excited,mathematical,0,stark,restrained,pet,british,0,tidal,conventional,necessary,lol,fifth,upset,fable,moral,orchestra,deceptive,delicate,chronic,lieutenant,golden,numb,curious,public,iti,unrequited,decisive,memorial,amazing,ready,fat,0,former,colonial,insect,porn,fist,brutal,patient,thorough,rare,shot,ridiculous,toxic,greasy,darkest,indelible,addictive,inappropriate,humorous,lacking,whole,fraudulent,broken,0,neural,0,0,strict,clever,vocal,soft,couch,unbearable,excessive,reluctant
5,wide,repeated,experienced,abnormal,express,attract,sharp,monotonous,full,distracting,meaningful,gentle,snobbish,literary,rainy,0,imperfect,eternal,bright,son,0,soybean,regret,vain,clear,gunshot,fancy,twenty,individual,reprehensible,unfair,paternal,remote,surprised,structural,0,grim,focused,elephant,french,0,solar,mature,cheap,julian,seventh,shocking,suspense,intellectual,musician,biased,rough,uncontrollable,chief,magnificent,nauseous,interested,personal,choi,extramarital,runaway,obituary,excellent,wish,frozen,0,legendary,conservative,bear,pornographic,noisy,vicious,hip,upgraded,surprising,left,regrettable,trash,savory,harshest,irresistible,obsessed,misogynistic,laugh,shoddy,sure,fictitious,forgotten,0,somatic,0,0,voluntary,bold,mournful,tight,0,unimaginable,certain,eager
6,north,persistent,skillful,accidental,feels,overcome,stable,sentimental,next,frivolous,creative,affectionate,timid,theatrical,dreary,0,complicated,divine,blue,guardian,0,bread,fear,brave,deep,armed,luxurious,eleven,numerous,deplorable,foolish,pregnant,screen,grateful,linear,0,terrifying,vested,baby,asian,0,archipelago,efficient,appropriate,sean,consecutive,heartbreaking,superhero,spiritual,cello,factual,tense,severe,intern,splendid,unconsciousness,nervous,national,yuan,friendship,finish,autopsy,wonderful,unlikely,organic,0,loyal,antiwar,jellyfish,pervert,solemn,offensive,hair,standard,improbable,moved,pathetic,toilet,spicy,ugliest,exemplary,fanatic,unbecoming,comical,unqualified,enough,malicious,closed,0,0,0,0,rigid,ingenious,rhythmical,gloomy,0,unspeakable,limited,pressured
7,opposite,frequent,instinct,intentional,understanding,write,positive,melodramatic,special,hopeless,persuasive,generous,hostile,poetic,snow,0,subjective,alien,green,buddy,0,soju,sadness,faith,common,masked,sensual,sixty,other,heinous,lazy,unemployed,electronic,impressed,spatial,0,hellish,trusted,doggy,spanish,0,marine,deeper,safe,n,sixth,unlucky,caper,paradoxical,soloists,groundless,tough,lethal,corporal,dreamy,drugged,satisfied,military,yoon,rendezvous,rival,honor,solid,please,eating,0,beloved,democratic,frog,heterosexual,breathless,stiff,mouth,available,extraordinary,unraveled,unfortunate,bong,salty,coolest,unbreakable,impulsive,lewd,gag,deficient,precious,unauthorized,isolated,0,0,0,0,unified,flamboyant,acoustic,unconvincing,0,insurmountable,large,unwilling
8,western,sporadic,energetic,random,reveal,follow,negative,cliché,short,futile,dynamic,honest,unsympathetic,art,wet,0,fragmentary,supernatural,thick,girlfriend,0,rice,hatred,unconditional,natural,fistfight,sensuous,whopping,respective,barbaric,irresponsible,teen,manual,proud,schematic,0,suspenseful,organized,infant,english,0,ocean,balanced,active,ps,7th,disastrous,story,vicarious,concert,suppressed,chaotic,deadly,assistant,exquisite,sleepless,worried,psychological,mahjong,affair,unanswered,homecoming,brilliant,expected,grown,0,leading,patriarchal,fox,gay,teary,rampant,needle,refined,thrilling,hit,unpleasant,iced,fluffy,happiest,invincible,psychopathic,hurtful,hilarious,unsatisfactory,sick,0,torn,0,0,0,0,cohesive,playful,resonant,thin,0,intolerable,vast,scrambled
9,outside,occasional,ambitious,coincidental,revealed,survive,flat,cliche,past,wasteful,refreshing,sympathetic,irritating,documentary,sunny,0,confusing,immortal,yellow,bedside,0,wine,anger,courageous,typical,shooting,flashy,thousand,distinct,inhumane,ignorant,child,digital,lucky,qualitative,0,horror,supporting,orangutan,christian,0,lake,lean,unattractive,brian,2nd,embarrassing,villainous,humanistic,piano,deceiving,messy,devastating,lady,harmonious,blindfolded,careful,domestic,ma,clingy,crown,centennial,outstanding,determined,carcinogenic,0,controversial,capitalist,migratory,sex,expressionless,fiery,healing,summary,explosive,caught,weird,urine,cookie,sexiest,impeccable,alcoholic,vulgar,comedy,uncorrected,touching,0,unidentified,0,0,0,0,systematic,unconventional,catchy,lethargic,0,unheard,fair,threatening


### make item-adjective matrix

In [18]:
movie_review_by_user = pd.DataFrame(index = watcha_review['user'].unique(), columns=['reviews'])

for id in tqdm(watcha_review['user'].unique()):
    review_all = ''
    for review in watcha_review.loc[watcha_review['user']==id,'en_review']:
        if type(review) != float:
            review_all += review+'. '
    movie_review_by_user.loc[id,'reviews'] = review_all

HBox(children=(IntProgress(value=0, max=6156), HTML(value='')))




In [19]:
movie_review_by_user['tags']=''

In [20]:
movie_review_by_user.head()

Unnamed: 0,reviews,tags
최한영,an outstanding masterpiece. Until I became a k...,
seulkiki,a really nice movie about two people drifting ...,
DamianJeonDongHyun,I really want to tell you that this is one of ...,
새별님,I like movies like this. The director must be ...,
BB,The most difficult thing in this movie is the ...,


In [21]:
for i in tqdm(movie_review_by_user.index):
    movie_review_by_user.loc[i,'tags'] = get_tags_to_txt(movie_review_by_user.loc[i,'reviews'])

HBox(children=(IntProgress(value=0, max=6156), HTML(value='')))




In [22]:
movie_matrix = pd.DataFrame(index=review_tag['movie_id'].index,columns=list(km_df.columns))
user_matrix = pd.DataFrame(index=movie_review_by_user.index,columns=list(km_df.columns))
movie_matrix = movie_matrix.fillna(0)
user_matrix = user_matrix.fillna(0)

In [23]:
for id in tqdm(movie_matrix.index):
    tags = Counter(review_tag.loc[id,'tags'].split(','))
    for tag in tags.keys():
        for num in range(0,cluster_count):
            if tag in set(km_df['group_'+str(num)]):
                movie_matrix.loc[id,'group_'+str(num)] += tags[tag]

HBox(children=(IntProgress(value=0, max=670), HTML(value='')))




In [24]:
for id in tqdm(user_matrix.index):
    tags = Counter(movie_review_by_user.loc[id,'tags'].split(','))
    for tag in tags.keys():
        for num in range(0,cluster_count):
            if tag in set(km_df['group_'+str(num)]):
                user_matrix.loc[id,'group_'+str(num)] += tags[tag]

HBox(children=(IntProgress(value=0, max=6156), HTML(value='')))




In [25]:
movie_matrix.head()

Unnamed: 0,group_74,group_0,group_1,group_2,group_3,group_4,group_5,group_6,group_7,group_8,group_9,group_10,group_11,group_12,group_13,group_14,group_15,group_16,group_17,group_18,group_19,group_20,group_21,group_22,group_23,group_24,group_25,group_26,group_27,group_28,group_29,group_30,group_31,group_32,group_33,group_34,group_35,group_36,group_37,group_38,group_39,group_40,group_41,group_42,group_43,group_44,group_45,group_46,group_47,group_48,group_49,group_50,group_51,group_52,group_53,group_54,group_55,group_56,group_57,group_58,group_59,group_60,group_61,group_62,group_63,group_64,group_65,group_66,group_67,group_68,group_69,group_70,group_71,group_72,group_73,group_75,group_76,group_77,group_78,group_79,group_80,group_81,group_82,group_83,group_84,group_85,group_86,group_87,group_88,group_89,group_90,group_91,group_92,group_93,group_94,group_95,group_96,group_97,group_98,group_99
0,3,0,0,1,0,1,2,1,7,1,5,2,0,0,6,0,1,1,3,0,0,1,1,1,7,0,3,0,4,0,1,1,2,1,0,0,2,0,0,1,0,0,2,0,1,3,0,1,2,0,1,4,0,0,2,0,5,4,0,1,0,0,9,0,2,0,0,0,0,0,0,2,1,0,1,0,5,0,1,0,0,0,0,2,0,10,1,4,0,0,0,0,2,0,0,3,0,0,2,1
1,2,0,1,3,0,1,4,4,12,0,8,3,0,0,2,0,2,1,2,1,0,0,1,0,4,0,3,0,2,0,1,1,1,3,0,0,1,0,0,2,0,0,0,3,0,2,1,0,1,0,2,2,1,0,0,0,5,2,0,0,0,0,12,1,0,0,0,0,0,0,0,1,0,1,3,0,1,0,1,0,0,3,0,1,0,5,1,0,0,0,0,0,0,0,0,5,0,1,2,0
2,4,1,0,1,2,0,4,1,12,0,2,2,0,2,3,0,1,0,1,0,0,0,0,0,12,0,2,0,15,0,1,0,0,1,0,0,1,0,0,2,0,0,0,8,8,15,0,0,4,0,1,3,1,0,6,0,3,18,0,3,0,0,6,0,0,0,0,1,0,0,0,1,0,0,2,1,5,0,3,0,0,0,0,0,1,10,0,0,0,0,0,0,0,2,0,0,0,1,2,0
3,3,0,0,2,0,0,3,2,2,1,1,0,1,0,1,0,0,0,0,0,0,0,0,2,8,0,1,0,1,1,3,1,1,1,0,0,2,1,0,9,0,0,3,2,1,0,0,0,1,0,1,0,1,0,0,0,5,0,0,0,0,0,12,3,3,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,2,1,2,0,14,0,2,0,0,0,0,0,1,0,3,0,0,1,0
4,1,0,1,0,0,0,5,0,10,1,1,1,0,0,2,1,2,0,2,0,0,0,0,1,13,0,0,0,1,0,0,0,4,3,0,0,2,2,0,3,0,1,0,5,0,5,0,0,1,0,0,1,1,0,0,0,2,2,0,0,1,0,7,1,0,0,0,0,0,0,1,2,0,2,3,0,3,0,0,0,1,0,1,2,0,5,1,0,0,0,0,0,0,0,0,1,0,0,2,0


In [26]:
user_matrix.head()

Unnamed: 0,group_74,group_0,group_1,group_2,group_3,group_4,group_5,group_6,group_7,group_8,group_9,group_10,group_11,group_12,group_13,group_14,group_15,group_16,group_17,group_18,group_19,group_20,group_21,group_22,group_23,group_24,group_25,group_26,group_27,group_28,group_29,group_30,group_31,group_32,group_33,group_34,group_35,group_36,group_37,group_38,group_39,group_40,group_41,group_42,group_43,group_44,group_45,group_46,group_47,group_48,group_49,group_50,group_51,group_52,group_53,group_54,group_55,group_56,group_57,group_58,group_59,group_60,group_61,group_62,group_63,group_64,group_65,group_66,group_67,group_68,group_69,group_70,group_71,group_72,group_73,group_75,group_76,group_77,group_78,group_79,group_80,group_81,group_82,group_83,group_84,group_85,group_86,group_87,group_88,group_89,group_90,group_91,group_92,group_93,group_94,group_95,group_96,group_97,group_98,group_99
최한영,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
seulkiki,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0
DamianJeonDongHyun,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,2,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0
새별님,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
BB,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [27]:
movie_matrix.to_csv('../data/movie_matrix.csv')
user_matrix.to_csv('../data/user_matrix.csv')
km_df.to_csv("../data/movie_cluster.csv", index=False)
km_df.count().to_csv('../data/group_count.csv')

  after removing the cwd from sys.path.


In [6]:
movie_matrix = pd.read_csv('../data/movie_matrix.csv')
movie_matrix.index.name = 'p_id'
movie_matrix
movie_matrix.to_csv('../data/movie_matrix.csv')

In [7]:
user_matrix = pd.read_csv('../data/movie_matrix.csv', index_col = 0)
user_matrix.index.name = 'u_id'
user_matrix
user_matrix.to_csv('../data/movie_matrix.csv')