# Machine Learning challenge: Love is love

## Approach
> * Used glob to extract all the paths for the images
> * Used cv2 and pytesseract library to read the text from the .jpg files.
> * Removed **\n** and **contraction** then applied some spelling correction and created Sentiment scores using `SentimentIntensityAnalyzer` & `TextBlob Sentiment polarity score`
> * Removed Non alphabets, Stopwords and applied lemmatization
> * Created TFIDF vectors for the text
> * Created vector represenation for each comment
> * Stacked all these features i.e. `Sentiment scores`, `TFIDF Vectors`, `Comment Vectors` And used these as features to cluster the comments in 3 groups using `Kmeans`.
> * Further by inspection detected the comment types in the cluster and mapped `Random`, `Negative`, `Positive` accordingly.

* *`NOTE` every time we run code we need to inspect the cluster label and mapping string*

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('darkgrid')

In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
pd.set_option("display.max_colwidth", 200) 
warnings.filterwarnings("ignore", category=DeprecationWarning) 

In [4]:
import glob
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd=r'C:\Program Files\Tesseract-OCR\tesseract.exe'

<img src='https://miro.medium.com/max/1400/1*FpkRjLbfJcnNH5ucdqfvew.png' />

<img src='https://miro.medium.com/max/740/1*bVREaNJmNyPrM2MWBbDzpQ.png' />

In [5]:
paths=glob.glob('Dataset/*')

In [6]:
paths

['Dataset\\Test100.jpg',
 'Dataset\\Test1001.jpg',
 'Dataset\\Test1012.jpg',
 'Dataset\\Test1022.jpg',
 'Dataset\\Test103.jpg',
 'Dataset\\Test105.jpg',
 'Dataset\\Test107.jpg',
 'Dataset\\Test1071.jpg',
 'Dataset\\Test108.jpg',
 'Dataset\\Test109.jpg',
 'Dataset\\Test1122.jpg',
 'Dataset\\Test113.jpg',
 'Dataset\\Test114.jpg',
 'Dataset\\Test1161.jpg',
 'Dataset\\Test117.jpg',
 'Dataset\\Test119.jpg',
 'Dataset\\Test1199.jpg',
 'Dataset\\Test122.jpg',
 'Dataset\\Test1229.jpg',
 'Dataset\\Test1240.jpg',
 'Dataset\\Test125.jpg',
 'Dataset\\Test126.jpg',
 'Dataset\\Test1271.jpg',
 'Dataset\\Test1279.jpg',
 'Dataset\\Test128.jpg',
 'Dataset\\Test129.jpg',
 'Dataset\\Test1290.jpg',
 'Dataset\\Test131.jpg',
 'Dataset\\Test132.jpg',
 'Dataset\\Test133.jpg',
 'Dataset\\Test134.jpg',
 'Dataset\\Test135.jpg',
 'Dataset\\Test1353.jpg',
 'Dataset\\Test1359.jpg',
 'Dataset\\Test136.jpg',
 'Dataset\\Test140.jpg',
 'Dataset\\Test141.jpg',
 'Dataset\\Test143.jpg',
 'Dataset\\Test144.jpg',
 'Dataset\\

In [14]:
text=[]
no=[]
for i in paths:
    img=cv2.imread(i)
    text.append(pytesseract.image_to_string(img,lang = 'eng'))
    no.append(i.split('\\')[-1])

In [15]:
text

['',
 "When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",
 'LOVE\nocd)\naie al\nRao',
 'LOVE\n®',
 '',
 'n =\nSi',
 'lesbianvenom:\n\nlesbianvenom:\n\nstraighteners aren’t worth ur money i’ve been using one for three\nweeks and i’m still definitely a lesbian\n\nin all my 20 years of living this has been by far my best joke ever',
 "eee MU WALLA OL\n\nem eat nimeel ie\n\nnot living by society's\nstandards, but deep down,\nthey wish they had the\ncourage to do the same.\n\nwe Being Gay & Proud Quotes\nwww.geckoandfly.com\n\neS",
 "trased on or was the outcome of the perception and attitudes of the audience toward . The\nmajority believed that portrayal of the LGBT commundy in Indian movies was montly negative,\nwhale the same in western movies was ponitive und precise. Even though the negative putrayal\n‘of the LOOBT themes in Indian cinema was pointed out the improvements that have been made\nwere ache

In [16]:
df=pd.DataFrame(zip(text,no))
df.columns=['text','id']

In [18]:
df.head(40)

Unnamed: 0,text,id
0,,Test100.jpg
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg
2,LOVE\nocd)\naie al\nRao,Test1012.jpg
3,LOVE\n®,Test1022.jpg
4,,Test103.jpg
5,n =\nSi,Test105.jpg
6,lesbianvenom:\n\nlesbianvenom:\n\nstraighteners aren’t worth ur money i’ve been using one for three\nweeks and i’m still definitely a lesbian\n\nin all my 20 years of living this has been by far m...,Test107.jpg
7,"eee MU WALLA OL\n\nem eat nimeel ie\n\nnot living by society's\nstandards, but deep down,\nthey wish they had the\ncourage to do the same.\n\nwe Being Gay & Proud Quotes\nwww.geckoandfly.com\n\neS",Test1071.jpg
8,"trased on or was the outcome of the perception and attitudes of the audience toward . The\nmajority believed that portrayal of the LGBT commundy in Indian movies was montly negative,\nwhale the sa...",Test108.jpg
9,,Test109.jpg


In [10]:
df.head(40)

Unnamed: 0,text,id
0,,Test100.jpg
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg
2,LOVE\nocd)\naie al\nRao,Test1012.jpg
3,LOVE\n®,Test1022.jpg
4,,Test103.jpg
5,n =\nSi,Test105.jpg
6,lesbianvenom:\n\nlesbianvenom:\n\nstraighteners aren’t worth ur money i’ve been using one for three\nweeks and i’m still definitely a lesbian\n\nin all my 20 years of living this has been by far m...,Test107.jpg
7,"eee MU WALLA OL\n\nem eat nimeel ie\n\nnot living by society's\nstandards, but deep down,\nthey wish they had the\ncourage to do the same.\n\nwe Being Gay & Proud Quotes\nwww.geckoandfly.com\n\neS",Test1071.jpg
8,"trased on or was the outcome of the perception and attitudes of the audience toward . The\nmajority believed that portrayal of the LGBT commundy in Indian movies was montly negative,\nwhale the sa...",Test108.jpg
9,,Test109.jpg


In [None]:
# df.to_csv('text_data_dev.csv',index=False)

In [61]:
# df=pd.read_csv('text_data_dev.csv')

In [62]:
# df['text']=df['text'].replace(df.text[0],"")

In [63]:
df.head()

Unnamed: 0,text,id
0,,Test100.jpg
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg
2,LOVE\nocd)\naie al\nRao,Test1012.jpg
3,LOVE\n®,Test1022.jpg
4,,Test103.jpg


In [64]:
import re
import spacy
from nltk.tokenize import word_tokenize,regexp_tokenize

In [65]:
def slash_n(text):
    #removing \n
    text=re.sub('\n',' ',text)
    #converting whole string into lowercase
    text=text.lower()
    return text

In [66]:
df['text_\n']=df['text'].apply(slash_n)

In [67]:
df.head()

Unnamed: 0,text,id,text_\n
0,,Test100.jpg,
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg,"when people ask what i see in you, i just smile and look away because i'm afraid if they knew, they'd fall in love with you too."
2,LOVE\nocd)\naie al\nRao,Test1012.jpg,love ocd) aie al rao
3,LOVE\n®,Test1022.jpg,love ®
4,,Test103.jpg,


In [68]:
#importing dictonary containing all the contractions and their expander version as their values
from contr import CONTRACTION_MAP

In [69]:
def contraction(text):
    """
    This function will return the text in  an expanded form which is in common English. It also helps in generalising the tokens
    """
    tokens=text.split()
    tok=[]
    for i in tokens:
        if i in CONTRACTION_MAP.keys():
            tok.append(CONTRACTION_MAP[i])
        else:
            tok.append(i)
    return ' '.join(tok)

In [70]:
df['text_cont']=df['text_\n'].apply(contraction)

In [71]:
df.head(10)

Unnamed: 0,text,id,text_\n,text_cont
0,,Test100.jpg,,
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg,"when people ask what i see in you, i just smile and look away because i'm afraid if they knew, they'd fall in love with you too.","when people ask what i see in you, i just smile and look away because i am afraid if they knew, they would fall in love with you too."
2,LOVE\nocd)\naie al\nRao,Test1012.jpg,love ocd) aie al rao,love ocd) aie al rao
3,LOVE\n®,Test1022.jpg,love ®,love ®
4,,Test103.jpg,,
5,n =\nSi,Test105.jpg,n = si,n = si
6,lesbianvenom:\n\nlesbianvenom:\n\nstraighteners aren’t worth ur money i’ve been using one for three\nweeks and i’m still definitely a lesbian\n\nin all my 20 years of living this has been by far m...,Test107.jpg,lesbianvenom: lesbianvenom: straighteners aren’t worth ur money i’ve been using one for three weeks and i’m still definitely a lesbian in all my 20 years of living this has been by far my best ...,lesbianvenom: lesbianvenom: straighteners aren’t worth ur money i’ve been using one for three weeks and i’m still definitely a lesbian in all my 20 years of living this has been by far my best jok...
7,"eee MU WALLA OL\n\nem eat nimeel ie\n\nnot living by society's\nstandards, but deep down,\nthey wish they had the\ncourage to do the same.\n\nwe Being Gay & Proud Quotes\nwww.geckoandfly.com\n\neS",Test1071.jpg,"eee mu walla ol em eat nimeel ie not living by society's standards, but deep down, they wish they had the courage to do the same. we being gay & proud quotes www.geckoandfly.com es","eee mu walla ol em eat nimeel ie not living by society's standards, but deep down, they wish they had the courage to do the same. we being gay & proud quotes www.geckoandfly.com es"
8,"trased on or was the outcome of the perception and attitudes of the audience toward . The\nmajority believed that portrayal of the LGBT commundy in Indian movies was montly negative,\nwhale the sa...",Test108.jpg,"trased on or was the outcome of the perception and attitudes of the audience toward . the majority believed that portrayal of the lgbt commundy in indian movies was montly negative, whale the same...","trased on or was the outcome of the perception and attitudes of the audience toward . the majority believed that portrayal of the lgbt commundy in indian movies was montly negative, whale the same..."
9,,Test109.jpg,,


In [72]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from textblob import TextBlob

In [73]:
sia=SentimentIntensityAnalyzer()

In [74]:
def score(text):
    text=str(TextBlob(text).correct())
    s=sia.polarity_scores(text)['compound']
#     t=TextBlob(text).sentiment.polarity
#     s=t+s
    return s

In [75]:
df['score']=df['text_cont'].apply(score)

In [76]:
df.head()

Unnamed: 0,text,id,text_\n,text_cont,score
0,,Test100.jpg,,,0.0
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg,"when people ask what i see in you, i just smile and look away because i'm afraid if they knew, they'd fall in love with you too.","when people ask what i see in you, i just smile and look away because i am afraid if they knew, they would fall in love with you too.",0.7717
2,LOVE\nocd)\naie al\nRao,Test1012.jpg,love ocd) aie al rao,love ocd) aie al rao,0.6369
3,LOVE\n®,Test1022.jpg,love ®,love ®,0.6369
4,,Test103.jpg,,,0.0


In [129]:
df[df['score']==0]

Unnamed: 0,text,Filename,text_\n,text_cont,score,text_lemma,text_no_pun,cluster,Score
0,,Test100.jpg,,,0.0,,,0,Random
4,,Test103.jpg,,,0.0,,,0,Random
5,n =\nSi,Test105.jpg,n = si,n = si,0.0,n = si,n si,2,Negative
9,,Test109.jpg,,,0.0,,,0,Random
10,"‘Zeigler, author of ""Fair Play: How LGBT Athletes are Claiming their\nRightful Place in Sports.” defied the motion that sports are inberemty\nhomophobic. Part of what is keeping gay athletes close...",Test1122.jpg,"‘zeigler, author of ""fair play: how lgbt athletes are claiming their rightful place in sports.” defied the motion that sports are inberemty homophobic. part of what is keeping gay athletes closete...","‘zeigler, author of ""fair play: how lgbt athletes are claiming their rightful place in sports.” defied the motion that sports are inberemty homophobic. part of what is keeping gay athletes closete...",0.0,"' zeigler , author of "" fair play : how lgbt athlete be claim -PRON- rightful place in sport . "" defy the motion that sport be inberemty homophobic . part of what be keep gay athlete closet , zeig...",zeigler author of fair play how lgbt athlete be claim PRON rightful place in sport defy the motion that sport be inberemty homophobic part of what be keep gay athlete closet zeigher say iy the hyp...,1,Positive
...,...,...,...,...,...,...,...,...,...
225,,Test861.jpg,,,0.0,,,0,Random
226,“Be lieve\nyou can\n\nyou're halfway\nthere.\n\nT. ROOSEVELT,Test884.jpg,“be lieve you can you're halfway there. t. roosevelt,“be lieve you can you are halfway there. t. roosevelt,0.0,""" be lieve -PRON- can -PRON- be halfway there . t. roosevelt",be lieve PRON can PRON be halfway there t roosevelt,1,Positive
232,i\neT\n\nOO\n\ncea\nRee eal,Test941.jpg,i et oo cea ree eal,i et oo cea ree eal,0.0,i et oo cea ree eal,i et oo cea ree eal,2,Negative
236,"Mh ""Wk GAY ISA a\nASU aL LF",Test957.jpg,"mh ""wk gay isa a asu al lf","mh ""wk gay isa a asu al lf",0.0,"mh "" wk gay isa a asu al lf",mh wk gay isa a asu al lf,2,Negative


In [77]:
nlp=spacy.load('en_core_web_md')

In [78]:
#Using Spacy's Lemmatization
def lemma(text):
    doc=nlp(text)
    tok=[i.lemma_ for i in doc]
    return ' '.join(tok)

In [79]:
df['text_lemma']=df['text_cont'].apply(lemma)

In [80]:
df.head()

Unnamed: 0,text,id,text_\n,text_cont,score,text_lemma
0,,Test100.jpg,,,0.0,
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg,"when people ask what i see in you, i just smile and look away because i'm afraid if they knew, they'd fall in love with you too.","when people ask what i see in you, i just smile and look away because i am afraid if they knew, they would fall in love with you too.",0.7717,"when people ask what i see in -PRON- , i just smile and look away because i be afraid if -PRON- know , -PRON- would fall in love with -PRON- too ."
2,LOVE\nocd)\naie al\nRao,Test1012.jpg,love ocd) aie al rao,love ocd) aie al rao,0.6369,love ocd ) aie al rao
3,LOVE\n®,Test1022.jpg,love ®,love ®,0.6369,love ®
4,,Test103.jpg,,,0.0,


In [81]:
def remove_non_alpha(text):
    tok=regexp_tokenize(text,'[A-Za-z]+')
    tokn=[]
    for i in tok:
        tokn.append(i.strip())
    return ' '.join(tokn)

In [82]:
df['text_no_pun']=df['text_lemma'].apply(remove_non_alpha)

In [83]:
df.head()

Unnamed: 0,text,id,text_\n,text_cont,score,text_lemma,text_no_pun
0,,Test100.jpg,,,0.0,,
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg,"when people ask what i see in you, i just smile and look away because i'm afraid if they knew, they'd fall in love with you too.","when people ask what i see in you, i just smile and look away because i am afraid if they knew, they would fall in love with you too.",0.7717,"when people ask what i see in -PRON- , i just smile and look away because i be afraid if -PRON- know , -PRON- would fall in love with -PRON- too .",when people ask what i see in PRON i just smile and look away because i be afraid if PRON know PRON would fall in love with PRON too
2,LOVE\nocd)\naie al\nRao,Test1012.jpg,love ocd) aie al rao,love ocd) aie al rao,0.6369,love ocd ) aie al rao,love ocd aie al rao
3,LOVE\n®,Test1022.jpg,love ®,love ®,0.6369,love ®,love
4,,Test103.jpg,,,0.0,,


In [84]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [85]:
tfidf=TfidfVectorizer(stop_words='english',strip_accents='unicode')

In [86]:
tfvec=tfidf.fit_transform(df['text_no_pun'])

In [87]:
tfvec

<239x917 sparse matrix of type '<class 'numpy.float64'>'
	with 1427 stored elements in Compressed Sparse Row format>

In [88]:
from scipy.sparse import csr_matrix,hstack

In [92]:
score=csr_matrix(df.score).T

In [93]:
score

<239x1 sparse matrix of type '<class 'numpy.float64'>'
	with 79 stored elements in Compressed Sparse Column format>

In [94]:
vec=[]
for i in df.text_no_pun.values:
    vec.append(nlp(i).vector)

In [95]:
vectors=csr_matrix(vec)

In [96]:
vectors

<239x300 sparse matrix of type '<class 'numpy.float32'>'
	with 38400 stored elements in Compressed Sparse Row format>

In [97]:
#created a feature matrix
f_matrix=hstack((score,vectors,tfvec))

In [98]:
from sklearn.cluster import KMeans

In [99]:
km = KMeans(n_clusters=3)
km.fit(f_matrix)
clusters = km.labels_.tolist()

In [100]:
df['cluster']=clusters

In [101]:
df.head()

Unnamed: 0,text,id,text_\n,text_cont,score,text_lemma,text_no_pun,cluster
0,,Test100.jpg,,,0.0,,,0
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg,"when people ask what i see in you, i just smile and look away because i'm afraid if they knew, they'd fall in love with you too.","when people ask what i see in you, i just smile and look away because i am afraid if they knew, they would fall in love with you too.",0.7717,"when people ask what i see in -PRON- , i just smile and look away because i be afraid if -PRON- know , -PRON- would fall in love with -PRON- too .",when people ask what i see in PRON i just smile and look away because i be afraid if PRON know PRON would fall in love with PRON too,1
2,LOVE\nocd)\naie al\nRao,Test1012.jpg,love ocd) aie al rao,love ocd) aie al rao,0.6369,love ocd ) aie al rao,love ocd aie al rao,2
3,LOVE\n®,Test1022.jpg,love ®,love ®,0.6369,love ®,love,1
4,,Test103.jpg,,,0.0,,,0


In [102]:
df[df['cluster']==0]

Unnamed: 0,text,id,text_\n,text_cont,score,text_lemma,text_no_pun,cluster
0,,Test100.jpg,,,0.0,,,0
4,,Test103.jpg,,,0.0,,,0
9,,Test109.jpg,,,0.0,,,0
11,,Test113.jpg,,,0.0,,,0
12,,Test114.jpg,,,0.0,,,0
...,...,...,...,...,...,...,...,...
215,,Test803.jpg,,,0.0,,,0
216,,Test811.jpg,,,0.0,,,0
218,,Test824.jpg,,,0.0,,,0
225,,Test861.jpg,,,0.0,,,0


In [103]:
df[df['cluster']==1]

Unnamed: 0,text,id,text_\n,text_cont,score,text_lemma,text_no_pun,cluster
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg,"when people ask what i see in you, i just smile and look away because i'm afraid if they knew, they'd fall in love with you too.","when people ask what i see in you, i just smile and look away because i am afraid if they knew, they would fall in love with you too.",0.7717,"when people ask what i see in -PRON- , i just smile and look away because i be afraid if -PRON- know , -PRON- would fall in love with -PRON- too .",when people ask what i see in PRON i just smile and look away because i be afraid if PRON know PRON would fall in love with PRON too,1
3,LOVE\n®,Test1022.jpg,love ®,love ®,0.6369,love ®,love,1
6,lesbianvenom:\n\nlesbianvenom:\n\nstraighteners aren’t worth ur money i’ve been using one for three\nweeks and i’m still definitely a lesbian\n\nin all my 20 years of living this has been by far m...,Test107.jpg,lesbianvenom: lesbianvenom: straighteners aren’t worth ur money i’ve been using one for three weeks and i’m still definitely a lesbian in all my 20 years of living this has been by far my best ...,lesbianvenom: lesbianvenom: straighteners aren’t worth ur money i’ve been using one for three weeks and i’m still definitely a lesbian in all my 20 years of living this has been by far my best jok...,0.8750,lesbianvenom : lesbianvenom : straightener be not worth ur money -PRON- have be use one for three week and -PRON- be still definitely a lesbian in all -PRON- 20 year of live this have be by far -P...,lesbianvenom lesbianvenom straightener be not worth ur money PRON have be use one for three week and PRON be still definitely a lesbian in all PRON year of live this have be by far PRON good joke ...,1
7,"eee MU WALLA OL\n\nem eat nimeel ie\n\nnot living by society's\nstandards, but deep down,\nthey wish they had the\ncourage to do the same.\n\nwe Being Gay & Proud Quotes\nwww.geckoandfly.com\n\neS",Test1071.jpg,"eee mu walla ol em eat nimeel ie not living by society's standards, but deep down, they wish they had the courage to do the same. we being gay & proud quotes www.geckoandfly.com es","eee mu walla ol em eat nimeel ie not living by society's standards, but deep down, they wish they had the courage to do the same. we being gay & proud quotes www.geckoandfly.com es",0.9186,"eee mu walla old -PRON- eat nimeel ie not live by society 's standard , but deep down , -PRON- wish -PRON- have the courage to do the same . -PRON- be gay & proud quote www.geckoandfly.com es",eee mu walla old PRON eat nimeel ie not live by society s standard but deep down PRON wish PRON have the courage to do the same PRON be gay proud quote www geckoandfly com es,1
8,"trased on or was the outcome of the perception and attitudes of the audience toward . The\nmajority believed that portrayal of the LGBT commundy in Indian movies was montly negative,\nwhale the sa...",Test108.jpg,"trased on or was the outcome of the perception and attitudes of the audience toward . the majority believed that portrayal of the lgbt commundy in indian movies was montly negative, whale the same...","trased on or was the outcome of the perception and attitudes of the audience toward . the majority believed that portrayal of the lgbt commundy in indian movies was montly negative, whale the same...",0.8176,"trased on or be the outcome of the perception and attitude of the audience toward . the majority believe that portrayal of the lgbt commundy in indian movie be montly negative , whale the same in ...",trased on or be the outcome of the perception and attitude of the audience toward the majority believe that portrayal of the lgbt commundy in indian movie be montly negative whale the same in west...,1
...,...,...,...,...,...,...,...,...
231,you are\n¥ LOVABLE\n\nYWORTHY\nY ENOUGH\n9 BRAVE,Test937.jpg,you are ¥ lovable yworthy y enough 9 brave,you are ¥ lovable yworthy y enough 9 brave,0.7430,-PRON- be ¥ lovable yworthy y enough 9 brave,PRON be lovable yworthy y enough brave,1
233,"“Lam not free\nwhile any woman\nis unfree, even\nwhen her shackles\nare very different\nfrom my own.”",Test942.jpg,"“lam not free while any woman is unfree, even when her shackles are very different from my own.”","“lam not free while any woman is unfree, even when her shackles are very different from my own.”",-0.6602,""" lam not free while any woman be unfree , even when -PRON- shackle be very different from -PRON- own . """,lam not free while any woman be unfree even when PRON shackle be very different from PRON own,1
234,"HATE\nIr Has causED ALOT ©\nOF PROBLEMS IN THIS\nWORLD, BUT IS HAS\n\nNOT SOLVED ONE ¥ET.\n-Mava AxceLou",Test945.jpg,"hate ir has caused alot © of problems in this world, but is has not solved one ¥et. -mava axcelou","hate ir has caused alot © of problems in this world, but is has not solved one ¥et. -mava axcelou",-0.6620,"hate ir have cause alot © of problem in this world , but be have not solve one ¥ et . -mava axcelou",hate ir have cause alot of problem in this world but be have not solve one et mava axcelou,1
235,"“Tam not free\nwhile any woman\nis unfree, even\nwhen her shackles\nare very different\nfrom my own.”\n\nAUDRE LORDE\n\nGH",Test946.jpg,"“tam not free while any woman is unfree, even when her shackles are very different from my own.” audre lorde gh","“tam not free while any woman is unfree, even when her shackles are very different from my own.” audre lorde gh",-0.6602,""" tam not free while any woman be unfree , even when -PRON- shackle be very different from -PRON- own . "" audre lorde gh",tam not free while any woman be unfree even when PRON shackle be very different from PRON own audre lorde gh,1


In [104]:
df[df['cluster']==2]

Unnamed: 0,text,id,text_\n,text_cont,score,text_lemma,text_no_pun,cluster
2,LOVE\nocd)\naie al\nRao,Test1012.jpg,love ocd) aie al rao,love ocd) aie al rao,0.6369,love ocd ) aie al rao,love ocd aie al rao,2
5,n =\nSi,Test105.jpg,n = si,n = si,0.0,n = si,n si,2
34,eS\n? >,Test136.jpg,es ? >,es ? >,0.0,es ? >,es,2
43,"Few Say Being LGRT Is 4 Negative\nFactor in Theis tite\n\nee es raat arentn rete\n\nae\n=\n—-— = -\n—~—\ncomes, p> | ”",Test154.jpg,"few say being lgrt is 4 negative factor in theis tite ee es raat arentn rete ae = —-— = - —~— comes, p> | ”","few say being lgrt is 4 negative factor in theis tite ee es raat arentn rete ae = —-— = - —~— comes, p> | ”",-0.5719,"few say be lgrt be 4 negative factor in theis tite ee es raat arentn rete ae = — - — = - — ~ — come , p > | """,few say be lgrt be negative factor in theis tite ee es raat arentn rete ae come p,2
45,fs,Test160.jpg,fs,fs,0.0,fs,fs,2
47,Slrleakectieesrema’\nee\nelena\neen reas\npeor iolrngeieane terial\npoorer rresronsy\necient!\neet tne\na nr\n\nreed,Test1615.jpg,slrleakectieesrema’ ee elena een reas peor iolrngeieane terial poorer rresronsy ecient! eet tne a nr reed,slrleakectieesrema’ ee elena een reas peor iolrngeieane terial poorer rresronsy ecient! eet tne a nr reed,-0.7088,slrleakectieesrema ' ee elena een reas peor iolrngeieane terial poor rresronsy ecient ! eet tne a nr reed,slrleakectieesrema ee elena een reas peor iolrngeieane terial poor rresronsy ecient eet tne a nr reed,2
58,eS -»- p> tb\n\non ‘agedy in their\nbe Acree ar Tel\nic haircuts and\n\n—_ ff.,Test1724.jpg,es -»- p> tb on ‘agedy in their be acree ar tel ic haircuts and —_ ff.,es -»- p> tb on ‘agedy in their be acree ar tel ic haircuts and —_ ff.,0.6486,es -»- p > tb on ' agedy in -PRON- be acree ar tel ic haircuts and — _ ff .,es p tb on agedy in PRON be acree ar tel ic haircuts and ff,2
66,cee to mye tae aye ee\nes et tne\n\n(nt my 2 pf ng hn De ye yh,Test179.jpg,cee to mye tae aye ee es et tne (nt my 2 pf ng hn de ye yh,cee to mye tae aye ee es et tne (nt my 2 pf ng hn de ye yh,-0.296,cee to mye tae aye ee es et tne ( nt -PRON- 2 pf ng hn de ye yh,cee to mye tae aye ee es et tne nt PRON pf ng hn de ye yh,2
71,Ao\n\nsie\n\nmy\n\nry\n\nmg\n\no\nz\nu\n=F\nJ\n°\n14\ni\nz\nu\niy\n-,Test183.jpg,ao sie my ry mg o z u =f j ° 14 i z u iy -,ao sie my ry mg o z u =f j ° 14 i z u iy -,0.0,ao sie -PRON- ry mg o z u = f j ° 14 i z u iy -,ao sie PRON ry mg o z u f j i z u iy,2
72,oe,Test1837.jpg,oe,oe,0.0,oe,oe,2


In [196]:
# df['Category']=df.cluster.map({:'Random',:'Positive',:'Negative'}) ##template

In [105]:
df['Category']=df.cluster.map({0:'Random',1:'Positive',2:'Negative'})

In [106]:
df.columns=['text', 'Filename', 'text_\n', 'text_cont', 'score', 'text_lemma','text_no_pun', 'cluster', 'Score']

In [107]:
df.head()

Unnamed: 0,text,Filename,text_\n,text_cont,score,text_lemma,text_no_pun,cluster,Score
0,,Test100.jpg,,,0.0,,,0,Random
1,"When people ask\nwhat I see in you,\nI just smile and\nlook away because\nI'm afraid if they knew,\nthey'd fall in love\nwith you too.",Test1001.jpg,"when people ask what i see in you, i just smile and look away because i'm afraid if they knew, they'd fall in love with you too.","when people ask what i see in you, i just smile and look away because i am afraid if they knew, they would fall in love with you too.",0.7717,"when people ask what i see in -PRON- , i just smile and look away because i be afraid if -PRON- know , -PRON- would fall in love with -PRON- too .",when people ask what i see in PRON i just smile and look away because i be afraid if PRON know PRON would fall in love with PRON too,1,Positive
2,LOVE\nocd)\naie al\nRao,Test1012.jpg,love ocd) aie al rao,love ocd) aie al rao,0.6369,love ocd ) aie al rao,love ocd aie al rao,2,Negative
3,LOVE\n®,Test1022.jpg,love ®,love ®,0.6369,love ®,love,1,Positive
4,,Test103.jpg,,,0.0,,,0,Random


In [108]:
df.shape

(239, 9)

In [109]:
test=pd.read_csv('Test.csv')

In [110]:
test.head()

Unnamed: 0,Filename,Category
0,Test1001.jpg,
1,Test1012.jpg,
2,Test1022.jpg,
3,Test1071.jpg,
4,Test1122.jpg,


In [111]:
test.shape

(239, 2)

In [124]:
sub=pd.merge(test,df,on='Filename')[['Filename','Score']]

In [125]:
sub

Unnamed: 0,Filename,Score
0,Test1001.jpg,Positive
1,Test1012.jpg,Negative
2,Test1022.jpg,Positive
3,Test1071.jpg,Positive
4,Test1122.jpg,Positive
...,...,...
234,Test243.jpg,Positive
235,Test244.jpg,Random
236,Test245.jpg,Positive
237,Test249.jpg,Positive


In [126]:
sub.columns=test.columns

In [127]:
sub

Unnamed: 0,Filename,Category
0,Test1001.jpg,Positive
1,Test1012.jpg,Negative
2,Test1022.jpg,Positive
3,Test1071.jpg,Positive
4,Test1122.jpg,Positive
...,...,...
234,Test243.jpg,Positive
235,Test244.jpg,Random
236,Test245.jpg,Positive
237,Test249.jpg,Positive


In [128]:
sub.to_csv('submission1.csv',index=False)