The **Amazon Fine Food Reviews** dataset (Kaggle Dataset) consists of reviews of fine foods from Amazon.

Number of reviews: 568,454<br>
Number of users: 256,059<br>
Number of products: 74,258<br>
Timespan: Oct 1999 - Oct 2012<br>
Number of Attributes/Columns in data: 10 

Attribute Information:

1. Id
2. ProductId - unique identifier for the product
3. UserId - unqiue identifier for the user
4. ProfileName
5. HelpfulnessNumerator - number of users who found the review helpful
6. HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
7. Score - rating between 1 and 5
8. Time - timestamp for the review
9. Summary - brief summary of the review
10. Text - text of the review

#### Objective:
Given a review, determine whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2).

To determine if a review is positive or negative : we could use the Score/Rating. A rating of 4 or 5 could be considered a positive review. A review of 1 or 2 could be considered negative. A review of 3 is nuetral and can be dropped(ignored). This is a way of determining the polarity (positivity/negativity) of a review.

#### Loading the data
The dataset is available in 2 forms:-
1. .csv file
2. SQLite Database

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import sqlite3    
import pickle     

from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.feature_extraction.text import TfidfVectorizer  

from sklearn.manifold import TSNE   


import re
import nltk
from nltk.corpus import stopwords

import gensim    



In [2]:
conn = sqlite3.connect('database.sqlite')

filtered_data = pd.read_sql_query("""
SELECT *
FROM Reviews
WHERE Score != 3
""", conn)

conn.close()

filtered_data.head()

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,5,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,1,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...
2,3,B000LQOCH0,ABXLMWJIXXAIN,"Natalia Corres ""Natalia Corres""",1,1,4,1219017600,"""Delight"" says it all",This is a confection that has been around a fe...
3,4,B000UA0QIQ,A395BORC6FGVXV,Karl,3,3,2,1307923200,Cough Medicine,If you are looking for the secret ingredient i...
4,5,B006K2ZZ7K,A1UQRSCLF8GW1T,"Michael D. Bigham ""M. Wassir""",0,0,5,1350777600,Great taffy,Great taffy at a great price. There was a wid...


In [3]:
filtered_data.shape

(525814, 10)

In [4]:
# Give reviews with Score>3 a positive rating, and reviews with a score<3 a negative rating.
def partition(x):
    if x < 3:
        return 'negative'
    return 'positive'

filtered_data['Score'] = filtered_data['Score'].map(partition)

In [5]:
filtered_data.head()

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,positive,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,negative,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...
2,3,B000LQOCH0,ABXLMWJIXXAIN,"Natalia Corres ""Natalia Corres""",1,1,positive,1219017600,"""Delight"" says it all",This is a confection that has been around a fe...
3,4,B000UA0QIQ,A395BORC6FGVXV,Karl,3,3,negative,1307923200,Cough Medicine,If you are looking for the secret ingredient i...
4,5,B006K2ZZ7K,A1UQRSCLF8GW1T,"Michael D. Bigham ""M. Wassir""",0,0,positive,1350777600,Great taffy,Great taffy at a great price. There was a wid...


## EDA

Data Cleaning - Removing the duplicates as the real world data contains duplicate entries which must be removed otherwise we may get biased results.

In [6]:
filtered_data['Time'].nunique()

3157

In [7]:
filtered_data['Time'].unique()

array([1303862400, 1346976000, 1219017600, ..., 1090627200, 1072915200,
       1087776000], dtype=int64)

In [8]:
filtered_data[filtered_data['Time'] == 1303862400]

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,positive,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
679,731,B002U56JXK,A1PIDGL665V5QP,Rod Jonse,2,2,positive,1303862400,Unparalleled taste,I purchased this 24 pack of Big Ass soda and o...
680,732,B002U56JXK,ASC04R0BT3TO4,"the BURNiNATOR ""raggle fraggle""",2,2,positive,1303862400,Amazing,"Well being the big red fan that I am, I bought..."
1520,1648,B001RVFDOO,A18GRF94T65X9Z,Reem Chavez,0,0,positive,1303862400,Best chips!,I wa hesitant about ordering this product simp...
2245,2440,B0089SPDUW,A163RZETDROJL5,"D. Johnson ""Dr. Duck""",0,0,positive,1303862400,Best K-cup out there.,So Far this is the only K-cup I've found thats...
5823,6302,B001D6F1PY,A2S4DN72TMWC2C,"Andy M. ""techno-geek""",0,0,positive,1303862400,Makes THE BEST risotto!,If you want to make authentic Italian risotto ...
8166,8935,B0007A0AP8,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",15,15,positive,1303862400,The price is right,We have a little Maltese that we spoil to no e...
14189,15489,B000255OIG,A3W35OAAB4XNX9,Maryann Aniela Royster,0,0,positive,1303862400,dogs love them,"all natural,dogs favorite treat. Don't give to..."
17152,18706,B00008JOL0,A1WSMYIW8APC5C,KN,0,0,positive,1303862400,My dog loves these!,My dog really loves these treats - and he's ve...
18110,19737,B0030VBRIU,A2P7TE7CVQAHH7,Dana Puckett,5,5,positive,1303862400,Don't give up and give it more then one try!,I gave this flavor to my 6 month old son but h...


In [9]:
filtered_data[(filtered_data['Time'] == 1303862400) & (filtered_data['ProfileName'] == 'R. Ellis "Bobby"')]

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
8166,8935,B0007A0AP8,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",15,15,positive,1303862400,The price is right,We have a little Maltese that we spoil to no e...
113110,122699,B001MWRT2W,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",0,0,positive,1303862400,I think he likes them. But not as much as Gre...,We have a little Maltese that is given a <a hr...
146203,158601,B000MLG4K2,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",4,4,positive,1303862400,"He likes them, but I don't think he loves them...","To be fair, our dog is spoiled. He gets treat..."
162162,175817,B0014DUUFC,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",15,15,positive,1303862400,The price is right,We have a little Maltese that we spoil to no e...
242677,263188,B001BOXBHI,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",4,4,positive,1303862400,"He likes them, but I don't think he loves them...","To be fair, our dog is spoiled. He gets treat..."
325087,351843,B001MWRT0Y,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",0,0,positive,1303862400,I think he likes them. But not as much as Gre...,We have a little Maltese that is given a <a hr...
494170,534267,B0007A0AOY,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",15,15,positive,1303862400,The price is right,We have a little Maltese that we spoil to no e...
504729,545770,B001E5E1C8,A74SHV5ZD3RLT,"R. Ellis ""Bobby""",15,15,positive,1303862400,The price is right,We have a little Maltese that we spoil to no e...


**Observations:**
1. Check for ProductIDs - B0007A0AP8, B0007A0AOY and B001E5E1C8 - "UserId","ProfileName","Time","Text" are exactly the same.
2. Getting the same text in train and test will lead to biased results.


In [10]:
#Sorting data according to ProductId in ascending order
sorted_data = filtered_data.sort_values('ProductId')

In [11]:
final=sorted_data.drop_duplicates(subset={"UserId","ProfileName","Time","Text"})
final.shape

(364173, 10)

#### Observation : 
 We are left with approx 69% of the original data after cleaning, that means more than 30% was duplicate data.

In [12]:
final[final['HelpfulnessNumerator'] > final['HelpfulnessDenominator']]

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
59301,64422,B000MIDROQ,A161DK06JJMCYF,"J. E. Stephens ""Jeanne""",3,1,positive,1224892800,Bought This for My Son at College,My son loves spaghetti so I didn't hesitate or...
41159,44737,B001EQ55RW,A2V0I904FH7ABY,Ram,3,2,positive,1212883200,Pure cocoa taste with crunchy almonds inside,It was almost a 'love at first bite' - the per...


In [13]:
final=final[final.HelpfulnessNumerator<=final.HelpfulnessDenominator]
final.shape

(364171, 10)

In [14]:
final['Score'].value_counts()

positive    307061
negative     57110
Name: Score, dtype: int64

### Note:
The dataset is not balanced.

In [15]:
conn = sqlite3.connect('final.sqlite')
c=conn.cursor()
final.to_sql('Reviews', conn, if_exists='replace')
conn.close()

In [16]:
conn = sqlite3.connect('final.sqlite')
final = pd.read_sql_query("""SELECT * FROM Reviews""", conn)
conn.close()
final.head()

Unnamed: 0,index,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,138706,150524,6641040,ACITT7DI6IDDL,shari zychinski,0,0,positive,939340800,EVERY book is educational,this witty little book makes my son laugh at l...
1,138688,150506,6641040,A2IW4PEEKO2R0U,Tracy,1,1,positive,1194739200,"Love the book, miss the hard cover version","I grew up reading these Sendak books, and watc..."
2,138689,150507,6641040,A1S4A3IQ2MU7V4,"sally sue ""sally sue""",1,1,positive,1191456000,chicken soup with rice months,This is a fun way for children to learn their ...
3,138690,150508,6641040,AZGXZ2UUK6X,"Catherine Hallberg ""(Kate)""",1,1,positive,1076025600,a good swingy rhythm for reading aloud,This is a great little book to read aloud- it ...
4,138691,150509,6641040,A3CMRKGE0P909G,Teresa,3,4,positive,1018396800,A great way to learn the months,This is a book of poetry about the months of t...


## BOW (Bag of words)


In [17]:
## Initialize on object of the class CountVectorizer
bow_vect = CountVectorizer()
bow = bow_vect.fit_transform(final['Text'].values)
bow.get_shape()

(364171, 115281)

## TFIDF

In [18]:
tf_idf_vect = TfidfVectorizer()
tf_idf = tf_idf_vect.fit_transform(final['Text'].values)

In [19]:
tf_idf.shape

(364171, 115281)

## Text preprocessing


In [20]:
i = 0
for sen in final['Text'].values:
    if(len(re.findall('<.*?>', sen))):
        print(sen,"\n\n")
        i += 1
    if i == 5:
        break

I set aside at least an hour each day to read to my son (3 y/o). At this point, I consider myself a connoisseur of children's books and this is one of the best. Santa Clause put this under the tree. Since then, we've read it perpetually and he loves it.<br /><br />First, this book taught him the months of the year.<br /><br />Second, it's a pleasure to read. Well suited to 1.5 y/o old to 4+.<br /><br />Very few children's books are worth owning. Most should be borrowed from the library. This book, however, deserves a permanent spot on your shelf. Sendak's best. 


Summary:  A young boy describes the usefulness of chicken soup with rice for each month of the year.<br /><br />Evaluation:  With Sendak's creative repetitious and rhythmic words, children will enjoy and learn to read the story of a boy who loves chicken soup with rice!  Through Sendak's catchy story, children will also learn the months of the year, as well as what seasons go with what month! They learn to identify ice-skatin

#### We can see that many reviews contain HTML tags which are not required.


#### Stopwords are those words which do not provide much meaning to the sentences.

In [21]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\GUDDI\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [22]:
stop = set(stopwords.words('english'))

In [23]:
print(stop)

{'their', 'couldn', 'some', 'more', 'own', 'didn', 'off', 'is', 'wouldn', 'yours', 'down', "don't", 'when', 'ours', "hasn't", 'mustn', 'its', 'before', 'needn', 'further', 'just', 't', 'other', 'they', "you'd", 're', 'yourself', 'nor', 'what', 'those', 'both', "won't", 'but', 'than', 'himself', 'how', 'don', "shan't", 'him', 'an', 'too', 'wasn', "wouldn't", 'most', 'does', 'haven', 'that', 'these', 'ourselves', 'was', 'to', 'such', 'or', 'no', 'weren', 'doesn', "haven't", 'theirs', 'few', 'i', 'from', 'which', 'very', 'here', "doesn't", 'he', 'am', "couldn't", 'have', 'at', 'over', 'mightn', 'm', 'had', 've', 'same', "isn't", 'a', 'my', 'itself', "she's", 'your', 'hers', 'any', "aren't", 'his', "hadn't", 'do', 'will', 'why', "you've", 'shan', 'shouldn', 'by', 'can', 'been', 'only', 'aren', 'while', 'you', 'themselves', 'this', 'having', 'against', 'between', 'up', 'ain', 'as', 'd', "you're", 'me', "should've", 'once', 'whom', 'there', 'because', "weren't", 'out', 'each', 'where', 'y', 

I am not considering the words **won, nor, not, against** as stopwords. So I would like to remove these words from the default stop words list. However these type of decisions depend on individual.

In [24]:
lst = ['won', 'nor', 'not', 'against']
for word in lst:
    stop.remove(word)
print(stop)

{'their', 'couldn', 'some', 'more', 'own', 'didn', 'off', 'is', 'wouldn', 'yours', 'down', "don't", 'when', 'ours', "hasn't", 'mustn', 'its', 'before', 'needn', 'further', 'just', 't', 'other', 'they', "you'd", 're', 'yourself', 'what', 'those', 'both', "won't", 'but', 'than', 'himself', 'how', 'don', "shan't", 'him', 'an', 'too', 'wasn', "wouldn't", 'most', 'does', 'haven', 'that', 'these', 'ourselves', 'was', 'to', 'such', 'or', 'no', 'weren', 'doesn', "haven't", 'theirs', 'few', 'i', 'from', 'which', 'very', 'here', "doesn't", 'he', 'am', "couldn't", 'have', 'at', 'over', 'mightn', 'm', 'had', 've', 'same', "isn't", 'a', 'my', 'itself', "she's", 'your', 'hers', 'any', "aren't", 'his', "hadn't", 'do', 'will', 'why', "you've", 'shan', 'shouldn', 'by', 'can', 'been', 'only', 'aren', 'while', 'you', 'themselves', 'this', 'having', 'between', 'up', 'ain', 'as', 'd', "you're", 'me', "should've", 'once', 'whom', 'there', 'because', "weren't", 'out', 'each', 'where', 'y', 'who', 'about', 'y

#### Stemming
In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form.

In [25]:
sno = nltk.stem.SnowballStemmer('english')

#### Function to clean HTML tags in a given sentence:

In [26]:
def cleanhtml(sentence):
    '''This function removes all the html tags in the given sentence'''
    cleanr = re.compile('<.*?>')    ## find the index of the html tags
    cleantext = re.sub(cleanr, ' ', sentence)  ## Substitute <space> in place of any html tag
    return cleantext

In [27]:
cleanhtml("hello<br /br>World")

'hello World'

#### Function to remove punctuations or special characters from a given sentence:

In [28]:
def cleanpunc(sentence):
    '''This function cleans all the punctuation or special characters from a given sentence'''
    cleaned = re.sub(r'[?|@|!|^|%|\'|"|#]',r'',sentence)
    cleaned = re.sub(r'[.|,|)|(|\|/]',r' ',cleaned)
    return  cleaned

In [29]:
cleanpunc("H?e^l@l/o #W(o!r%l.d,)")

'Hell o W orl d  '

#### Function to implement preprocessing :

In [30]:
def preprocessing(series):
    
    i = 0
    str1=" "
    final_string = []    ## This list will contain cleaned sentences
    list_of_sent = []    ## This is a list of lists used as input to the W2V model at a later stage
    
    ## Creating below lists for future use
    all_positive_words=[] # store words from +ve reviews here
    all_negative_words=[] # store words from -ve reviews here
    
    
    for sent in series.values:
        ## 
        filtered_sent = []
        sent = cleanhtml(sent)    ## Clean the HTML tags
        sent = cleanpunc(sent)    ## Clean the punctuations and special characters
        ## Sentences are cleaned and words are handled individually
        for cleaned_words in sent.split():
            ## Only consider non-numeric words with length at least 3
            if((cleaned_words.isalpha()) and (len(cleaned_words) > 2)):
                ## Only consider words which are not stopwords and convert them to lowet case
                if(cleaned_words.lower() not in stop):
                    ## Apply snowball stemmer and add them to the filtered_sent list
                    s = (sno.stem(cleaned_words.lower()))#.encode('utf-8')
                    filtered_sent.append(s)    ## This contains all the cleaned words for a sentence
                    if (final['Score'].values)[i] == 'positive':
                        all_positive_words.append(s) #list of all words used to describe positive reviews
                    if(final['Score'].values)[i] == 'negative':
                        all_negative_words.append(s) #list of all words used to describe negative reviews
        ## Below list is a list of lists used as input to W2V model later
        list_of_sent.append(filtered_sent)
        ## Join back all the words belonging to the same sentence
        str1 = " ".join(filtered_sent)
        ## Finally add the cleaned sentence in the below list
        final_string.append(str1)
        #print(i)
        i += 1
    return final_string, list_of_sent

#### First 5 rows without preprocessing

In [31]:
for x in final['Text'].iloc[:5].values:
    print(x,"\n\n")

this witty little book makes my son laugh at loud. i recite it in the car as we're driving along and he always can sing the refrain. he's learned about whales, India, drooping roses:  i love all the new words this book  introduces and the silliness of it all.  this is a classic book i am  willing to bet my son will STILL be able to recite from memory when he is  in college 


I grew up reading these Sendak books, and watching the Really Rosie movie that incorporates them, and love them. My son loves them too. I do however, miss the hard cover version. The paperbacks seem kind of flimsy and it takes two hands to keep the pages open. 


This is a fun way for children to learn their months of the year!  We will learn all of the poems throughout the school year.  they like the handmotions which I invent for each poem. 


This is a great little book to read aloud- it has a nice rhythm as well as good repetition that little ones like, in the lines about "chicken soup with rice".  The child g

#### First 5 rows after preprocessing

In [32]:
final_string, list_of_sent = preprocessing(final['Text'].iloc[:5])
for x in final_string:
    print(x,"\n\n")

witti littl book make son laugh loud recit car drive along alway sing refrain hes learn whale india droop love new word book introduc silli classic book will bet son still abl recit memori colleg 


grew read sendak book watch realli rosi movi incorpor love son love howev miss hard cover version paperback seem kind flimsi take two hand keep page open 


fun way children learn month year learn poem throughout school year like handmot invent poem 


great littl book read nice rhythm well good repetit littl one like line chicken soup rice child get month year wonder place like bombay nile eat well know get eat kid mauric sendak version ice skate treat rose head long time wont even know came surpris came littl witti book 


book poetri month year goe month cute littl poem along love book realli fun way learn month poem creativ author purpos write book give children fun way learn month children also learn thing poetri rhythm read book 




In [33]:
final_string, list_of_sent = preprocessing(final['Text'])

In [34]:
final['CleanedText']=final_string  #adding a column of CleanedText which displays the data after pre-processing of the review

In [35]:
final.head()

Unnamed: 0,index,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text,CleanedText
0,138706,150524,6641040,ACITT7DI6IDDL,shari zychinski,0,0,positive,939340800,EVERY book is educational,this witty little book makes my son laugh at l...,witti littl book make son laugh loud recit car...
1,138688,150506,6641040,A2IW4PEEKO2R0U,Tracy,1,1,positive,1194739200,"Love the book, miss the hard cover version","I grew up reading these Sendak books, and watc...",grew read sendak book watch realli rosi movi i...
2,138689,150507,6641040,A1S4A3IQ2MU7V4,"sally sue ""sally sue""",1,1,positive,1191456000,chicken soup with rice months,This is a fun way for children to learn their ...,fun way children learn month year learn poem t...
3,138690,150508,6641040,AZGXZ2UUK6X,"Catherine Hallberg ""(Kate)""",1,1,positive,1076025600,a good swingy rhythm for reading aloud,This is a great little book to read aloud- it ...,great littl book read nice rhythm well good re...
4,138691,150509,6641040,A3CMRKGE0P909G,Teresa,3,4,positive,1018396800,A great way to learn the months,This is a book of poetry about the months of t...,book poetri month year goe month cute littl po...


#### Save the updated DataFrame as an SQL Table:

In [36]:
conn = sqlite3.connect('final.sqlite')
c=conn.cursor()
final.to_sql('Reviews', conn, if_exists='replace', index = False)
conn.close()

In [37]:
with open('list_of_sent_for_input_to_w2v.pkl', 'wb') as pickle_file:
    pickle.dump(list_of_sent, pickle_file)

## BOW after cleaning:

In [38]:
bow_vect = CountVectorizer()
bow = bow_vect.fit_transform(final['CleanedText'].values)
bow.shape

(364171, 71691)

#### Take a subset of 2000 points to visualize using TSNE :

In [39]:
final['Score'].iloc[:2000].value_counts()

positive    1651
negative     349
Name: Score, dtype: int64

In [40]:
X = bow[:2000, :].toarray()
type(X)

numpy.ndarray

## TFIDF after cleaning:

In [46]:
tf_idf_vect = TfidfVectorizer()
tf_idf = tf_idf_vect.fit_transform(final['CleanedText'].values)
tf_idf.shape

(364171, 71691)

## W2vec for cleaned text :

In [47]:
with open('list_of_sent_for_input_to_w2v.pkl', 'rb') as pickle_file:
    list_of_sent = pickle.load(pickle_file)

In [48]:
final['CleanedText'].values[0]

'witti littl book make son laugh loud recit car drive along alway sing refrain hes learn whale india droop love new word book introduc silli classic book will bet son still abl recit memori colleg'

In [49]:
list_of_sent[0]

['witti',
 'littl',
 'book',
 'make',
 'son',
 'laugh',
 'loud',
 'recit',
 'car',
 'drive',
 'along',
 'alway',
 'sing',
 'refrain',
 'hes',
 'learn',
 'whale',
 'india',
 'droop',
 'love',
 'new',
 'word',
 'book',
 'introduc',
 'silli',
 'classic',
 'book',
 'will',
 'bet',
 'son',
 'still',
 'abl',
 'recit',
 'memori',
 'colleg']

In [50]:
w2v_model=gensim.models.Word2Vec(list_of_sent,min_count=5,size=50, workers=4)

In [51]:
words = list(w2v_model.wv.vocab)
print(len(words))

21944


In [52]:
w2v_model.wv.most_similar('tasti')

[('delici', 0.8107119202613831),
 ('yummi', 0.8014793992042542),
 ('tastey', 0.7208950519561768),
 ('satisfi', 0.6908074617385864),
 ('good', 0.6858827471733093),
 ('nice', 0.6768076419830322),
 ('hearti', 0.675946831703186),
 ('terrif', 0.6668615937232971),
 ('delish', 0.6499996781349182),
 ('great', 0.6277387738227844)]

In [53]:
w2v_model.wv.most_similar('like')

[('weird', 0.7408169507980347),
 ('dislik', 0.7046239972114563),
 ('okay', 0.6873660087585449),
 ('prefer', 0.6761718988418579),
 ('resembl', 0.6549103260040283),
 ('gross', 0.6518966555595398),
 ('appeal', 0.6494094729423523),
 ('funki', 0.6433548331260681),
 ('fake', 0.6376919746398926),
 ('alright', 0.6367895603179932)]

## Function to calculate Avg Word2Vec:

In [54]:
def calc_avg_w2v(list_of_sent, w2v_model):    
    sent_vectors = []
    for sent in list_of_sent: 
        sent_vec = np.zeros(50) 
        cnt_words = 0
        for word in sent:
            try: 
                vec = w2v_model.wv[word] 
                sent_vec += vec   
                cnt_words += 1
            except:
                pass
        sent_vec /= cnt_words
        sent_vectors.append(sent_vec)
    return sent_vectors

In [55]:
w2v_model=gensim.models.Word2Vec(list_of_sent,min_count=5,size=50, workers=4)

In [56]:
sent_vectors = calc_avg_w2v(list_of_sent, w2v_model)



In [57]:
type(sent_vectors)

list

In [58]:
len(sent_vectors)

364171

In [59]:
sent_vectors[0]

array([ 0.54308065, -0.12757891,  0.12502741, -0.13849101,  0.26417459,
        0.21925172, -0.07446818,  0.30277628,  0.5162778 ,  0.14188315,
        0.20696526, -0.14865552,  0.14359178,  0.60137935,  0.59129041,
        0.58417412, -0.57450913, -0.54151769,  0.2668606 , -0.08478943,
        0.44957848,  0.05724072,  0.70286784,  0.36228414, -0.03630744,
       -0.12841898,  0.17630362, -0.17889127, -0.04394157,  1.1085737 ,
       -0.66720269, -0.07936857,  0.35468006,  0.26088582,  0.27061799,
       -0.10496438, -0.79999097, -0.19636995, -0.53222918,  0.02545231,
       -0.23860375,  0.29029459,  0.13152159,  0.04226853,  0.04372555,
       -0.47160015,  0.70053695,  0.70490709,  0.52312854, -0.22950828])

In [60]:
with open('sent_vec_avg_w2v.pkl', 'wb') as pickle_file:
    pickle.dump(sent_vectors, pickle_file)

In [61]:
with open('sent_vec_avg_w2v.pkl', 'rb') as pickle_file:
    sent_vectors = pickle.load( pickle_file)

In [62]:
sent_vec_array = np.array(sent_vectors)

In [63]:
type(sent_vec_array)

numpy.ndarray

In [64]:
sent_vec_array.shape

(364171, 50)

In [65]:
sent_vec_array = sent_vec_array[~np.isnan(sent_vec_array).any(axis = 1)]
sent_vec_array.shape

(364167, 50)

# Tfidf weighted avg

In [66]:
def calc_tfidf_avg_w2v(list_of_sent, w2v_model, tf_idf, tfidf_feat):
    
    tfidf_sent_vectors = []
    row = 0
    for sent in list_of_sent:   
        sent_vec = np.zeros(50) 
        weighted_sum = 0
        for word in sent:
            try:    
                vec = w2v_model.wv[word]   
                tfidf = tf_idf[row, tfidf_feat.index(word)]
                sent_vec += vec*tfidf
                weighted_sum += tfidf
            except:
                pass 
        sent_vec /= weighted_sum
        tfidf_sent_vectors.append(sent_vec)
        row += 1
    return tfidf_sent_vectors

In [67]:
tf_idf_vect = TfidfVectorizer()
tf_idf = tf_idf_vect.fit_transform(final['CleanedText'].values)
tfidf_feat = tf_idf_vect.get_feature_names()
w2v_model=gensim.models.Word2Vec(list_of_sent,min_count=5,size=50, workers=4)

In [68]:
with open('google_word2vec_model', 'rb') as pickle_file:
    google_w2v = pickle.load(pickle_file)  

In [69]:
type(google_w2v)

dict

In [70]:
len(google_w2v)

46603

In [71]:
word = 'trying'
google_w2v[word]

array([ 0.4375    ,  0.25195312,  0.15332031, -0.01312256, -0.07714844,
        0.00747681,  0.16113281, -0.03393555, -0.17285156, -0.22460938,
       -0.02160645, -0.01831055, -0.23242188, -0.24511719, -0.33007812,
       -0.19042969, -0.05908203,  0.16992188,  0.19433594, -0.12890625,
       -0.13867188,  0.05419922,  0.03564453, -0.31640625, -0.03881836,
       -0.18847656, -0.08154297,  0.25585938,  0.01940918, -0.00349426,
        0.30664062,  0.07568359,  0.15136719, -0.27148438,  0.02380371,
        0.16796875,  0.17382812, -0.05297852,  0.19726562,  0.03759766,
       -0.05200195, -0.171875  ,  0.04150391, -0.24121094, -0.18359375,
        0.07910156, -0.02587891,  0.15625   ,  0.03198242, -0.02978516,
       -0.18847656,  0.08447266,  0.0255127 ,  0.05859375,  0.09765625,
        0.19238281,  0.03833008, -0.08398438,  0.21582031,  0.02600098,
        0.07958984,  0.3203125 , -0.19433594, -0.12890625,  0.14355469,
        0.08886719,  0.14257812,  0.25195312,  0.03112793, -0.27

In [74]:
def calc_avg_w2v(list_of_sent, w2v_model):
    sent_vectors = []
    for sent in list_of_sent:
        sent_vec = np.zeros(300)
        cnt_words = 0
        for word in sent:
            try:
                vec = w2v_model[word]
                sent_vec += vec
                cnt_words += 1
            except:
                pass
        sent_vec /= cnt_words
        sent_vectors.append(sent_vec)
    return sent_vectors

In [78]:
sent_vectors = calc_avg_w2v(list_of_sent, google_w2v)

  del sys.path[0]


In [76]:
with open('avg_w2v.pkl', 'wb') as pickle_file:
    pickle.dump(sent_vectors, pickle_file)

In [81]:
tf_idf_vect = TfidfVectorizer()
tf_idf = tf_idf_vect.fit_transform(final['CleanedText'].values)
tfidf_feat = tf_idf_vect.get_feature_names()
tf_idf.shape

(364171, 71691)

In [82]:
def calc_tfidf_avg_w2v(list_of_sent, w2v_model, tf_idf, tfidf_feat, start_row):
    tfidf_sent_vectors = []
    for sent in list_of_sent:
        sent_vec = np.zeros(300)
        weighted_sum = 0
        for word in sent:
            try:
                vec = w2v_model[word]
                tfidf = tf_idf[start_row, tfidf_feat.index(word)]
                sent_vec += vec*tfidf
                weighted_sum += tfidf
            except:
                pass
        sent_vec /= weighted_sum
        print(start_row, weighted_sum)
        tfidf_sent_vectors.append(sent_vec)
        start_row += 1
    return tfidf_sent_vectors

In [83]:
tfidf_sent_vectors = calc_tfidf_avg_w2v(list_of_sent[30000:40000], google_w2v, tf_idf, tfidf_feat,30000)

30000 3.13502219274
30001 9.17589656478
30002 4.65386956381
30003 6.14266856819
30004 1.9272704639
30005 2.62607848886
30006 3.83049210087
30007 3.85118927709
30008 5.81233921389
30009 1.68749161292
30010 1.42744895597
30011 7.36224621903
30012 3.82985549624
30013 3.67036182422
30014 7.59277971319
30015 4.04285175419
30016 3.97372238844
30017 6.95691293681
30018 8.1179709615
30019 2.77828694791
30020 2.84748890788
30021 3.05340935125
30022 4.74926181603
30023 4.86258098515
30024 3.2931568532
30025 3.20249417072
30026 6.22751006299
30027 12.5820036928
30028 8.76105142571
30029 14.7850154601
30030 3.65184485891
30031 6.71708156485
30032 11.1161853834
30033 4.67395657786
30034 13.0234553714
30035 6.98317918982
30036 5.26361075125
30037 3.58217626765
30038 2.9516127521
30039 3.73459373918
30040 3.60106064543
30041 8.06937288339
30042 3.70543499633
30043 3.32188529774
30044 6.33139169014
30045 3.35212770053
30046 2.37549940068
30047 3.58320960287
30048 9.76384307066
30049 8.16318749466
3005

30413 7.88228835438
30414 3.85302524748
30415 2.41619393216
30416 3.9358316602
30417 3.0590504352
30418 3.33458610728
30419 17.2796615736
30420 9.45436492376
30421 2.92134632299
30422 4.65120332595
30423 7.77061671352
30424 3.6832827524
30425 11.3783620708
30426 10.1465166063
30427 3.6595863075
30428 4.99038629633
30429 7.1089116996
30430 7.77464407041
30431 4.04619588384
30432 1.64996524848
30433 9.4705178761
30434 3.73873164909
30435 3.50522412475
30436 7.08367811278
30437 6.35653972295
30438 2.9644388592
30439 2.00412789661
30440 8.08317571875
30441 5.00655310509
30442 5.16403118017
30443 7.47445063443
30444 6.50955099352
30445 6.46567740124
30446 3.42909483241
30447 4.63952259139
30448 6.59543634944
30449 5.91291392647
30450 6.89556598067
30451 5.43768279325
30452 9.11385644812
30453 5.92410001947
30454 6.35708694015
30455 6.57769037916
30456 3.04227195989
30457 7.53553624636
30458 6.10349622198
30459 3.48412945273
30460 8.68428304303
30461 15.9879304963
30462 7.26490079479
30463 3

30830 2.60460992952
30831 8.24375050659
30832 2.8703728643
30833 3.43847285342
30834 4.16579309665
30835 3.37677390243
30836 7.01255819864
30837 2.46235001856
30838 6.70162473883
30839 10.3372330682
30840 3.96981232486
30841 5.60246262679
30842 4.26115763348
30843 5.12587041666
30844 1.83331289097
30845 2.07835685195
30846 3.37798483868
30847 5.97507336474
30848 6.31767303195
30849 6.98320468879
30850 5.22469405793
30851 2.91852056786
30852 2.24759929334
30853 9.57677904128
30854 2.46767230187
30855 1.16314171351
30856 4.14053963648
30857 5.89637363147
30858 4.04905642433
30859 8.3506070018
30860 8.43970882184
30861 3.43841140963
30862 3.16118021091
30863 4.80789277708
30864 3.4283216565
30865 6.38461808892
30866 4.02288762127
30867 3.71158951859
30868 8.64315995329
30869 4.8624731206
30870 2.99120536682
30871 7.21266714584
30872 3.53516476135
30873 5.66947392355
30874 4.55094752018
30875 2.72698686691
30876 11.184882974
30877 3.47201656977
30878 7.23660537052
30879 3.14064272685
30880

31244 4.13360176091
31245 2.38404481964
31246 3.09744673136
31247 1.08125730585
31248 2.84018434641
31249 4.37341661642
31250 2.58329984776
31251 3.87532906362
31252 2.87419308237
31253 3.51852806861
31254 2.81062008053
31255 3.64964562273
31256 8.1722083535
31257 2.31939343903
31258 2.50878700988
31259 3.13431747001
31260 3.44378556533
31261 2.53535803621
31262 1.38518985962
31263 2.94347945668
31264 2.8837228434
31265 2.8067914564
31266 4.72240618485
31267 3.28846897148
31268 5.88807163517
31269 9.67688643853
31270 18.4090728015
31271 6.52849272787
31272 3.61253469039
31273 3.94417536061
31274 4.0194991132
31275 4.14000660616
31276 4.44156664997
31277 1.76499099012
31278 0.512577297877
31279 4.75191731147
31280 8.85310995853
31281 4.32492056969
31282 5.46555501584
31283 5.2771483989
31284 4.51668510917
31285 12.86844006
31286 3.02252728167
31287 4.9995933948
31288 2.28070077827
31289 3.80681064128
31290 1.1327384143
31291 2.62661838248
31292 9.08645476243
31293 2.86660984926
31294 4.

31660 1.57317947961
31661 7.22658407926
31662 4.43521838708
31663 3.178296788
31664 3.93316134802
31665 2.75846560329
31666 5.14023206793
31667 3.52036056145
31668 4.48424423304
31669 2.45266413887
31670 2.61273736255
31671 2.85042798632
31672 3.26211912652
31673 2.95676233751
31674 2.06418589658
31675 3.86703242587
31676 5.15100701703
31677 3.44417854976
31678 3.51352387947
31679 4.31964842809
31680 1.72227318702
31681 7.99181331575
31682 3.93097570914
31683 6.91875699029
31684 2.88669953262
31685 2.5185166142
31686 4.39460335555
31687 1.08362516406
31688 1.86851609012
31689 4.79476689513
31690 1.49091549968
31691 4.70647427143
31692 2.93475968616
31693 0.762169728197
31694 2.66779836018
31695 3.93491556109
31696 1.91239422906
31697 5.52635410426
31698 3.39055488093
31699 3.79544937384
31700 2.55399589355
31701 1.55118967862
31702 6.04416920851
31703 3.60424082424
31704 4.04917686949
31705 7.22924511609
31706 3.17277722328
31707 3.86542316403
31708 2.55981628487
31709 4.26917978295
31

32075 7.57354640104
32076 5.21181588421
32077 1.63833117977
32078 6.3663772767
32079 4.55045207312
32080 5.34324708006
32081 3.99350941371
32082 8.93188883059
32083 3.06855341438
32084 2.29560643237
32085 3.53573767005
32086 2.42714175033
32087 4.54087061168
32088 4.77372075376
32089 2.85779791472
32090 19.3236807396
32091 4.34285117985
32092 3.37478217439
32093 3.8489264123
32094 3.77203010996
32095 3.09265803986
32096 3.10742550872
32097 1.40111943365
32098 4.3848148611
32099 2.73309336298
32100 7.30238313531
32101 1.6627664961
32102 2.50706225529
32103 3.26455929262
32104 1.83503072861
32105 2.85214476161
32106 3.32164037956
32107 2.69366859354
32108 4.70990905912
32109 4.48566091895
32110 3.28632571092
32111 7.14911579622
32112 2.44650051673
32113 4.63768409386
32114 4.14021236027
32115 4.39051611707
32116 1.32629073696
32117 2.21067454253
32118 3.3919931367
32119 2.4562265637
32120 4.08634210165
32121 3.49611292247
32122 5.11433703662
32123 6.22257041424
32124 5.37655097423
32125 

32492 4.52904701741
32493 9.51646814365
32494 5.30965918151
32495 8.0550995277
32496 2.75844683787
32497 3.91355349627
32498 5.72164625581
32499 4.5970949058
32500 4.85796334858
32501 2.73197802174
32502 1.76494876014
32503 7.24217262863
32504 3.96844726811
32505 3.03238744965
32506 4.43946690338
32507 5.33859602275
32508 4.03820863612
32509 3.78314290144
32510 5.70656607829
32511 2.12832744704
32512 2.42231365612
32513 1.46696671042
32514 2.94509595981
32515 2.11205020519
32516 2.83774910078
32517 3.67738246196
32518 3.46001672905
32519 1.74473638308
32520 3.39509315302
32521 2.79404938142
32522 2.94509595981
32523 3.78270300812
32524 1.62261283089
32525 9.20667568247
32526 2.9679282725
32527 2.25350896982
32528 4.78701907996
32529 2.70348678513
32530 5.98173171257
32531 3.91950625778
32532 3.58286688654
32533 2.9308067376
32534 1.55966316884
32535 9.24592250677
32536 4.52976895747
32537 1.22241986897
32538 3.19909332541
32539 9.91498544507
32540 6.40360204796
32541 4.63021070006
3254

32904 7.31316811201
32905 2.15930627549
32906 9.8247917792
32907 3.91907302684
32908 2.15874092678
32909 1.87481944621
32910 4.9899034247
32911 3.08631924225
32912 1.76682302965
32913 3.18580540538
32914 3.62154276384
32915 5.4869882505
32916 6.50536557216
32917 3.31540764647
32918 4.83813774179
32919 0.916203075331
32920 8.57921212784
32921 2.88839135485
32922 6.47114435997
32923 4.91707516589
32924 4.51605566248
32925 4.15999973404
32926 6.1192528656
32927 1.82852714528
32928 2.11383199069
32929 3.96652914605
32930 2.1522775765
32931 5.10150470421
32932 3.6665346426
32933 8.70122865184
32934 2.07512697319
32935 4.71051532194
32936 4.14695506554
32937 4.75050303569
32938 8.03767698642
32939 2.74889870045
32940 1.07112065151
32941 3.72794905487
32942 6.28758212935
32943 3.01389201883
32944 3.21696800369
32945 6.40295197425
32946 4.5474308744
32947 6.18676820539
32948 2.13693529773
32949 4.28942214848
32950 1.74647441232
32951 5.43800822084
32952 3.27358128219
32953 4.14098783665
32954 

33323 7.60775456906
33324 5.15916699185
33325 4.21283918856
33326 1.57204478731
33327 4.40596783411
33328 4.38421389962
33329 2.5243373547
33330 2.27361494549
33331 3.31309277863
33332 4.29664145984
33333 2.8245638803
33334 2.13428550842
33335 3.48672140994
33336 7.17001535996
33337 4.04170778375
33338 1.26048355329
33339 6.05399953009
33340 5.16192007897
33341 2.51945334333
33342 2.03516092671
33343 3.78413074013
33344 2.54568612927
33345 4.6304091031
33346 2.91931087559
33347 6.36845856865
33348 2.77870730897
33349 3.20272501851
33350 10.4857847195
33351 2.0300632126
33352 6.08979735799
33353 4.88191274969
33354 8.41388063132
33355 4.34805621963
33356 3.73782918696
33357 3.46894388789
33358 3.65937439046
33359 2.64320992273
33360 5.81376524129
33361 2.70080796204
33362 2.80275832937
33363 5.58453081935
33364 2.13316777893
33365 1.7077450461
33366 2.63513161054
33367 1.93829933705
33368 2.35616323111
33369 3.49564743368
33370 5.09097592532
33371 2.9838604261
33372 6.96962971075
33373 

33737 6.41621915371
33738 3.16741685589
33739 3.3078485279
33740 4.66208458332
33741 1.40489657719
33742 2.73212139876
33743 2.56451900615
33744 2.28788460465
33745 4.02994985816
33746 2.55285369305
33747 4.03822964244
33748 3.40224120686
33749 8.80634676773
33750 2.16719717353
33751 5.43104028513
33752 1.54450422539
33753 4.283431091
33754 2.17843410195
33755 3.11370898186
33756 15.2600896243
33757 3.13605077504
33758 2.2816829582
33759 3.41166540784
33760 5.55491144958
33761 3.20819944983
33762 3.27067549458
33763 1.79862040974
33764 3.48840700388
33765 1.25196943011
33766 3.34923705255
33767 3.66044408537
33768 2.81857383276
33769 1.92248764939
33770 1.68106065344
33771 8.35955193012
33772 3.80913316003
33773 4.25454159829
33774 8.22076747824
33775 2.22095997438
33776 5.71726617963
33777 7.81094455753
33778 2.63335852163
33779 2.24782267642
33780 3.28467039533
33781 3.79271821967
33782 4.76854320146
33783 2.58583678811
33784 4.76326226784
33785 2.86740592069
33786 7.36044363267
3378

34149 2.80593199875
34150 2.83593425751
34151 3.98081366356
34152 2.26260230857
34153 5.23182495093
34154 5.43162797058
34155 3.00367834682
34156 4.07079162403
34157 3.45706620757
34158 8.59665751586
34159 7.40024589263
34160 2.57035958422
34161 1.87187264391
34162 4.89499705233
34163 3.69853099099
34164 7.06496281697
34165 7.44751701367
34166 2.72360229138
34167 1.44318922897
34168 5.2419860057
34169 2.60380870387
34170 3.77484904343
34171 3.11443261236
34172 4.94094930051
34173 2.80623868227
34174 1.92598205496
34175 5.18436307262
34176 7.32362794377
34177 1.80851108492
34178 4.37500274057
34179 3.89566281057
34180 4.33209570054
34181 4.30789594441
34182 2.51715686482
34183 2.61859306607
34184 3.09716147082
34185 10.9577679897
34186 5.75674073734
34187 5.09357326418
34188 2.47998149419
34189 2.66635592179
34190 1.96015793059
34191 2.8635185475
34192 2.510405919
34193 2.30981892008
34194 13.5863917996
34195 3.23971635179
34196 5.41398917074
34197 3.37403772708
34198 4.57952781186
3419

34561 8.70969554301
34562 2.63037443662
34563 3.18197605558
34564 2.73156332669
34565 2.38837499261
34566 3.53329995886
34567 4.41067335765
34568 4.62125009496
34569 2.88479215212
34570 4.35801462245
34571 5.60020161014
34572 2.62392174627
34573 3.07122819976
34574 4.01572235304
34575 1.80030253731
34576 3.02807821537
34577 4.66164811696
34578 5.40731297247
34579 4.43915212296
34580 4.17617060221
34581 3.23275257614
34582 3.41560654325
34583 7.27590745029
34584 5.93791737467
34585 11.3595059193
34586 9.44412259229
34587 9.30128650341
34588 6.31683991586
34589 2.65927124235
34590 12.8109909127
34591 2.70022103649
34592 3.13123652103
34593 3.52214587142
34594 5.82641743034
34595 3.73229152565
34596 3.53506301317
34597 3.5508885635
34598 2.6336538122
34599 3.29249840614
34600 1.61422473329
34601 3.3920697786
34602 2.5129315993
34603 2.46633704402
34604 6.5077401626
34605 5.24347418643
34606 0.681928811327
34607 5.21455422727
34608 4.72651797673
34609 5.91655133706
34610 2.87304119031
3461

34974 1.39121390541
34975 2.9959093588
34976 2.6744294743
34977 5.27515162975
34978 3.4065112496
34979 1.51408286335
34980 5.71750259387
34981 3.20655878808
34982 0.981116787839
34983 3.55276420209
34984 4.71040455159
34985 6.68778899515
34986 3.06891466596
34987 3.17473151745
34988 2.33665552469
34989 4.01285930825
34990 20.8511226023
34991 4.66669847572
34992 4.68589401135
34993 1.15838533185
34994 3.61403591916
34995 2.84112462746
34996 3.58849581597
34997 5.44573812686
34998 4.04507311602
34999 2.0796157984
35000 6.43850843304
35001 4.63749202247
35002 1.76802765407
35003 3.72837595001
35004 7.10471804473
35005 2.5599729754
35006 3.41659357494
35007 8.6837047538
35008 6.86155697758
35009 5.92239000661
35010 7.61007774148
35011 2.83809157567
35012 5.49361714908
35013 7.75910167835
35014 2.12312418051
35015 3.47607466125
35016 2.33715627281
35017 4.87136831525
35018 3.63005350079
35019 2.7438463952
35020 2.27470446383
35021 6.74460968906
35022 2.82149965169
35023 2.98488045917
35024 

35388 5.85378947056
35389 5.30400878642
35390 2.94146669918
35391 3.00436937362
35392 3.09752547588
35393 3.35636484818
35394 2.17158724796
35395 9.90814659768
35396 3.6502672303
35397 11.1523748795
35398 2.82078748936
35399 2.34703192229
35400 4.70071873178
35401 21.431271822
35402 2.39128553477
35403 2.26434651456
35404 6.40062777139
35405 3.23939615075
35406 2.08577863536
35407 13.3197704081
35408 1.13803039766
35409 6.72838187101
35410 2.80200625981
35411 4.13474652766
35412 3.58039068065
35413 2.64208594745
35414 2.67969405085
35415 2.01924184722
35416 8.85474750101
35417 2.41257046947
35418 2.47051511215
35419 3.34376899406
35420 2.70877149178
35421 1.8027112451
35422 8.25903139335
35423 22.6600127371
35424 2.99762051726
35425 4.30169212204
35426 3.76434011641
35427 4.64606487359
35428 2.61362703555
35429 3.17999009424
35430 5.49338751142
35431 2.34669573421
35432 3.49118941669
35433 6.20202323471
35434 6.58972765591
35435 7.12611068432
35436 3.96231345795
35437 1.3507011503
3543

35800 37.6726129359
35801 3.71866071695
35802 5.55491026947
35803 3.46097607704
35804 5.08222861381
35805 7.53570342776
35806 6.65423347363
35807 9.61923038072
35808 3.21074954999
35809 1.96329730677
35810 5.18627209271
35811 2.46005643439
35812 3.40509020719
35813 2.87141589809
35814 4.36343214722
35815 7.78508245214
35816 8.38624525471
35817 2.97553623146
35818 1.08021102951
35819 3.84930988863
35820 8.21059391865
35821 1.94958024969
35822 4.60163184108
35823 3.41698369701
35824 2.28542700003
35825 6.17600038504
35826 3.17550077338
35827 2.78202485869
35828 8.86718808118
35829 3.82722526876
35830 3.25402456788
35831 3.13869651136
35832 10.6069241937
35833 5.67255175724
35834 8.56610547808
35835 5.11732401522
35836 2.42476491016
35837 8.5433013842
35838 3.54997215072
35839 9.46815164931
35840 5.82434165212
35841 4.20761137471
35842 6.77339042446
35843 7.67870875336
35844 3.0670662471
35845 4.25217495346
35846 6.00405512056
35847 2.96067511905
35848 4.82603037776
35849 7.09574783652
35

36213 2.32653663387
36214 1.96817988017
36215 2.67061247218
36216 1.80513866905
36217 3.71808268882
36218 4.60189328866
36219 7.29880915831
36220 3.57963455274
36221 6.94503883912
36222 1.93658820923
36223 3.23710957434
36224 5.35105319058
36225 3.69542977921
36226 2.97041405244
36227 3.83679934095
36228 3.14114032421
36229 5.70896936442
36230 0.557270730878
36231 2.12343758818
36232 4.50955834628
36233 2.53857341434
36234 3.00188368092
36235 3.17007701324
36236 1.84174477555
36237 5.77893845305
36238 4.78124809317
36239 2.22612882522
36240 1.39887767169
36241 10.3784009703
36242 6.83831154507
36243 2.16722850105
36244 3.91370002666
36245 1.90929992241
36246 2.84011487136
36247 3.88273853932
36248 2.86236155523
36249 3.72924867847
36250 1.2710418768
36251 6.08040842584
36252 2.42770424791
36253 10.223365836
36254 5.70745835981
36255 2.78031092429
36256 1.89423896631
36257 2.59621273145
36258 4.10836516394
36259 2.7438862357
36260 2.95739133054
36261 3.16310331088
36262 3.97140648869
36

37037 13.032347736
37038 5.63216429571
37039 2.86097450303
37040 3.18027541305
37041 4.33327019029
37042 3.67926658537
37043 4.83940736532
37044 3.45016659621
37045 1.74983509083
37046 2.34375143711
37047 3.00278417485
37048 3.92366069173
37049 4.18793414226
37050 4.1346040637
37051 2.6992486579
37052 2.70149412481
37053 9.54180053069
37054 17.3757944545
37055 1.82627517925
37056 5.14726118742
37057 7.65554125837
37058 3.26050592955
37059 3.78810745457
37060 5.55622694515
37061 3.00323178533
37062 3.04963139189
37063 2.97900030789
37064 5.141622126
37065 1.36156630577
37066 3.7976457768
37067 4.78735678411
37068 4.58565139893
37069 5.41293843778
37070 3.93896010998
37071 2.70947391024
37072 5.31977354765
37073 4.18623029128
37074 7.22846571304
37075 7.48382998905
37076 4.03423197659
37077 2.16254231207
37078 4.42604390307
37079 2.21002603975
37080 3.43709162269
37081 2.54986540947
37082 3.73367743873
37083 3.92305335652
37084 5.38568832758
37085 7.82133574921
37086 7.02354872689
37087 

37451 8.28973396694
37452 2.62022237091
37453 13.3220712156
37454 2.88680391333
37455 3.19393543211
37456 5.94471704449
37457 2.98532131375
37458 2.71948008287
37459 6.01315016105
37460 3.42491709139
37461 2.67624708521
37462 8.54656067256
37463 9.14195351835
37464 5.04229285607
37465 8.66655410363
37466 3.12285552898
37467 8.03883664691
37468 3.20937398675
37469 3.18549323968
37470 4.66499584871
37471 13.9049835745
37472 7.40726596509
37473 5.21340228647
37474 5.00321782964
37475 2.87171802105
37476 5.16034138036
37477 5.41756429681
37478 2.2890249411
37479 3.30169092209
37480 7.92248142078
37481 4.52817139776
37482 2.33863700945
37483 4.08242184618
37484 4.00597005491
37485 6.03883933967
37486 3.82629002683
37487 1.33729362483
37488 3.84458432099
37489 3.45875761551
37490 2.54740155567
37491 3.88641820658
37492 5.69880911721
37493 5.26554578326
37494 2.81688616261
37495 4.98944149516
37496 1.46808529136
37497 3.3439042014
37498 5.53089572061
37499 4.13723674872
37500 6.75386079846
37

37864 7.27994052307
37865 2.10679313431
37866 6.62836360647
37867 2.92360953591
37868 2.97248898382
37869 9.94635847813
37870 3.4260046773
37871 1.83676957725
37872 6.75983045756
37873 4.13192625486
37874 1.57302341499
37875 3.11466559931
37876 2.29792254209
37877 3.53622607995
37878 1.84452097059
37879 3.89543572128
37880 3.39479665795
37881 3.48189895351
37882 7.75923247142
37883 3.04758393436
37884 11.4661181642
37885 6.62352748146
37886 4.62946539968
37887 4.94415783428
37888 3.61508393187
37889 18.6256276183
37890 10.9227143811
37891 3.8634710778
37892 4.64335641719
37893 5.51856247903
37894 8.10329012178
37895 2.98963187206
37896 26.8004812417
37897 5.67927540782
37898 2.15073229361
37899 1.65821082703
37900 6.54737127415
37901 2.24053523955
37902 1.97151905332
37903 5.01944334252
37904 5.48395095125
37905 3.1504784777
37906 2.15583039101
37907 2.89133812023
37908 1.97840127106
37909 3.15717721728
37910 3.52327097991
37911 4.76748756316
37912 3.27087614842
37913 6.13556569918
379

38279 3.35970563638
38280 3.32427614627
38281 3.5083375918
38282 4.54487925667
38283 2.00848837091
38284 5.29079229086
38285 5.56509274417
38286 5.55044910922
38287 2.65170376147
38288 4.07543471811
38289 5.51903510834
38290 4.08323409561
38291 10.7997745492
38292 14.3904685274
38293 5.49819334051
38294 3.12659214276
38295 3.18977589973
38296 4.38960467899
38297 4.04147188899
38298 3.1218060533
38299 3.78872174097
38300 2.26233833323
38301 4.03849917078
38302 6.16328942949
38303 3.85685213057
38304 8.65969943066
38305 1.99463240068
38306 2.88316740949
38307 4.99749019876
38308 3.08578850125
38309 4.46542315662
38310 2.07573827884
38311 2.49807193946
38312 4.55112820394
38313 8.12468403178
38314 3.73966306608
38315 3.58982232128
38316 7.25327271466
38317 1.41350212748
38318 3.30402265067
38319 12.1554489805
38320 3.52497438496
38321 3.33215846678
38322 5.96795359092
38323 8.31107990687
38324 4.29427320605
38325 5.11721562036
38326 5.36971555595
38327 7.92428044151
38328 2.1305846991
383

38694 3.76457548358
38695 3.40319041979
38696 2.6363611769
38697 3.44844279367
38698 3.7622739888
38699 3.45650170727
38700 4.09063511618
38701 4.38098893066
38702 3.95516846787
38703 2.23366244444
38704 2.19897433065
38705 6.3219737817
38706 1.84929010818
38707 2.33852777397
38708 3.86279149142
38709 3.80658295511
38710 3.32546035893
38711 2.46774644214
38712 3.11398872651
38713 3.9989096579
38714 6.01049045315
38715 11.0530705039
38716 4.8054526728
38717 2.1668714923
38718 3.90333349035
38719 7.82744163932
38720 5.55246424606
38721 2.63130095515
38722 4.1956926144
38723 3.24318080579
38724 3.02418788943
38725 2.34094474716
38726 2.9030808248
38727 1.86504886464
38728 1.37916181405
38729 2.54856716036
38730 2.90531517844
38731 7.80450546353
38732 3.74269825207
38733 2.11071762264
38734 4.47847975359
38735 3.02695723484
38736 3.56953591352
38737 6.3975930616
38738 5.17580073696
38739 2.93397425183
38740 2.38224592007
38741 3.39814014104
38742 5.46691816903
38743 3.04218979803
38744 5.4

39107 1.91374832798
39108 4.82122979478
39109 2.74376875821
39110 6.35728445341
39111 4.61750960904
39112 5.74003876877
39113 2.39283339419
39114 3.19244729518
39115 16.2904483912
39116 5.86978039194
39117 1.41403714743
39118 2.80286909025
39119 5.96394889801
39120 3.65818738013
39121 3.77647585894
39122 2.62887100535
39123 2.69967487986
39124 5.11216586045
39125 5.11322279672
39126 4.83382871234
39127 1.73704392662
39128 7.61631348687
39129 8.48436837159
39130 3.24449203857
39131 6.14309205313
39132 3.95481507134
39133 2.8419983648
39134 3.22860402634
39135 5.22240503228
39136 3.32241203258
39137 6.01662459442
39138 4.09995738836
39139 3.33123759365
39140 1.5445826274
39141 4.44587491451
39142 9.29038900111
39143 2.5616612638
39144 3.01311184758
39145 3.54268311834
39146 1.9572293224
39147 5.55397215235
39148 2.78811014094
39149 1.82940054016
39150 1.75949691809
39151 2.12461712326
39152 2.84505003234
39153 5.7660199456
39154 3.0880189527
39155 2.78059257035
39156 2.45550706077
39157 

39520 3.86142218416
39521 5.48041179557
39522 5.02945043902
39523 5.74843036716
39524 2.22740554148
39525 4.87101263948
39526 2.33842958688
39527 1.30859630406
39528 4.18016505885
39529 5.06652426283
39530 2.57036155994
39531 2.26990843876
39532 2.48513576316
39533 3.41692526537
39534 11.0895456962
39535 11.0472045651
39536 3.24581654055
39537 3.17225221731
39538 6.79439020873
39539 2.6713344752
39540 3.5925755741
39541 2.52271312789
39542 4.12810200646
39543 8.27013310941
39544 7.80802086439
39545 5.66599470329
39546 3.18256373793
39547 2.68111757067
39548 8.34667186109
39549 3.73895993617
39550 6.6835412881
39551 3.9913903818
39552 2.95285740095
39553 7.97083788401
39554 2.1732803671
39555 3.3098354418
39556 5.63400546498
39557 2.82847461158
39558 6.23706295474
39559 3.52111477993
39560 5.04567575251
39561 5.30776030414
39562 3.93335779922
39563 10.1476630469
39564 5.46783718875
39565 3.72553348727
39566 5.18530266641
39567 4.82098317516
39568 3.36509833266
39569 5.09224302885
39570 

39933 3.30192656213
39934 2.54275298792
39935 3.4401535714
39936 1.74066191584
39937 2.79096518828
39938 5.27892139444
39939 1.80338362772
39940 8.27504480629
39941 6.83209413369
39942 3.36000349503
39943 3.45255141819
39944 5.72726829038
39945 3.73288272432
39946 3.31225202683
39947 2.81196305822
39948 4.96452346143
39949 12.6148354142
39950 9.92911058184
39951 3.93778906345
39952 2.66500269697
39953 4.05411213782
39954 3.23609836502
39955 5.17657405178
39956 3.80562716606
39957 1.60011895537
39958 3.06822184148
39959 5.36619386876
39960 4.88091980311
39961 1.55789869824
39962 9.24777820565
39963 5.18490446536
39964 4.53241291658
39965 8.10952965064
39966 3.15435306132
39967 5.91739445232
39968 6.55284726958
39969 4.72678148465
39970 9.36621505992
39971 5.38796897158
39972 11.42244263
39973 1.99578449813
39974 3.54613282133
39975 4.23128674463
39976 7.07233792328
39977 9.1749471342
39978 7.93174674755
39979 3.82858161417
39980 6.42077292227
39981 6.81380803516
39982 7.42826999422
3998