# Overview
This projects aims to build a model predictive model that can classify whether a given review is positive or negative.  
Positive Review= >3 star  
Negative Review= <3 star  

Data used in this project is from Amazon's US Gift Card Data Reviews Data and can be accessed @
https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Gift_Card_v1_00.tsv.gz

### Data Processing
The original data contains reviews as well as additional metadata information and specific scripts have been written to perform the following:  
- Automatically download the data (if not already downloaded).  
- Process the data line by line.
- Split train and test samples (Default 70% training and 30% test).
- Save the datasets into 4 files: training and test positive and negative.  

The below DataExtractor Class implements the above steps.

In [7]:
#Calling scripts and instantiating objects
from utils.extractor import DataExtractor
from utils.text_processing import TextProcessing

extract_data = DataExtractor()
process_text = TextProcessing()

In [8]:
files = extract_data.get_data()
print('Number of positive training samples: ', len(files['train_pos']))
print('Number of negative training samples: ', len(files['train_neg']))
print('Number of positive test samples: ', len(files['test_pos']))
print('Number of negative test samples: ', len(files['test_neg']))

Number of positive training samples:  97872
Number of negative training samples:  4393
Number of positive test samples:  41696
Number of negative test samples:  1969


In [5]:
#Columns in dataset
files['header']

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date\n


In [6]:
#few samples of negative reviews
files['train_neg'][13]

0                                           What?????????
1            I wanted the gift car4 for MUSIC not movies.
2       I had no idea that the amount is listed in US ...
3       ordered this for a going away party thinking i...
4       The giftcard themselves weren't the problem - ...
                              ...                        
4388    Beware, since this item is not eligible for su...
4389    I have had iTunes for about 3 weeks. The Progr...
4390    I have a G5 dual processor that is a little ov...
4391    iTune by Apple has the worst customer service ...
4392    I have ordered 2 of these cards from Amazon an...
Name: 13, Length: 4393, dtype: object

In [5]:
#few samples of positive reviews
files['train_pos'][13]

0                                                     Good
1        I can't believe how quickly Amazon can get the...
2                                                 excelent
3                               Great and Safe Gift Giving
4                                                     Bien
                               ...                        
97867    The itunes gift card is absolutely the best gi...
97868    Finally there is a way for your family to buy ...
97869    Finally there is a way for your family to buy ...
97870    I picked up a few of these at Target a while b...
97871    This is the ultimate tool for downloading musi...
Name: 13, Length: 97872, dtype: object

### Processing Reviews

In order for us to convert the text to a meaningful numerical representation the following steps have been performed: 
- Tokenisation: This step tokenize the sentence into individual words. i.e. a sentence with 5 words will become a list with 5 elements.
- Stopword removal: We then remove stopwords i.e. i, me, you etc.
- Removing Punctuations: This removes punctuations i.e. !,# etc.
- Lemmatizing: We then lemmatize the words to find the root word i.e. runs, running, ran gets converted to run.

### Word Freqs

After processing the text we then build a word freqency dictionary to count word associated with positive and negative reviews.

In [6]:
#build freqs
all_reviews, labels = extract_data.process_freq_text()
freqs = process_text.build_freq(all_reviews,labels)


In [7]:
import itertools
x = dict(itertools.islice(freqs.items(), 20)) 
x

{('wanted', 1.0): 254,
 ('gift', 1.0): 5266,
 ('car4', 1.0): 1,
 ('music', 1.0): 43,
 ('movie', 1.0): 22,
 ('idea', 1.0): 140,
 ('amount', 1.0): 187,
 ('listed', 1.0): 14,
 ('u', 1.0): 254,
 ('dollar', 1.0): 175,
 ('ended', 1.0): 80,
 ('purchasing', 1.0): 104,
 ('100', 1.0): 137,
 ('instead', 1.0): 179,
 ('cad', 1.0): 2,
 ('meant', 1.0): 31,
 ('extra', 1.0): 58,
 ('30cad', 1.0): 1,
 ('ordered', 1.0): 532,
 ('going', 1.0): 145}

In [8]:
for w in sorted(freqs, key=freqs.get, reverse=True):
    print(w, freqs[w])

('gift', 0.0) 77650
('easy', 0.0) 23164
('great', 0.0) 18548
('card', 0.0) 15120
('love', 0.0) 12631
('use', 0.0) 11024
('way', 0.0) 10950
("n't", 0.0) 10903
('br', 0.0) 10821
('get', 0.0) 10551
("'s", 0.0) 10415
('buy', 0.0) 9966
('purchase', 0.0) 8777
('perfect', 0.0) 8758
('good', 0.0) 8187
('give', 0.0) 7923
('want', 0.0) 7836
('like', 0.0) 7603
('one', 0.0) 7053
('loved', 0.0) 7041
('time', 0.0) 6919
('birthday', 0.0) 6613
('send', 0.0) 6573
('would', 0.0) 6362
('book', 0.0) 6234
('always', 0.0) 6213
('kindle', 0.0) 6193
('christmas', 0.0) 5655
('recipient', 0.0) 5512
('gift', 1.0) 5266
('able', 0.0) 5053
('could', 0.0) 5001
('friend', 0.0) 4828
('make', 0.0) 4763
('received', 0.0) 4739
('really', 0.0) 4729
('know', 0.0) 4493
('nice', 0.0) 4457
('print', 0.0) 4327
('someone', 0.0) 4277
('...', 0.0) 4183
('minute', 0.0) 4160
('fast', 0.0) 4118
('email', 0.0) 4102
('something', 0.0) 4094
('go', 0.0) 4075
('quick', 0.0) 4074
('happy', 0.0) 4027
('best', 0.0) 4025
('convenient', 0.0) 

('expire', 0.0) 213
('possible', 0.0) 212
('although', 0.0) 212
('business', 0.0) 212
('information', 0.0) 212
('form', 0.0) 211
('disappointed', 0.0) 211
('season', 0.0) 211
('guy', 0.0) 210
('must', 0.0) 210
('asin', 0.0) 210
('fund', 0.0) 209
('otherwise', 0.0) 209
('moment', 0.0) 208
('week', 1.0) 207
('student', 0.0) 207
('sell', 0.0) 207
('picked', 0.0) 207
('on-line', 0.0) 207
('arrived', 1.0) 206
('described', 0.0) 206
('enjoys', 0.0) 206
('high', 0.0) 206
('regular', 0.0) 205
('prefer', 0.0) 205
('dont', 0.0) 205
('rating', 0.0) 205
('effective', 0.0) 205
('exchange', 0.0) 205
('mailing', 0.0) 205
('however', 1.0) 204
('complete', 0.0) 204
('next', 1.0) 203
('see', 1.0) 203
('prompt', 0.0) 203
('certificate', 1.0) 202
('finally', 1.0) 202
('adorable', 0.0) 202
('mailed', 0.0) 202
('gave', 1.0) 201
('gc', 0.0) 201
('started', 0.0) 201
('bank', 0.0) 201
('con', 0.0) 201
('required', 0.0) 200
('say', 1.0) 198
('santa', 0.0) 198
('morning', 0.0) 198
('full', 0.0) 198
('least', 0.0

('difficult', 1.0) 74
('australia', 1.0) 74
('left', 1.0) 74
('transaction', 1.0) 74
('brother', 1.0) 74
('design', 1.0) 74
('’', 0.0) 74
('checking', 0.0) 74
('written', 0.0) 74
('pack', 0.0) 74
('screen', 0.0) 74
('loss', 0.0) 74
('miss', 0.0) 74
('trust', 0.0) 74
('purchaser', 0.0) 74
('reach', 0.0) 74
('picky', 0.0) 74
('loving', 0.0) 74
('happen', 0.0) 74
('procrastinator', 0.0) 74
('involved', 0.0) 74
('fix', 1.0) 73
('guess', 1.0) 73
('delightful', 0.0) 73
('per', 0.0) 73
('preferred', 0.0) 73
('scene', 0.0) 73
('suitable', 0.0) 73
('pleasant', 0.0) 73
('positive', 0.0) 73
('open', 1.0) 72
('read', 1.0) 72
('clue', 0.0) 72
('container', 0.0) 72
('afford', 0.0) 72
('buck', 0.0) 72
('im', 0.0) 72
('holder', 0.0) 72
('mum', 0.0) 72
('x', 0.0) 72
('win-win', 0.0) 72
('cheaper', 0.0) 72
('boyfriend', 0.0) 72
('therefore', 0.0) 72
('rapido', 0.0) 72
('log', 0.0) 72
('germany', 0.0) 71
('stated', 0.0) 71
('stationed', 0.0) 71
('unexpected', 0.0) 71
('sends', 0.0) 71
('view', 0.0) 71
('

('e-book', 0.0) 40
('plea', 0.0) 40
('held', 0.0) 40
('senior', 0.0) 40
('cutest', 0.0) 40
('opposed', 0.0) 40
('sea', 0.0) 40
('history', 0.0) 40
('traditional', 0.0) 40
('accessible', 0.0) 40
('impressive', 0.0) 40
('suited', 0.0) 40
('coworker', 0.0) 40
('key', 0.0) 40
('varied', 0.0) 40
('”', 0.0) 40
('segura', 0.0) 40
('affordable', 0.0) 40
('pas', 0.0) 40
('texas', 0.0) 40
('nativity', 0.0) 40
('hardest', 0.0) 40
('meaning', 0.0) 40
('tu', 0.0) 40
('user', 1.0) 39
('receiver', 1.0) 39
('hold', 1.0) 39
('granddaughter', 1.0) 39
('course', 1.0) 39
('file', 1.0) 39
('plastic', 1.0) 39
('local', 1.0) 39
('answer', 1.0) 39
('grandson', 1.0) 39
('state', 1.0) 39
('blank', 1.0) 39
('original', 1.0) 39
('delivering', 0.0) 39
('clearly', 0.0) 39
('somehow', 0.0) 39
('incentive', 0.0) 39
('china', 0.0) 39
('ty', 0.0) 39
('pop', 0.0) 39
('enclose', 0.0) 39
('reminder', 0.0) 39
('air', 0.0) 39
('random', 0.0) 39
('unwanted', 0.0) 39
('blessing', 0.0) 39
('valuable', 0.0) 39
('market', 0.0) 3

('sale', 1.0) 23
('department', 1.0) 23
('realize', 1.0) 23
('apology', 1.0) 23
('upon', 1.0) 23
('dec.', 1.0) 23
('nature', 0.0) 23
('zealand', 0.0) 23
('typical', 0.0) 23
('internationally', 0.0) 23
('monthly', 0.0) 23
('woofy', 0.0) 23
('pc', 0.0) 23
('long-distance', 0.0) 23
('sick', 0.0) 23
('kidding', 0.0) 23
('updated', 0.0) 23
('notify', 0.0) 23
('deserves', 0.0) 23
('searched', 0.0) 23
('roll', 0.0) 23
('jesus', 0.0) 23
('fiance', 0.0) 23
('friend.', 0.0) 23
('generous', 0.0) 23
('shall', 0.0) 23
('compare', 0.0) 23
('religious', 0.0) 23
('broken', 0.0) 23
('confirm', 0.0) 23
('action', 0.0) 23
('..........', 0.0) 23
('wise', 0.0) 23
('deployed', 0.0) 23
('customization', 0.0) 23
('occassions', 0.0) 23
('tradition', 0.0) 23
('away.', 0.0) 23
('deserve', 0.0) 23
('hang', 0.0) 23
('sistema', 0.0) 23
('fear', 0.0) 23
('determine', 0.0) 23
('improve', 0.0) 23
('git', 0.0) 23
('yea', 0.0) 23
('steal', 0.0) 23
('explained', 0.0) 23
('alaska', 0.0) 23
('um', 0.0) 23
('gran', 0.0) 23


('science', 0.0) 17
('k', 0.0) 17
('fell', 0.0) 17
('broke', 0.0) 17
('2000', 0.0) 17
('john', 0.0) 17
('relationship', 0.0) 17
('portable', 0.0) 17
('thin', 0.0) 17
('awkward', 0.0) 17
('sized', 0.0) 17
('satisfecho', 0.0) 17
('tenido', 0.0) 17
('sharp', 0.0) 17
('quantity', 0.0) 17
('chat', 0.0) 17
('savy', 0.0) 17
('future.', 0.0) 17
('eleven', 0.0) 17
('defiantly', 0.0) 17
('cared', 0.0) 17
('march', 0.0) 17
('ntilde', 0.0) 17
('fortune', 0.0) 17
('2012', 0.0) 17
('wireless', 0.0) 17
('paperback', 0.0) 17
('efectiva', 0.0) 17
('algo', 0.0) 17
('hundred', 1.0) 16
('scammed', 1.0) 16
('unusable', 1.0) 16
('regarding', 1.0) 16
('short', 1.0) 16
('giver', 1.0) 16
('junk', 1.0) 16
('e-gift', 1.0) 16
('lack', 1.0) 16
('wedding', 1.0) 16
('couldnt', 1.0) 16
('followed', 1.0) 16
('wonder', 1.0) 16
('us', 1.0) 16
('side', 1.0) 16
('staff', 1.0) 16
('crushed', 1.0) 16
('frustrated', 1.0) 16
('april', 1.0) 16
('pain', 1.0) 16
('together', 1.0) 16
('electronically', 1.0) 16
('gc', 1.0) 16
('do

('problem.', 0.0) 13
('laid', 0.0) 13
('mitzvah', 0.0) 13
('0', 0.0) 13
('bulk', 0.0) 13
('navigation', 0.0) 13
('superior', 0.0) 13
('economy', 0.0) 13
('dude', 0.0) 13
('worthy', 0.0) 13
('limiting', 0.0) 13
('pricing', 0.0) 13
('hung', 0.0) 13
('fitted', 0.0) 13
('icing', 0.0) 13
('dec', 0.0) 13
('1/2', 0.0) 13
('boss', 0.0) 13
('connect', 0.0) 13
('timeline', 0.0) 13
('initial', 0.0) 13
('capability', 0.0) 13
('patient', 0.0) 13
('utilized', 0.0) 13
('catalog', 0.0) 13
('responsable', 0.0) 13
('padded', 0.0) 13
('snowboarding', 0.0) 13
('amozon', 0.0) 13
('truth', 0.0) 13
('comodo', 0.0) 13
('pretend', 0.0) 13
('19', 0.0) 13
('pile', 0.0) 13
('luggage', 0.0) 13
('precio', 0.0) 13
('percent', 0.0) 13
('soldier', 0.0) 13
('ereader', 0.0) 13
('wants.', 0.0) 13
('grandfather', 0.0) 13
('certificate.', 0.0) 13
('country.', 0.0) 13
('speaking', 0.0) 13
('toilet', 0.0) 13
('data', 0.0) 13
('tan', 0.0) 13
('restricted', 0.0) 13
('employer', 0.0) 13
('giftees', 0.0) 13
('stretch', 0.0) 13
(

('mistakenly', 1.0) 10
('age', 1.0) 10
('variety', 1.0) 10
('level', 1.0) 10
('cards.', 1.0) 10
('rectified', 1.0) 10
('attempting', 1.0) 10
('concern', 1.0) 10
('thief', 1.0) 10
('wrapping', 1.0) 10
('fairly', 1.0) 10
('funny', 1.0) 10
('renew', 1.0) 10
('recieve', 1.0) 10
('helped', 1.0) 10
('rectify', 1.0) 10
('aci', 1.0) 10
('town', 1.0) 10
('europe', 1.0) 10
('straight', 1.0) 10
('leaving', 1.0) 10
('selling', 1.0) 10
('assured', 1.0) 10
('applying', 1.0) 10
('ready', 1.0) 10
('alone', 1.0) 10
('busy', 1.0) 10
('willing', 1.0) 10
('35', 1.0) 10
('cc', 1.0) 10
('mr.', 1.0) 10
('artist', 0.0) 10
('generate', 0.0) 10
('jack', 0.0) 10
('mechanism', 0.0) 10
('manually', 0.0) 10
('dig', 0.0) 10
('skip', 0.0) 10
('volunteer', 0.0) 10
('deployment', 0.0) 10
('national', 0.0) 10
('occasional', 0.0) 10
('hoo', 0.0) 10
('directed', 0.0) 10
('e-delivery', 0.0) 10
('converting', 0.0) 10
('woot', 0.0) 10
('option.', 0.0) 10
('role', 0.0) 10
('pm', 0.0) 10
('praise', 0.0) 10
('aka', 0.0) 10
('bo

('thing.', 0.0) 8
('clip', 0.0) 8
('giraffe', 0.0) 8
('guard', 0.0) 8
('buff', 0.0) 8
('prohibitive', 0.0) 8
('maximo', 0.0) 8
('30th', 0.0) 8
('block', 0.0) 8
('misplacing', 0.0) 8
('roommate', 0.0) 8
('becasue', 0.0) 8
('upgraded', 0.0) 8
('accompaniment', 0.0) 8
('justify', 0.0) 8
('toaster', 0.0) 8
('wipe', 0.0) 8
('preorder', 0.0) 8
('use.i', 0.0) 8
('s.', 0.0) 8
('authorization', 0.0) 8
('storm', 0.0) 8
('thesis', 0.0) 8
('attractively', 0.0) 8
('circle', 0.0) 8
('bracelet', 0.0) 8
('rolling', 0.0) 8
('scope', 0.0) 8
('facilitates', 0.0) 8
('backup', 0.0) 8
('issuing', 0.0) 8
('techno', 0.0) 8
('fo', 0.0) 8
('wizard', 0.0) 8
('misplace', 0.0) 8
('i-pad', 0.0) 8
('nurse', 0.0) 8
('disco', 0.0) 8
('piano', 0.0) 8
('glued', 0.0) 8
('streamlined', 0.0) 8
('updating', 0.0) 8
('i-pod', 0.0) 8
('presents.', 0.0) 8
('bob', 0.0) 8
('hastle', 0.0) 8
('occasions.', 0.0) 8
('educational', 0.0) 8
('mission', 0.0) 8
('insures', 0.0) 8
('transportation', 0.0) 8
('disagree', 0.0) 8
('claus', 0.0

('lucky', 1.0) 6
('existing', 1.0) 6
('format', 1.0) 6
('inquiry', 1.0) 6
('asap', 1.0) 6
('caught', 1.0) 6
('protection', 1.0) 6
('fraudulently', 1.0) 6
('etc.', 1.0) 6
('visible', 1.0) 6
('model', 1.0) 6
('desk', 1.0) 6
('refuse', 1.0) 6
('animation', 1.0) 6
('unavailable', 1.0) 6
('communication', 1.0) 6
('reviewed', 1.0) 6
('allows', 1.0) 6
('torn', 1.0) 6
('corner', 1.0) 6
('annoyed', 1.0) 6
('rated', 1.0) 6
('audio', 1.0) 6
('bird', 1.0) 6
('ugly', 1.0) 6
('fair', 1.0) 6
('force', 1.0) 6
('monetary', 1.0) 6
('p', 1.0) 6
('glue', 1.0) 6
('proceeded', 1.0) 6
('lame', 1.0) 6
('frankly', 1.0) 6
('99', 1.0) 6
('red', 1.0) 6
('recognize', 1.0) 6
('city', 1.0) 6
('print-at-home', 1.0) 6
('keeping', 1.0) 6
('near', 1.0) 6
('intention', 1.0) 6
('container', 1.0) 6
('described', 1.0) 6
('boxed', 1.0) 6
('embarrassment', 1.0) 6
('exact', 1.0) 6
('whatsoever', 1.0) 6
('throw', 1.0) 6
('besides', 1.0) 6
('claimed', 1.0) 6
('measure', 1.0) 6
('professional', 1.0) 6
('mum', 1.0) 6
('earth', 1.0

('portion', 1.0) 5
('considered', 1.0) 5
('exist', 1.0) 5
('2,000', 1.0) 5
('acknowledged', 1.0) 5
('satisfactory', 1.0) 5
('amazon', 1.0) 5
('stocking', 1.0) 5
('profit', 1.0) 5
('headache', 1.0) 5
('easter', 1.0) 5
('dealt', 1.0) 5
('pin', 1.0) 5
('back.', 1.0) 5
('emailing', 1.0) 5
('seven', 1.0) 5
('padded', 1.0) 5
('behind', 1.0) 5
('pop', 1.0) 5
('jib', 1.0) 5
('jab', 1.0) 5
('ruining', 1.0) 5
('debited', 1.0) 5
('250', 1.0) 5
('elected', 1.0) 5
('linked', 1.0) 5
('phoned', 1.0) 5
('confidence', 1.0) 5
('u.k.', 1.0) 5
('store.', 1.0) 5
('approximately', 1.0) 5
('menu', 1.0) 5
('senior', 1.0) 5
('1/2', 1.0) 5
('downloadable', 1.0) 5
('conversion', 1.0) 5
('impersonal', 1.0) 5
('service.', 1.0) 5
('speed', 1.0) 5
('you\\\\', 1.0) 5
('snowman', 1.0) 5
('unclear', 1.0) 5
('18th', 1.0) 5
('occur', 1.0) 5
('extended', 1.0) 5
('friend.', 1.0) 5
('personalize', 1.0) 5
('12/23', 1.0) 5
('twenty', 1.0) 5
('19', 1.0) 5
('activating', 1.0) 5
('snow', 1.0) 5
('day.', 1.0) 5
('16th', 1.0) 5
('

('registry', 1.0) 4
('differently', 1.0) 4
('f', 1.0) 4
('uk.', 1.0) 4
('reset', 1.0) 4
('treatment', 1.0) 4
('wide', 1.0) 4
('rubbed', 1.0) 4
('camper', 1.0) 4
('becomes', 1.0) 4
('flower', 1.0) 4
('refunding', 1.0) 4
('mailer', 1.0) 4
('lock', 1.0) 4
('geographical', 1.0) 4
('zone', 1.0) 4
('alerted', 1.0) 4
('mistaken', 1.0) 4
('scripted', 1.0) 4
('familiar', 1.0) 4
('ease', 1.0) 4
('somewhat', 1.0) 4
('refund.', 1.0) 4
('ending', 1.0) 4
('cell', 1.0) 4
('visit', 1.0) 4
('pinch', 1.0) 4
('sucked', 1.0) 4
('restrictive', 1.0) 4
('bos', 1.0) 4
('italy', 1.0) 4
('downloading', 1.0) 4
('economy', 1.0) 4
('sits', 1.0) 4
('resending', 1.0) 4
('10th', 1.0) 4
('hurt', 1.0) 4
('500.00', 1.0) 4
('damage', 1.0) 4
('realizing', 1.0) 4
('symbol', 1.0) 4
('sat', 1.0) 4
('favor', 1.0) 4
('ha', 1.0) 4
('oz', 1.0) 4
('trap', 1.0) 4
('popular', 1.0) 4
('flash', 1.0) 4
('for.', 1.0) 4
('eye', 1.0) 4
('celebration', 1.0) 4
('mountain', 1.0) 4
('trial', 1.0) 4
('sliced', 1.0) 4
('frame', 1.0) 4
('rang',

('oldie', 0.0) 4
('martin', 0.0) 4
('turn-around', 0.0) 4
('bout', 0.0) 4
('addicting', 0.0) 4
('guiding', 0.0) 4
('princess', 0.0) 4
('baking', 0.0) 4
('prove', 0.0) 4
('birthdays.', 0.0) 4
('kudo', 0.0) 4
('birhday', 0.0) 4
('scheduling', 0.0) 4
('cherish', 0.0) 4
('lodge', 0.0) 4
('enrich', 0.0) 4
('false', 0.0) 4
('liar', 0.0) 4
('beneath', 0.0) 4
('shuffle', 0.0) 4
('shopping\\\\', 0.0) 4
('penalty', 0.0) 4
('13-year-old', 0.0) 4
('greeting/gift', 0.0) 4
('liz', 0.0) 4
('pb', 0.0) 4
('creep', 0.0) 4
('sporting', 0.0) 4
('comparable', 0.0) 4
('knitting', 0.0) 4
('gft', 0.0) 4
('debating', 0.0) 4
('passage', 0.0) 4
('hid', 0.0) 4
('me.i', 0.0) 4
('recipients.', 0.0) 4
('pre-selected', 0.0) 4
('steel', 0.0) 4
('smartest', 0.0) 4
('ipads', 0.0) 4
('anyhow', 0.0) 4
('spiritual', 0.0) 4
('cassette', 0.0) 4
('milestone', 0.0) 4
('disguise', 0.0) 4
("'do", 0.0) 4
('visa/amex', 0.0) 4
('new\\\\', 0.0) 4
('slip/receipt', 0.0) 4
('patricia', 0.0) 4
('fascinating', 0.0) 4
('plugged', 0.0) 4
(

('translation', 0.0) 3
('resume', 0.0) 3
('psychology', 0.0) 3
('93', 0.0) 3
('bub', 0.0) 3
('conveniet', 0.0) 3
('gd', 0.0) 3
('goooooood', 0.0) 3
('routinely', 0.0) 3
('thks', 0.0) 3
('bitcoin', 0.0) 3
('understandable', 0.0) 3
('freaked', 0.0) 3
('amy', 0.0) 3
('popping', 0.0) 3
('supporting', 0.0) 3
('profusely', 0.0) 3
('approached', 0.0) 3
('tomorrow.', 0.0) 3
('ave', 0.0) 3
('obstacle', 0.0) 3
('catherine', 0.0) 3
('vale', 0.0) 3
('stars.', 0.0) 3
('gina', 0.0) 3
('attention.', 0.0) 3
('storyline', 0.0) 3
('collins', 0.0) 3
('3.5', 0.0) 3
('anna', 0.0) 3
('werewolf', 0.0) 3
('proofread', 0.0) 3
('tidier', 0.0) 3
('antonio', 0.0) 3
('hacked', 0.0) 3
('vibe', 0.0) 3
('metro', 0.0) 3
('communion', 0.0) 3
('dated', 0.0) 3
('quicly', 0.0) 3
('himself.', 0.0) 3
('lick', 0.0) 3
('atention', 0.0) 3
('aunty', 0.0) 3
('panel', 0.0) 3
('logistics', 0.0) 3
('e-gifts', 0.0) 3
('alongside', 0.0) 3
('late.', 0.0) 3
('specialize', 0.0) 3
('1.78', 0.0) 3
('constructive', 0.0) 3
('cheesey', 0.0) 

('significantly', 0.0) 3
('aquire', 0.0) 3
('validity', 0.0) 3
('tidy', 0.0) 3
('.for', 0.0) 3
('wich', 0.0) 3
('baja', 0.0) 3
('time-', 0.0) 3
('evans', 0.0) 3
('marilyn', 0.0) 3
('alleviates', 0.0) 3
('alien', 0.0) 3
('purchsed', 0.0) 3
('kindler', 0.0) 3
('ceremony', 0.0) 3
('printer.', 0.0) 3
('easy-to-follow', 0.0) 3
('deactivated', 0.0) 3
('clients.', 0.0) 3
('vender', 0.0) 3
('14-year', 0.0) 3
('e-mail.', 0.0) 3
('esy', 0.0) 3
('plunger', 0.0) 3
('vzla', 0.0) 3
('amazonkindle', 0.0) 3
('caed', 0.0) 3
('slack', 0.0) 3
('theory', 0.0) 3
('agrees', 0.0) 3
('u.k.', 0.0) 3
('aussie', 0.0) 3
('acceptance', 0.0) 3
('two-day', 0.0) 3
('knitter', 0.0) 3
('gator', 0.0) 3
('catagory', 0.0) 3
('nabi', 0.0) 3
('thailand', 0.0) 3
('computer/printer', 0.0) 3
('getter', 0.0) 3
('relatives.', 0.0) 3
('classmate', 0.0) 3
('use.it', 0.0) 3
('attract', 0.0) 3
('expanded', 0.0) 3
('bar/bat', 0.0) 3
('item/s', 0.0) 3
('kendal', 0.0) 3
('utilizando', 0.0) 3
('sid', 0.0) 3
('council', 0.0) 3
('hard-cop

('escalated', 1.0) 2
('drop', 1.0) 2
('bettie', 1.0) 2
('decipher', 1.0) 2
('effort.', 1.0) 2
('certification', 1.0) 2
('involving', 1.0) 2
('birthdate', 1.0) 2
('designated', 1.0) 2
('watermelon', 1.0) 2
('purse', 1.0) 2
('revelrie.com', 1.0) 2
('instead.', 1.0) 2
('638.99', 1.0) 2
('3d', 1.0) 2
('edited', 1.0) 2
('finance', 1.0) 2
('bah', 1.0) 2
('humbug', 1.0) 2
('539', 1.0) 2
('dialog', 1.0) 2
('andrew', 1.0) 2
('b/c', 1.0) 2
('unorganized', 1.0) 2
('svc', 1.0) 2
('precious', 1.0) 2
('lateness', 1.0) 2
('audible.com', 1.0) 2
('hp', 1.0) 2
('christmas.', 1.0) 2
('yep', 1.0) 2
('mine.', 1.0) 2
('previewed', 1.0) 2
('paul', 1.0) 2
('garage', 1.0) 2
('grab', 1.0) 2
('stressful', 1.0) 2
('dirty.', 1.0) 2
('national', 1.0) 2
('increased', 1.0) 2
('inconsistent', 1.0) 2
('summary', 1.0) 2
('served', 1.0) 2
('england.', 1.0) 2
('2013.', 1.0) 2
('party.', 1.0) 2
('grief', 1.0) 2
('rapid', 1.0) 2
('easy/intuitive', 1.0) 2
('expand', 1.0) 2
('mildly', 1.0) 2
('pricey', 1.0) 2
('ice', 1.0) 2
(

('conscience', 0.0) 2
('j.', 0.0) 2
('cranberry', 0.0) 2
('práctica', 0.0) 2
('replenishing', 0.0) 2
('excelnete', 0.0) 2
('gamble', 0.0) 2
('non-committal', 0.0) 2
('tougher', 0.0) 2
("'ca", 0.0) 2
('minimize', 0.0) 2
('rotating', 0.0) 2
('congratullations', 0.0) 2
('playfulness', 0.0) 2
('believing', 0.0) 2
('believable', 0.0) 2
('domestic', 0.0) 2
('homophobia', 0.0) 2
('phobia', 0.0) 2
('shout', 0.0) 2
('avery', 0.0) 2
('complimented', 0.0) 2
('excelenteee', 0.0) 2
('94', 0.0) 2
('thrilling', 0.0) 2
('nephew-in-law', 0.0) 2
('ford', 0.0) 2
('misc', 0.0) 2
('llike', 0.0) 2
('unactivated', 0.0) 2
('dissertation', 0.0) 2
('keith', 0.0) 2
('laziness', 0.0) 2
('justified', 0.0) 2
('daughter-in-', 0.0) 2
('=-d', 0.0) 2
('length/breadth', 0.0) 2
('.5', 0.0) 2
('bend/dent', 0.0) 2
('velocity', 0.0) 2
('sd', 0.0) 2
('deleting', 0.0) 2
('dodgy', 0.0) 2
('aloud', 0.0) 2
('grand-daughters', 0.0) 2
('non-reloadable', 0.0) 2
('chimney', 0.0) 2
('lottery', 0.0) 2
('chrisstmas', 0.0) 2
('treating'

('wilderness', 0.0) 2
('remained', 0.0) 2
('braving', 0.0) 2
('relive', 0.0) 2
('violin', 0.0) 2
('cocoa', 0.0) 2
('convinient.', 0.0) 2
('fatherhood', 0.0) 2
('pre-schedule', 0.0) 2
('promptly.', 0.0) 2
('chrsitmas', 0.0) 2
('fist', 0.0) 2
('communist', 0.0) 2
('vouch', 0.0) 2
('america.', 0.0) 2
('kaye', 0.0) 2
('shifting', 0.0) 2
('course.', 0.0) 2
('kitchen.', 0.0) 2
('thesaurus', 0.0) 2
('nitpicking', 0.0) 2
('abke', 0.0) 2
('cn', 0.0) 2
('communicated', 0.0) 2
('nostalgic', 0.0) 2
('maze', 0.0) 2
('concrete', 0.0) 2
('back.', 0.0) 2
('hipster', 0.0) 2
('down.', 0.0) 2
('reassured', 0.0) 2
('pomeranian', 0.0) 2
('brthday', 0.0) 2
('aviation', 0.0) 2
('grankids', 0.0) 2
('flip-flop', 0.0) 2
('forgo', 0.0) 2
('scott', 0.0) 2
('sweat.', 0.0) 2
('recommendable', 0.0) 2
('home.my', 0.0) 2
('salvation', 0.0) 2
('fiasco', 0.0) 2
('.a', 0.0) 2
('n.', 0.0) 2
('pre-holiday', 0.0) 2
("'world", 0.0) 2
('calculator', 0.0) 2
('walla', 0.0) 2
('tampered', 0.0) 2
('critic', 0.0) 2
('recommened', 

('foreward', 0.0) 2
('segun', 0.0) 2
('ofrece', 0.0) 2
('twisted', 0.0) 2
('tbe', 0.0) 2
('wisest', 0.0) 2
('evitar', 0.0) 2
('regalarlas', 0.0) 2
('demás', 0.0) 2
('octogenarian', 0.0) 2
('video.', 0.0) 2
('mazel', 0.0) 2
('propios', 0.0) 2
('realiza', 0.0) 2
('inveterate', 0.0) 2
('abit', 0.0) 2
('52', 0.0) 2
('kindels', 0.0) 2
('kf', 0.0) 2
('chavez', 0.0) 2
('amazon-ing', 0.0) 2
('necesidad', 0.0) 2
('otorgados', 0.0) 2
('ialso', 0.0) 2
('palette', 0.0) 2
('thermometer', 0.0) 2
('artículos', 0.0) 2
('personales', 0.0) 2
('shipping/delivery', 0.0) 2
('existen', 0.0) 2
('serias', 0.0) 2
('principio', 0.0) 2
('alegre', 0.0) 2
('códigos', 0.0) 2
('toysrus', 0.0) 2
('cerificate', 0.0) 2
('viven', 0.0) 2
('utilizacion', 0.0) 2
('oportuno', 0.0) 2
('inboxes', 0.0) 2
('scramble', 0.0) 2
('wondrous', 0.0) 2
('disponibilidad', 0.0) 2
('quedaba', 0.0) 2
('carding', 0.0) 2
('elegir', 0.0) 2
('llevar', 0.0) 2
('propio', 0.0) 2
('divorce', 0.0) 2
('kicker', 0.0) 2
('handheld', 0.0) 2
('shudder',

('pathosl', 1.0) 1
('countryside', 1.0) 1
('gobble', 1.0) 1
('individualism', 1.0) 1
('gallery', 1.0) 1
('piano.', 1.0) 1
('participant', 1.0) 1
('book4you_302662', 1.0) 1
('bookmail.org', 1.0) 1
('promitional', 1.0) 1
('arrrived', 1.0) 1
('135.00', 1.0) 1
('uncheck', 1.0) 1
('plot', 1.0) 1
('jeep', 1.0) 1
('wrangler', 1.0) 1
('1988', 1.0) 1
('peugeo', 1.0) 1
('5-speed', 1.0) 1
('gearbox', 1.0) 1
('repurposed', 1.0) 1
('rectangular', 1.0) 1
('serf', 1.0) 1
('numbers/barcodes', 1.0) 1
('picure', 1.0) 1
('chile', 1.0) 1
('boucher', 1.0) 1
('fine.i', 1.0) 1
('67', 1.0) 1
('double-checking', 1.0) 1
('stupidly', 1.0) 1
('a+', 1.0) 1
('d-', 1.0) 1
('selectable', 1.0) 1
('chirpy', 1.0) 1
('donald', 1.0) 1
('estes', 1.0) 1
('518', 1.0) 1
('659', 1.0) 1
('6080', 1.0) 1
('newborn', 1.0) 1
('critical.', 1.0) 1
('piss', 1.0) 1
('demonstrate', 1.0) 1
('techie', 1.0) 1
('tack', 1.0) 1
('conducive', 1.0) 1
('installing', 1.0) 1
('shaded', 1.0) 1
('humming', 1.0) 1
('guitar', 1.0) 1
('52', 1.0) 1
('co

('1-3', 1.0) 1
('voiced', 1.0) 1
('snug', 1.0) 1
('leather', 1.0) 1
('taxi', 1.0) 1
('jordan', 1.0) 1
('relize', 1.0) 1
('😡', 1.0) 1
('emphasize', 1.0) 1
('noticeable', 1.0) 1
('why.i', 1.0) 1
('slept', 1.0) 1
('yielded', 1.0) 1
('derived', 1.0) 1
('enormous', 1.0) 1
('spare', 1.0) 1
('valve', 1.0) 1
('stems.', 1.0) 1
('stem', 1.0) 1
('pictures.', 1.0) 1
('cruise', 1.0) 1
('brother-in-law', 1.0) 1
('nuisance', 1.0) 1
('foresee', 1.0) 1
("'promised", 1.0) 1
('1800', 1.0) 1
('89', 1.0) 1
('fiasco', 1.0) 1
('writer', 1.0) 1
('bee', 1.0) 1
('crockpot', 1.0) 1
('crock', 1.0) 1
('pot', 1.0) 1
('phantom', 1.0) 1
('items.1', 1.0) 1
('48.00', 1.0) 1
('35.00.there', 1.0) 1
('remainder.i', 1.0) 1
('aamazon', 1.0) 1
('waist', 1.0) 1
('wh0', 1.0) 1
('cut-out', 1.0) 1
('dusty', 1.0) 1
('winner', 1.0) 1
('purchase.i', 1.0) 1
('stung', 1.0) 1
('stream', 1.0) 1
('comfirmation', 1.0) 1
('jeanne', 1.0) 1
('bonney', 1.0) 1
('proposed', 1.0) 1
('resp', 1.0) 1
('wrk', 1.0) 1
('crashed', 1.0) 1
('0d', 1.0) 1

('it.ou', 1.0) 1
('ralph', 1.0) 1
('messick', 1.0) 1
('rgmapex', 1.0) 1
('aarrggghh', 1.0) 1
('beacuse', 1.0) 1
('newer', 1.0) 1
('today.', 1.0) 1
('-fed', 1.0) 1
('noticing', 1.0) 1
('6/16/13', 1.0) 1
('6/11/13', 1.0) 1
('6/13/13', 1.0) 1
('ignoring', 1.0) 1
('16.', 1.0) 1
('scheduled.', 1.0) 1
('intervene', 1.0) 1
('redeemgift', 1.0) 1
('loses', 1.0) 1
('luster', 1.0) 1
('fuel', 1.0) 1
('platform', 1.0) 1
('relatively', 1.0) 1
('swap', 1.0) 1
('6.10.13', 1.0) 1
('accordingly', 1.0) 1
('forensics', 1.0) 1
('increasingly', 1.0) 1
('attorney', 1.0) 1
('lifesaver', 1.0) 1
('reliability', 1.0) 1
('promptness', 1.0) 1
('unusuable', 1.0) 1
('unintuitive', 1.0) 1
('impressive', 1.0) 1
('wood', 1.0) 1
('chipped', 1.0) 1
('owl', 1.0) 1
('accountable', 1.0) 1
('kicking', 1.0) 1
('camo', 1.0) 1
('amazpn', 1.0) 1
('barf', 1.0) 1
('grrrrr', 1.0) 1
('reprogram', 1.0) 1
('reprogrammed', 1.0) 1
('money/gift', 1.0) 1
('hid', 1.0) 1
('safeguard', 1.0) 1
('fyi-the', 1.0) 1
('aside', 1.0) 1
('satisfying'

('cannon', 1.0) 1
('mail\\\\', 1.0) 1
('7.99', 1.0) 1
('intimadating', 1.0) 1
('automaticly', 1.0) 1
('soul', 1.0) 1
('redue', 1.0) 1
('retainable', 1.0) 1
('covenience', 1.0) 1
('employer', 1.0) 1
('8.00', 1.0) 1
('redeemere', 1.0) 1
('replenished', 1.0) 1
('5/25', 1.0) 1
('5/26', 1.0) 1
('6/2', 1.0) 1
('forgive', 1.0) 1
('equivalent', 1.0) 1
('rr', 1.0) 1
('whatever.', 1.0) 1
('win', 1.0) 1
('skin', 1.0) 1
('nd', 1.0) 1
('gloria', 1.0) 1
('equal', 1.0) 1
('whopping', 1.0) 1
('5/28', 1.0) 1
('5/31/12.', 1.0) 1
('actuality', 1.0) 1
('5/31/12', 1.0) 1
('6/1/12.', 1.0) 1
('unexcusable', 1.0) 1
('cardio', 1.0) 1
('complainig', 1.0) 1
('prdered', 1.0) 1
('0-10', 1.0) 1
('dealer', 1.0) 1
('score', 1.0) 1
('obligation', 1.0) 1
('exists', 1.0) 1
('assisting', 1.0) 1
('liane', 1.0) 1
('timesaver', 1.0) 1
('abominabley', 1.0) 1
('glitz', 1.0) 1
('brazil', 1.0) 1
('accurately', 1.0) 1
('involves', 1.0) 1
('...............', 1.0) 1
('exhorbitant', 1.0) 1
("'present", 1.0) 1
('comfort', 1.0) 1
('s

('11.99', 1.0) 1
('nicaragua', 1.0) 1
('herald', 1.0) 1
('shirley', 1.0) 1
('bador', 1.0) 1
('shirleybador', 1.0) 1
('states.\\\\', 1.0) 1
('presently', 1.0) 1
('apporpriate', 1.0) 1
('lied', 1.0) 1
('represemntative', 1.0) 1
('behaviorand', 1.0) 1
('mcmillen', 1.0) 1
('cichester', 1.0) 1
('obsolete', 1.0) 1
('u.s.p.s', 1.0) 1
('amazon.com\\\\', 1.0) 1
('thereby', 1.0) 1
('alerting', 1.0) 1
('couse', 1.0) 1
('upshot', 1.0) 1
('provides', 1.0) 1
('stolen/lost', 1.0) 1
('duck', 1.0) 1
('santa\\\\', 1.0) 1
('asked\\\\', 1.0) 1
('relating', 1.0) 1
('yay', 1.0) 1
('slowest', 1.0) 1
('discoved', 1.0) 1
('guitarjam97', 1.0) 1
('thereof', 1.0) 1
("'sunshine", 1.0) 1
('cg', 1.0) 1
('acually', 1.0) 1
('relized', 1.0) 1
('s/he', 1.0) 1
('trading', 1.0) 1
('strengthened', 1.0) 1
('refusal', 1.0) 1
('orwell', 1.0) 1
('farm\\\\', 1.0) 1
('1984\\\\', 1.0) 1
('ever-', 1.0) 1
('picked-', 1.0) 1
('unreal', 1.0) 1
('carrds', 1.0) 1
('californ', 1.0) 1
('ia', 1.0) 1
('goodbye', 1.0) 1
('34aus', 1.0) 1
('o

('give-that', 0.0) 1
('line-up', 0.0) 1
('arriba', 0.0) 1
('should.', 0.0) 1
('perdon', 0.0) 1
('well-spent', 0.0) 1
('amazon.com-land', 0.0) 1
('ggreat', 0.0) 1
('quickty', 0.0) 1
('here\x1ahere', 0.0) 1
('quality\x1a', 0.0) 1
('cuutomizable', 0.0) 1
('mc/visa', 0.0) 1
('neiborgh', 0.0) 1
('give-aways', 0.0) 1
('angst', 0.0) 1
('co-', 0.0) 1
('it.just', 0.0) 1
('it.was', 0.0) 1
('use.the', 0.0) 1
('affinity', 0.0) 1
('woreid', 0.0) 1
('derived', 0.0) 1
('gift~', 0.0) 1
('were-', 0.0) 1
('llegen', 0.0) 1
('cousing', 0.0) 1
('alvaro', 0.0) 1
('restivo', 0.0) 1
('so.africa.eas\\y', 0.0) 1
('interact', 0.0) 1
('minutesafter', 0.0) 1
('revenue', 0.0) 1
('good~~~~~~~', 0.0) 1
('free-shipping', 0.0) 1
('cherishable', 0.0) 1
('-cyberamp', 0.0) 1
(':3', 0.0) 1
('cyberamp', 0.0) 1
('company/manufacturer', 0.0) 1
('comapanies', 0.0) 1
('amazon/ebay/target/walmart.com', 0.0) 1
('cyberampproductreviews', 0.0) 1
('seriousness', 0.0) 1
('ysq', 0.0) 1
('boing', 0.0) 1
('wanted.this', 0.0) 1
('recruit

('dreamscometrue', 0.0) 1
('questions.', 0.0) 1
('perfectly/', 0.0) 1
('tin.', 0.0) 1
('munnie', 0.0) 1
("who'duh", 0.0) 1
('5-8', 0.0) 1
('transformed', 0.0) 1
('thnkx', 0.0) 1
('dilvery', 0.0) 1
('excelkente', 0.0) 1
('subjecting', 0.0) 1
('ordering/buying', 0.0) 1
('artificial', 0.0) 1
('excelente.-', 0.0) 1
('-_-', 0.0) 1
('12/13/14', 0.0) 1
('12/15/14', 0.0) 1
('efasy', 0.0) 1
('hadden', 0.0) 1
('schmeasy', 0.0) 1
('staffer', 0.0) 1
('uber', 0.0) 1
('forgeted', 0.0) 1
('midway', 0.0) 1
('voodoo', 0.0) 1
('non-themed', 0.0) 1
('...........................', 0.0) 1
('spider', 0.0) 1
('relevance', 0.0) 1
('dismay', 0.0) 1
('70°', 0.0) 1
('northernmost', 0.0) 1
('hot/humid', 0.0) 1
('amazoooooooooon', 0.0) 1
('amozoooooon', 0.0) 1
('knock-off', 0.0) 1
('vetting', 0.0) 1
('ilse', 0.0) 1
('naj', 0.0) 1
('bellas', 0.0) 1
('diameter', 0.0) 1
('yaaaaasssss', 0.0) 1
('please.the', 0.0) 1
('impressing', 0.0) 1
('untouchable', 0.0) 1
('non-prime', 0.0) 1
('awesome！', 0.0) 1
('ghana', 0.0) 1
(

('articule', 0.0) 1
('sayi', 0.0) 1
('leeway', 0.0) 1
("boys'trip", 0.0) 1
('bacon.', 0.0) 1
('empache', 0.0) 1
('ch', 0.0) 1
('day/last', 0.0) 1
('goodies-', 0.0) 1
('possible-', 0.0) 1
('tap-tap-done.', 0.0) 1
('egc', 0.0) 1
('immedaitely', 0.0) 1
('liberating', 0.0) 1
('bike.', 0.0) 1
('cameron', 0.0) 1
('22-year', 0.0) 1
('marvelously', 0.0) 1
('bulkiy', 0.0) 1
('detangle', 0.0) 1
('exelete', 0.0) 1
('email/code', 0.0) 1
('brick-', 0.0) 1
('-mortar', 0.0) 1
('ta-', 0.0) 1
('cards.also', 0.0) 1
('30year', 0.0) 1
('dalek', 0.0) 1
('ticket///give', 0.0) 1
('cooccanut', 0.0) 1
('flaxseed', 0.0) 1
('ville', 0.0) 1
('email.such', 0.0) 1
('exchanging.if', 0.0) 1
('defently', 0.0) 1
('nugget', 0.0) 1
('goof-proof', 0.0) 1
('10x', 0.0) 1
('great.great', 0.0) 1
('reddem', 0.0) 1
('toolbox', 0.0) 1
('needle.', 0.0) 1
('wallmart', 0.0) 1
('windex', 0.0) 1
('1/16', 0.0) 1
('shortcoming', 0.0) 1
('frivolously', 0.0) 1
('therapy', 0.0) 1
('scoliosis', 0.0) 1
('restricting', 0.0) 1
('handfull', 0.

('felpro', 0.0) 1
('netherlands.', 0.0) 1
('hard|forum', 0.0) 1
('battlefield', 0.0) 1
('crippled', 0.0) 1
('vist', 0.0) 1
('guaira', 0.0) 1
('town/state', 0.0) 1
('marqueting', 0.0) 1
('caress', 0.0) 1
('isaac', 0.0) 1
('use，but', 0.0) 1
('purchase.i', 0.0) 1
('haha。', 0.0) 1
('reslly', 0.0) 1
('llagado', 0.0) 1
('bien.', 0.0) 1
('wounded', 0.0) 1
('profesional', 0.0) 1
('atractive', 0.0) 1
('visually.', 0.0) 1
('bluff', 0.0) 1
('quick.and', 0.0) 1
('promo.', 0.0) 1
('fade', 0.0) 1
('scamming', 0.0) 1
('gesture.', 0.0) 1
('xx.00', 0.0) 1
('coutry', 0.0) 1
('guite', 0.0) 1
('llent', 0.0) 1
('ry', 0.0) 1
('receipt.', 0.0) 1
('jehova', 0.0) 1
('scarlet', 0.0) 1
('carnesi', 0.0) 1
('isaiah', 0.0) 1
('1:18', 0.0) 1
('reacted', 0.0) 1
('fairly-immediate', 0.0) 1
('queried', 0.0) 1
('onsite', 0.0) 1
('millennium', 0.0) 1
('family😊', 0.0) 1
('excelenc', 0.0) 1
('____', 0.0) 1
('sufficed', 0.0) 1
('account.great', 0.0) 1
('line.the', 0.0) 1
('boo-boo', 0.0) 1
('usuable', 0.0) 1
('setback', 0.0

('retail-world', 0.0) 1
('hemisphere/australian', 0.0) 1
('card….absolutely', 0.0) 1
('moro', 0.0) 1
('siblings.', 0.0) 1
('envelop.', 0.0) 1
('symbolizes', 0.0) 1
('ageing', 0.0) 1
('well.amazon', 0.0) 1
('youi', 0.0) 1
('remail', 0.0) 1
('fill-in', 0.0) 1
('happen….so', 0.0) 1
('hmmmmmmmmm', 0.0) 1
('service.it', 0.0) 1
('messenger', 0.0) 1
('leuzinger', 0.0) 1
('great-nieces.', 0.0) 1
('box-', 0.0) 1
('olders', 0.0) 1
('youngers', 0.0) 1
('jeanette', 0.0) 1
('chrristmas', 0.0) 1
('hollidays', 0.0) 1
('captivity', 0.0) 1
('racked', 0.0) 1
('likened', 0.0) 1
('rationalizing', 0.0) 1
("'junk", 0.0) 1
('no-no', 0.0) 1
('friends.extremely', 0.0) 1
('reguired', 0.0) 1
('christmas/appreciation', 0.0) 1
('hurrying', 0.0) 1
('smile.', 0.0) 1
('recommendation.', 0.0) 1
('abt', 0.0) 1
('colorfuls', 0.0) 1
('erfect', 0.0) 1
('ranged', 0.0) 1
('sistert', 0.0) 1
('bit.', 0.0) 1
('spender', 0.0) 1
("b'ay", 0.0) 1
('said.and', 0.0) 1
('pratice', 0.0) 1
('usu', 0.0) 1
('givving', 0.0) 1
('-especiall

('yeeehaa', 0.0) 1
('i\uef01was', 0.0) 1
('simple.a', 0.0) 1
('140', 0.0) 1
("have'nt", 0.0) 1
('money.takes', 0.0) 1
('.by', 0.0) 1
('easyest', 0.0) 1
('greatest/easiest', 0.0) 1
('difficul', 0.0) 1
('use.appropriately', 0.0) 1
('privllege', 0.0) 1
('cards/messages', 0.0) 1
('notebook', 0.0) 1
('it,100', 0.0) 1
('odrered', 0.0) 1
('customazing', 0.0) 1
('cultural', 0.0) 1
('cheri', 0.0) 1
('aquaintances', 0.0) 1
('squezzy', 0.0) 1
('overal', 0.0) 1
("'love", 0.0) 1
('foggiest', 0.0) 1
('ease.amazon', 0.0) 1
('offert', 0.0) 1
('reuirements', 0.0) 1
('comprehensible', 0.0) 1
('book/s', 0.0) 1
('big-time', 0.0) 1
('hesitates', 0.0) 1
('yeaaa', 0.0) 1
('sci-fi', 0.0) 1
('/plus', 0.0) 1
('groupe', 0.0) 1
('buy/give', 0.0) 1
('demonstrate', 0.0) 1
('sent/delivered', 0.0) 1
('convalescence', 0.0) 1
('carf', 0.0) 1
('spectatives', 0.0) 1
('again.five', 0.0) 1
('manu', 0.0) 1
('clicks.', 0.0) 1
('awesum', 0.0) 1
('congratulation.i', 0.0) 1
('percipient', 0.0) 1
('practicable', 0.0) 1
('harman'

('worrry', 0.0) 1
('fit.for', 0.0) 1
('yey', 0.0) 1
('custome-printed', 0.0) 1
('need/', 0.0) 1
('storebought', 0.0) 1
('y0u', 0.0) 1
('dressier', 0.0) 1
('nominate', 0.0) 1
('relentless', 0.0) 1
('manifested', 0.0) 1
('banked', 0.0) 1
('unwrapped', 0.0) 1
('jest', 0.0) 1
('swivel', 0.0) 1
("i-couldn't-make", 0.0) 1
('it-to-the-shower', 0.0) 1
('giggle.', 0.0) 1
('livwe', 0.0) 1
('certificate/bday', 0.0) 1
('cherrsu', 0.0) 1
('aawl', 0.0) 1
('waived', 0.0) 1
('cb', 0.0) 1
('gift.by', 0.0) 1
('energy.i', 0.0) 1
('voil', 0.0) 1
('behave', 0.0) 1
('vut', 0.0) 1
('ccccuold', 0.0) 1
('replicate', 0.0) 1
('things.off', 0.0) 1
('gorilla', 0.0) 1
('root', 0.0) 1
('tract', 0.0) 1
("'foreign", 0.0) 1
('loused', 0.0) 1
('108', 0.0) 1
('minimally', 0.0) 1
('13.99', 0.0) 1
('card,10.00', 0.0) 1
('dsiponible', 0.0) 1
('recomeindo', 0.0) 1
('animatiom', 0.0) 1
('ungrateful', 0.0) 1
('givers.', 0.0) 1
('diann', 0.0) 1
('consaul', 0.0) 1
('nomad', 0.0) 1
('been/will', 0.0) 1
('remington', 0.0) 1
('twar

('overstated', 0.0) 1
('purchases/gifts', 0.0) 1
('directamente', 0.0) 1
('if/when', 0.0) 1
('saudos', 0.0) 1
('bybyb', 0.0) 1
('i/2', 0.0) 1
('tds', 0.0) 1
('telecummunication', 0.0) 1
('zein', 0.0) 1
('homebbound', 0.0) 1
('contiene', 0.0) 1
('bbt', 0.0) 1
('tvt', 0.0) 1
('vrf', 0.0) 1
('rcr', 0.0) 1
('vtv', 0.0) 1
('gv', 0.0) 1
('skipped', 0.0) 1
('ahorre', 0.0) 1
('deathly', 0.0) 1
('conveneience', 0.0) 1
('ahorros', 0.0) 1
('increible', 0.0) 1
('vastness', 0.0) 1
('photoshopped', 0.0) 1
('again.item', 0.0) 1
('designs/pictures', 0.0) 1
('oh-so-fun', 0.0) 1
('omany', 0.0) 1
('rocking', 0.0) 1
('difficultto', 0.0) 1
('destiny', 0.0) 1
('msoy', 0.0) 1
('asuming', 0.0) 1
('exqactly', 0.0) 1
('50-100', 0.0) 1
('sincerest', 0.0) 1
('estaré', 0.0) 1
('regalo.', 0.0) 1
('entiendan', 0.0) 1
('personal/', 0.0) 1
('intensive.', 0.0) 1
('fluke', 0.0) 1
('caducan', 0.0) 1
('anualmente', 0.0) 1
('existente', 0.0) 1
('deceen', 0.0) 1
("dind't", 0.0) 1
('stickin', 0.0) 1
('granddaughter-in-law', 

('punishment', 0.0) 1
('jtm', 0.0) 1
("'oops", 0.0) 1
('centsing', 0.0) 1
('waas', 0.0) 1
('fiberoptic', 0.0) 1
('bativity', 0.0) 1
('offerrings', 0.0) 1
('niece/nephew', 0.0) 1
('wagon', 0.0) 1
('afectar', 0.0) 1
('easest', 0.0) 1
("'producing", 0.0) 1
('moolah', 0.0) 1
('nicley', 0.0) 1
('brother-in-law.', 0.0) 1
('was.nice', 0.0) 1
('be.able', 0.0) 1
('11/2', 0.0) 1
('intante', 0.0) 1
('podria', 0.0) 1
('guardarla', 0.0) 1
('positon', 0.0) 1
('christmassy.', 0.0) 1
('psychedelic', 0.0) 1
("'against", 0.0) 1
('cople', 0.0) 1
('embarassed', 0.0) 1
('pasen', 0.0) 1
('grouped', 0.0) 1
('holiday.', 0.0) 1
('holidays.it', 0.0) 1
('emiediately', 0.0) 1
('latititude', 0.0) 1
('hollywd', 0.0) 1
('childrenm', 0.0) 1
('c-mas', 0.0) 1
('vivo', 0.0) 1
('proximo', 0.0) 1
('acumularlo.', 0.0) 1
('kindle-loving', 0.0) 1
('future-mother-in-law', 0.0) 1
('niceness', 0.0) 1
('to.before', 0.0) 1
('seas.', 0.0) 1
('other-hard', 0.0) 1
('jab/amazon', 0.0) 1
('música', 0.0) 1
('bobble', 0.0) 1
('folow', 0

('fututas', 0.0) 1
('g4acias', 0.0) 1
('importa', 0.0) 1
('campbell', 0.0) 1
('keyed', 0.0) 1
('individaul', 0.0) 1
('non-refillable', 0.0) 1
('son-in-law.', 0.0) 1
('engineer.', 0.0) 1
('governmet', 0.0) 1
('t.a', 0.0) 1
('recomentd', 0.0) 1
('wanted.it', 0.0) 1
('gizmoreport.com', 0.0) 1
('crocheting', 0.0) 1
('crochet', 0.0) 1
('blaaa', 0.0) 1
("'celebrate", 0.0) 1
('print/purchase', 0.0) 1
('usps.', 0.0) 1
('applys', 0.0) 1
('shove', 0.0) 1
('broader', 0.0) 1
('reoccuring', 0.0) 1
('straighened', 0.0) 1
('service.the', 0.0) 1
('inscribed', 0.0) 1
('assassination', 0.0) 1
('murder', 0.0) 1
('excatly', 0.0) 1
('webding', 0.0) 1
('unclaimed', 0.0) 1
('instantness', 0.0) 1
('courage', 0.0) 1
('newylweds', 0.0) 1
('whatvere', 0.0) 1
('accrosse', 0.0) 1
('fabu', 0.0) 1
('fabulosity', 0.0) 1
('glittery', 0.0) 1
('chatchkes', 0.0) 1
('healthful', 0.0) 1
('bling', 0.0) 1
('non-fattening', 0.0) 1
('sigh~', 0.0) 1
('ʚ', 0.0) 1
('ˆ◡ˆ', 0.0) 1
('ɞ', 0.0) 1
('❤', 0.0) 1
('printing.', 0.0) 1
('sh

('received.it', 0.0) 1
('sidestep', 0.0) 1
('oit', 0.0) 1
('dpo', 0.0) 1
('sanuk', 0.0) 1
('sanuks', 0.0) 1
('points\\\\', 0.0) 1
('metchandise', 0.0) 1
('curling', 0.0) 1
('rosaura', 0.0) 1
('microshell', 0.0) 1
('amonth', 0.0) 1
('minnentoka', 0.0) 1
('beaded', 0.0) 1
('need-to-have', 0.0) 1
('ludicrous.', 0.0) 1
('crocs', 0.0) 1
('amb', 0.0) 1
('zagat', 0.0) 1
('audible.com', 0.0) 1
('cricut', 0.0) 1
('limit\\\\', 0.0) 1
('ghd', 0.0) 1
('birthday.what', 0.0) 1
('99.00', 0.0) 1
('knidle', 0.0) 1
('kara', 0.0) 1
('dalton', 0.0) 1
('abigal', 0.0) 1
('darkened', 0.0) 1
('old-fashioned\\\\', 0.0) 1
('shepitka', 0.0) 1
('unusable.', 0.0) 1
('martinez', 0.0) 1
('71st', 0.0) 1
('infinitable', 0.0) 1
('read.it', 0.0) 1
('sonic', 0.0) 1
('flosser', 0.0) 1
('polish', 0.0) 1
('lemonade', 0.0) 1
('wiah', 0.0) 1
('reader.so', 0.0) 1
('amazine', 0.0) 1
('b00383pb0u', 0.0) 1
('cfds05', 0.0) 1
('boombox', 0.0) 1
('b000c1zdtu', 0.0) 1
('jessica', 0.0) 1
('parker', 0.0) 1
('parfum', 0.0) 1
('3.4-ounce

('blowing', 0.0) 1
('swept', 0.0) 1
('metric-o-phobes', 0.0) 1
('plight', 0.0) 1
('dramatic', 0.0) 1
('eternal', 0.0) 1
('giviing', 0.0) 1
('lang', 0.0) 1
('calanders', 0.0) 1
('gobaby', 0.0) 1
('grandchuildren', 0.0) 1
('moka', 0.0) 1
('indulged', 0.0) 1
('espresso-based', 0.0) 1
('one-click\\\\', 0.0) 1
('copout', 0.0) 1
('8.75', 0.0) 1
('deter', 0.0) 1
('removing', 0.0) 1
('etter', 0.0) 1
('recv', 0.0) 1
('cp', 0.0) 1
('mind-boggling', 0.0) 1
('zane', 0.0) 1
('best-selling', 0.0) 1
('card/printout', 0.0) 1
('moan', 0.0) 1
('/the', 0.0) 1
('amazon.very', 0.0) 1
('versitility', 0.0) 1
('stelle', 0.0) 1
('koonze', 0.0) 1
('untill', 0.0) 1
('contune', 0.0) 1
('instigate', 0.0) 1
('presto\\\\', 0.0) 1
('dwindles', 0.0) 1
('experience-i', 0.0) 1
('pie\\\\', 0.0) 1
('uggs', 0.0) 1
('-love', 0.0) 1
('themselfs', 0.0) 1
('anxiously', 0.0) 1
('book/cd', 0.0) 1
('tuition', 0.0) 1
('cyperworld', 0.0) 1
('aproved', 0.0) 1
('arethe', 0.0) 1
('half.i', 0.0) 1
('meand', 0.0) 1
('specialthank', 0.0)

('pahgcs', 0.0) 1
('handed.', 0.0) 1
('instruct', 0.0) 1
('phone.', 0.0) 1
('purchaser/gift', 0.0) 1
('w/nothing', 0.0) 1
('detested', 0.0) 1
('gift/', 0.0) 1
('denominaton', 0.0) 1
('techno-feeb', 0.0) 1
('gird', 0.0) 1
('aggrivation', 0.0) 1
('amazon-gift', 0.0) 1
('giftcard.i', 0.0) 1
('any\\\\', 0.0) 1
('reassure', 0.0) 1
('nec', 0.0) 1
('multiprofiler\\\\', 0.0) 1
('brightness', 0.0) 1
('icc', 0.0) 1
('dvi-d', 0.0) 1
('macpro', 0.0) 1
('port-dvi', 0.0) 1
('converter', 0.0) 1
('surprsed', 0.0) 1
('dsplayed', 0.0) 1
('bubba', 0.0) 1
('palace', 0.0) 1
('proper-looking', 0.0) 1
('skipping', 0.0) 1
('interim', 0.0) 1
('envelopes\\\\', 0.0) 1
('teacher-ey', 0.0) 1
('demonimation', 0.0) 1
('u.p.s', 0.0) 1
('pudding', 0.0) 1
('seagate', 0.0) 1
('freeagent', 0.0) 1
('goflex', 0.0) 1
('ultra-portable', 0.0) 1
('thiswasthe', 0.0) 1
('usingamazon.com', 0.0) 1
('stanford', 0.0) 1
('30-year-old', 0.0) 1
('conmpras', 0.0) 1
('wanted/just', 0.0) 1
('w/i', 0.0) 1
('originaly', 0.0) 1
('pijamas', 0

('fashionned', 0.0) 1
('rechecked', 0.0) 1
('cannon', 0.0) 1
('rebel', 0.0) 1
('t2i', 0.0) 1
('newbee', 0.0) 1
('experieince', 0.0) 1
('gulped', 0.0) 1
('diferentes', 0.0) 1
('ali', 0.0) 1
('swoop', 0.0) 1
('spammed\\\\', 0.0) 1
('stale', 0.0) 1
('rot', 0.0) 1
('no-no\\\\', 0.0) 1
('diet-busters', 0.0) 1
('gamut', 0.0) 1
("don'ts", 0.0) 1
('tandem', 0.0) 1
('unlucky', 0.0) 1
('chappy', 0.0) 1
('multitute', 0.0) 1
('8gb', 0.0) 1
('fisher', 0.0) 1
('lamb', 0.0) 1
('erie', 0.0) 1
('tallahassee', 0.0) 1
('cradle', 0.0) 1
('devote', 0.0) 1
('prep', 0.0) 1
('reflecting', 0.0) 1
('inexperience', 0.0) 1
('trimester', 0.0) 1
('networking', 0.0) 1
('thot', 0.0) 1
('buget', 0.0) 1
('create\\\\', 0.0) 1
('breathes', 0.0) 1
('coincided', 0.0) 1
('certifcates', 0.0) 1
('mirabile', 0.0) 1
('dictu', 0.0) 1
('products-', 0.0) 1
('dc.', 0.0) 1
('choises.thanks', 0.0) 1
('accolade', 0.0) 1
('acceder', 0.0) 1
('accesorios', 0.0) 1
('life-long', 0.0) 1
('avoinds', 0.0) 1
('plague', 0.0) 1
('~natalia', 0.0)

In [9]:
len(freqs)

36732

In [10]:
import pandas as pd
t = pd.Series(freqs).reset_index()
t.columns = ["word","sentiment","freq"]

In [11]:
#top n words related to positive sentiments
n=10
t[t['sentiment']==0].sort_values("freq", ascending=False).head(n)

Unnamed: 0,word,sentiment,freq
8841,gift,0.0,77650
8844,easy,0.0,23164
8839,great,0.0,18548
8849,card,0.0,15120
8847,love,0.0,12631
8953,use,0.0,11024
8906,way,0.0,10950
8832,n't,0.0,10903
8862,br,0.0,10821
8835,get,0.0,10551


In [12]:
#top n words related to negative sentiments
n=10
t[t['sentiment']==1].sort_values("freq", ascending=False).head(n)

Unnamed: 0,word,sentiment,freq
1,gift,1.0,5266
33,n't,1.0,1966
58,br,1.0,1912
61,card,1.0,1220
37,would,1.0,1157
109,use,1.0,1033
107,get,1.0,962
194,time,1.0,934
112,never,1.0,902
168,one,1.0,899


In [3]:
from transformer import Transformer
t = Transformer()
train_data, test_data = t.model_data()

In [4]:
train_data

Unnamed: 0,product_id,product_parent,product_category,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,negative_score,positive_score,target_label
0,B004LLIKVU,473048287,Gift Card,0,0,N,Y,One Star,What?????????,0.0,0.0,1.0
1,B004LLIKVU,473048287,Gift Card,0,0,N,Y,Two Stars,I wanted the gift car4 for MUSIC not movies.,5586.0,82177.0,1.0
2,B004LLIKVU,473048287,Gift Card,0,0,N,Y,I had no idea that the amount is listed in ...,I had no idea that the amount is listed in US ...,1362.0,10681.0,1.0
3,BT00DC6QU4,473048287,Gift Card,0,0,N,Y,... away party thinking it should be a quick a...,ordered this for a going away party thinking i...,10107.0,81404.0,1.0
4,B005ESMGGY,379368939,Gift Card,0,0,N,Y,The giftcard themselves weren't the problem - ...,The giftcard themselves weren't the problem - ...,9108.0,68275.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...
102260,B0002CZPPG,867256265,Gift Card,3,9,N,N,Great idea.,The itunes gift card is absolutely the best gi...,10357.0,137341.0,0.0
102261,B0002CZPPG,867256265,Gift Card,10,10,N,N,Way easier than explaining your musical taste ...,Finally there is a way for your family to buy ...,9890.0,138446.0,0.0
102262,B0002CZPPG,867256265,Gift Card,20,30,N,N,Way easier than explaining your musical taste ...,Finally there is a way for your family to buy ...,11812.0,149353.0,0.0
102263,B0002CZPPG,867256265,Gift Card,63,72,N,N,A great way to turn cash into songs,I picked up a few of these at Target a while b...,14114.0,160144.0,0.0


In [1]:
#building features for the model
from transformer import Transformer
t = Transformer()
train_data, test_data = t.build_features()

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [None]:
#ToDo: iterative data cleaning, no of times to go through the loop, what i had to do. automated & manual.

In [None]:
#Next Steps
-ToDo: for each step record inputs & outputs i.e. n x m array etc. Think if anything's left behind.
- Build features from the word counts - ToDo: review with smaller chunk of data.
- Encode categorical features
- Perform EDA's
- Develop logistic regression model

- Look at probabilistic Topic Models

In [2]:
import app
app.logistic_regression_results()

Precision: 0.25276153346328784 and Recall: 0.39512442864398173 


In [None]:
#LDA

In [9]:
sample_data = files['train_neg'][0:10][13]
sample_data

0                                        What?????????
1         I wanted the gift car4 for MUSIC not movies.
2    I had no idea that the amount is listed in US ...
3    ordered this for a going away party thinking i...
4    The giftcard themselves weren't the problem - ...
5    I did not order this please cancel and return ...
6    I wanted this gift card applied to my account,...
7    I had great difficulty in getting my gift card...
8    Card Didn't come to me until the next day, and...
9    Did not authorise payment. My account was hacked.
Name: 13, dtype: object

In [None]:
freq = {}
for sentence, y in zip(sentences, ys):
    if not self.isNaN(sentence):
        for _word in self.process_text(sentence,):
            pair = (_word, y)
            freq[pair] = freq.get(pair, 0) + 1
freq

In [5]:
for sd in sample_data:
    for _word in self.process_text(sentence,):
        print()
        

NameError: name 'sample_data' is not defined

In [15]:
import gensim.corpora as corpora

In [17]:
corpora.Dictionary(sample_data[0])

TypeError: doc2bow expects an array of unicode tokens on input, not a single string

In [26]:
data_words = sample_data.values.tolist()
data_words

['What?????????',
 'I wanted the gift car4 for MUSIC not movies.',
 'I had no idea that the amount is listed in US dollars. I ended purchasing US $100 instead of CAD $100, which meant extra $30CAD.',
 "ordered this for a going away party thinking it should be a quick and easy process.  two hours later i called customer service as it still wasn't &#34;authorized&#34;.  i was told it would be three hours maximum by the rep.  three and a half hours later i called back and was told it may be a full day.  unfortunately, it's not clear that this there is a significant delay when purchasing these.  otherwise, i wouldn't have used this service.",
 "The giftcard themselves weren't the problem - they were cool, and used long ago.<br />The hitch was I bought the cards on Amazon's 20th anniversary, for a special deal that entitled me to an extra 10$ free, and till this day I have seen no  indication in my Amazon account of the 10$ free coupon.",
 'I did not order this please cancel and return my m

In [4]:
sample_data.values.tolist()[:1][0][:30]

NameError: name 'sample_data' is not defined

In [37]:
from utils.text_processing import TextProcessing
t = TextProcessing()

def corpus_id2word_mapping():
    clean_list = []
    for sentence in data_words:
        if not self.isNaN(sentence):
            clean_text = self.process_text(sentence,)
            if clean_text != []:
                clean_list.append(clean_text)
    
    id2word = corpora.Dictionary(clean_list)
    texts = clean_list
    corpus = [id2word.doc2bow(text) for text in texts]
    return id2words, corpus

[['wanted', 'gift', 'car4', 'music', 'movie'], ['idea', 'amount', 'listed', 'u', 'dollar', 'ended', 'purchasing', 'u', '100', 'instead', 'cad', '100', 'meant', 'extra', '30cad'], ['ordered', 'going', 'away', 'party', 'thinking', 'quick', 'easy', 'process', 'two', 'hour', 'later', 'called', 'customer', 'service', 'still', "n't", '34', 'authorized', '34', 'told', 'would', 'three', 'hour', 'maximum', 'rep.', 'three', 'half', 'hour', 'later', 'called', 'back', 'told', 'may', 'full', 'day', 'unfortunately', "'s", 'clear', 'significant', 'delay', 'purchasing', 'otherwise', 'would', "n't", 'used', 'service'], ['giftcard', "n't", 'problem', 'cool', 'used', 'long', 'ago.', 'br', 'hitch', 'bought', 'card', "'s", '20th', 'anniversary', 'special', 'deal', 'entitled', 'extra', '10', 'free', 'till', 'day', 'seen', 'indication', 'account', '10', 'free', 'coupon'], ['order', 'please', 'cancel', 'return', 'money'], ['wanted', 'gift', 'applied', 'account', 'find', 'way', 'help', 'available', 'phone', 'n

In [51]:
clean_list = []
for sentence in sample_data:
    if t.process_text(sentence,):
        clean_list.append(t.process_text(sentence,))
clean_list

[['wanted', 'gift', 'car4', 'music', 'movie'],
 ['idea',
  'amount',
  'listed',
  'u',
  'dollar',
  'ended',
  'purchasing',
  'u',
  '100',
  'instead',
  'cad',
  '100',
  'meant',
  'extra',
  '30cad'],
 ['ordered',
  'going',
  'away',
  'party',
  'thinking',
  'quick',
  'easy',
  'process',
  'two',
  'hour',
  'later',
  'called',
  'customer',
  'service',
  'still',
  "n't",
  '34',
  'authorized',
  '34',
  'told',
  'would',
  'three',
  'hour',
  'maximum',
  'rep.',
  'three',
  'half',
  'hour',
  'later',
  'called',
  'back',
  'told',
  'may',
  'full',
  'day',
  'unfortunately',
  "'s",
  'clear',
  'significant',
  'delay',
  'purchasing',
  'otherwise',
  'would',
  "n't",
  'used',
  'service'],
 ['giftcard',
  "n't",
  'problem',
  'cool',
  'used',
  'long',
  'ago.',
  'br',
  'hitch',
  'bought',
  'card',
  "'s",
  '20th',
  'anniversary',
  'special',
  'deal',
  'entitled',
  'extra',
  '10',
  'free',
  'till',
  'day',
  'seen',
  'indication',
  'acco

In [38]:
import gensim.corpora as corpora
# Create Dictionary
id2word = corpora.Dictionary(l)
# Create Corpus
texts = l

In [39]:
# Term Document Frequency
corpus = [id2word.doc2bow(text) for text in texts]

In [47]:
corpus[:3]

[[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1)],
 [(5, 2),
  (6, 1),
  (7, 1),
  (8, 1),
  (9, 1),
  (10, 1),
  (11, 1),
  (12, 1),
  (13, 1),
  (14, 1),
  (15, 1),
  (16, 1),
  (17, 2)],
 [(16, 1),
  (18, 1),
  (19, 2),
  (20, 1),
  (21, 1),
  (22, 1),
  (23, 2),
  (24, 1),
  (25, 1),
  (26, 1),
  (27, 1),
  (28, 1),
  (29, 1),
  (30, 1),
  (31, 1),
  (32, 3),
  (33, 2),
  (34, 1),
  (35, 1),
  (36, 2),
  (37, 1),
  (38, 1),
  (39, 1),
  (40, 1),
  (41, 1),
  (42, 1),
  (43, 2),
  (44, 1),
  (45, 1),
  (46, 1),
  (47, 2),
  (48, 2),
  (49, 1),
  (50, 1),
  (51, 1),
  (52, 2)]]

In [49]:
import gensim
# number of topics
num_topics = 2
# Build LDA model
lda_model = gensim.models.LdaMulticore(corpus=corpus,
                                       id2word=id2word,
                                       num_topics=num_topics)


In [50]:
# Print the Keyword in the 10 topics
print(lda_model.print_topics())
doc_lda = lda_model[corpus]

[(0, '0.028*"n\'t" + 0.017*"gift" + 0.016*"day" + 0.014*"hour" + 0.014*"u" + 0.013*"100" + 0.012*"purchasing" + 0.012*"amazon.com" + 0.012*"problem" + 0.011*"credit"'), (1, '0.023*"n\'t" + 0.021*"gift" + 0.016*"day" + 0.015*"account" + 0.013*"problem" + 0.013*"amazon.com" + 0.013*"10" + 0.012*"free" + 0.011*"hour" + 0.011*"please"')]


In [55]:
lda_model.

<gensim.interfaces.TransformedCorpus at 0x7f10f21b06a0>

In [1]:
import app

lad_model_result = app.lda_model_results()

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [3]:
lad_model_result

[[(0, 0.6989875), (1, 0.3010125)],
 [(0, 0.87352437), (1, 0.12647563)],
 [(0, 0.9606775), (1, 0.03932252)],
 [(0, 0.9661594), (1, 0.03384058)],
 [(0, 0.10467775), (1, 0.89532226)],
 [(0, 0.32801884), (1, 0.67198116)],
 [(0, 0.96472716), (1, 0.035272796)],
 [(0, 0.80621207), (1, 0.19378799)],
 [(0, 0.87645024), (1, 0.1235498)],
 [(0, 0.32614994), (1, 0.67385006)],
 [(0, 0.24543743), (1, 0.75456256)],
 [(0, 0.7726747), (1, 0.22732532)],
 [(0, 0.6482957), (1, 0.3517043)],
 [(0, 0.7268328), (1, 0.27316716)],
 [(0, 0.5046476), (1, 0.49535236)],
 [(0, 0.10302207), (1, 0.89697796)],
 [(0, 0.28329447), (1, 0.7167055)],
 [(0, 0.19362716), (1, 0.8063729)],
 [(0, 0.42064485), (1, 0.5793551)],
 [(0, 0.69838434), (1, 0.30161566)],
 [(0, 0.059221845), (1, 0.94077814)],
 [(0, 0.2867007), (1, 0.71329933)],
 [(0, 0.21949463), (1, 0.78050536)],
 [(0, 0.06126128), (1, 0.93873876)],
 [(0, 0.8692249), (1, 0.13077506)],
 [(0, 0.8597159), (1, 0.14028409)],
 [(0, 0.059304778), (1, 0.9406952)],
 [(0, 0.6462086

In [4]:
from utils import extractor
e = extractor.DataExtractor()
train_reviews, train_labels, test_reviews, test_labels = e.process_freq_text()

In [5]:
test_labels

[1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0

In [13]:
final_prediction = []
for item in lad_model_result:
    if item[0][1] > item[1][1]:
        i = 1
    else:
        i = 0
    final_prediction.append(i)
final_prediction

IndexError: list index out of range

In [35]:
len(lad_model_result[0])

2

In [50]:
pred=[]
for item in lad_model_result:
    if isinstance(item, int):
        i = -1
    if not isinstance(item, int) and len(item)<2:
        v = item[0]
        if v ==0:
            i = 1
        else: 
            i = 0
    if not isinstance(item, int) and len(item)==2:
        if item[0][1] > item[1][1]:
            i = 1
        else:
            i = 0
    pred.append(i)
print(pred)

[1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, -1, 1, 1, 0, 1, 0, 0, 1,

In [51]:
len(pred)

43665

In [52]:
len(test_reviews)

43665

In [33]:
lad_model_result[222]

[(0, 0.9926118)]

In [34]:
test_reviews[222]

"I have a horrible experience with this offer. After receiving the promotion code $20 off to must be used by June 28, 2015 I went to use it per instructions. The code did not work. I tried taking the dash off the code still did not work.<br /><br />I called and spoke with a lady who treated me like I am nobody. First she did not know what being asked. I could hear from the background how someone was telling what to say which was all wrong.<br /><br />I told her the email I received simply states must be used by June 28, 2015. This simply means I have from now until June 28, 2015 to use it.<br /><br />She kept on fighting with me verbally, kept asking have I read the promotional offer, did I read the words must be used.<br /><br />I really am shocked why did she define her own promotional offer.<br /><br />This is what the email from Amazon stated,<br />Additional information on this offer can be found here.<br /><br />Thank you for treating someone to an Amazon.com Gift Card. Now we're