In [None]:
# This R environment comes with many helpful analytics packages installed
# It is defined by the kaggle/rstats Docker image: https://github.com/kaggle/docker-rstats
# For example, here's a helpful package to load

library(tidyverse) # metapackage of all tidyverse packages

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

list.files(path = "../input")

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [3]:
# import the necessary packages
library(rsparse)
library(Rcpp)
library(text2vec)
library(glmnet)
library(pROC)

## Explaination: Vocabulary list generatation

* Step 1:
    > Since professor has allowed us to use alldata.csv file to generate words, the file alldata.csv is first read as a table.
    
    > As with any NLP task, first we remove punctuations and stopwords. Custom stopwords are added.
   
* Step 2:
    > We use itoken from text2vec package to generate vector representation of text data from review column of alldata.csv.
    
    > We make sure that each term we generated after vectorization is repeated at least 10 times. Finally we will create a document term matrix representation of reveiws.
    
* Step 3:
    > After document term matrix is created, we fit a lasso regression using the dtm_train as an input and binary sentiment value as our target variable.
    
    > Since we want our myvocab size to be under 1000, we then choose the value of number of non-zero features. Here, in this case, these are the words or n-grams.
    
    > We select the df as 36 which has 976 non-zero features. We then save these term titles into myvocab.txt file.

In [4]:
# use all words
train = read.table("../input/imdbmovies/alldata.tsv", stringsAsFactors = FALSE,
                  header = TRUE)
train$review = gsub('<.*?>', ' ', train$review)

In [5]:
stop_words = c("i", "me", "my", "myself", 
               "we", "our", "ours", "ourselves", 
               "you", "your", "yours", 
               "their", "they", "his", "her", 
               "she", "he", "a", "an", "and",
               "is", "was", "are", "were", 
               "him", "himself", "has", "have", 
               "it", "its", "the", "us")
it_train = itoken(train$review,
                  preprocessor = tolower, 
                  tokenizer = word_tokenizer)
tmp.vocab = create_vocabulary(it_train, 
                              stopwords = stop_words, 
                              ngram = c(1L,4L))
tmp.vocab = prune_vocabulary(tmp.vocab, term_count_min = 10,
                             doc_proportion_max = 0.5,
                             doc_proportion_min = 0.001)
dtm_train  = create_dtm(it_train, vocab_vectorizer(tmp.vocab))  

In [6]:
set.seed(9021)
tmpfit = glmnet(x = dtm_train, 
                y = train$sentiment, 
                alpha = 1,
                family='binomial')
tmpfit$df

In [7]:
which(tmpfit$df==976)

36

In [8]:
myvocab = colnames(dtm_train)[which(tmpfit$beta[, 36] != 0)]

In [9]:
myvocab[1:10]

'half_hearted''give_4''scriptwriters''this_dull''bad_at_all''dorm''liked_movie''blew_away''isn\'t_for''stopped_watching'

In [10]:
write.table(myvocab, file = 'myvocab.txt', row.names = FALSE)

## Interpretability of our algorithm

* We have used ridge regression penalty with logistic regression. Since there are only 976 words used, we need to check the top words belonging to each sentiment (positive and negative)

* Below we will run the algorithm again and get the coefficient values.

* The lower the coefficient value, the stronger the weight it has for a negative sentiment and vice-versa for the positive if the coefficient value is high (above zero)

* Top five words for positive sentiment are: 7_10, should_been, marvellous, only_compliant, 7_out
* Top five words for negative sentiment are: scriptwriters, 4_10, had_high, if_director, half_hearted.

* These can be verified from the coefficient value (higher for positive sentiment, lower(negative) for negative sentiment)

In [16]:
# create train-test set for first split

data <- read.table("../input/imdbmovies/alldata.tsv", stringsAsFactors = FALSE,
                  header = TRUE)
testIDs <- read.csv("../input/rrrrrr/project3_splits1.csv", header = TRUE)
for(j in 1){
#   dir.create(paste("split_", j, sep=""))
  train <- data[-testIDs[,j], c("id", "sentiment", "review") ]
  test <- data[testIDs[,j], c("id", "review")]
  test.y <- data[testIDs[,j], c("id", "sentiment", "score")]
  
  tmp_file_name <- paste("train.tsv", sep="")
  write.table(train, file=tmp_file_name, 
              quote=TRUE, 
              row.names = FALSE,
              sep='\t')
  tmp_file_name <- paste("test.tsv", sep="")
  write.table(test, file=tmp_file_name, 
              quote=TRUE, 
              row.names = FALSE,
              sep='\t')
  tmp_file_name <- paste("test_y.tsv", sep="")
  write.table(test.y, file=tmp_file_name, 
            quote=TRUE, 
            row.names = FALSE,
            sep='\t')
}

In [17]:
#####################################
# Load your vocabulary and training data
#####################################
myvocab <- scan(file = "myvocab.txt", what = character())
train <- read.table("train.tsv", stringsAsFactors = FALSE,
                   header = TRUE)

#####################################
#
# Train a binary classification model
#

train = read.table("train.tsv",
                   stringsAsFactors = FALSE,
                   header = TRUE)
 train$review <- gsub('<.*?>', ' ', train$review)
 it_train = itoken(train$review,
                    preprocessor = tolower, 
                    tokenizer = word_tokenizer)
 vectorizer = vocab_vectorizer(create_vocabulary(myvocab, 
                                                  ngram = c(1L, 2L)))
 dtm_train = create_dtm(it_train, vectorizer)


fit1 = cv.glmnet(x = dtm_train, 
                y = train$sentiment, 
                alpha = 0,
                family='binomial')

#####################################


test <- read.table("test.tsv", stringsAsFactors = FALSE,
                    header = TRUE)

#####################################
# Compute prediction 
# Store your prediction for test data in a data frame
# "output": col 1 is test$id
#           col 2 is the predited probabilities


 test$review <- gsub('<.*?>', ' ', test$review)
 it_test = itoken(test$review,
                    preprocessor = tolower, 
                    tokenizer = word_tokenizer)
 vectorizer = vocab_vectorizer(create_vocabulary(myvocab, 
                                                  ngram = c(1L, 2L)))
 dtm_test = create_dtm(it_test, vectorizer)


output = predict(fit1, dtm_test, s=fit1$lambda.min, type = 'response')
test$prediction = output
output = data.frame(test$id, test$prediction)
names(output) = c('id', 'prob')

#####################################

write.table(output, file = "mysubmission.txt", 
            row.names = FALSE, sep='\t')

In [18]:
# evaluation
# move test_y.tsv to this directory
test.y <- read.table("test_y.tsv", header = TRUE)
pred <- read.table("mysubmission.txt", header = TRUE)
pred <- merge(pred, test.y, by="id")
roc_obj <- roc(pred$sentiment, pred$prob)
pROC::auc(roc_obj)

In [19]:
coef(fit1, s = fit1$lambda.min)

In [None]:
978 x 1 sparse Matrix of class "dgCMatrix"
                                 1
(Intercept)           -0.002452839
1                     -0.115234979
10_10                  1.151319123
10_out                 1.236103672
10_out_of_10           .          
1_10                  -1.305352251
1_2_from               .          
1_out                 -1.792226977
2                     -0.086142000
2_10                  -1.839029803
2_out                 -1.199033035
2_out_of_10            .          
3                     -0.269978203
3_10                  -1.882366432
3_out_of_10            .          
4                     -0.451505424
4_10                  -2.202799304
4_out_of_10            .          
7                      0.771237800
7.5                    2.041016232
7_10                   2.564275569
7_out                  2.254550555
7_out_of               .          
8                      0.484595353
80_s                   0.356391098
8_10                   1.780021997
8_out                  1.560462411
8_out_of               .          
9                      0.300570601
9_10                   0.849951746
9_out_of_10            .          
able_to                0.200532704
able_to_see            .          
about_as              -0.459270572
about_time             0.447304242
absolutely_no         -0.356910618
absolutely_nothing    -0.645991718
absorbing              1.028969184
absurd                -0.432707708
accent                -0.265093846
acting                -0.134583687
actors                -0.095315472
adds                   0.490156601
again                  0.063075637
again_again            0.405616702
ages                   0.147209799
all_in                 0.251273442
alright               -0.554948934
also                   0.077163580
although               0.091311277
always                 0.138926457
amateurish            -0.979244645
amazing                0.555314561
animal                -0.463423881
annoyed               -0.285720964
annoying              -0.598524026
any                   -0.098705459
any_of                -0.171175445
any_of_characters      .          
anyone_interested      1.226456806
anything              -0.149804347
apartment              0.203317362
appalling             -0.682141482
apparently            -0.197604625
appears               -0.232784147
appreciate             0.426752092
appreciated            0.604261362
as_good_as             .          
as_if                 -0.207763312
as_much_as             .          
ashamed               -0.708186996
asleep                -0.442252081
at_all                -0.404337478
at_best               -0.839577823
at_end                -0.490610285
at_first               0.514500578
at_same                .          
at_time                0.535208846
at_times               0.229319573
atmosphere             0.374551938
atrocious             -0.857692019
attempt               -0.396868488
attempt_at            -0.341619986
attention              0.194510043
available              0.438873416
avoid                 -0.668086715
avoid_this            -1.173971042
awesome                0.573889399
awful                 -0.877780095
b_movie               -0.334358685
back_enjoy             1.676529463
back_in                0.298693495
bad                   -0.315789928
bad_at_all             .          
bad_movie             -0.462630835
bad_reviews            0.573057280
bad_thing              1.156163806
badly                 -0.543683788
baldwin               -0.684990357
barely                -0.470551776
based_on_true_story    .          
basically             -0.326041193
be_comedy             -1.139064016
be_disappointed        0.621371316
be_funny              -0.393109104
be_good               -0.585453228
be_missed              1.555693346
beautiful              0.247572513
beautifully            0.587019898
beauty                 0.278008103
been_great            -0.693212907
beginning_to           0.369197102
bela_lugosi           -1.380695809
believable             0.472386304
believe_that          -0.184031429
below                 -0.437726627
below_average         -1.584774171
best                   0.361447657
best_part             -0.691057449
better                -0.136632921
better_than            0.433921521
bible                 -0.442029045
bit                    0.147298501
bland                 -0.490211308
blew_away              2.401444521
blown_up              -1.788257379
bond                   0.187921008
book                  -0.138394172
bore                  -0.963736975
bored                 -0.356517258
boredom               -0.766586841
boring                -0.555458439
both                   0.160542913
bother                -0.331998644
bothered_to           -0.872452495
bravo                  1.399750043
brazil                 0.321806908
brilliant              0.571509669
brilliantly            1.136071020
brings                 0.226663566
brutal                 0.386422702
bugs                   0.308173335
bunch_of              -0.183996974
but_not_much           .          
but_sadly             -0.692107685
but_still              0.261867691
but_this              -0.191345666
camera                -0.260419062
can                    0.069305634
can't_wait             0.754573365
can_relate             1.233381338
can_watch              0.465335337
captures               0.393440165
cardboard             -0.656767351
careful                0.233175229
carries                0.641215418
cash_in               -0.844397647
catch                  0.169077406
caught                 0.415020629
cerebral               0.884690487
chance                 0.395591482
charming               0.336914588
cheap                 -0.613158690
check_this             0.768556196
chilling               0.769263616
christian             -0.387938050
christians            -0.492384899
city                   0.150611043
classic                0.194265850
cleverly               0.287527919
cliché                -0.500251191
come_on               -0.308972526
comedies               0.395226410
comic                  0.167716802
compelling             0.407339879
complaints             0.470180445
completely            -0.152346238
complex                0.387596602
confusing             -0.419909161
consequences           0.221049337
could                 -0.089995460
could_been            -1.347340107
couldn't              -0.191174309
country's              1.538108334
courage                0.445335973
crap                  -0.353888818
crappy                -0.582936016
credibility           -0.836315404
cried                  0.430660708
day                   -0.023363293
deals_with             0.162769021
death                  0.040033536
decent                -0.184704729
definite               0.180783271
definitely             0.296497899
definitely_recommend   1.398679872
definitely_worth       1.615305472
deliciously            1.663394366
delightful             0.727933547
delivers               0.410844961
deserves               0.519703315
detract                0.791198146
did_not               -0.188677306
didn't                -0.092695532
different              0.226363038
dimensional           -0.586353564
director              -0.195596269
dirty_harry            0.586175585
disappoint             0.713642048
disappointed          -0.302025017
disappointing         -0.683776094
disappointment        -1.089633228
disaster              -0.218168809
disjointed            -1.170242895
distasteful           -1.239773262
do_not_watch_this      .          
doesn't_even          -0.021121718
doesn't_help          -1.263292670
doesn't_make          -0.733336365
doesn't_work          -0.684067293
don't                 -0.058838251
don't_even            -0.242044175
don't_miss             1.533498169
don't_want             0.489330435
dorm                  -1.468820683
downhill              -0.766713779
draws                  0.362889535
dreadful              -0.919289261
dreams                 0.163292454
drivel                -0.710418144
dull                  -0.803880001
dumb                  -0.258119689
dvd                    0.107200370
dvd_cover             -1.233915924
each                   0.134912138
easy                   0.276917425
edge                   0.413117729
eerie                  0.926616411
effective              0.332608797
effort                -0.219724340
either                -0.218194692
elsewhere             -0.711552783
email                  0.906792888
embarrassed           -0.809315192
embarrassing          -0.700067583
embarrassingly        -1.571930969
embarrassment         -0.952012497
emotions               0.420186640
endless               -0.981826983
energetic              0.903442794
enjoy                  0.386612537
enjoyable              0.538354013
enjoyed                0.227060778
enjoyed_this           1.186167093
entertaining           0.360851063
entertains             0.983158651
episode                0.090320445
episodes               0.283556065
especially             0.296760515
even                  -0.051246391
even_better            1.008002315
even_close_to          .          
even_that             -1.262551599
even_though            0.110229076
every_day              0.908070787
every_time_watch       .          
excellent              0.717911133
excellently            1.230197622
except                -0.237918353
exceptional            0.893741533
excuse                -0.427815251
existent              -0.856914227
experience             0.250150466
extraordinary          0.745467563
fabulous               0.184303520
failed                -0.443580372
fails                 -0.731476675
family                 0.041481258
fantastic              0.619758726
far_too               -0.373470805
fascinating            0.578563251
fast_forward          -1.091813156
fast_paced             0.438377257
favorite               0.626926193
favorite_movies        1.446849825
favorites              0.512258843
favourite              0.588373943
fears                  0.749295555
feel                   0.086683107
fell_asleep           -0.866098136
felt_like             -0.207190342
few_good              -0.979737949
few_laughs            -1.216932150
film_in                0.260166816
film_school           -0.965805035
filmmakers            -0.180961668
find                   0.033475258
fine                   0.069315437
fine_film              0.778582222
finest                 0.586119955
first_rate             0.796514455
first_saw              0.618716958
first_saw_this         .          
first_time             0.545086513
flat                  -0.201583791
flawless               0.263656028
flimsy                -0.886202505
follows                0.160944622
for_actors            -0.675075415
for_everyone           1.049820459
for_first_time         .          
for_free              -0.817253238
for_once               1.125645723
for_this              -0.240997867
for_those              0.225993742
forgettable           -1.344564620
fortunate              0.400820623
forward_to_this        .          
frank                  0.193148715
frankly               -0.159062860
freedom                0.535021513
fresh                  0.275979916
friendship             0.336330864
from_original         -1.475881881
from_outer_space       .          
fun                    0.271515196
funniest               0.671094385
future                 0.250003368
garbage               -0.492870183
gem                    0.793109927
genius                 0.320572680
get_copy               .          
give_4                -0.632117763
gives                  0.240250628
glad                   0.375560372
glad_did               .          
glued                  1.973848961
good                   0.060854682
good_as                0.270109389
good_film              0.165759728
gorgeous               0.384321407
grade                 -0.409406506
grade_b                1.760580064
grade_d               -1.806040799
great                  0.396908433
great_but             -0.777133169
great_job              0.448880488
great_movie            0.376100825
greatest               0.275427262
gripping               0.521254913
gritty                 0.717976276
guess                 -0.154424994
guilt                  0.211316155
hackneyed             -0.623187949
had_high              -1.808430777
hairy                 -1.466515310
half                  -0.141737345
half_hearted          -1.100808084
haunting               0.722249765
heart                  0.140223608
helps                  0.077420071
highly                 0.452618865
highly_recommend       0.711311605
highly_recommend_this  .          
highly_recommended     1.256693853
hilarious              0.807444864
holly                  0.554439988
honestly              -0.308981969
hooked                 1.097818858
hoot                   0.646706925
hoping                -0.713294016
horrible              -0.480696635
horribly              -0.495582299
hour                  -0.192908408
how_good               0.785998733
how_not               -1.036450187
how_not_to             .          
human                  0.287284881
hype                  -0.698033831
i'm_afraid            -1.038313105
idea                  -0.256519469
ideal                  0.422886311
identify               0.307130557
identity               0.120826109
idiotic               -0.631607696
if_director           -2.586484825
if_don't_like          .          
if_had                -1.062636091
if_must               -1.956372654
if_this               -0.487111885
if_want                0.234578014
if_want_to             .          
if_want_to_see         .          
impact                 0.443044016
impressed              0.786035872
in_black               0.535242353
in_grave              -0.183702799
in_lives              -0.120038075
in_way                -0.587039800
inaccurate            -0.678209189
inappropriate         -0.761743629
incoherent            -0.925897259
incredible             0.722593702
inept                 -0.636797878
information            0.030927978
innocent               0.422995905
instead               -0.354971615
insult                -0.520898003
insult_to             -0.656104430
insulting             -0.595520458
intense                0.404050623
interesting           -0.175580456
interesting_but       -0.638554855
irritating            -0.637370361
isn't_even            -0.768578343
isn't_for              1.259240228
isn't_very            -0.504546786
it's                   0.034642852
it's_just             -0.321766881
jack                   0.163325261
jackass                0.612864916
jean                   0.269659353
job                    0.151262552
joke                  -0.259074200
journey                0.263808265
joy_to                 0.524413008
junk                  -0.508952215
just                  -0.036978187
just_as                0.218494824
just_didn't           -0.522710222
just_doesn't          -0.123222826
just_enjoy             1.925785904
just_great             1.307805074
just_isn't            -1.538163819
just_not              -1.070417038
just_plain            -0.494633053
just_right             0.904886816
just_seems            -0.072577712
just_too              -0.325140621
just_wasn't           -0.665221261
keep                   0.195641404
keeps                  0.062100578
knowing                0.235667339
lack                  -0.074546993
lack_of               -0.246133997
lacked                -0.848025774
lacking               -0.457268611
lacks                 -0.763483454
ladder                 1.021131925
lame                  -0.478432457
last                   0.175153098
later                  0.064825981
laughable             -0.923112628
least                 -0.165227020
left                  -0.137973704
let_down              -0.410499739
letdown               -1.301552955
life                   0.076762479
lifeless              -1.343413024
light                  0.233949519
like_watching         -0.294467784
liked                  0.282403965
liked_movie            0.352802484
liked_this             0.649157747
little_else           -1.690918357
little_film            0.891902216
little_movie           0.261157296
little_slow            2.143103638
little_to             -0.420433248
lives                  0.139129138
look_forward           0.667005476
look_like             -0.364128516
looked                -0.278822858
looking               -0.143441175
looking_forward       -0.863633952
looks                 -0.205902574
looks_as_if            .          
looks_at               1.255117546
looks_like            -0.114888445
lost_interest         -1.753954498
lot                    0.135280896
lot_about              1.108359535
lot_of_fun             .          
lot_of_things          .          
lousy                 -1.247468678
love                   0.102725232
love_this              0.747724891
loved                  0.445228614
loved_this             1.004192125
low                   -0.221693518
ludicrous             -0.459072471
made_this             -0.246756832
magnificent            0.130423070
main_problem          -0.616689629
mainstream             0.575851031
make_movie             0.438998841
makes                  0.252439481
many                   0.042652502
marvellous             2.479886521
marvelous              0.605644912
masterpiece            0.535590913
match                  0.018261486
material              -0.448360170
may                    0.047080086
may_not                0.583394873
may_not_be             .          
maybe                 -0.061107944
maybe_if              -0.079782170
mcdowell              -1.151550738
mean                  -0.084506491
mediocre              -0.766594305
meets                  0.157009014
memorable              0.121414938
mess                  -0.550806496
might_been             .          
might_enjoy           -1.521228867
mildly                -0.894325592
minutes               -0.238643215
minutes_into          -0.961985477
minutes_of            -0.450873882
miscast               -0.672911462
money                 -0.216880565
more                   0.035013705
more_movies            0.947159064
most                   0.121968847
most_people            0.301577158
motions               -1.590213935
moved                  0.223154401
moves_along            0.908548000
movie_just            -0.472700853
moving                 0.252335633
mst3k                 -1.209811720
much_as                0.545372526
much_better           -0.334743567
much_better_than       .          
muddled               -0.827432348
music                  0.078147832
must_for               1.424684471
must_see               1.358056658
mystery_science       -1.629454452
neatly                 1.490010407
negative_comments      2.105662942
neither               -0.433806501
nevertheless           0.794907639
new                    0.098699653
nice                   0.122977133
nicely                 0.486468597
no                    -0.108463676
no_other               0.803128018
no_sense              -0.766722061
noir                   0.625087957
none_of               -0.314121792
nonetheless            0.458285802
nonsense              -0.585829833
not_disappointed       1.360895905
not_enough            -0.352357898
not_enough_to          .          
not_even              -0.512265320
not_funny             -1.037483633
not_good              -0.928659024
not_in_good            .          
not_make              -0.314114849
not_much              -0.369425822
not_one               -0.502457879
not_one_of             .          
not_recommend         -2.157802498
not_recommended       -1.561126539
not_that              -0.351761258
not_typical            0.226470213
not_very              -0.933154128
not_very_good          .          
not_worth             -1.169843704
nothing               -0.353153213
nowhere               -0.120562258
obnoxious             -0.568585693
obvious               -0.222667905
of_best               -0.395317265
of_course              0.082043164
of_fun                 0.252760326
of_funniest            1.290100542
of_most               -0.077028912
of_movie               0.198336486
of_seat                1.043721658
of_war                 0.670275866
of_what                0.382457475
of_worst              -1.285033171
off                   -0.111771797
offensive             -0.367367909
often                  0.106367526
oh                    -0.360647176
ok                    -0.242870223
okay                  -0.409863748
olds                  -1.595702806
on_dvd                 0.670757118
on_edge_of             .          
on_plus_side           .          
one_of                 0.128949745
one_of_best            .          
one_of_better          .          
one_of_worst           .          
one_star              -0.744717672
only                  -0.115217101
only_complaint         1.441791741
only_good             -0.836074453
only_problem           1.629192826
only_reason            0.055845945
only_reason_to         .          
only_saving           -1.848616325
only_thing            -0.219504159
or_dvd                 1.227949523
or_hate                2.730211289
or_something          -0.334971177
original              -0.211156299
oscar                  0.110401797
others                 0.072547371
otherwise             -0.388623450
outstanding            0.572702885
own                    0.163898166
packed                 0.328465943
paid                  -0.479542264
painful               -0.626577386
paint                 -0.643277106
paramount              0.456790854
pass                  -0.242913974
pathetic              -0.692344146
penalty                0.772256582
perfect                0.629606640
perfection             0.378636633
perfectly              0.401571756
performance            0.065607114
performance_as         0.336336249
performances           0.196780119
performances_by        0.585480359
played                 0.059427736
pleasant               0.403980773
pleasantly_surprised   0.984375135
pleased                0.904655372
pleasure               0.396985983
plenty                 0.130399263
plenty_of              0.213335513
plodding              -1.338620978
plot                  -0.124843815
pointless             -0.804575664
polished               0.965228578
poor                  -0.564684372
poorly                -0.952138394
portrayal              0.252821829
potential             -0.381616016
potential_but         -0.821435246
power_of               0.425320896
powerful               0.524699785
predictable           -0.485924007
predictable_but        0.843144672
premise               -0.123346233
present                0.236995818
pretentious           -0.895182240
pretty_good            0.327190567
problem               -0.265304060
producers             -0.285114059
promising             -0.423383900
propaganda            -0.554424420
provides               0.281812052
put_off                1.086304480
quiet                  0.411620582
quite                  0.121009084
rare                   0.491897051
raw                    0.252567988
read_book             -1.913269494
realistic              0.366321668
reality                0.171405028
really_good            0.430639558
reason                -0.254398069
reason_for            -0.500591833
reason_to_watch_this   .          
recommend              0.210507459
recommend_this         0.353378320
recommend_to          -0.002345549
recommended            0.412243727
redeeming             -0.933862274
refreshing             1.262704750
relationship           0.272467294
release                0.329555595
remarkable             0.247259649
remotely              -0.756394912
rent                  -0.198507897
rented                -0.479661950
research              -0.634588691
rest_of_movie          .          
restored               0.699112435
revolting             -1.241270343
ride                   0.325106191
ridiculous            -0.606636415
rip_off               -0.329690773
rip_off_of             .          
riveting               0.502259937
rubbish               -0.567103665
ruined                -0.567904992
sadistic              -0.864562817
sadly                 -0.462741642
satisfying             1.005627099
save                  -0.519753994
save_money            -1.443078259
save_this             -0.878947193
saving                -0.434776592
saw                    0.035601408
saw_this               0.208508024
scientist             -0.441827773
screaming             -0.362796400
screenwriter          -0.667728139
script                -0.269272591
scriptwriters         -0.162596125
seagal                -0.281715586
season                 0.288181736
see                    0.098522988
see_again              0.367815091
see_any               -0.817536518
seemed                -0.235111396
seems                 -0.127841047
seen                   0.164809648
sensitive              0.665296635
series                 0.120305170
shallow               -0.542453132
shame                 -0.213431458
shelf                 -0.390896100
shots_of              -0.348406374
should                -0.115758109
should_been            1.958541811
should_never          -0.787081601
should_see             0.337107863
shows                  0.184990167
silly                 -0.321359750
simple                 0.308566760
simplicity             0.414869926
simply_not            -0.527838205
sinatra                0.156475316
sit_back               0.916362273
sit_through           -0.550902403
skip_this             -1.333346364
small                  0.142557616
so_bad                -0.469984610
so_disappointed       -1.359466567
so_well                0.056530530
society                0.132011283
solid                  0.636582313
some_kind_of           .          
someone               -0.181576677
something_better      -1.447662702
sometimes              0.115564098
sorry                 -0.563532748
spectacular            0.475738774
spinal_tap             0.865266809
splendid               1.022406770
spot_on                1.421388116
start_to               0.561259265
started_out           -0.709056597
stay_away             -0.526333502
steals                 0.839009191
steals_show            .          
stereotyped           -0.855076304
stick_to              -0.657492344
still                  0.200084470
still_this             0.966288738
stilted               -0.981260768
stinker               -1.293789024
stock_footage         -1.130952649
stopped_watching      -0.690170438
story_of               0.123866784
story_told             1.072292926
strong                 0.398249689
stunning               0.444376443
stupid                -0.301702854
stupidity             -0.436441386
sub_par               -0.753435572
subtitles              0.541071586
subtle                 0.741788758
succeeds               0.585432598
sucked                -0.394588511
sucks                 -0.581073552
superb                 0.965210584
superbly               0.938144710
suppose               -0.297019142
supposed              -0.406590330
supposedly            -0.635144675
surprised              0.370701435
surprising             0.562599097
surprisingly           0.585633368
surprisingly_good      0.825116959
surreal                0.763193478
sweet                  0.486198792
takes                  0.023826631
tale                   0.095497064
tears                  0.220731785
tedious               -1.073589817
tells                  0.332408198
terrible              -0.665479837
terrific               0.665389057
than_any               0.300533483
than_expected          0.466219361
than_many              0.754092624
than_this             -0.529148798
thank                  0.383948488
thanks                 0.451104993
that's_about          -0.900651659
that_would            -0.558857074
there_some_good        .          
thin                  -0.469486974
thing                 -0.126037722
think                  0.121016485
this_best              .          
this_crap             -1.071137332
this_dull             -2.420687666
this_excellent         1.197930605
this_fine              0.631659336
this_fun               3.220766346
this_great             1.152727266
this_just             -0.601291654
this_mess             -0.921252516
this_piece            -0.881631727
this_piece_of          .          
this_short             0.541828824
this_thing            -0.859162511
this_travesty         -1.845355446
this_turkey           -1.325352433
this_very_good         .          
those_of               0.384237712
thought_provoking      1.068430364
through_this          -0.399209258
thumbs_up              0.995340715
time                   0.050852964
times                  0.088564527
tired                 -0.417351098
tiresome              -1.067552228
to_all                 0.261597791
to_be                 -0.063353118
to_believe            -0.274196083
to_believe_that        .          
to_help                0.005205766
to_make               -0.107270841
to_recommend          -0.809015140
to_see_again           .          
to_sit_through         .          
to_work_with           .          
today                  0.488700374
together               0.155014419
tolerable             -1.318053589
too_hard              -0.307931663
too_many              -0.381745253
too_seriously          0.785321372
top_notch              0.682182862
total_lack            -1.601939412
touches                0.456223928
touching               0.594088445
traditional            0.378868051
tragic                 0.069164442
trash                 -0.373140351
treat                  0.416225413
tried                 -0.064010249
tries_to_be            .          
tripe                 -1.186017544
trite                 -1.072290215
troubled               0.750941812
true                   0.258630683
truly                  0.140124422
trying                -0.240229981
turd                  -0.718802364
turkey                -0.514870061
twists                 0.264482731
two_hours             -0.617372400
ugly                  -0.313533544
ultimate               0.594100602
unappealing           -1.465652410
unbelievable          -0.495009039
unconvincing          -0.989685686
underrated             0.798381598
understated            0.083760802
unexpected             0.535758993
unforgettable          0.927002225
unfortunately         -0.484904396
unfunny               -1.078956348
uninspired            -0.589034082
unintentional         -0.766335761
unintentionally       -0.481923877
uninteresting         -0.902104853
unique                 0.333553253
unless                -0.491938361
unlike                 0.315504702
unlikeable            -1.097061306
unrealistic           -0.277851970
unusual               -0.032095103
unwatchable           -1.165354553
urban                  0.128602074
value                 -0.226740589
van                   -0.298350137
vapid                 -0.637966148
very                   0.079882644
very_bad              -0.405209770
very_disappointed     -1.401106956
very_disappointing    -0.634548259
very_enjoyable         0.401352092
very_entertaining      0.827099973
very_funny             1.020814581
very_good              0.261434427
very_little           -0.251135611
very_moving            1.206309702
very_slow             -1.508620904
very_well              0.425710801
vhs                    0.239448633
victims               -0.407927024
voodoo                -0.805039874
walked                -0.348735345
walked_out            -0.385806569
walking_around        -1.744896758
wanted_to_like         .          
wanting_more           1.856629971
war                    0.093000751
warm                   0.402846719
wasn't                 0.028154394
waste                 -1.209511789
wasted                -0.568268289
wasting               -1.458508645
watch_over             1.803072355
watchable             -0.430439329
watching              -0.068908158
watching_this         -0.213926571
way_too               -0.594504847
ways                   0.317156298
weak                  -0.600045203
well                   0.086852875
well_done              0.227795605
well_worth             1.391171083
well_written           0.412813699
what_exactly          -0.560376772
what_makes             0.689818867
what_point            -0.550515055
what_point_of          .          
what_thinking         -0.031847343
whatsoever            -0.459271085
who                    0.015234720
who_made_this          .          
whole_film            -0.576235844
why                   -0.096353451
why_did               -0.437350484
wider                  1.178397759
will                   0.094861781
with_this             -0.005680718
without_being          0.493573306
witty                  0.412948362
women                 -0.182162677
won't_be_disappointed  .          
wonder                -0.256311263
wonderful              0.470467570
wonderfully            1.183656932
wooden                -0.661721848
work_in                0.266236102
works                  0.179473696
world                  0.059978893
world_of               0.119378128
worse                 -0.660158584
worst                 -0.959447166
worst_movie           -0.679622328
worth_seeing           0.493790245
worthless             -1.278071238
would                 -0.093988271
would've              -0.535459684
would_be_good          .          
would_been             0.810781638
would_love_to          .          
would_recommend        0.319158158
wouldn't              -0.094571680
wouldn't_recommend    -0.509168019
wrenching              0.540043564
write                 -0.394768810
writers               -0.549363095
writing               -0.107948883
wrong                 -0.232389541
x                      0.207281293
yawn                  -1.743585315
years                  0.078503948