### Student Information
Name:Muzi Phiwokuhle Gondwe

Student ID:106065433

### Instructions

- Download the dataset provided in this [link](https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences#). The sentiment dataset contains a `sentence` and `score` label. Read what the dataset is about on the link provided before you start exploring it. 


- Then, you are asked to apply each of the data exploration and data operation techniques learned in the [first lab session](https://goo.gl/Sg4FS1) on the new dataset. You don't need to explain all the procedures as we did in the notebook, but you are expected to provide some **minimal comments** explaining your code. You are also expected to use the same libraries used in the first lab session. You are allowed to use and modify the `helper` functions we provided in the first lab session or create your own. Also, be aware that the helper functions may need modification as you are dealing with a completely different dataset. This part is worth 80% of your grade!


- After you have completed the operations, you should attempt the **bonus exercises** provided in the [notebook](https://goo.gl/Sg4FS1) we used for the first lab session. There are six (6) additional exercises; attempt them all, as it is part of your grade (10%). 


- You are also expected to tidy up your notebook and attempt new data operations that you have learned so far in the Data Mining course. Surprise us! This segment is worth 10% of your grade.


- After completing all the above tasks, you are free to remove this header block and submit your assignment following the guide provided in the `README.md` file of the assignment's [repository](https://github.com/omarsar/data_mining_hw_1). 

In [3]:
# necessary for when working with external scripts
%load_ext autoreload
%autoreload 2

In [4]:
import pandas as pd
import numpy as np
import nltk
#from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
import plotly.plotly as py
import plotly.graph_objs as go
import math
import plotly.tools as tls
import cufflinks as cf
%matplotlib inline

# my functions
import helpers.data_mining_helpers as dmh
import helpers.text_analysis as ta

In [9]:
# 
from os import path
notebook_path = './'

In [10]:
#  reading the csv file into the data frame
df = pd.read_csv(path.join(notebook_path, 'sentiment labelled sentences', 
                             'dataset.csv'),dtype=object,header= 0, sep='|', engine='python')
df['Score'] = df['Score'].astype(str).astype(int)
df['Source'] = df['Source'].astype(str)




In [11]:
# checking the data types in the clomns
df.dtypes

Source      object
Sentence    object
Score        int64
dtype: object

In [12]:
len(df)

3000

In [7]:
# checking column headers of my dataframe
list(df)

['Source', 'Sentence', 'Score']

In [8]:
# query for the first 10 rows 
df[0:10]

Unnamed: 0,Source,Sentence,Score
0,amazon_cells,So there is no way for me to plug it in here i...,0
1,amazon_cells,"Good case, Excellent value.",1
2,amazon_cells,Great for the jawbone.,1
3,amazon_cells,Tied to charger for conversations lasting more...,0
4,amazon_cells,The mic is great.,1
5,amazon_cells,I have to jiggle the plug to get it to line up...,0
6,amazon_cells,If you have several dozen or several hundred c...,0
7,amazon_cells,If you are Razr owner...you must have this!,1
8,amazon_cells,"Needless to say, I wasted my money.",0
9,amazon_cells,What a waste of money and time!.,0


In [9]:
# query for the last 10 rows 
df[-11:-1]

Unnamed: 0,Source,Sentence,Score
2989,yelp,I would avoid this place if you are staying in...,0
2990,yelp,The refried beans that came with my meal were ...,0
2991,yelp,Spend your money and time some place else.,0
2992,yelp,A lady at the table next to us found a live gr...,0
2993,yelp,the presentation of the food was awful.,0
2994,yelp,I can't tell you how disappointed I was.,0
2995,yelp,I think food should have flavor and texture an...,0
2996,yelp,Appetite instantly gone.,0
2997,yelp,Overall I was not impressed and would not go b...,0
2998,yelp,"The whole experience was underwhelming, and I ...",0


In [10]:
# sampling any 10 rows
df.sample(n=10, replace=False)

Unnamed: 0,Source,Sentence,Score
2453,yelp,I was so insulted.,0
36,amazon_cells,It has kept up very well.,1
50,amazon_cells,Not loud enough and doesn't turn on like it sh...,0
1016,imdb,"This review is long overdue, since I consider ...",1
174,amazon_cells,The file browser offers all the options that o...,1
669,amazon_cells,Setup went very smoothly.,1
535,amazon_cells,All it took was one drop from about 6 inches a...,0
261,amazon_cells,Only had this a month but it's worked flawless...,1
2988,yelp,It really is impressive that the place hasn't ...,0
2016,yelp,Highly recommended.,1


In [11]:
# checking if the data contains any missing values.
df.isnull()

Unnamed: 0,Source,Sentence,Score
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,False
5,False,False,False
6,False,False,False
7,False,False,False
8,False,False,False
9,False,False,False


In [12]:
# checking for missing values in every column
df.isnull().apply(lambda x: dmh.check_missing_values(x))

Source      (The amoung of missing records is: , 0)
Sentence    (The amoung of missing records is: , 0)
Score       (The amoung of missing records is: , 0)
dtype: object

To check if our code for checking missing value is correct, we create a dummy entry with missing data. If the function really works it should locate the mising value

In [13]:
# defining the entry data with missing data. This data will not have a score
dummy_series = pd.Series(["imdb", "I hate these guys"], index=["Source", "Sentence"])

In [14]:
dummy_series

Source                   imdb
Sentence    I hate these guys
dtype: object

In [15]:
# insert the data to the data frame
result_with_series = df.append(dummy_series, ignore_index=True)

In [16]:
# checking the lenght of the data to see if the new entry has been added
len(result_with_series)

3001

In [17]:
result_with_series.isnull().apply(lambda x: dmh.check_missing_values(x))

Source      (The amoung of missing records is: , 0)
Sentence    (The amoung of missing records is: , 0)
Score       (The amoung of missing records is: , 1)
dtype: object

In [18]:
# entry with missing data, this one has no sentiment
dummy_dict = [{'Source': 'amazon',
               'Score': 1
              }]

In [19]:
df = df.append(dummy_dict,ignore_index=True)

In [20]:
len(df)

3001

In [21]:
df.isnull().apply(lambda x: dmh.check_missing_values(x))

Source      (The amoung of missing records is: , 0)
Sentence    (The amoung of missing records is: , 1)
Score       (The amoung of missing records is: , 0)
dtype: object

In [22]:
pd.isnull(df)


Unnamed: 0,Source,Sentence,Score
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,False
5,False,False,False
6,False,False,False
7,False,False,False
8,False,False,False
9,False,False,False


In [23]:
# displaying the entry that has missing data
df[df.isnull().any(axis=1)]

Unnamed: 0,Source,Sentence,Score
3000,amazon,,1


In [24]:
df.dropna(inplace=True)

In [25]:
df.isnull().apply(lambda x: dmh.check_missing_values(x))

Source      (The amoung of missing records is: , 0)
Sentence    (The amoung of missing records is: , 0)
Score       (The amoung of missing records is: , 0)
dtype: object

Now that we have dealt with missing data, we will check for duplicated. I beleive for the data set we can allow duplicates, becuase the frequency of the the types of sentiments is significate. For example if you have a lot of 'Great phone!', it really says a lot about the greatness of the phone.

In [26]:
sum(df.duplicated())

17

In [27]:
# displaying the duplicate data
df[df.duplicated(['Source', 'Sentence' , 'Score'])]

Unnamed: 0,Source,Sentence,Score
285,amazon_cells,Great phone!.,1
407,amazon_cells,Works great.,1
524,amazon_cells,Works great!.,1
543,amazon_cells,Don't buy this product.,0
744,amazon_cells,If you like a loud buzzing to override all you...,0
748,amazon_cells,Does not fit.,0
778,amazon_cells,This is a great deal.,1
792,amazon_cells,Great Phone.,1
892,amazon_cells,Excellent product for the price.,1
896,amazon_cells,Great phone.,1


In [28]:
df_sample = df.sample(n=900)

In [29]:
len(df_sample)

900

In [30]:
df_sample[0:20]

Unnamed: 0,Source,Sentence,Score
1088,imdb,It is a true classic.,1
760,amazon_cells,Would not reccommend.,0
993,amazon_cells,disappointed.,0
286,amazon_cells,I wouldn't recommend buying this product.,0
2882,yelp,We definately enjoyed ourselves.,1
2090,yelp,"In summary, this was a largely disappointing d...",0
1618,imdb,"An instant classic, with a great soundtrack an...",1
2203,yelp,Great brunch spot.,1
899,amazon_cells,The pairing of the two devices was so easy it ...,1
2939,yelp,The building itself seems pretty neat; the bat...,0


In [31]:
df.columns[0]

'Source'

In [32]:
df_category_counts = ta.get_tokens_and_frequency(list(df.Source))
df_sample_category_counts = ta.get_tokens_and_frequency(list(df_sample.Source))

In [33]:
py.iplot(ta.plot_word_frequency(df_category_counts, "Category distribution"))


High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~Phiwo/0 or inside your plot.ly account where it is named 'plot from API'


In [34]:
py.iplot(ta.plot_word_frequency(df_sample_category_counts, "Category distribution_Sample"))

In [35]:
series = df['Source'].value_counts()
series_sample = df_sample['Source'].value_counts()

compare_data = {'Data': ['Orignal', 'Sample'], 
        'imdb': [series[0], series_sample[0]], 
        'yelp': [series[1], series_sample[1]],
         'amazon': [series[2], series_sample[2]]}
df_sample_compare = pd.DataFrame(compare_data, columns = ['Data','imdb', 'yelp', 'amazon'])
df_sample_compare.set_index('Data', inplace=True)
df_sample_compare

Unnamed: 0_level_0,imdb,yelp,amazon
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Orignal,1000,1000,1000
Sample,312,303,285


In [36]:
df_sample_compare_plot = pd.DataFrame(data = df_sample_compare, columns=['imdb', 'yelp', 'amazon'])
#row = df.ix[5]
#row.iplot(kind='bar')
df_sample_compare_plot.iplot(kind='bar', yTitle='Number of Sentiments', xTitle='Data', title = 'Sources Distribution Data vs sample')

In [37]:
cf.go_online()

In [38]:
series = df_sample['Source'].value_counts()
series.head(3)

amazon_cells    312
imdb            303
yelp            285
Name: Source, dtype: int64

In [39]:
df_amazon_only = df_sample['Source'] == "amazon_cells"
df_amazon_1score_only = df_sample['Score'] == 1
df_amazon_0score_only = df_sample['Score'] == 0

In [40]:
amazon_one = len(df_sample[df_amazon_only & df_amazon_1score_only ])
amazon_one

158

In [41]:
amazon_zero = len(df_sample[df_amazon_only & df_amazon_0score_only  ])
amazon_zero

154

In [42]:
df_imdb_only = df_sample['Source'] == "imdb"
df_imdb_1score_only = df_sample['Score'] == 1
df_imdb_0score_only = df_sample['Score'] == 0

In [43]:
imdb_one = len(df_sample[df_imdb_only & df_imdb_1score_only ])
imdb_one

148

In [44]:
imdb_zero = len(df_sample[df_imdb_only & df_imdb_0score_only ])
imdb_zero

155

In [45]:
df_yelp_only = df_sample['Source'] == "yelp"
df_yelp_1score_only = df_sample['Score'] == 1
df_yelp_0score_only = df_sample['Score'] == 0

In [46]:
yelp_one = len(df_sample[df_yelp_only & df_yelp_1score_only ])
yelp_one

141

In [47]:
yelp_zero = len(df_sample[df_yelp_only & df_yelp_0score_only ])
yelp_zero

144

In [48]:
raw_data = {'Source': ['Amazon', 'Imdb', 'Yelb'], 
        'Positive': [amazon_one, imdb_one, yelp_one], 
        'Negative': [amazon_zero, imdb_zero,yelp_zero]}

In [49]:
df_new = pd.DataFrame(raw_data, columns = ['Source', 'Positive', 'Negative'])

df_new

Unnamed: 0,Source,Positive,Negative
0,Amazon,158,154
1,Imdb,148,155
2,Yelb,141,144


In [50]:
df_new.set_index('Source', inplace=True)
df_new

Unnamed: 0_level_0,Positive,Negative
Source,Unnamed: 1_level_1,Unnamed: 2_level_1
Amazon,158,154
Imdb,148,155
Yelb,141,144


In [51]:
df_plot = pd.DataFrame(data = df_new, columns=['Positive', 'Negative'])
#row = df.ix[5]
#row.iplot(kind='bar')
df_plot.iplot(kind='bar', yTitle='Number of Sentiments', xTitle='Sources', title = 'Sources Distribution')

In [52]:
compare_data = {'Data' :['Orignal', 'Orignal', 'Orignal','Sample', 'Sample', 'Sample'],
        'Source': ['Amazon', 'Imdb', 'Yelb','Amazon', 'Imdb', 'Yelb'], 
        'Positive': [500,500,500,amazon_one, imdb_one, yelp_one], 
        'Negative': [500,500,500,amazon_zero, imdb_zero,yelp_zero]}

df_compare = pd.DataFrame(compare_data, columns = ['Data','Source', 'Positive', 'Negative'])
df_compare.set_index('Data', 'Source', inplace=True)
df_compare


Unnamed: 0_level_0,Source,Positive,Negative
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Orignal,Amazon,500,500
Orignal,Imdb,500,500
Orignal,Yelb,500,500
Sample,Amazon,158,154
Sample,Imdb,148,155
Sample,Yelb,141,144


In [53]:
#df_compare_plot = pd.DataFrame(data = df_compare, columns=['Sources','Positive', 'Negative'])
#row = df.ix[5]
#row.iplot(kind='bar')
df_compare.iplot(kind='bar', yTitle='Number of Sentiments', xTitle='Sources', title = 'Sources Distribution')

In [54]:
import nltk
df['unigrams'] = df['Sentence'].apply(lambda x: dmh.tokenize_text(x))

In [55]:
df[0:5]["unigrams"]

0    [So, there, is, no, way, for, me, to, plug, it...
1                 [Good, case, ,, Excellent, value, .]
2                        [Great, for, the, jawbone, .]
3    [Tied, to, charger, for, conversations, lastin...
4                             [The, mic, is, great, .]
Name: unigrams, dtype: object

In [56]:
list(df[0:1]['unigrams'])

[['So',
  'there',
  'is',
  'no',
  'way',
  'for',
  'me',
  'to',
  'plug',
  'it',
  'in',
  'here',
  'in',
  'the',
  'US',
  'unless',
  'I',
  'go',
  'by',
  'a',
  'converter',
  '.']]

In [57]:
count_vect = CountVectorizer()
df_counts = count_vect.fit_transform(df.Sentence)

In [58]:
analyze = count_vect.build_analyzer()
analyze(" ".join(list(df[4:5].Sentence)))

['the', 'mic', 'is', 'great']

In [59]:
" ".join(list(df[4:5].Sentence))

'The mic is great.'

In [60]:
df_counts.shape

(3000, 5155)

In [61]:
count_vect.get_feature_names()[0:10]

['00', '10', '100', '11', '12', '13', '15', '15g', '15pm', '17']

In [62]:
count_vect.vocabulary_.get('So')

In [63]:
df[0:5]

Unnamed: 0,Source,Sentence,Score,unigrams
0,amazon_cells,So there is no way for me to plug it in here i...,0,"[So, there, is, no, way, for, me, to, plug, it..."
1,amazon_cells,"Good case, Excellent value.",1,"[Good, case, ,, Excellent, value, .]"
2,amazon_cells,Great for the jawbone.,1,"[Great, for, the, jawbone, .]"
3,amazon_cells,Tied to charger for conversations lasting more...,0,"[Tied, to, charger, for, conversations, lastin..."
4,amazon_cells,The mic is great.,1,"[The, mic, is, great, .]"


In [64]:
df_counts[0:5,0:100].toarray()

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0,

In [65]:
count_vect.transform(['Something completely new.']).toarray()

array([[0, 0, 0, ..., 0, 0, 0]])

In [66]:
count_vect.transform(['00 Something completely new.']).toarray()

array([[1, 0, 0, ..., 0, 0, 0]])

In [67]:
plot_x = ["term_"+str(i) for i in count_vect.get_feature_names()[0:50]]

In [68]:
plot_x

['term_00',
 'term_10',
 'term_100',
 'term_11',
 'term_12',
 'term_13',
 'term_15',
 'term_15g',
 'term_15pm',
 'term_17',
 'term_18',
 'term_18th',
 'term_1928',
 'term_1947',
 'term_1948',
 'term_1949',
 'term_1971',
 'term_1973',
 'term_1979',
 'term_1980',
 'term_1986',
 'term_1995',
 'term_1998',
 'term_20',
 'term_2000',
 'term_2005',
 'term_2006',
 'term_2007',
 'term_20th',
 'term_2160',
 'term_23',
 'term_24',
 'term_25',
 'term_2mp',
 'term_30',
 'term_30s',
 'term_325',
 'term_35',
 'term_350',
 'term_375',
 'term_3o',
 'term_40',
 'term_40min',
 'term_42',
 'term_44',
 'term_45',
 'term_4s',
 'term_4ths',
 'term_50',
 'term_5020']

In [69]:
plot_y = ["doc_"+ str(i) for i in list(df.index)[0:50]]

In [70]:
plot_z = df_counts[0:50, 0:50].toarray()

In [71]:
# to plot
py.iplot(ta.plot_heat_map(plot_x, plot_y, plot_z))

In [72]:
from sklearn.decomposition import PCA

In [73]:
df_reduced = PCA(n_components=3).fit_transform(df_counts.toarray())

In [74]:
df_reduced.shape

(3000, 3)

In [75]:
trace1 = ta.get_trace(df_reduced, df["Source"], "Amazon_cells", "rgb(71,233,163)")
trace2 = ta.get_trace(df_reduced, df["Source"], "imbd", "rgb(52,133,252)")
trace3 = ta.get_trace(df_reduced, df["Source"], "yelp", "rgb(229,65,136)")

In [76]:
data = [trace1, trace2, trace3]

In [77]:
layout = go.Layout(
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=0
    )
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='simple-3d-scatter')

In [78]:
term_frequencies = []
for j in range(0,df_counts.shape[1]):
    term_frequencies.append(sum(df_counts[:,j].toarray()))

In [79]:
term_frequencies[1]

array([38])

In [80]:
py.iplot(ta.plot_word_frequency([count_vect.get_feature_names(), term_frequencies], "Term Frequency Distribution"))

In [81]:
term_frequencies = []
for j in range(0,df_counts.shape[0]):
    term_frequencies.append(sum(df_counts[:,j].toarray()))

In [82]:
term_frequencies

[array([1]),
 array([38]),
 array([3]),
 array([2]),
 array([4]),
 array([3]),
 array([3]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([9]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([7]),
 array([1]),
 array([1]),
 array([3]),
 array([1]),
 array([1]),
 array([1]),
 array([5]),
 array([1]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([4]),
 array([1]),
 array([3]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([7]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([5]),
 array([4]),
 array([1]),
 array([85]),
 array([7]

In [83]:
py.iplot(ta.plot_word_frequency([count_vect.get_feature_names(), term_frequencies], "Term Frequency Distribution"))

In [84]:
term_frequencies_log = [math.log(i) for i in term_frequencies]

In [85]:
py.iplot(ta.plot_word_frequency([count_vect.get_feature_names(), term_frequencies_log], "Term Frequency Distribution"))

In [86]:
from sklearn.feature_extraction.text import CountVectorizer
word_vectorizer = CountVectorizer(ngram_range=(1,4), analyzer='word')
sparse_matrix = word_vectorizer.fit_transform(df['Sentence'])
frequencies = sum(sparse_matrix).toarray()[0]
pd.DataFrame(frequencies, index=word_vectorizer.get_feature_names(), columns=['frequency'])

Unnamed: 0,frequency
00,1
10,38
10 10,3
10 and,1
10 and only,1
10 and only because,1
10 feet,1
10 feet wide,1
10 feet wide of,1
10 for,1


In [87]:
from sklearn import preprocessing, metrics, decomposition, pipeline, dummy

In [88]:
mlb = preprocessing.LabelBinarizer()

In [89]:
mlb.fit(df.Source)

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

In [90]:
mlb.classes_

array(['amazon_cells', 'imdb', 'yelp'],
      dtype='<U12')

In [91]:
df['bin_category'] = mlb.transform(df['Source']).tolist()

In [92]:
df.sample(n=10, replace=False)

Unnamed: 0,Source,Sentence,Score,unigrams,bin_category
2867,yelp,"For sushi on the Strip, this is the place to go.",1,"[For, sushi, on, the, Strip, ,, this, is, the,...","[0, 0, 1]"
2500,yelp,I also had to taste my Mom's multi-grain pumpk...,1,"[I, also, had, to, taste, my, Mom, 's, multi-g...","[0, 0, 1]"
1659,imdb,"If you love death and decay, and Shakespears l...",1,"[If, you, love, death, and, decay, ,, and, Sha...","[0, 1, 0]"
1400,imdb,A good commentary of today's love and undoubte...,1,"[A, good, commentary, of, today, 's, love, and...","[0, 1, 0]"
644,amazon_cells,"I contacted the company and they told me that,...",0,"[I, contacted, the, company, and, they, told, ...","[1, 0, 0]"
2095,yelp,We'll never go again.,0,"[We, 'll, never, go, again, .]","[0, 0, 1]"
1520,imdb,It is zillion times away from reality.,0,"[It, is, zillion, times, away, from, reality, .]","[0, 1, 0]"
1611,imdb,I believe that Pitch Black was done well.,1,"[I, believe, that, Pitch, Black, was, done, we...","[0, 1, 0]"
2226,yelp,This is an unbelievable BARGAIN!,1,"[This, is, an, unbelievable, BARGAIN, !]","[0, 0, 1]"
48,amazon_cells,This case seems well made.,1,"[This, case, seems, well, made, .]","[1, 0, 0]"
