### Student Information
Name:Muzi Phiwokuhle Gondwe

Student ID:106065433

### Instructions

- Download the dataset provided in this [link](https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences#). The sentiment dataset contains a `sentence` and `score` label. Read what the dataset is about on the link provided before you start exploring it. 


- Then, you are asked to apply each of the data exploration and data operation techniques learned in the [first lab session](https://goo.gl/Sg4FS1) on the new dataset. You don't need to explain all the procedures as we did in the notebook, but you are expected to provide some **minimal comments** explaining your code. You are also expected to use the same libraries used in the first lab session. You are allowed to use and modify the `helper` functions we provided in the first lab session or create your own. Also, be aware that the helper functions may need modification as you are dealing with a completely different dataset. This part is worth 80% of your grade!


- After you have completed the operations, you should attempt the **bonus exercises** provided in the [notebook](https://goo.gl/Sg4FS1) we used for the first lab session. There are six (6) additional exercises; attempt them all, as it is part of your grade (10%). 


- You are also expected to tidy up your notebook and attempt new data operations that you have learned so far in the Data Mining course. Surprise us! This segment is worth 10% of your grade.


- After completing all the above tasks, you are free to remove this header block and submit your assignment following the guide provided in the `README.md` file of the assignment's [repository](https://github.com/omarsar/data_mining_hw_1). 

# 1. Data Source

In [None]:
# necessary for when working with external scripts
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import numpy as np
import nltk
#from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
import plotly.plotly as py
import plotly.graph_objs as go
import math
import plotly.tools as tls
import cufflinks as cf
%matplotlib inline

# my functions
import helpers.data_mining_helpers as dmh
import helpers.text_analysis as ta

In [3]:
# 
from os import path
notebook_path = './'

In [4]:
#  reading the csv file into the data frame
df = pd.read_csv(path.join(notebook_path, 'sentiment labelled sentences', 
                             'dataset.csv'),dtype=object,header= 0, sep='|', engine='python')
df['Score'] = df['Score'].astype(str).astype(int)
df['Source'] = df['Source'].astype(str)




# 2. Data Preparation

In [5]:
# checking the data types in the clomns
df.dtypes

Source      object
Sentence    object
Score        int64
dtype: object

In [6]:
len(df)

3000

In [7]:
# checking column headers of my dataframe
list(df)

['Source', 'Sentence', 'Score']

# 3.Familarizing myself with the data

In [8]:
# query for the first 10 rows 
df[0:10]

Unnamed: 0,Source,Sentence,Score
0,amazon_cells,So there is no way for me to plug it in here i...,0
1,amazon_cells,"Good case, Excellent value.",1
2,amazon_cells,Great for the jawbone.,1
3,amazon_cells,Tied to charger for conversations lasting more...,0
4,amazon_cells,The mic is great.,1
5,amazon_cells,I have to jiggle the plug to get it to line up...,0
6,amazon_cells,If you have several dozen or several hundred c...,0
7,amazon_cells,If you are Razr owner...you must have this!,1
8,amazon_cells,"Needless to say, I wasted my money.",0
9,amazon_cells,What a waste of money and time!.,0


In [9]:
# query for the last 10 rows 
df[-11:-1]

Unnamed: 0,Source,Sentence,Score
2989,yelp,I would avoid this place if you are staying in...,0
2990,yelp,The refried beans that came with my meal were ...,0
2991,yelp,Spend your money and time some place else.,0
2992,yelp,A lady at the table next to us found a live gr...,0
2993,yelp,the presentation of the food was awful.,0
2994,yelp,I can't tell you how disappointed I was.,0
2995,yelp,I think food should have flavor and texture an...,0
2996,yelp,Appetite instantly gone.,0
2997,yelp,Overall I was not impressed and would not go b...,0
2998,yelp,"The whole experience was underwhelming, and I ...",0


In [10]:
# sampling any 10 rows
df.sample(n=10, replace=False)

Unnamed: 0,Source,Sentence,Score
2700,yelp,The chips that came out were dripping with gre...,0
448,amazon_cells,"The screen size is big, key pad lit well enoug...",1
2712,yelp,2 Thumbs Up!!,1
2061,yelp,This place receives stars for their APPETIZERS!!!,1
2740,yelp,The restaurant is very clean and has a family ...,1
2543,yelp,I won't try going back there even if it's empty.,0
2566,yelp,"The servers went back and forth several times,...",0
2588,yelp,"My breakfast was perpared great, with a beauti...",1
955,amazon_cells,Buttons are too small.,0
797,amazon_cells,A good quality bargain.. I bought this after I...,1


# 4. Data Mining Using Pandas

# 4.1 Checking For Missing Values

In [11]:
# checking if the data contains any missing values.
df.isnull()

Unnamed: 0,Source,Sentence,Score
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,False
5,False,False,False
6,False,False,False
7,False,False,False
8,False,False,False
9,False,False,False


In [12]:
# checking for missing values in every column
df.isnull().apply(lambda x: dmh.check_missing_values(x))

Source      (The amoung of missing records is: , 0)
Sentence    (The amoung of missing records is: , 0)
Score       (The amoung of missing records is: , 0)
dtype: object

To check if our code for checking missing value is correct, we create a dummy entry with missing data. If the function really works it should locate the mising value

In [13]:
# defining the entry data with missing data. This data will not have a score
dummy_series = pd.Series(["imdb", "I hate these guys"], index=["Source", "Sentence"])

In [14]:
dummy_series

Source                   imdb
Sentence    I hate these guys
dtype: object

In [15]:
# insert the data to the data frame
result_with_series = df.append(dummy_series, ignore_index=True)

In [16]:
# checking the lenght of the data to see if the new entry has been added
len(result_with_series)

3001

In [17]:
result_with_series.isnull().apply(lambda x: dmh.check_missing_values(x))

Source      (The amoung of missing records is: , 0)
Sentence    (The amoung of missing records is: , 0)
Score       (The amoung of missing records is: , 1)
dtype: object

In [18]:
# entry with missing data, this one has no sentiment
dummy_dict = [{'Source': 'amazon',
               'Score': 1
              }]

In [19]:
df = df.append(dummy_dict,ignore_index=True)

In [20]:
len(df)

3001

In [21]:
df.isnull().apply(lambda x: dmh.check_missing_values(x))

Source      (The amoung of missing records is: , 0)
Sentence    (The amoung of missing records is: , 1)
Score       (The amoung of missing records is: , 0)
dtype: object

In [22]:
pd.isnull(df)


Unnamed: 0,Source,Sentence,Score
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,False
5,False,False,False
6,False,False,False
7,False,False,False
8,False,False,False
9,False,False,False


In [23]:
# displaying the entry that has missing data
df[df.isnull().any(axis=1)]

Unnamed: 0,Source,Sentence,Score
3000,amazon,,1


In [24]:
df.dropna(inplace=True)

In [25]:
df.isnull().apply(lambda x: dmh.check_missing_values(x))

Source      (The amoung of missing records is: , 0)
Sentence    (The amoung of missing records is: , 0)
Score       (The amoung of missing records is: , 0)
dtype: object

# 4.2 Checking for Duplicates
Now that we have dealt with missing data, we will check for duplicated. I beleive for the data set we can allow duplicates, becuase the frequency of the the types of sentiments is significate. For example if you have a lot of 'Great phone!', it really says a lot about the greatness of the phone.

In [26]:
sum(df.duplicated())

17

In [27]:
# displaying the duplicate data
df[df.duplicated(['Source', 'Sentence' , 'Score'])]

Unnamed: 0,Source,Sentence,Score
285,amazon_cells,Great phone!.,1
407,amazon_cells,Works great.,1
524,amazon_cells,Works great!.,1
543,amazon_cells,Don't buy this product.,0
744,amazon_cells,If you like a loud buzzing to override all you...,0
748,amazon_cells,Does not fit.,0
778,amazon_cells,This is a great deal.,1
792,amazon_cells,Great Phone.,1
892,amazon_cells,Excellent product for the price.,1
896,amazon_cells,Great phone.,1


# 5. Data Processing

# 5.1 Sampling

In [28]:
# let say we want to sample 900
df_sample = df.sample(n=900)

In [29]:
len(df_sample)

900

In [30]:
# check any 20 records in the sample
df_sample[0:20]

Unnamed: 0,Source,Sentence,Score
297,amazon_cells,This one works and was priced right.,1
309,amazon_cells,They do not care about the consumer one bit.,0
795,amazon_cells,Perfect for the PS3.,1
178,amazon_cells,It only recognizes the Phone as its storage de...,0
2843,yelp,I won't be back.,0
2098,yelp,It was not good.,0
483,amazon_cells,You won't regret it!,1
1373,imdb,"I guess it was supposed to be clever twist, th...",0
2573,yelp,"He also came back to check on us regularly, ex...",1
1744,imdb,"Emily Watson's character is very strong, and s...",1


In [31]:
df.columns[0]

'Source'

In [32]:
df_category_counts = ta.get_tokens_and_frequency(list(df.Source))
df_sample_category_counts = ta.get_tokens_and_frequency(list(df_sample.Source))

In [33]:
py.iplot(ta.plot_word_frequency(df_category_counts, "Category distribution"))


In [35]:
py.iplot(ta.plot_word_frequency(df_sample_category_counts, "Category distribution_Sample"))

In [36]:
series = df['Source'].value_counts()
series_sample = df_sample['Source'].value_counts()

compare_data = {'Data': ['Orignal', 'Sample'], 
        'imdb': [series[0], series_sample[0]], 
        'yelp': [series[1], series_sample[1]],
         'amazon': [series[2], series_sample[2]]}
df_sample_compare = pd.DataFrame(compare_data, columns = ['Data','imdb', 'yelp', 'amazon'])
df_sample_compare.set_index('Data', inplace=True)
df_sample_compare

Unnamed: 0_level_0,imdb,yelp,amazon
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Orignal,1000,1000,1000
Sample,317,293,290


In [96]:
df_sample_compare_plot = pd.DataFrame(data = df_sample_compare, columns=['imdb', 'yelp', 'amazon'])
df_sample_compare_plot.iplot(kind='bar', yTitle='Number of Sentiments', xTitle='Data', title = 'Sources Distribution (Data vs Sample)')

In [40]:
series = df_sample['Source'].value_counts()
series.head(3)

yelp            317
imdb            293
amazon_cells    290
Name: Source, dtype: int64

In [41]:
df_amazon_only = df_sample['Source'] == "amazon_cells"
df_amazon_1score_only = df_sample['Score'] == 1
df_amazon_0score_only = df_sample['Score'] == 0

In [42]:
amazon_one = len(df_sample[df_amazon_only & df_amazon_1score_only ])
amazon_one

151

In [43]:
amazon_zero = len(df_sample[df_amazon_only & df_amazon_0score_only  ])
amazon_zero

139

In [44]:
df_imdb_only = df_sample['Source'] == "imdb"
df_imdb_1score_only = df_sample['Score'] == 1
df_imdb_0score_only = df_sample['Score'] == 0

In [45]:
imdb_one = len(df_sample[df_imdb_only & df_imdb_1score_only ])
imdb_one

142

In [46]:
imdb_zero = len(df_sample[df_imdb_only & df_imdb_0score_only ])
imdb_zero

151

In [47]:
df_yelp_only = df_sample['Source'] == "yelp"
df_yelp_1score_only = df_sample['Score'] == 1
df_yelp_0score_only = df_sample['Score'] == 0

In [48]:
yelp_one = len(df_sample[df_yelp_only & df_yelp_1score_only ])
yelp_one

164

In [49]:
yelp_zero = len(df_sample[df_yelp_only & df_yelp_0score_only ])
yelp_zero

153

In [50]:
raw_data = {'Source': ['Amazon', 'Imdb', 'Yelb'], 
        'Positive': [amazon_one, imdb_one, yelp_one], 
        'Negative': [amazon_zero, imdb_zero,yelp_zero]}

In [51]:
df_new = pd.DataFrame(raw_data, columns = ['Source', 'Positive', 'Negative'])

df_new

Unnamed: 0,Source,Positive,Negative
0,Amazon,151,139
1,Imdb,142,151
2,Yelb,164,153


In [52]:
df_new.set_index('Source', inplace=True)
df_new

Unnamed: 0_level_0,Positive,Negative
Source,Unnamed: 1_level_1,Unnamed: 2_level_1
Amazon,151,139
Imdb,142,151
Yelb,164,153


In [54]:
df_plot = pd.DataFrame(data = df_new, columns=['Positive', 'Negative'])
df_plot.iplot(kind='bar', yTitle='Number of Sentiments', xTitle='Sources', title = 'Sources Distribution (Sample)')

# 5.3 Feature Creation
Now we try do do feature creation. Basically we are going to obtain unigrams from each text.

In [58]:
import nltk
df['unigrams'] = df['Sentence'].apply(lambda x: dmh.tokenize_text(x))

In [71]:
# display the first 10 unigram
df[0:10]["unigrams"]

0    [So, there, is, no, way, for, me, to, plug, it...
1                 [Good, case, ,, Excellent, value, .]
2                        [Great, for, the, jawbone, .]
3    [Tied, to, charger, for, conversations, lastin...
4                             [The, mic, is, great, .]
5    [I, have, to, jiggle, the, plug, to, get, it, ...
6    [If, you, have, several, dozen, or, several, h...
7    [If, you, are, Razr, owner, ..., you, must, ha...
8      [Needless, to, say, ,, I, wasted, my, money, .]
9         [What, a, waste, of, money, and, time, !, .]
Name: unigrams, dtype: object

In [62]:
list(df[0:1]['unigrams'])

[['So',
  'there',
  'is',
  'no',
  'way',
  'for',
  'me',
  'to',
  'plug',
  'it',
  'in',
  'here',
  'in',
  'the',
  'US',
  'unless',
  'I',
  'go',
  'by',
  'a',
  'converter',
  '.']]

In [63]:
count_vect = CountVectorizer()
df_counts = count_vect.fit_transform(df.Sentence)

In [64]:
analyze = count_vect.build_analyzer()
analyze(" ".join(list(df[4:5].Sentence)))

['the', 'mic', 'is', 'great']

In [65]:
" ".join(list(df[4:5].Sentence))

'The mic is great.'

In [66]:
df_counts.shape

(3000, 5155)

In [67]:
count_vect.get_feature_names()[0:10]

['00', '10', '100', '11', '12', '13', '15', '15g', '15pm', '17']

In [68]:
count_vect.vocabulary_.get('So')

In [69]:
df[0:5]

Unnamed: 0,Source,Sentence,Score,unigrams
0,amazon_cells,So there is no way for me to plug it in here i...,0,"[So, there, is, no, way, for, me, to, plug, it..."
1,amazon_cells,"Good case, Excellent value.",1,"[Good, case, ,, Excellent, value, .]"
2,amazon_cells,Great for the jawbone.,1,"[Great, for, the, jawbone, .]"
3,amazon_cells,Tied to charger for conversations lasting more...,0,"[Tied, to, charger, for, conversations, lastin..."
4,amazon_cells,The mic is great.,1,"[The, mic, is, great, .]"


In [70]:
df_counts[0:5,0:100].toarray()

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0,

In [72]:
count_vect.transform(['Something completely new.']).toarray()

array([[0, 0, 0, ..., 0, 0, 0]])

In [73]:
count_vect.transform(['00 Something completely new.']).toarray()

array([[1, 0, 0, ..., 0, 0, 0]])

In [74]:
plot_x = ["term_"+str(i) for i in count_vect.get_feature_names()[0:50]]

In [75]:
plot_x

['term_00',
 'term_10',
 'term_100',
 'term_11',
 'term_12',
 'term_13',
 'term_15',
 'term_15g',
 'term_15pm',
 'term_17',
 'term_18',
 'term_18th',
 'term_1928',
 'term_1947',
 'term_1948',
 'term_1949',
 'term_1971',
 'term_1973',
 'term_1979',
 'term_1980',
 'term_1986',
 'term_1995',
 'term_1998',
 'term_20',
 'term_2000',
 'term_2005',
 'term_2006',
 'term_2007',
 'term_20th',
 'term_2160',
 'term_23',
 'term_24',
 'term_25',
 'term_2mp',
 'term_30',
 'term_30s',
 'term_325',
 'term_35',
 'term_350',
 'term_375',
 'term_3o',
 'term_40',
 'term_40min',
 'term_42',
 'term_44',
 'term_45',
 'term_4s',
 'term_4ths',
 'term_50',
 'term_5020']

In [76]:
plot_y = ["doc_"+ str(i) for i in list(df.index)[0:50]]

In [81]:
plot_z = df_counts[0:50, 0:50].toarray()

In [82]:
# to plot
py.iplot(ta.plot_heat_map(plot_x, plot_y, plot_z))

# 5.4 Dimensionality Reduction
Now we are going to use PCA to try and reduce the dimensionality of our data set.

In [83]:
from sklearn.decomposition import PCA

In [84]:
df_reduced = PCA(n_components=3).fit_transform(df_counts.toarray())

In [85]:
df_reduced.shape

(3000, 3)

In [86]:
trace1 = ta.get_trace(df_reduced, df["Source"], "Amazon_cells", "rgb(71,233,163)")
trace2 = ta.get_trace(df_reduced, df["Source"], "imbd", "rgb(52,133,252)")
trace3 = ta.get_trace(df_reduced, df["Source"], "yelp", "rgb(229,65,136)")

In [87]:
data = [trace1, trace2, trace3]

In [88]:
layout = go.Layout(
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=0
    )
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='simple-3d-scatter')

In [95]:
term_frequencies = []
for j in range(0,df_counts.shape[1]):
    term_frequencies.append(sum(df_counts[:,j].toarray()))

In [90]:
term_frequencies[1]

array([38])

In [91]:
py.iplot(ta.plot_word_frequency([count_vect.get_feature_names(), term_frequencies], "Term Frequency Distribution"))

In [92]:
term_frequencies = []
for j in range(0,df_counts.shape[0]):
    term_frequencies.append(sum(df_counts[:,j].toarray()))

In [93]:
term_frequencies

[array([1]),
 array([38]),
 array([3]),
 array([2]),
 array([4]),
 array([3]),
 array([3]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([9]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([7]),
 array([1]),
 array([1]),
 array([3]),
 array([1]),
 array([1]),
 array([1]),
 array([5]),
 array([1]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([4]),
 array([1]),
 array([3]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([1]),
 array([2]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([7]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([1]),
 array([5]),
 array([4]),
 array([1]),
 array([85]),
 array([7]

In [94]:
py.iplot(ta.plot_word_frequency([count_vect.get_feature_names(), term_frequencies], "Term Frequency Distribution"))

In [97]:
term_frequencies_log = [math.log(i) for i in term_frequencies]

In [98]:
py.iplot(ta.plot_word_frequency([count_vect.get_feature_names(), term_frequencies_log], "Term Frequency Distribution"))

In [99]:
from sklearn.feature_extraction.text import CountVectorizer
word_vectorizer = CountVectorizer(ngram_range=(1,4), analyzer='word')
sparse_matrix = word_vectorizer.fit_transform(df['Sentence'])
frequencies = sum(sparse_matrix).toarray()[0]
pd.DataFrame(frequencies, index=word_vectorizer.get_feature_names(), columns=['frequency'])

Unnamed: 0,frequency
00,1
10,38
10 10,3
10 and,1
10 and only,1
10 and only because,1
10 feet,1
10 feet wide,1
10 feet wide of,1
10 for,1


# 5.5 Attribute Tranformation

In [100]:
from sklearn import preprocessing, metrics, decomposition, pipeline, dummy

In [101]:
mlb = preprocessing.LabelBinarizer()

In [102]:
mlb.fit(df.Source)

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

In [103]:
mlb.classes_

array(['amazon_cells', 'imdb', 'yelp'],
      dtype='<U12')

In [104]:
df['Bin_category'] = mlb.transform(df['Source']).tolist()

In [105]:
df.sample(n=10, replace=False)

Unnamed: 0,Source,Sentence,Score,unigrams,Bin_category
245,amazon_cells,Worth every penny.,1,"[Worth, every, penny, .]","[1, 0, 0]"
2933,yelp,We won't be going back.,0,"[We, wo, n't, be, going, back, .]","[0, 0, 1]"
1277,imdb,I'm not sure what he was trying to do with thi...,0,"[I, 'm, not, sure, what, he, was, trying, to, ...","[0, 1, 0]"
1188,imdb,Nothing at all to recommend.,0,"[Nothing, at, all, to, recommend, .]","[0, 1, 0]"
2904,yelp,-My order was not correct.,0,"[-My, order, was, not, correct, .]","[0, 0, 1]"
2754,yelp,Main thing I didn't enjoy is that the crowd is...,0,"[Main, thing, I, did, n't, enjoy, is, that, th...","[0, 0, 1]"
750,amazon_cells,"It is light, has plenty of battery capacity, a...",1,"[It, is, light, ,, has, plenty, of, battery, c...","[1, 0, 0]"
1989,imdb,":) Anyway, the plot flowed smoothly and the ma...",1,"[:, ), Anyway, ,, the, plot, flowed, smoothly,...","[0, 1, 0]"
2154,yelp,Some highlights : Great quality nigiri here!,1,"[Some, highlights, :, Great, quality, nigiri, ...","[0, 0, 1]"
620,amazon_cells,Steer clear of this product and go with the ge...,0,"[Steer, clear, of, this, product, and, go, wit...","[1, 0, 0]"
