# Streaming Twitter Sentiment Prediction

## Training a NaiveBayes Model of Twitter Sentiment

Our approach to making Twitter sentiment predictions is to first train a Naive Bayes model using a labeled dataset available on Kaggle.

We then save the trained model and will load and use it subsequently in a streaming application (in a different notebook)

## Step 1: Download and Explore data

1\. First download the 1.6 Million tweets data set from `http://idsdl.csom.umn.edu/c/share/msba6330/twitter1.6m.zip`

The original kaggle dataset is available at [https://www.kaggle.com/datasets/kazanova/sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140)

It contains 1,600,000 tweets extracted using the twitter api . The tweets have been annotated (0 = negative, 4 = positive) and they can be used to detect sentiment .

It contains the following 6 fields:

- target: the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive) - there is no 2 in this dataset.
- id: The id of the tweet (`2087`)
- date: the date of the tweet (`Sat May 16 23:58:44 UTC 2009`)
- flag: The query (lyx). If there is no query, then this value is `NO_QUERY`.
- user: the user that tweeted (`robotickilldozr`)
- text: the text of the tweet (`Lyx is cool`)

In [0]:
!wget http://idsdl.csom.umn.edu/c/share/msba6330/twitter1.6m.zip

--2023-11-06 07:36:54--  http://idsdl.csom.umn.edu/c/share/msba6330/twitter1.6m.zip
Resolving idsdl.csom.umn.edu (idsdl.csom.umn.edu)... 134.84.138.46, 2607:ea00:101:480a:250:56ff:febb:e76b
Connecting to idsdl.csom.umn.edu (idsdl.csom.umn.edu)|134.84.138.46|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84855679 (81M) [application/zip]
Saving to: ‘twitter1.6m.zip’


2023-11-06 07:36:57 (30.7 MB/s) - ‘twitter1.6m.zip’ saved [84855679/84855679]



In [0]:
!unzip twitter1.6m.zip

Archive:  twitter1.6m.zip
  inflating: training.1600000.processed.noemoticon.csv  


2\. Explore the data format

- display the first 10 rows of the data in the training dataset

In [0]:
%%bash
head training.1600000.processed.noemoticon.csv

"0","1467810369","Mon Apr 06 22:19:45 PDT 2009","NO_QUERY","_TheSpecialOne_","@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D"
"0","1467810672","Mon Apr 06 22:19:49 PDT 2009","NO_QUERY","scotthamilton","is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!"
"0","1467810917","Mon Apr 06 22:19:53 PDT 2009","NO_QUERY","mattycus","@Kenichan I dived many times for the ball. Managed to save 50%  The rest go out of bounds"
"0","1467811184","Mon Apr 06 22:19:57 PDT 2009","NO_QUERY","ElleCTF","my whole body feels itchy and like its on fire "
"0","1467811193","Mon Apr 06 22:19:57 PDT 2009","NO_QUERY","Karoli","@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there. "
"0","1467811372","Mon Apr 06 22:20:00 PDT 2009","NO_QUERY","joy_wolf","@Kwesidei not the whole crew "
"0","1467811592","Mon Apr 06 22:20:03 PDT 2009","NO

In [0]:
!ls -l

total 317380
drwxr-xr-x 2 root root      4096 Nov  6 07:31 azure
drwxr-xr-x 2 root root      4096 Nov  6 07:31 conf
drwxr-xr-x 3 root root      4096 Nov  6 07:33 eventlogs
-r-xr-xr-x 1 root root      2755 Nov  6 07:31 hadoop_accessed_config.lst
drwxr-xr-x 2 root root      4096 Nov  6 07:34 logs
drwxr-xr-x 5 root root      4096 Nov  6 07:35 metastore_db
-r-xr-xr-x 1 root root   1306936 Nov  6 07:31 preload_class.lst
-rw-r--r-- 1 root root 238803811 Sep 21  2019 training.1600000.processed.noemoticon.csv
-rw-r--r-- 1 root root  84855679 Oct 18  2022 twitter1.6m.zip


3\. Read the data into a DataFrame `data` using schema string `target integer,id long,date string,flag string,user string,text string`

then verify the results by showing 10 rows and schema

In [0]:
schema = "target integer, id long, date string, flag string, user string, text string"
data = spark.read.csv("file:/databricks/driver/training.1600000.processed.noemoticon.csv",schema=schema)
data.limit(10).display()

target,id,date,flag,user,text
0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D"
0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!
0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds
0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there."
0,1467811372,Mon Apr 06 22:20:00 PDT 2009,NO_QUERY,joy_wolf,@Kwesidei not the whole crew
0,1467811592,Mon Apr 06 22:20:03 PDT 2009,NO_QUERY,mybirch,Need a hug
0,1467811594,Mon Apr 06 22:20:03 PDT 2009,NO_QUERY,coZZ,"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?"
0,1467811795,Mon Apr 06 22:20:05 PDT 2009,NO_QUERY,2Hood4Hollywood,@Tatiana_K nope they didn't have it
0,1467812025,Mon Apr 06 22:20:09 PDT 2009,NO_QUERY,mimismo,@twittera que me muera ?


In [0]:
data.printSchema()

root
 |-- target: integer (nullable = true)
 |-- id: long (nullable = true)
 |-- date: string (nullable = true)
 |-- flag: string (nullable = true)
 |-- user: string (nullable = true)
 |-- text: string (nullable = true)



## Step 2: Data Cleaning and Target Variable Transformation

The train data has a different format from our testing data which comes from twitter stream. Our testing data will have these fields

- `text`: tweet text
- `time`: timestamp

In the following, we first want to 

- transform the date column into a timestamp column `time`
- transform `target` variable into a binary column (tip: using Binarizer, but cast it to double first as Binarizer only works with double/float columns)
- drop the irrelevant columns.

The resulting dataframe, called `data_clean` will have these
- `label`: a 0-1 label column derived from target.
- `text` 
- `time`

In [0]:
from pyspark.sql.functions import col, to_timestamp, substring
from pyspark.ml.feature import *
from pyspark.ml import Pipeline


data_clean = data.drop("id", "flag", "user").withColumn("time", to_timestamp(substring(col("date"),5,24),"MMM dd HH:mm:ss zzz yyyy")).drop("date").cache()

st_cast = SQLTransformer(statement = "select cast(target as double) as target_double, * from __THIS__")
binarizer = Binarizer(threshold = 2.0, inputCol="target_double", outputCol="label")
target_pipeline = Pipeline(stages=[st_cast,binarizer])
data_train = target_pipeline.fit(data_clean).transform(data_clean).drop("target_double","target")

data_train.limit(10).display()

text,time,label
"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D",2009-04-07T05:19:45.000+0000,0.0
is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!,2009-04-07T05:19:49.000+0000,0.0
@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds,2009-04-07T05:19:53.000+0000,0.0
my whole body feels itchy and like its on fire,2009-04-07T05:19:57.000+0000,0.0
"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there.",2009-04-07T05:19:57.000+0000,0.0
@Kwesidei not the whole crew,2009-04-07T05:20:00.000+0000,0.0
Need a hug,2009-04-07T05:20:03.000+0000,0.0
"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?",2009-04-07T05:20:03.000+0000,0.0
@Tatiana_K nope they didn't have it,2009-04-07T05:20:05.000+0000,0.0
@twittera que me muera ?,2009-04-07T05:20:09.000+0000,0.0


In [0]:
data_train.printSchema()

root
 |-- text: string (nullable = true)
 |-- time: timestamp (nullable = true)
 |-- label: double (nullable = true)



## Step 3: Define and Fit a ML Pipeline Containing Data Preprocessing and Model Training

Specifically, we need to remove some unwanted strings components (such as URL), stop words, and vectorize the words. We will try to do all these with PySpark (a combination of SparkSQL and Spark MLlib)

- eliminate URLs, @user, and # (remove the just the symbol but keep the hashtag). We will do this with SQLTransformer
  - Please use regexp_replace from SparkSQL
  - `'http\\\S+'` --> `''`: remove url   (note, we have three escape symbols because the text needs to go through three interpreter; final form should be `http\S+`)
  - `'@\\\w+'` --> `''`: remove @user
  - `'#'` --> `''` --> remove hashtag symbols.
- Tokenize the tweet, using a RegExTokenizer with `\\W+` as the token pattern.
- Remove stopwords, using StopWordsRemover (note that you need to load stop words first)
- Turn words into numerical features using CountVectorizer, limiting document frequency to 20 and above so that rare words are dropped.
- Use NaiveBayes to predict the sentiment with `smoothing` coefficient of `1.0` and `modelType` of `multinomial`

Also define 

- an evaluator `e` for the accuracy metric.
- a pipeline `pipeline` with these stages: SQL transformer (for removing unwanted patterns), tokenizer, stopword remover, count vectorizer, naiveBayes.

In [0]:
from pyspark.ml.classification import NaiveBayes
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

st = SQLTransformer(statement = "select *, regexp_replace(regexp_replace(regexp_replace(lower(text), 'http:\\\S+',''), '@\\\w+', ''),'#','') as text_cleaned from __THIS__")

tokenizer = RegexTokenizer(inputCol = "text_cleaned", outputCol='words', pattern = "\\W+")
stopwords = StopWordsRemover.loadDefaultStopWords("english")
swr = StopWordsRemover(inputCol = 'words', outputCol='words_filtered', stopWords=stopwords)
cv = CountVectorizer(inputCol='words_filtered', outputCol='features', minDF=20)
# ['hello' 'world'] -->
# hello    world
#  1        1
nb = NaiveBayes(smoothing=1.0, modelType='multinomial')
e = MulticlassClassificationEvaluator(metricName = 'accuracy')

pipeline = Pipeline(stages=[st, tokenizer, swr, cv, nb])

In [0]:
pipelineModel = pipeline.fit(data_train)

## Step 4: Train the Pipeline

- save the resulting model as `pipelineModel`
- use the model to transform the training data `data_clean`
- display sample results

In [0]:
results = pipelineModel.transform(data_train)
results.limit(10).display()

text,time,label,text_cleaned,words,words_filtered,features,rawPrediction,probability,prediction
"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D",2009-04-07T05:19:45.000+0000,0.0,"- awww, that's a bummer. you shoulda got david carr of third day to do it. ;d","List(awww, that, s, a, bummer, you, shoulda, got, david, carr, of, third, day, to, do, it, d)","List(awww, bummer, shoulda, got, david, carr, third, day, d)","Map(vectorType -> sparse, length -> 22150, indices -> List(2, 11, 72, 349, 737, 1074, 1787, 3381, 9576), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-73.10331329092534, -75.76040428183782))","Map(vectorType -> dense, length -> 2, values -> List(0.9344466971628318, 0.06555330283716827))",0.0
is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!,2009-04-07T05:19:49.000+0000,0.0,is upset that he can't update his facebook by texting it... and might cry as a result school today also. blah!,"List(is, upset, that, he, can, t, update, his, facebook, by, texting, it, and, might, cry, as, a, result, school, today, also, blah)","List(upset, update, facebook, texting, might, cry, result, school, today, also, blah)","Map(vectorType -> sparse, length -> 22150, indices -> List(7, 70, 174, 197, 425, 429, 440, 682, 1018, 1917, 2240), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-84.59414501008106, -89.98197577429755))","Map(vectorType -> dense, length -> 2, values -> List(0.9954489268877933, 0.004551073112206787))",0.0
@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds,2009-04-07T05:19:53.000+0000,0.0,i dived many times for the ball. managed to save 50% the rest go out of bounds,"List(i, dived, many, times, for, the, ball, managed, to, save, 50, the, rest, go, out, of, bounds)","List(dived, many, times, ball, managed, save, 50, rest, go, bounds)","Map(vectorType -> sparse, length -> 22150, indices -> List(5, 216, 256, 370, 800, 981, 1170, 1578), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-63.04542955499372, -63.78005141899811))","Map(vectorType -> dense, length -> 2, values -> List(0.67581868825521, 0.32418131174479003))",0.0
my whole body feels itchy and like its on fire,2009-04-07T05:19:57.000+0000,0.0,my whole body feels itchy and like its on fire,"List(my, whole, body, feels, itchy, and, like, its, on, fire)","List(whole, body, feels, itchy, like, fire)","Map(vectorType -> sparse, length -> 22150, indices -> List(4, 331, 381, 705, 1043, 2814), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-47.0100517499005, -50.54793735282175))","Map(vectorType -> dense, length -> 2, values -> List(0.9717467190554288, 0.02825328094457132))",0.0
"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there.",2009-04-07T05:19:57.000+0000,0.0,"no, it's not behaving at all. i'm mad. why am i here? because i can't see you all over there.","List(no, it, s, not, behaving, at, all, i, m, mad, why, am, i, here, because, i, can, t, see, you, all, over, there)","List(behaving, m, mad, see)","Map(vectorType -> sparse, length -> 22150, indices -> List(0, 21, 493, 10182), values -> List(1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-30.130673621754106, -31.075616317275248))","Map(vectorType -> dense, length -> 2, values -> List(0.720096976808941, 0.27990302319105903))",0.0
@Kwesidei not the whole crew,2009-04-07T05:20:00.000+0000,0.0,not the whole crew,"List(not, the, whole, crew)","List(whole, crew)","Map(vectorType -> sparse, length -> 22150, indices -> List(331, 2084), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-17.983945382297364, -17.91495730669582))","Map(vectorType -> dense, length -> 2, values -> List(0.48275981823545605, 0.517240181764544))",1.0
Need a hug,2009-04-07T05:20:03.000+0000,0.0,need a hug,"List(need, a, hug)","List(need, hug)","Map(vectorType -> sparse, length -> 22150, indices -> List(35, 815), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.601759941781173, -15.43562112134421))","Map(vectorType -> dense, length -> 2, values -> List(0.6971707364117425, 0.30282926358825757))",0.0
"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?",2009-04-07T05:20:03.000+0000,0.0,"hey long time no see! yes.. rains a bit ,only a bit lol , i'm fine thanks , how's you ?","List(hey, long, time, no, see, yes, rains, a, bit, only, a, bit, lol, i, m, fine, thanks, how, s, you)","List(hey, long, time, see, yes, rains, bit, bit, lol, m, fine, thanks)","Map(vectorType -> sparse, length -> 22150, indices -> List(0, 12, 13, 21, 31, 76, 78, 88, 162, 423, 2559), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-80.42899760997152, -76.07727296338678))","Map(vectorType -> dense, length -> 2, values -> List(0.012720671663382397, 0.9872793283366176))",1.0
@Tatiana_K nope they didn't have it,2009-04-07T05:20:05.000+0000,0.0,nope they didn't have it,"List(nope, they, didn, t, have, it)","List(nope, didn)","Map(vectorType -> sparse, length -> 22150, indices -> List(69, 691), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.79513117152748, -16.11864752311122))","Map(vectorType -> dense, length -> 2, values -> List(0.7897661408990221, 0.21023385910097786))",0.0
@twittera que me muera ?,2009-04-07T05:20:09.000+0000,0.0,que me muera ?,"List(que, me, muera)","List(que, muera)","Map(vectorType -> sparse, length -> 22150, indices -> List(2371), values -> List(1.0))","Map(vectorType -> dense, length -> 2, values -> List(-10.504218124777754, -10.58930814153326))","Map(vectorType -> dense, length -> 2, values -> List(0.5212596785128979, 0.4787403214871022))",0.0


## Step 5: Obtain the Accuracy of the Model using Our Predefined Evaluator

In [0]:
e.evaluate(results)

Out[18]: 0.77388625

## Step 6: Save the Model on DBFS at `/FileStore/twitter_nbpipeline/`

- then, explore the saved model using fs commands

In [0]:
pipelineModel.write().overwrite().save("/FileStore/twitter_nbpipeline/")

In [0]:
%fs ls /FileStore/twitter_nbpipeline/

path,name,size,modificationTime
dbfs:/FileStore/twitter_nbpipeline/metadata/,metadata/,0,0
dbfs:/FileStore/twitter_nbpipeline/stages/,stages/,0,0


## Step 7. Test the Saved Model

- Load the saved pipeline model as `pipelineModel2`
- Use it transform a small sample (e.g. 1000 rows) of the training data
- View the results

In [0]:
from pyspark.ml import PipelineModel
pipelineModel2 = PipelineModel.load('/FileStore/twitter_nbpipeline/')

In [0]:
scored_tweets = pipelineModel2.transform(data_train.limit(1000))
scored_tweets.display()

text,time,label,text_cleaned,words,words_filtered,features,rawPrediction,probability,prediction
"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D",2009-04-07T05:19:45.000+0000,0.0,"- awww, that's a bummer. you shoulda got david carr of third day to do it. ;d","List(awww, that, s, a, bummer, you, shoulda, got, david, carr, of, third, day, to, do, it, d)","List(awww, bummer, shoulda, got, david, carr, third, day, d)","Map(vectorType -> sparse, length -> 22150, indices -> List(2, 11, 72, 349, 737, 1074, 1787, 3381, 9576), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-73.10331329092534, -75.76040428183782))","Map(vectorType -> dense, length -> 2, values -> List(0.9344466971628318, 0.06555330283716827))",0.0
is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!,2009-04-07T05:19:49.000+0000,0.0,is upset that he can't update his facebook by texting it... and might cry as a result school today also. blah!,"List(is, upset, that, he, can, t, update, his, facebook, by, texting, it, and, might, cry, as, a, result, school, today, also, blah)","List(upset, update, facebook, texting, might, cry, result, school, today, also, blah)","Map(vectorType -> sparse, length -> 22150, indices -> List(7, 70, 174, 197, 425, 429, 440, 682, 1018, 1917, 2240), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-84.59414501008106, -89.98197577429755))","Map(vectorType -> dense, length -> 2, values -> List(0.9954489268877933, 0.004551073112206787))",0.0
@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds,2009-04-07T05:19:53.000+0000,0.0,i dived many times for the ball. managed to save 50% the rest go out of bounds,"List(i, dived, many, times, for, the, ball, managed, to, save, 50, the, rest, go, out, of, bounds)","List(dived, many, times, ball, managed, save, 50, rest, go, bounds)","Map(vectorType -> sparse, length -> 22150, indices -> List(5, 216, 256, 370, 800, 981, 1170, 1578), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-63.04542955499372, -63.78005141899811))","Map(vectorType -> dense, length -> 2, values -> List(0.67581868825521, 0.32418131174479003))",0.0
my whole body feels itchy and like its on fire,2009-04-07T05:19:57.000+0000,0.0,my whole body feels itchy and like its on fire,"List(my, whole, body, feels, itchy, and, like, its, on, fire)","List(whole, body, feels, itchy, like, fire)","Map(vectorType -> sparse, length -> 22150, indices -> List(4, 331, 381, 705, 1043, 2814), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-47.0100517499005, -50.54793735282175))","Map(vectorType -> dense, length -> 2, values -> List(0.9717467190554288, 0.02825328094457132))",0.0
"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there.",2009-04-07T05:19:57.000+0000,0.0,"no, it's not behaving at all. i'm mad. why am i here? because i can't see you all over there.","List(no, it, s, not, behaving, at, all, i, m, mad, why, am, i, here, because, i, can, t, see, you, all, over, there)","List(behaving, m, mad, see)","Map(vectorType -> sparse, length -> 22150, indices -> List(0, 21, 493, 10182), values -> List(1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-30.130673621754106, -31.075616317275248))","Map(vectorType -> dense, length -> 2, values -> List(0.720096976808941, 0.27990302319105903))",0.0
@Kwesidei not the whole crew,2009-04-07T05:20:00.000+0000,0.0,not the whole crew,"List(not, the, whole, crew)","List(whole, crew)","Map(vectorType -> sparse, length -> 22150, indices -> List(331, 2084), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-17.983945382297364, -17.91495730669582))","Map(vectorType -> dense, length -> 2, values -> List(0.48275981823545605, 0.517240181764544))",1.0
Need a hug,2009-04-07T05:20:03.000+0000,0.0,need a hug,"List(need, a, hug)","List(need, hug)","Map(vectorType -> sparse, length -> 22150, indices -> List(35, 815), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.601759941781173, -15.43562112134421))","Map(vectorType -> dense, length -> 2, values -> List(0.6971707364117425, 0.30282926358825757))",0.0
"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?",2009-04-07T05:20:03.000+0000,0.0,"hey long time no see! yes.. rains a bit ,only a bit lol , i'm fine thanks , how's you ?","List(hey, long, time, no, see, yes, rains, a, bit, only, a, bit, lol, i, m, fine, thanks, how, s, you)","List(hey, long, time, see, yes, rains, bit, bit, lol, m, fine, thanks)","Map(vectorType -> sparse, length -> 22150, indices -> List(0, 12, 13, 21, 31, 76, 78, 88, 162, 423, 2559), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-80.42899760997152, -76.07727296338678))","Map(vectorType -> dense, length -> 2, values -> List(0.012720671663382397, 0.9872793283366176))",1.0
@Tatiana_K nope they didn't have it,2009-04-07T05:20:05.000+0000,0.0,nope they didn't have it,"List(nope, they, didn, t, have, it)","List(nope, didn)","Map(vectorType -> sparse, length -> 22150, indices -> List(69, 691), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.79513117152748, -16.11864752311122))","Map(vectorType -> dense, length -> 2, values -> List(0.7897661408990221, 0.21023385910097786))",0.0
@twittera que me muera ?,2009-04-07T05:20:09.000+0000,0.0,que me muera ?,"List(que, me, muera)","List(que, muera)","Map(vectorType -> sparse, length -> 22150, indices -> List(2371), values -> List(1.0))","Map(vectorType -> dense, length -> 2, values -> List(-10.504218124777754, -10.58930814153326))","Map(vectorType -> dense, length -> 2, values -> List(0.5212596785128979, 0.4787403214871022))",0.0
