# When Rotten Tomatoes Isn't Enough: Twitter Sentiment Analysis with DSE

### Things To Setup
* Create a Twitter Account and get API access: https://developer.twitter.com/en/docs/ads/general/guides/getting-started.html
* Install DSE https://docs.datastax.com/en/install/doc/install60/installTOC.html
* Start DSE Analytics Cluster: dse cassandra -k #Must use -k option for Analytics
* Install Anaconda and Jupyter #Anaconda is not required but will make installing jupyter easier 
* Start Jupyter with DSE to get all environemnt variables: dse exec jupyter notebook
* !pip install cassandra-driver
* !pip install tweepy 
* !pip install pattern 
* Counter-intuitive don't install pyspark!!

#### Add some environment variables to find dse verision of pyspark

In [1]:
# Needed to be able to find pyspark libaries
import sys
sys.path.append("~dse-6.0.1/resources/spark/python/lib/pyspark.zip")
sys.path.append("~dse-6.0.1/resources/spark/python/lib/py4j-0.10.4-src.zip")

#### Import python packages -- all are required

In [2]:
import pandas
import cassandra
import pyspark
import tweepy
import re
from IPython.display import display, HTML
from pyspark.sql import SparkSession
from pyspark.ml.feature import Tokenizer, RegexTokenizer, StopWordsRemover
from pyspark.sql.functions import col, udf
from pyspark.sql.types import IntegerType
from pattern.en import sentiment, positive

#### Helper function to have nicer formatting of Spark DataFrames

In [3]:
#Helper for pretty formatting for Spark DataFrames
def showDF(df, limitRows =  20, truncate = True):
    if(truncate):
        pandas.set_option('display.max_colwidth', 50)
    else:
        pandas.set_option('display.max_colwidth', -1)
    pandas.set_option('display.max_rows', limitRows)
    display(df.limit(limitRows).toPandas())
    pandas.reset_option('display.max_rows')

### Creating Tables, Pulling Tweets, and Loading Tables

#### Connect to DSE Analytics Cluster

In [4]:
from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1']) #If you have a locally installed DSE cluster
session = cluster.connect()

#### Create Demo Keyspace 

In [5]:
session.execute("""
    CREATE KEYSPACE IF NOT EXISTS dseanalyticsdemo 
    WITH REPLICATION = 
    { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }"""
)

<cassandra.cluster.ResultSet at 0x1059452d0>

#### Set keyspace 

In [6]:
session.set_keyspace('dseanalyticsdemo')

#### Set Movie Title variable --Change this to search for different movies!

In [52]:
movieTitle = "jurassicworld"

In [53]:
positiveNegative = ["pos", "sad"] 

#### Create two tables in Cassandra for the movie title. One of negative tweets and one for positive tweets. Twitter returns a lot of information with each call but for this demo we will just utilize the twitter id (as our Primary key as it is unique) and the actual tweet. 
#### Is using twitter id the right value to distriubte by? Consider your data model when choosing your primary key. 

In [54]:
for emotion in positiveNegative: 
    
    query = "CREATE TABLE IF NOT EXISTS movie_tweets_%s_%s (twitterid bigint, tweet text, PRIMARY KEY (twitterid))" % (movieTitle, emotion)
    print query
    session.execute(query)


CREATE TABLE IF NOT EXISTS movie_tweets_antman_pos (twitterid bigint, tweet text, PRIMARY KEY (twitterid))
CREATE TABLE IF NOT EXISTS movie_tweets_antman_sad (twitterid bigint, tweet text, PRIMARY KEY (twitterid))


#### Setting up Search Terms for gathering tweets from Twitters API. The happy :) and sad :( face are twitter operators to find positive and negative tweets

In [55]:
searchTermSad= movieTitle + " :("
searchTermPos= movieTitle + " :)"

searchTerms = [searchTermSad, searchTermPos]

#### Function to CleanUp Each Tweet before if is inserted into Cassandra.
#### Removing: 
* emojis 
* flags 
* special characters 
* URL's 
* RT (for Retweet)

In [56]:
#Code from: https://stackoverflow.com/questions/33404752/removing-emojis-from-a-string-in-

def cleanUpTweet(tweet):
    
    emoji_pattern = re.compile(
    u"(\ud83d[\ude00-\ude4f])|"
    u"(\ud83c[\udf00-\uffff])|"  
    u"(\ud83d[\u0000-\uddff])|" 
    u"(\ud83d[\ude80-\udeff])|"  
    u"(\ud83c[\udde0-\uddff])" 
    "+", flags=re.UNICODE)

    removeSpecial = re.compile ('[\n|#|@|!|.|?|,|\"]')
    removeHttp = re.compile("http\S+ | https\S+")
    removeRetweet = re.compile("RT")
    
    noemoji = emoji_pattern.sub(r'', tweet)
    nospecial = removeSpecial.sub(r'', noemoji)
    nohttp = removeHttp.sub(r'', nospecial)
    noretweet = removeRetweet.sub(r'', nohttp)
    
    cleanTweet=noretweet
    
    return cleanTweet

#### Required from Twitter: 
* consumer_key= ''
* consumer_secret= ''
* access_token=''
* access_token_secret=''

In [57]:
consumer_key= ''
consumer_secret= ''

access_token=''
access_token_secret=''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

#### This cell will pull tweets from Twitter. The max number of tweets returned for free at one time is 100. 
#### Run this code a couple of times to get more data! 
#### Once the tweets are collected, loop over the list, clean up each tweet, and then insert it into the table. A large for loops surrounds this to make one call for postive tweets and one call for negative tweets. 

In [58]:
for emotion in positiveNegative:
    print emotion
    query = "INSERT INTO movie_tweets_%s_%s (twitterid, tweet)" % (movieTitle, emotion)
    query = query + " VALUES (%s, %s)"
    
    if emotion == "pos":
        searchTermPos= movieTitle + " :)"
        public_tweets = api.search(q=searchTermPos, lang="en", count="100")
    if emotion == "sad":
        searchTermPos= movieTitle + " :("
        public_tweets = api.search(q=searchTermPos, lang="en", count="100")

    for tweet in public_tweets:
        cleanTweet = cleanUpTweet(tweet.text)
        session.execute(query, (tweet.id, cleanTweet))
        print(cleanTweet)

pos
harleivy I thought the first gotg was really good but the second one just fell kinda flat on its face every situ…
FoxInTheFridge YAS AntMan 2 :DI feel like either you won’t like it or you’ll love it :P
tara nuod antman &amp; the wasp :)
Just back from seeing AntMan at the CineplexMovies in New Minas good popcorn flick Dig the 3-D shades eh…
 alexalbrecht: Thanks to Dell and MoviePass I saw AntMan AGAIN  Even better the second time :) WishIWasGhost CloseUpMagic
Styles_Justinn Really good movie Justin :) I am trying to see the new Jurassic World this summer and perhaps A…
Thanks to Dell and MoviePass I saw AntMan AGAIN  Even better the second time :) WishIWasGhost CloseUpMagic
AdamDoesMovies_ Antman and the Wasp was good but not better than the first Everyone laughed in my theater at the…
RealD_FR AntMan Super :) AntManetlaGuepe
 angiesavocados: Wow I thought Antman and the Wasp would end up being a nice happy Marvel film for once but yet again Marvel has disappo…
Wow I thought Antma

I'm sorry for losing the two Antman tickets :(
omg I LOVE ANTMAN hes such a good dad :(
Barely one show each for incredibles 2 Hotel Transylvania 3 Antman and the wasp Odd timings too  Wott izzzz :(
 atqhrlsn: i just wanna watch antman and the wasp :(
all i want is to watch antman and the wasp okay :(((((
i just wanna watch antman and the wasp :(
FunkoPOPsNews got classic antman :) unfortunately kraglin went out of stock right as I was adding him to my cart :(
indomymenfess inget antman :(
wanna watch antman and the wasp :(
OKAY ANTMAN AND THE WASP WAS REALLY GOOD BUT THE END SCENE MADE ME :((
 WeSupport_SRK: Maybe a bit late but that last post-credits scene from AntManAndTheWasp really struck a nerve with me  Tonally dispar…
i wanna watch antman and the wasp :(((
Antman and the wasp was so pure :(
No he visto Antman :(
 WeSupport_SRK: Maybe a bit late but that last post-credits scene from AntManAndTheWasp really struck a nerve with me  Tonally dispar…
Maybe a bit late but that last po

#### Do a select * on each table and verify that the tweets have been inserted into each Cassandra table

In [59]:
for emotion in positiveNegative:
    print emotion
    query = 'SELECT * FROM movie_tweets_%s_%s' % (movieTitle, emotion)
    rows = session.execute(query)
    for user_row in rows:
        print (user_row.twitterid, user_row.tweet)

pos
(1018294728058548224, u'SCENECard Hey folks Doing date night with the wife AntMan and noticed my SCENEcard looks like it\u2019s back from a f\u2026')
(1018289342442295303, u'MovieMantz PNemiroff TheInSneider ColliderVideo AntMan DisneyStudios MarvelStudios Sorry2BotherYou Finally\u2026')
(1020112947132223489, u'Just back from seeing AntMan at the CineplexMovies in New Minas good popcorn flick Dig the 3-D shades eh\u2026')
(1019661780883263488, u'spoilez pas Antman 2 la miff be kind :)')
(1017587989340250112, u'abhinav_kr Yesterday was antman and the wasp :)')
(1018611948584239104, u'Going to watch antman today :D')
(1020296719274131458, u'tara nuod antman &amp; the wasp :)')
(1019703638485291012, u'NiceToMeetcha Hahaha I bet you were really confused like wtf was going on at the end of Antman and the Wasp :D')
(1019793190197628929, u'i still need to watch- deadpool 2- antman &amp; the wasp :)')
(1018451121520685058, u'EvangelineLilly So in Antman and the wasp the movie was about res

### Finally time for Apache Spark! 

#### Create a spark session that is connected to Cassandra. From there load each table into a Spark Dataframe and take a count of the number of rows in each.

In [60]:
countTokens = udf(lambda words: len(words), IntegerType())

spark = SparkSession.builder.appName('demo').master("local").getOrCreate()

tableNamePos = "movie_tweets_%s_pos" % (movieTitle)
tableNameSad = "movie_tweets_%s_sad" % (movieTitle)
tablepos = spark.read.format("org.apache.spark.sql.cassandra").options(table=tableNamePos, keyspace="dseanalyticsdemo").load()
tablesad = spark.read.format("org.apache.spark.sql.cassandra").options(table=tableNameSad, keyspace="dseanalyticsdemo").load()

print "Postive Table Count: "
print tablepos.count()
print "Negative Table Count: "
print tablesad.count()


Postive Table Count: 
100
Negative Table Count: 
57


#### Use Tokenizer to break up the sentences into indiviudals words

In [61]:
tokenizerPos = Tokenizer(inputCol="tweet", outputCol="tweetwords")
tokenizedPos = tokenizerPos.transform(tablepos)

dfPos = tokenizedPos.select("tweet", "tweetwords").withColumn("tokens", countTokens(col("tweetwords")))

showDF(dfPos)

tokenizerSad = Tokenizer(inputCol="tweet", outputCol="tweetwords")
tokenizedSad = tokenizerSad.transform(tablesad)

dfSad = tokenizedSad.select("tweet", "tweetwords").withColumn("tokens", countTokens(col("tweetwords")))

showDF(dfSad)

Unnamed: 0,tweet,tweetwords,tokens
0,angiesavocados: Wow I thought Antman and the ...,"[, angiesavocados:, wow, i, thought, antman, a...",26
1,watching antman 2 tomorrow :D,"[watching, antman, 2, tomorrow, :d]",5
2,Most POPular Podcast latest episode talks all ...,"[most, popular, podcast, latest, episode, talk...",17
3,Just saw AntMan &amp; The Wasp - that was a fu...,"[just, saw, antman, &amp;, the, wasp, -, that,...",22
4,BenMkWrites thomallison SYFY SpaceChannel hann...,"[benmkwrites, thomallison, syfy, spacechannel,...",10
5,Watched Ant Man for the first time last night ...,"[watched, ant, man, for, the, first, time, las...",24
6,butchandriley AntMan So it seems :-) I discove...,"[butchandriley, antman, so, it, seems, :-), i,...",21
7,c8flores: what i say: aw :)what i mean: antma...,"[, c8flores:, what, i, say:, aw, :)what, i, me...",13
8,Dastmalchian cheddar AntMan PersonaPR MarvelSt...,"[dastmalchian, cheddar, antman, personapr, mar...",15
9,b0iledfr0gs Yea it seems like a really amazing...,"[b0iledfr0gs, yea, it, seems, like, a, really,...",20


Unnamed: 0,tweet,tweetwords,tokens
0,TheMamaDao But but he's Antman :(,"[themamadao, but, , but, he's, antman, :(]",7
1,Marvel AntMan I have to wait until Aug 3rd for...,"[marvel, antman, i, have, to, wait, until, aug...",19
2,Las escenas post creditos de Antman and the wa...,"[las, escenas, post, creditos, de, antman, and...",14
3,wanna watch antman and the wasp :(,"[wanna, watch, antman, and, the, wasp, :(]",7
4,want to watch antman :((((((((,"[want, to, watch, antman, :((((((((]",5
5,the end credit scene of Antman &amp; the Wasp ...,"[the, end, credit, scene, of, antman, &amp;, t...",14
6,i wanna watch antman and the wasp :(((,"[i, wanna, watch, antman, and, the, wasp, :(((]",8
7,Antman and the wasp was so pure :(,"[antman, and, the, wasp, was, so, pure, :(]",8
8,someone in san pedro laguna wanna watch antman...,"[someone, in, san, pedro, laguna, wanna, watch...",13
9,and i wanna watch antman 2 but no time :(,"[and, i, wanna, watch, antman, 2, but, no, tim...",10


#### Using StopWordsRemover to remove all stop words. Interesting to see, people don't use many stop words with twitter!

In [62]:
removerPos = StopWordsRemover(inputCol="tweetwords", outputCol="tweetnostopwords")
removedPos = removerPos.transform(dfPos)

dfPosStop = removedPos.select("tweet", "tweetwords", "tweetnostopwords").withColumn("tokens", countTokens(col("tweetwords"))).withColumn("notokens", countTokens(col("tweetnostopwords")))

showDF(dfPosStop)

removerSad = StopWordsRemover(inputCol="tweetwords", outputCol="tweetnostopwords")
removedSad = removerSad.transform(dfSad)

dfSadStop = removedSad.select("tweet", "tweetwords", "tweetnostopwords").withColumn("tokens", countTokens(col("tweetwords"))).withColumn("notokens", countTokens(col("tweetnostopwords")))

showDF(dfSadStop)

Unnamed: 0,tweet,tweetwords,tweetnostopwords,tokens,notokens
0,SCENECard Hey folks Doing date night with the ...,"[scenecard, hey, folks, doing, date, night, wi...","[scenecard, hey, folks, date, night, wife, ant...",21,14
1,MovieMantz PNemiroff TheInSneider ColliderVide...,"[moviemantz, pnemiroff, theinsneider, collider...","[moviemantz, pnemiroff, theinsneider, collider...",9,9
2,Just back from seeing AntMan at the CineplexMo...,"[just, back, from, seeing, antman, at, the, ci...","[back, seeing, antman, cineplexmovies, new, mi...",19,13
3,spoilez pas Antman 2 la miff be kind :),"[spoilez, pas, antman, 2, la, miff, be, kind, :)]","[spoilez, pas, antman, 2, la, miff, kind, :)]",9,8
4,abhinav_kr Yesterday was antman and the wasp :),"[abhinav_kr, yesterday, was, antman, and, the,...","[abhinav_kr, yesterday, antman, wasp, :)]",8,5
5,Going to watch antman today :D,"[going, to, watch, antman, today, :d]","[going, watch, antman, today, :d]",6,5
6,tara nuod antman &amp; the wasp :),"[tara, nuod, antman, &amp;, the, wasp, :)]","[tara, nuod, antman, &amp;, wasp, :)]",7,6
7,NiceToMeetcha Hahaha I bet you were really con...,"[nicetomeetcha, hahaha, i, bet, you, were, rea...","[nicetomeetcha, hahaha, bet, really, confused,...",22,12
8,i still need to watch- deadpool 2- antman &amp...,"[i, still, need, to, watch-, deadpool, 2-, ant...","[still, need, watch-, deadpool, 2-, antman, &a...",12,9
9,EvangelineLilly So in Antman and the wasp the ...,"[evangelinelilly, so, in, antman, and, the, wa...","[evangelinelilly, antman, wasp, movie, rescuin...",20,9


Unnamed: 0,tweet,tweetwords,tweetnostopwords,tokens,notokens
0,OKAY ANTMAN AND THE WASP WAS REALLY GOOD BUT T...,"[okay, antman, and, the, wasp, was, really, go...","[okay, antman, wasp, really, good, end, scene,...",15,9
1,Antman &amp; the wasp was a bop :(,"[antman, &amp;, the, wasp, was, a, bop, :(]","[antman, &amp;, wasp, bop, :(]",8,5
2,i just wanna watch antman and the wasp :(,"[i, just, wanna, watch, antman, and, the, wasp...","[wanna, watch, antman, wasp, :(]",9,5
3,i haven't watched antman and the wasp yet :-(,"[i, haven't, watched, antman, and, the, wasp, ...","[watched, antman, wasp, yet, :-(]",9,5
4,No he visto Antman :(,"[no, he, visto, antman, :(]","[visto, antman, :(]",5,3
5,I wanna go watch antman and the wasp but I can...,"[i, wanna, go, watch, antman, and, the, wasp, ...","[wanna, go, watch, antman, wasp, can’t, bc, sh...",29,15
6,antman and the wasp pls :(:,"[antman, and, the, wasp, pls, :(:]","[antman, wasp, pls, :(:]",6,4
7,Dastmalchian AntMan What When did she happen W...,"[dastmalchian, antman, what, when, did, she, h...","[dastmalchian, antman, happen, whoa:-(:-o]",8,4
8,The ending of antman put me in the feels Remi...,"[the, ending, of, antman, put, me, in, the, fe...","[ending, antman, put, feels, , reminded, infin...",17,10
9,Honestly idc how they feel about Scott not inv...,"[honestly, idc, how, they, feel, about, scott,...","[honestly, idc, feel, scott, inviting, cassie’...",24,13


### Sentiment Analysis using Python package Pattern

#### Convert each Spark Dataframe to a Pandas Dataframe. From there loop over each row and get the sentiment score (anything + is postive and anything - or 0 is negative). The "positive" function will return true if the tweet is postive. For more info on how the scores are calcuated: https://www.clips.uantwerpen.be/pages/pattern-en#sentiment

#### Negative Tweets

In [63]:
pandaSad = dfSadStop.toPandas()
movieScoreSad = 0

for index, row in pandaSad.iterrows():
    print row['tweet']
    print sentiment(row["tweetnostopwords"])
    print positive(row["tweetnostopwords"])
    if positive(row["tweetnostopwords"]):
        print "This is a negative tweet! Analysis is wrong :("
    scoreSad = sentiment(row['tweetnostopwords'])[0]
    movieScoreSad = scoreSad + movieScoreSad

OKAY ANTMAN AND THE WASP WAS REALLY GOOD BUT THE END SCENE MADE ME :((
(0.6, 0.55)
True
This is a negative tweet! Analysis is wrong :(
Antman &amp; the wasp was a bop :(
(-0.75, 1.0)
False
i just wanna watch antman and the wasp :(
(-0.75, 1.0)
False
i haven't watched antman and the wasp yet :-(
(-0.75, 1.0)
False
No he visto Antman :(
(-0.75, 1.0)
False
I wanna go watch antman and the wasp but I can’t bc the showing is too late so I can’t make it in time for the last train :((
(-0.15, 0.3333333333333333)
False
antman and the wasp pls :(:
(0.0, 0.0)
False
Dastmalchian AntMan What When did she happen Whoa:-(:-O
(0.0, 0.0)
False
The ending of antman put me in the feels  Reminded me of infinity wars :( IDontFeelSoGoodMrStark
(-0.75, 1.0)
False
Honestly idc how they feel about Scott not inviting him to Cassie’s birthday in antman was literally so rude and it’s so heartbreaking :(
(-0.15, 0.8333333333333334)
False
No he visto antman ni ni Jurassic world :(
(-0.75, 1.0)
False
ok pls talk to m

#### Positive Tweet
#### Also adding up all the sentiment scores of all the tweets

In [64]:
pandaPos = dfPosStop.toPandas()
movieScore = 0

for index, row in pandaPos.iterrows():
    print row['tweet']
    print sentiment(row["tweetnostopwords"])
    print positive(row["tweetnostopwords"])
    if not positive(row["tweetnostopwords"]):
        print "This is a postive tweet! Analysis is wrong :("
    score = sentiment(row['tweetnostopwords'])[0]
    movieScore = score + movieScore
    

 angiesavocados: Wow I thought Antman and the Wasp would end up being a nice happy Marvel film for once but yet again Marvel has disappo…
(0.5, 1.0)
True
watching antman 2 tomorrow :D
(1.0, 1.0)
True
Most POPular Podcast latest episode talks all things pop culture: Rugrats She-Ra Joker Ditko Incredibles 2 and…
(0.55, 0.9)
True
Just saw AntMan &amp; The Wasp - that was a fucking awesome story/follow-up Also really glad to see that Scott had to…
(0.75, 1.0)
True
BenMkWrites thomallison SYFY SpaceChannel hannahjk1 Killjoys lovretta AaronRAshmore MarvelStudios AntMan…
(0.0, 0.0)
False
This is a postive tweet! Analysis is wrong :(
Watched Ant Man for the first time last night So is it weird that now when I'm out walking I find myself way mor…
(-0.08333333333333333, 0.4666666666666666)
False
This is a postive tweet! Analysis is wrong :(
butchandriley AntMan So it seems :-) I discovered science fiction quite by accident in the '50s when I was a tiny…
(0.5, 1.0)
True
 c8flores: what i say: aw 

### Alright! Should I see this movie???

In [65]:
posrating = movieScore/dfPos.count()
print "Postive Rating Average Score: " 
print posrating
sadrating = movieScoreSad/dfSad.count()
print "Negative Rating Average Score:"
print sadrating

if posrating > abs(sadrating):
    print "People like this movie!"
elif posrating == abs(sadrating):
    print "People are split on this movie! Take a risk!"
elif posrating < abs(sadrating):
    print "People do not like this movie!"


Postive Rating Average Score: 
0.40945886544
Negative Rating Average Score:
-0.327175103162
People like this movie!
