# Tutorial 2: Perform sentiment analysis using existing toolkits

## Introduction

In this tutorial you will perform basic sentiment analysis on collected Twitter data called [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140), using the TextBlob and VADER sentiment analysis tools. These tools use a lexicon (i.e. a dictionary/vocabulary of words and their respective sentiment scores - see the example lexicon in the table below) and rule-based approach to classify text as either negative, neutral or positive.

![lexicon.png](attachment:lexicon.png)

Lexicon-based methods make use of a collection of words, each of which are assigned a polarity score, to decide the general sentiment score of a given text. This method is more accessible as it does not require the training of a model to be able to classify sentiment. However, the fact that sentiment is calculated based on the polarity score of each word in the text can lead to errors in sentiment classification. Such errors may arise due to a lack of understanding of various language traits such as sarcasm or even variations in the use of word the same word. Take, for example, the following phrases: "He killed many people" and "I killed it in that exam". In the second phrase the word killied is not used in a negative context. 

Let's start by importing all the libraries needed for this tutorial.

In [2]:
# ___Cell no. 1___

import pandas as pd
import csv
import re
import numpy as np
import plotly.express as px
from plotly.offline import init_notebook_mode

from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

from pre import clean_text, remove_stopwords

[nltk_data] Downloading package stopwords to
[nltk_data]     /users/hussein/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


We now read in our Twitter data that has been partially cleaned (as demonstrated in Tutorial 1) and put it into a pandas dataframe.

In [3]:
# ___Cell no. 2___

# Read in the data and put it into a dataframe
df = pd.read_csv("data.csv")

# Have a quick look at the dataframe
df

Unnamed: 0.1,Unnamed: 0,label,ids,data,flag,user,tweet_text
0,670081,0,2246455751,Fri Jun 19 17:37:19 PDT 2009,NO_QUERY,louiseisanelf,@brodiejay OH IM GOING THERE! Wow Mona Vale is...
1,408251,0,2059364084,Sat Jun 06 16:47:03 PDT 2009,NO_QUERY,MrsAmarieB,my baby's growing up
2,1559739,1,2186151891,Mon Jun 15 18:25:49 PDT 2009,NO_QUERY,epallaviccini,Painted Black-Rolling Stones..the best!
3,571248,0,2208723981,Wed Jun 17 09:33:02 PDT 2009,NO_QUERY,Kiwitabby,"kk, i'm logging off now BYEZZ!"
4,524639,0,2193564503,Tue Jun 16 08:37:46 PDT 2009,NO_QUERY,annaqui,Shitty shitty shitty news today
...,...,...,...,...,...,...,...
49995,1402388,1,2054703436,Sat Jun 06 07:55:06 PDT 2009,NO_QUERY,misslilamae,"@tenishae26 hey, your new icon picture is look..."
49996,452328,0,2069991159,Sun Jun 07 16:54:21 PDT 2009,NO_QUERY,poetictitlewave,I am MOST DEFINITELY a cotton-headed ninny mug...
49997,983247,1,1834278721,Mon May 18 02:19:09 PDT 2009,NO_QUERY,jonoabroad,. @JonoH be careful with your bot - it's easy ...
49998,1480049,1,2066790581,Sun Jun 07 11:10:36 PDT 2009,NO_QUERY,celineaura,should sleep NOW. LTO laterrr. i love friendst...


let us take a sample of 10000 tweets

In [4]:
df = df.sample(n = 10000, random_state = 2)

Now, let us see the count of the labels

In [5]:
df['label'].value_counts()

1    5003
0    4997
Name: label, dtype: int64

Now let us try to clean the data like what we did in the last tutorial

In [6]:
df['tweet_text']=df['tweet_text'].apply(clean_text)
df['tweet_text']=df['tweet_text'].str.lower()
df['tweet_text']=df['tweet_text'].apply(remove_stopwords)

In [7]:
df['tweet_text']

23656                                     mine coming true
27442                                     changed username
40162                                cheetah print nails !
8459     veronica wont let phone ill school. acually co...
8051                               sick dog , hackin lungs
                               ...                        
44231                     haha bet horrible lol yes please
18034    talkin abt job challenge ! lv ! .. askd u agre...
33856    oh one wants get back bed ! haha .. must work ...
15906                  power gone nans get something eat !
40899              thank god weekend ... algebra exam easy
Name: tweet_text, Length: 10000, dtype: object

The 'label' column consists of sentiment lables which have been assigned to each tweet, i.e. this is a **'labelled'** dataset. A label of either 0, 1. corresponds to an assigned negative or positive sentiment, respectively.

As opposed to having humans manual annotate tweets. The authors for this data assumed that any tweet with positive emoticons, like :), were positive, and tweets with negative emoticons, like :(, were negative. For more info, check this [link](http://help.sentiment140.com/for-students).

<hr>

## 1. Sentiment Analysis with TextBlob

The functions below will use TextBlob in order to determine the **Polarity** and **Subjectivity** of each tweet. Polarity is denoted by a score between -1 and 1, with scores < 0, scores = 0 and scores > 0 corresponding to negative, neutral and positive sentiment, respectively. Subjectivity is denoted by a score between 0 and 1 and is a score of whether the statement is deemed as more fact, or opinion based.

Before making use of the sentiment property of TextBlob, it's good practice to know what's going on behind the scenes. As mentioned previously, TextBlob is a lexicon based method. For a large library of words, researchers have assigned a respective polarity and subjectivity score. For a given text, TextBlob then finds all words and phrases from a given text that it can assign both a polarity and subjectivity to, and averages them to produce the final output.

TextBlob's sentiment analysis is also rule-based, it applies various actions to existing polarity and subjectivity scores according to a set of pre-defined rules. For example, the word 'good' has a polarity score of 0.7, but if the word 'not' appears in front of it, then in order to determine the polarity of the phrase 'not good', the polarity of 'good' is multipled by -0.5, yielding a score of -0.35. Punctuation, especially exclamation points, also increase the positive or negative intensity of the polarity score.

In [8]:
# ___Cell no. 3___

#Create a function to get the subjectivity
def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

#Create a function to get the polarity
def getPolarity(text):
    return TextBlob(text).sentiment.polarity

Let's try out these TextBlob tools by entering any text you'd like - try different examples, i.e. text which you think has positive or negative sentiment and see if TextBlob can get it right! Does punctuation make a difference? How does it handle sarcasm or common text slang such as 'lol'?

In [9]:
# ___Cell no. 4___

your_text = 'this is so cool'
getPolarity(your_text), getSubjectivity(your_text)

(0.35, 0.65)

We now have subjectivity and polarity scores for each of our tweets, which we add to our dataframe. Notice that our dataset does not have neutral labelled data. However, it will be interesting to see if `TextBlob` classify some tweets as neutral.

In [10]:
# ___Cell no. 5___

df['subjectivity'] = df['tweet_text'].apply(getSubjectivity)
df['polarity'] = df['tweet_text'].apply(getPolarity)

df

Unnamed: 0.1,Unnamed: 0,label,ids,data,flag,user,tweet_text,subjectivity,polarity
23656,201104,0,1971818899,Sat May 30 07:58:08 PDT 2009,NO_QUERY,afinahardiana,mine coming true,0.650000,0.350000
27442,827554,1,1556877328,Sun Apr 19 00:47:40 PDT 2009,NO_QUERY,Thigerboy,changed username,0.000000,0.000000
40162,1218086,1,1989798112,Mon Jun 01 02:50:47 PDT 2009,NO_QUERY,karaaaax3,cheetah print nails !,0.000000,0.000000
8459,336203,0,2013947665,Wed Jun 03 00:01:33 PDT 2009,NO_QUERY,jesssicababesss,veronica wont let phone ill school. acually co...,1.000000,-0.625000
8051,611431,0,2224575760,Thu Jun 18 09:40:18 PDT 2009,NO_QUERY,navygrl1303,"sick dog , hackin lungs",0.857143,-0.714286
...,...,...,...,...,...,...,...,...,...
44231,1287523,1,2002375178,Tue Jun 02 04:01:39 PDT 2009,NO_QUERY,Keels_90,haha bet horrible lol yes please,0.666667,0.000000
18034,105352,0,1823005894,Sat May 16 20:53:40 PDT 2009,NO_QUERY,mindbrooklyn,talkin abt job challenge ! lv ! .. askd u agre...,0.000000,0.000000
33856,154765,0,1936017760,Wed May 27 07:18:18 PDT 2009,NO_QUERY,Letiitaa,oh one wants get back bed ! haha .. must work ...,0.133333,0.133333
15906,239612,0,1980660303,Sun May 31 07:20:41 PDT 2009,NO_QUERY,tequiladepths,power gone nans get something eat !,0.000000,0.000000


Next, let's create a function to add a sentiment label to each tweet, based on it's polarity score.

In [11]:
# ___Cell no. 6___

# Create a function to label postitive, neutral and negative tweets

def get_sentiment_label(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'    

In [12]:
# ___Cell no. 7___

# Apply the get_sentiment_label function to the polarity column
# and add the sentiment results as a new column in our dataframe

df['TBsentiment'] = df['polarity'].apply(get_sentiment_label)
df

Unnamed: 0.1,Unnamed: 0,label,ids,data,flag,user,tweet_text,subjectivity,polarity,TBsentiment
23656,201104,0,1971818899,Sat May 30 07:58:08 PDT 2009,NO_QUERY,afinahardiana,mine coming true,0.650000,0.350000,Positive
27442,827554,1,1556877328,Sun Apr 19 00:47:40 PDT 2009,NO_QUERY,Thigerboy,changed username,0.000000,0.000000,Neutral
40162,1218086,1,1989798112,Mon Jun 01 02:50:47 PDT 2009,NO_QUERY,karaaaax3,cheetah print nails !,0.000000,0.000000,Neutral
8459,336203,0,2013947665,Wed Jun 03 00:01:33 PDT 2009,NO_QUERY,jesssicababesss,veronica wont let phone ill school. acually co...,1.000000,-0.625000,Negative
8051,611431,0,2224575760,Thu Jun 18 09:40:18 PDT 2009,NO_QUERY,navygrl1303,"sick dog , hackin lungs",0.857143,-0.714286,Negative
...,...,...,...,...,...,...,...,...,...,...
44231,1287523,1,2002375178,Tue Jun 02 04:01:39 PDT 2009,NO_QUERY,Keels_90,haha bet horrible lol yes please,0.666667,0.000000,Neutral
18034,105352,0,1823005894,Sat May 16 20:53:40 PDT 2009,NO_QUERY,mindbrooklyn,talkin abt job challenge ! lv ! .. askd u agre...,0.000000,0.000000,Neutral
33856,154765,0,1936017760,Wed May 27 07:18:18 PDT 2009,NO_QUERY,Letiitaa,oh one wants get back bed ! haha .. must work ...,0.133333,0.133333,Positive
15906,239612,0,1980660303,Sun May 31 07:20:41 PDT 2009,NO_QUERY,tequiladepths,power gone nans get something eat !,0.000000,0.000000,Neutral


We can have a quick look at the sentiment distribution of the tweets as follows:

In [13]:
# ___Cell no. 8___

df['TBsentiment'].value_counts()

Positive    4301
Neutral     3706
Negative    1993
Name: TBsentiment, dtype: int64

We see that according to TextBlob, there are more tweets with a positive sentiment than either neutral or negative. Is this what you would expect? 

Let's have a closer look at the tweets which TexBlob has classified as the most positive and most negative, to check the accuracy of the assigned sentiment. To do this, we will first sort all the tweets by polarity in descending order, i.e. from the most positive tweets to the most negative.

In [14]:
# ___Cell no. 9___

#We sort the tweets by their polarity value and put the sorted tweets into a new dataframe 

sorted_df = df.sort_values(by=['polarity'], ascending=False)

Now, let's print out the top 15 most _positive_ tweets, which are now the first 15 tweets in the new dataframe. Do these tweets have a positive sentiment?

In [15]:
# ___Cell no. 10___

#Print out the text from the first 15 tweets in the sorted dataframe

for i, tweet in enumerate(sorted_df.head(15)['tweet_text']):
    print(i+1, tweet, '\n')

1 timing ' perfect ! ! ! ' stick around till 9 30 

2 awww know ! ! ! tomorrow though babygurlz ! ! ! 

3 bojangles lunch , fully charged ipods , beautiful day ! headed charlotte ! 

4 give shout mr.musso quot marco gennuso awesome ! quot 

5 im hurt .. seregon gig awesome mosh hurtness 

6 p.s. check find awesome deals home decor amp stuff. thumbs 

7 yayyyy ! ! ! welcome pierced nose club ! ! ! 

8 hey trace ! name ' fatima. welcome philippines ! hope ' like here. 

9 ' sport ... best part school imho 

10 .. awesome service. 

11 good morning ! huzzah weekend ! 

12 happy cos ' going soon ! 

13 welcome ! 

14 peanut butter jar. yummy ! best thing ever . 

15 great ! wait tell work watch 



Next, let's print out the top 15 most _negative_ tweets, which are now the last 15 tweets in the new dataframe. Do these tweets have a negative sentiment?

In [16]:
# ___Cell no. 11___

#Print out the text from the last 15 tweets in the sorted dataframe

for i, tweet in enumerate(sorted_df.tail(15)['tweet_text']):
    print(i+1, tweet, '\n')

1 mondays boring cant wait see angel 

2 ahhh ! soo bad ! u giving pain typing ! 

3 wanna something. ' think make it. still feel awful. maybe tomorrow. 

4 technology. boring. phones measuring 

5 england boring team watch ever 

6 terrible mood. need cheering 

7 head walmart ... ughhh another boring day 

8 crap ! missed dog ' uncharted 2 beta code 

9 hate soooo ! 

10 boring 

11 worst part payday paying bills 

12 write books. ending shocking surprise. 

13 thats worst combination 

14 raced travis cobra .. lost .. miserably. going 110. still beat me. 

15 problem ! hope ' working soon ! tech problems worst . 



Now let's visualise distribution of the polarity and subjectivity assignments from TextBlob. To do this, we will make use of interactive plots from [Plotly](https://plotly.com/python/).

Interactive Plotly plots make use of JavaScript behind the scenes. To connect our Jupyter notebook with JavaScript, we need to execute the following line of code:

In [17]:
# ___Cell no. 12___

init_notebook_mode(connected=True)

Below, we use [Plotly Express](https://plotly.com/python/plotly-express/) to create a simple scatter plot of the polarity and subjectivity data. As this is an interactive plot, you will be able to hover your mouse over a point to view it's properties. 

Note how plotly express automatically labels our axes for us according to our dataframe column names.

In [18]:
# ___Cell no. 13___

fig = px.scatter(df, x="polarity", y="subjectivity", hover_data=['tweet_text'],
                 title="TextBlob Sentiment Analysis")

fig.show()

What can we learn from this scatter plot? Discuss this with your team.

<hr>

## 2. Sentiment Analysis with VADER

Let's now try performing a sentiment analysis using the [VADER](https://pypi.org/project/vaderSentiment/) (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis tool, which, like TextBlob, uses a lexicon and rule-based method to assign sentiment scores. The VADER SentimentIntensityAnalyzer tool is specifically attuned to sentiments expressed in social media. Performing sentiment analysis on social media posts is challenging as you are often dealing with short snippets of text, as well as slang and abbreviated language. These challenges were given special consideration during the development of VADER's sentiment analyser. As mentioned previously, lexicons consist of a large library of words which are classed as negative, neutral or positive. In addition, the VADER lexicon also includes acronyms (lol, btw etc.) and slang (meh, nah etc.).

Just like the TextBlob method, VADER also applies grammatical rules which affect the sentiment of the text. However, the rules applied by the VADER sentiment analyser are more sophisticated. VADER is sensitive to both the polarity (whether the sentiment is positive or negative), as well as the intensity (how positive or negative is sentiment) of emotions in the text.

- Capiltalisation increases the positive or negative intensity of the sentiment score, i.e. the text "I am very happy today" will have a different sentiment score to "I am VERY happy today".
- The word "but" signifies a shift in sentiment of the text, with the sentiment score then dominated by the second part of the sentence.
- Certain "booster words" also either increase or decrease the intensity of the sentiment. For example, the word "extremely", i.e. "The food is extremely good” is more intense than "The food is good".
- The VADER model also catches nearly 90% of cases where negation flips the sentiment of the text. For example, a negated sentence would be "The food isn’t really all that great".

For a given text, VADER outputs a scores for the negative, neutral and positive sentiment categories, as well as a compound score, which is a “normalized, weighted, composite" score and is a good overall indication of the assigned sentiment.

To make the VADER sentiment analyser easier and quicker to use, let's call it _analyser_.

In [19]:
# ___Cell no. 14___

analyser = SentimentIntensityAnalyzer()

As we did with TextBlob, let's now try out the vaderSentiment tool by entering any text you'd like - try different examples, i.e. text which you think has positive or negative sentiment and see if VADER can get it right! Experiment with punctuation and capilaisation. How does it handle sarcasm, slang or abbreviations?

In [20]:
# ___Cell no. 15___

your_text = 'wow this is so cool'
analyser.polarity_scores(your_text)

{'neg': 0.0, 'neu': 0.306, 'pos': 0.694, 'compound': 0.7777}

Let us now use VADER to retrieve the compound sentiment score for all tweets and add this information to our original (unsorted) dataframe.

In [21]:
# ___Cell no. 16___

#Create a function to get the polarity

def get_vaderCompoundPolarity(text):
    return analyser.polarity_scores(text)['compound']
    
df['vader_compound'] = df['tweet_text'].apply(get_vaderCompoundPolarity)
df

Unnamed: 0.1,Unnamed: 0,label,ids,data,flag,user,tweet_text,subjectivity,polarity,TBsentiment,vader_compound
23656,201104,0,1971818899,Sat May 30 07:58:08 PDT 2009,NO_QUERY,afinahardiana,mine coming true,0.650000,0.350000,Positive,0.4215
27442,827554,1,1556877328,Sun Apr 19 00:47:40 PDT 2009,NO_QUERY,Thigerboy,changed username,0.000000,0.000000,Neutral,0.0000
40162,1218086,1,1989798112,Mon Jun 01 02:50:47 PDT 2009,NO_QUERY,karaaaax3,cheetah print nails !,0.000000,0.000000,Neutral,0.0000
8459,336203,0,2013947665,Wed Jun 03 00:01:33 PDT 2009,NO_QUERY,jesssicababesss,veronica wont let phone ill school. acually co...,1.000000,-0.625000,Negative,0.8506
8051,611431,0,2224575760,Thu Jun 18 09:40:18 PDT 2009,NO_QUERY,navygrl1303,"sick dog , hackin lungs",0.857143,-0.714286,Negative,-0.5106
...,...,...,...,...,...,...,...,...,...,...,...
44231,1287523,1,2002375178,Tue Jun 02 04:01:39 PDT 2009,NO_QUERY,Keels_90,haha bet horrible lol yes please,0.666667,0.000000,Neutral,0.7430
18034,105352,0,1823005894,Sat May 16 20:53:40 PDT 2009,NO_QUERY,mindbrooklyn,talkin abt job challenge ! lv ! .. askd u agre...,0.000000,0.000000,Neutral,0.5781
33856,154765,0,1936017760,Wed May 27 07:18:18 PDT 2009,NO_QUERY,Letiitaa,oh one wants get back bed ! haha .. must work ...,0.133333,0.133333,Positive,0.5093
15906,239612,0,1980660303,Sun May 31 07:20:41 PDT 2009,NO_QUERY,tequiladepths,power gone nans get something eat !,0.000000,0.000000,Neutral,0.0000


Let us once again apply the 'get_sentiment_label' function to assign the VADER sentiment of each tweet given the compound score.

In [22]:
# ___Cell no. 17___

# Apply the get_sentiment_label function to the VADER compound score
# and add the VADER sentiment results as a new column in our dataframe

df['VADERsentiment'] = df['vader_compound'].apply(get_sentiment_label)
df

Unnamed: 0.1,Unnamed: 0,label,ids,data,flag,user,tweet_text,subjectivity,polarity,TBsentiment,vader_compound,VADERsentiment
23656,201104,0,1971818899,Sat May 30 07:58:08 PDT 2009,NO_QUERY,afinahardiana,mine coming true,0.650000,0.350000,Positive,0.4215,Positive
27442,827554,1,1556877328,Sun Apr 19 00:47:40 PDT 2009,NO_QUERY,Thigerboy,changed username,0.000000,0.000000,Neutral,0.0000,Neutral
40162,1218086,1,1989798112,Mon Jun 01 02:50:47 PDT 2009,NO_QUERY,karaaaax3,cheetah print nails !,0.000000,0.000000,Neutral,0.0000,Neutral
8459,336203,0,2013947665,Wed Jun 03 00:01:33 PDT 2009,NO_QUERY,jesssicababesss,veronica wont let phone ill school. acually co...,1.000000,-0.625000,Negative,0.8506,Positive
8051,611431,0,2224575760,Thu Jun 18 09:40:18 PDT 2009,NO_QUERY,navygrl1303,"sick dog , hackin lungs",0.857143,-0.714286,Negative,-0.5106,Negative
...,...,...,...,...,...,...,...,...,...,...,...,...
44231,1287523,1,2002375178,Tue Jun 02 04:01:39 PDT 2009,NO_QUERY,Keels_90,haha bet horrible lol yes please,0.666667,0.000000,Neutral,0.7430,Positive
18034,105352,0,1823005894,Sat May 16 20:53:40 PDT 2009,NO_QUERY,mindbrooklyn,talkin abt job challenge ! lv ! .. askd u agre...,0.000000,0.000000,Neutral,0.5781,Positive
33856,154765,0,1936017760,Wed May 27 07:18:18 PDT 2009,NO_QUERY,Letiitaa,oh one wants get back bed ! haha .. must work ...,0.133333,0.133333,Positive,0.5093,Positive
15906,239612,0,1980660303,Sun May 31 07:20:41 PDT 2009,NO_QUERY,tequiladepths,power gone nans get something eat !,0.000000,0.000000,Neutral,0.0000,Neutral


Now let's have a look at what VADER has classified as the 15 most postive and negative tweets by using the same method shown in the TextBlob example. How does the accuracy of the assigned sentiments compare to those from the 15 most postitive and negative tweets from TextBlob?

In [23]:
# ___Cell no. 18___

#We sort the tweets by their vader_compound value and put the sorted tweets into a new dataframe 

sorted_df2 = df.sort_values(by=['vader_compound'], ascending=False)

In [24]:
# ___Cell no. 19___

#Print out the text from the first 15 tweets in the sorted dataframe

for i, tweet in enumerate(sorted_df2.head(15)['tweet_text']):
    print(i+1, tweet, '\n')

1 happy b day ! congrulations ! health , happiness , love , peace , success ! know , love ! #seb day #seb day 

2 good morning , love twitter. much info ! ! great day. beautiful sunshine am. day b4 surgery. food ! happy 

3 haha lol love cat ! oh cool , theres pretty cool people playing v festival including 

4 alot thing thankful , want say ' thankful followers love ones. heart keep smiling 

5 bestiest best friend ever ! freaking love , babe. ' love , heart goes boom see name. 

6 haha ' full corniness sonya ! ! ! ' love ha ha ha ha ha .. think ! ! things healing quick ! 

7 would love texas , live argentina please come back guys best l love lvtt ! 

8 driving home bar ! fun tonight , seen lot friends tonight great times old friends. love . 

9 happy birthday ... happy birthday ... happy birthday dear verltodd .. happy birthday ! ! 

10 hey best friend lily may3 absolutely love tuning hope good day loveyou xxx 

11 twote quot good luck , ' sure win awards lol x even u lose ur still w

In [None]:
# ___Cell no. 20___

#Print out the text from the last 15 tweets in the sorted dataframe

for i, tweet in enumerate(sorted_df2.tail(15)['tweet_text']):
    print(i+1, tweet, '\n')

Lastly let us compare the sentiment assignments of TextBlob and VADER by plotting another Plotly interactive scatter plot. What can we learn from this plot? Discuss this with your team.

In [26]:
# ___Cell no. 21___

fig = px.scatter(df, x="polarity", y="vader_compound", hover_data=['tweet_text'],
                 title="TextBlob vs VADER")
fig.show()

**Additional challenge**: Given the labels provided for each tweet in the 'label' column (which we will consider to be the correct sentiment of each tweet), investigate the accuracy of the TextBlob and VADER sentiment analysis tools.

1) For what percentage of tweets did the TextBlob and VADER tools correctly predict the sentiment? Which model performs better?\
2) For the each of the models, does the number of words (or characters) in a tweet affect the accuracy of the assigned sentiment?

NB: This challenge is not compulsory. However, if your team does attempt to answer either question, or both, you may tell us about your findings during your team presentation.

After completing this tutorial you should be able to:

- Use the TextBlob library to perform basic sentiment analysis
- Use the VADER library to perform basic sentiment analysis
- Produce interactive plots using Plotly

### Tutorial 2 complete! Well done! 