<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Airline Tweets Sentiment Analysis Lab

_Authors: Phillippa Thomson (NYC)_

---

You are going to be analyzing tweets about airlines.  These have been hand-tagged with sentiment.  There are three categories: positive, neutral, and negative.

Use VADER to calculate sentiment for each tweet, and see if you can correctly predict the hand-tagged sentiment.

What is the accuracy?  Print out a heatmap to see where your model performs well, and where it performs poorly.

In [1]:
import pandas as pd
import numpy as np

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_score, recall_score
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
tweets = pd.read_csv("../data/Tweets.csv", encoding='unicode_escape')

In [3]:
tweets.head()

Unnamed: 0,airline_sentiment,airline,text
0,neutral,Virgin America,@VirginAmerica What @dhepburn said.
1,positive,Virgin America,@VirginAmerica plus you've added commercials t...
2,neutral,Virgin America,@VirginAmerica I didn't today... Must mean I n...
3,negative,Virgin America,@VirginAmerica it's really aggressive to blast...
4,negative,Virgin America,@VirginAmerica and it's a really big bad thing...


In [4]:
tweets.shape

(14640, 3)

In [17]:
tweets.loc[135, 'text']

'@VirginAmerica I need to register a service dog for a first class ticket from SFO &gt; Dulles. The phone queue is an hour or longer. Pls advise'

In [20]:
tweets['new_text'] = tweets['text'].str.replace('@', '')

In [21]:
tweets.head()

Unnamed: 0,airline_sentiment,airline,text,new_text
0,neutral,Virgin America,@VirginAmerica What @dhepburn said.,VirginAmerica What dhepburn said.
1,positive,Virgin America,@VirginAmerica plus you've added commercials t...,VirginAmerica plus you've added commercials to...
2,neutral,Virgin America,@VirginAmerica I didn't today... Must mean I n...,VirginAmerica I didn't today... Must mean I ne...
3,negative,Virgin America,@VirginAmerica it's really aggressive to blast...,VirginAmerica it's really aggressive to blast ...
4,negative,Virgin America,@VirginAmerica and it's a really big bad thing...,VirginAmerica and it's a really big bad thing ...


### 1. Preview the airline_sentiment column.
- What percentage of reviews are positive, neutral, and negative?

    Negative 63%  (dominant class)
    
    Neutral 21%
    
    Positive 16%

In [22]:
tweets['airline_sentiment'].value_counts()

negative    9178
neutral     3099
positive    2363
Name: airline_sentiment, dtype: int64

### 2. Load in the Sentiment IntensityAnalyzer from Vader and add compound, negative, neutral, and positive scores into the DataFrame.

In [23]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [24]:
analyser = SentimentIntensityAnalyzer()

In [25]:
scores =[]
sentences = tweets.new_text

for sentence in sentences:
    score = analyser.polarity_scores(sentence)
    scores.append(score)

In [26]:
scores_df = pd.DataFrame(scores)

In [27]:
scores_df.head()

Unnamed: 0,neg,neu,pos,compound
0,0.0,1.0,0.0,0.0
1,0.0,1.0,0.0,0.0
2,0.0,1.0,0.0,0.0
3,0.226,0.645,0.129,-0.2716
4,0.296,0.704,0.0,-0.5829


In [28]:
sentiments_df = pd.concat([tweets, scores_df], axis=1)

In [29]:
sentiments_df.head()

Unnamed: 0,airline_sentiment,airline,text,new_text,neg,neu,pos,compound
0,neutral,Virgin America,@VirginAmerica What @dhepburn said.,VirginAmerica What dhepburn said.,0.0,1.0,0.0,0.0
1,positive,Virgin America,@VirginAmerica plus you've added commercials t...,VirginAmerica plus you've added commercials to...,0.0,1.0,0.0,0.0
2,neutral,Virgin America,@VirginAmerica I didn't today... Must mean I n...,VirginAmerica I didn't today... Must mean I ne...,0.0,1.0,0.0,0.0
3,negative,Virgin America,@VirginAmerica it's really aggressive to blast...,VirginAmerica it's really aggressive to blast ...,0.226,0.645,0.129,-0.2716
4,negative,Virgin America,@VirginAmerica and it's a really big bad thing...,VirginAmerica and it's a really big bad thing ...,0.296,0.704,0.0,-0.5829


### 3. Store airline_sentiment in y to use as labels and create an appropriate feature matrix, X.

In [30]:
y = sentiments_df.airline_sentiment

In [31]:
X = sentiments_df.drop(['airline_sentiment', 'airline', 'text'], axis=1)

In [32]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [35]:
# Vectorize your text column for both training and test set
from sklearn.feature_extraction.text import CountVectorizer

vect = CountVectorizer(stop_words='english')

X_train_vect = vect.fit_transform(X_train.new_text)

X_test_vect = vect.transform(X_test.new_text)

In [37]:
# Convert other features in X_train and X_test to sparse matrix (drop new_text column)
import scipy as sp

others_train = sp.sparse.csr_matrix(X_train.drop('new_text', axis=1).astype(float))
others_test = sp.sparse.csr_matrix(X_test.drop('new_text', axis=1).astype(float))

In [38]:
# Combine sparse matrices
X_train_new = sp.sparse.hstack((X_train_vect, others_train))
X_test_new = sp.sparse.hstack((X_test_vect, others_test))

### 4. Fit a model of your choice to predict airline_sentient and cross-validate.

In [52]:
# Use Ensembler method OvR for multiclass classification
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier

logreg = LogisticRegression(max_iter=1000, multi_class='ovr')

ovr = OneVsRestClassifier(logreg)

ovr.fit(X_train_new, y_train)

OneVsRestClassifier(estimator=LogisticRegression(max_iter=1000,
                                                 multi_class='ovr'))

In [53]:
pred = ovr.predict(X_test_new)

In [54]:
print(accuracy_score(y_test, pred))
print(precision_score(y_test, pred, average='weighted'))
print(recall_score(y_test, pred, average='weighted'))

0.8024590163934426
0.7959630829667205
0.8024590163934426


### 5. Display the confusion matrix.
- What reviews are difficult to identify?

In [55]:
print(confusion_matrix(y_test, pred))

[[2095  163   63]
 [ 277  438   58]
 [ 101   61  404]]


### 6. Print the classification report and discuss the characteristics of the model.

In [56]:
print(classification_report(y_test, pred))

              precision    recall  f1-score   support

    negative       0.85      0.90      0.87      2321
     neutral       0.66      0.57      0.61       773
    positive       0.77      0.71      0.74       566

    accuracy                           0.80      3660
   macro avg       0.76      0.73      0.74      3660
weighted avg       0.80      0.80      0.80      3660



The model does ok with negative tweets (the predominant class) but quite poorly with neutral.

To put this in perspective, human concordance, the probability that two people assign the same sentiment to an observation is usually around 70%-80% our baseline is at 63%. Even small increases in accuracy quickly move us towards a theoretical maximum in performance.