# Week 11: Sentiment Analysis, Part 2: Plotting Curves with Rolling Averages

This week, our questions and concepts are robust, but the amount of new code to learn is modest :)

Our focus is on plotting "sentiment curves." We will:
* Plot the raw values for sentiment in *The Sign of the Four*
* Calculate rolling averages, plot them, and investigate the portions of the text at which the minimum and maximum points occur

# Getting started

First, let's import our libraries, set our Pandas display options, and reload the data produced during last week's lecture (handily stored in a CSV that lives in the same folder as this notebook).

In [None]:
import pandas as pd
from textblob import TextBlob

In [None]:
pd.set_option('display.max_colwidth', 0)

In [None]:
sot4_sentence_sentiment_df = pd.read_csv("sot4_sentence_sentiment.csv")
sot4_sentence_sentiment_df[:15]

Now let's run the command we ended last lecture with, plotting the "raw" sentiment polarity values for every sentence in *The Sign of the Four*. Can we learn much from this?

In [None]:
sot4_sentence_sentiment_df[['polarity']].plot(figsize=(20,8))

Just for fun, let's have a look at the raw subjectivity scores, too...

In [None]:
sot4_sentence_sentiment_df[['subjectivity']].plot(figsize=(20,8))

Okay, at this point, I will give a little lecture about Matthew Jockers and Annie Swafford and Fourier Transforms, and then we will pass things over to Mary to explain rolling averages to us...

# Rolling Averages in Pandas

Pandas is basically built to do things like calculate rolling averages. It makes it really easy, which is awesome. All we need to do is specify:
- What part of the DataFrame we're interested in (here, the `polarity` column)
- That we want to do a rolling... something or other (the `.rolling()` method)... and that we want our rolling window size to be (here it's 10, indicated with the `window=10` agument) and that we want these rolling averages to be "centred" — which you'll understand if you were in lecture 😊 and which is explained in Mary's lecture (the `centred=True` argument)
- That the "something or other" we want are rolling **averages (or means)** (indicated with the `.mean()` method)

In [None]:
sot4_sentence_sentiment_df['polarity'].rolling(window=10, center=True).mean()

It would be a lot handier if that were a column in our DataFrame — so let's put that data there!

In [None]:
sot4_sentence_sentiment_df['rolling_10_polarity'] = sot4_sentence_sentiment_df['polarity'].rolling(window=10, center=True).mean()

In [None]:
sot4_sentence_sentiment_df[:15]

Now all we need to do is to plot our new column! 

Does that look any better?

In [None]:
sot4_sentence_sentiment_df[['rolling_10_polarity']].plot(figsize=(20,8))

In [None]:
sot4_sentence_sentiment_df[['polarity','rolling_10_polarity']].plot(figsize=(20,8))

Let's try some other window sizes: 25, 50, and 100. (You can try others, too!)

In [None]:
sot4_sentence_sentiment_df['rolling_25_polarity'] = sot4_sentence_sentiment_df['polarity'].rolling(window=25, center=True).mean()

In [None]:
sot4_sentence_sentiment_df[['rolling_25_polarity']].plot(figsize=(20,8))

In [None]:
sot4_sentence_sentiment_df[['rolling_10_polarity','rolling_25_polarity']].plot(figsize=(20,8))

In [None]:
sot4_sentence_sentiment_df['rolling_50_polarity'] = sot4_sentence_sentiment_df['polarity'].rolling(window=50, center=True).mean()

In [None]:
sot4_sentence_sentiment_df[['rolling_50_polarity']].plot(figsize=(20,8))

In [None]:
sot4_sentence_sentiment_df[['rolling_25_polarity','rolling_50_polarity']].plot(figsize=(20,8))

In [None]:
sot4_sentence_sentiment_df['rolling_100_polarity'] = sot4_sentence_sentiment_df['polarity'].rolling(window=100, center=True).mean()

In [None]:
sot4_sentence_sentiment_df[['rolling_100_polarity']].plot(figsize=(20,8))

In [None]:
sot4_sentence_sentiment_df[['rolling_50_polarity','rolling_100_polarity']].plot(figsize=(20,8))

In [None]:
sot4_sentence_sentiment_df[['polarity','rolling_10_polarity','rolling_25_polarity','rolling_50_polarity','rolling_100_polarity']].plot(figsize=(20,8))

# Finding the Maximum and Minimum Points

The below line of Pandas code is an absolute whopper... but I think we're all ready for it at this point!

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_50_polarity']==sot4_sentence_sentiment_df['rolling_50_polarity'].min()]

In [None]:
sot4_sentence_sentiment_df['rolling_50_polarity'].min()

In [None]:
min_polarity_50 = sot4_sentence_sentiment_df['rolling_50_polarity'].min()

In [None]:
sot4_sentence_sentiment_df['rolling_50_polarity'] == min_polarity_50

In [None]:
has_min_polarity_50 = sot4_sentence_sentiment_df['rolling_50_polarity'] == min_polarity_50

In [None]:
sot4_sentence_sentiment_df[has_min_polarity_50]

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_10_polarity']==sot4_sentence_sentiment_df['rolling_10_polarity'].min()]

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_25_polarity']==sot4_sentence_sentiment_df['rolling_25_polarity'].min()]

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_50_polarity']==sot4_sentence_sentiment_df['rolling_50_polarity'].min()]

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_100_polarity']==sot4_sentence_sentiment_df['rolling_100_polarity'].min()]

Let's have a look now at the *maximum* points...

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_10_polarity']==sot4_sentence_sentiment_df['rolling_10_polarity'].max()]

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_25_polarity']==sot4_sentence_sentiment_df['rolling_25_polarity'].max()]

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_50_polarity']==sot4_sentence_sentiment_df['rolling_50_polarity'].max()]

In [None]:
sot4_sentence_sentiment_df[sot4_sentence_sentiment_df['rolling_100_polarity']==sot4_sentence_sentiment_df['rolling_100_polarity'].max()]

# Investigating the Extremes 

Let's have a look at the 50-sentence window that the `rolling_50_polarity` score indicates are the most negative and most positive, and see if seems like it's on to something...

(Yeah, that's right, I'm just taking the min and max points, then manually adding and subtracting 25 to make a 50-sentence window! The minimum window is centred at 831 and the maximum at 1557.)

In [None]:
sot4_sentence_sentiment_df[782:832]

In [None]:
sot4_sentence_sentiment_df.loc[1508:1558]

# Using TextBlob's Other Sentiment System

We probably won't have time for this, but just in case — let's try using TextBlob's other built-in sentiment system — the Naive Bayes classifier trained on movie reviews — and see what basic shape it gives. 

You'll recall that this is the basic syntax for calling it...

In [None]:
from textblob.sentiments import NaiveBayesAnalyzer

TextBlob("Neil Young is the greatest artist to come out of this country", analyzer=NaiveBayesAnalyzer()).sentiment

If we then subset that to to `[1]`, the `p_pos` value, we'll get its sense of how positive that sentence is (which is equal to 1-`p_neg`).

In [None]:
TextBlob("Neil Young is the greatest artist to come out of this country", analyzer=NaiveBayesAnalyzer()).sentiment[1]

Okay, let's build up a DataFrame from parallel lists, like we did last week.

In [None]:
sot4 = open("sign-of-four.txt", encoding="utf-8").read()
sot4_blob = TextBlob(sot4, analyzer=NaiveBayesAnalyzer())
sot4_sentences_blob = sot4_blob.sentences

sot4_ppos = []
for sentence in sot4_sentences_blob:
    sot4_ppos.append(sentence.sentiment[1])
    
sot4_sentences = []
for sentence in sot4_sentences_blob:
    sot4_sentences.append(" ".join(sentence.words))

sot4_sentence_bayes_df = pd.DataFrame({
    'sentence': sot4_sentences,
    'ppos': sot4_ppos,
})

sot4_sentence_bayes_df['rolling_25_ppos'] = sot4_sentence_bayes_df['ppos'].rolling(window=25, center=True).mean()

sot4_sentence_bayes_df['rolling_100_ppos'] = sot4_sentence_bayes_df['ppos'].rolling(window=100, center=True).mean()

In [None]:
sot4_sentence_bayes_df[:10]

In [None]:
sot4_sentence_bayes_df[['rolling_25_ppos','rolling_100_ppos']].plot(figsize=(20,8))

In [None]:
sot4_sentence_bayes_df[sot4_sentence_bayes_df['rolling_25_ppos']==sot4_sentence_bayes_df['rolling_25_ppos'].min()]

In [None]:
sot4_sentence_bayes_df[sot4_sentence_bayes_df['rolling_100_ppos']==sot4_sentence_bayes_df['rolling_100_ppos'].min()]

In [None]:
sot4_sentence_bayes_df[sot4_sentence_bayes_df['rolling_25_ppos']==sot4_sentence_bayes_df['rolling_25_ppos'].max()]

In [None]:
sot4_sentence_bayes_df[sot4_sentence_bayes_df['rolling_100_ppos']==sot4_sentence_bayes_df['rolling_100_ppos'].max()]