## TextBlob Sentiment Analysis Tutorial
#### By: Carter Burkhart

### What is Sentiment Analysis
Sentiment analysis is a type of way to classify sentences or paragraphs as either positive, negative, or neutral based on the words within them. Sentiment analysis can be used to quickly and effectively understand large amounts of text data to generate insights and ideas. This can be extremely useful to read data like customer reviews.

### TextBlob Sentiment
TextBlob sentiment is a python package that gives a sentiment score based on the text. TextBlob is a great tool because it will give you a score for each of the neutral, egative and positive words all at one time for each sentence. It will also give you a score to show if a text is objective or subjective.

### TextBlob Sentiment Analysis
The first thing we will need to do is import our python packages. The main packages we will need are pandas, regualr expression, TextBlob sentiment, and nltk.

In [14]:
# Import python packages
import csv
import re
import pandas as pd
from textblob import TextBlob
import nltk
from nltk.corpus import stopwords
from nltk import FreqDist, word_tokenize
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer

Once our packages are imported we need to import the text we will be analyzing.

In [4]:
# Read csv file
openfile = open("data/sentimentdata.csv", "rb")
r = csv.reader(openfile)
reviews = []
for i in r:
    reviews.append(i)
openfile.close()

Now we will need to look at the data to understand how it is formatted. This will give us an understanding of what we need to do to clean the data before running our TextBlob sentiment analysis.

In [5]:
reviews

[["I went and saw this movie last night after being coaxed to by a few friends of mine. I'll admit that I was reluctant to see it because from what I knew of Ashton Kutcher he was only able to do comedy. I was wrong. Kutcher played the character of Jake Fischer very well, and Kevin Costner played Ben Randall with such professionalism. The sign of a good movie is that it can toy with our emotions. This one did exactly that. The entire theater (which was sold out) was overcome by laughter during the first half of the movie, and were moved to tears during the second half. While exiting the theater I not only saw many women in tears, but many full grown men as well, trying desperately not to let anyone see them crying. This movie was great, and I suggest that you go see it before you judge."],
 ["Once again Mr. Costner has dragged out a movie for far longer than necessary. Aside from the terrific sea rescue sequences, of which there are very few I just did not care about any of the charact

The first thing we will need to do is remove the extra bracket around the text to make the different reviews editable.

In [6]:
#remove the extra bracket
reviews = [x for y in reviews for x in y]

Now we will need to use regular expression to remove useless characters and numbers.

In [7]:
# Remove useless numbers and alphanumerical words
documents = [re.sub("[^a-zA-Z]+", " ", document) for document in reviews]

Next, we will need to tokenize the data. this will take each of the reviews and break them down into individual words. This will let use use sentiment analysis on a word-by-word basis to provide a score.

In [8]:
# tokenize
texts = [[word for word in document.lower().split() ] for document in documents]

We will now need to remove stopwords. Stopwords are very common terms that give little to no context to our sentiment analysis.

In [9]:
# remove stopwords
stoplist = stopwords.words('english')
texts = [[word for word in text if word not in stoplist] for text in texts]

We will also remove words that are only 3 or less characters.

In [11]:
#remove short words
texts = [[ word for word in tokens if len(word) >= 3 ] for tokens in texts]

In [12]:
texts

[['went',
  'saw',
  'movie',
  'last',
  'night',
  'coaxed',
  'friends',
  'mine',
  'admit',
  'reluctant',
  'see',
  'knew',
  'ashton',
  'kutcher',
  'able',
  'comedy',
  'wrong',
  'kutcher',
  'played',
  'character',
  'jake',
  'fischer',
  'well',
  'kevin',
  'costner',
  'played',
  'ben',
  'randall',
  'professionalism',
  'sign',
  'good',
  'movie',
  'toy',
  'emotions',
  'one',
  'exactly',
  'entire',
  'theater',
  'sold',
  'overcome',
  'laughter',
  'first',
  'half',
  'movie',
  'moved',
  'tears',
  'second',
  'half',
  'exiting',
  'theater',
  'saw',
  'many',
  'women',
  'tears',
  'many',
  'full',
  'grown',
  'men',
  'well',
  'trying',
  'desperately',
  'let',
  'anyone',
  'see',
  'crying',
  'movie',
  'great',
  'suggest',
  'see',
  'judge'],
 ['costner',
  'dragged',
  'movie',
  'far',
  'longer',
  'necessary',
  'aside',
  'terrific',
  'sea',
  'rescue',
  'sequences',
  'care',
  'characters',
  'ghosts',
  'closet',
  'costner',
  '

As we can now see, our data is broken down by word and is ready to be analyzed.

We will now use TextBlob sentiment analysis to get an idea of what each review contains.

In [15]:
for i in texts:
    i = str(i)
    blob = TextBlob(i)
    for sentence in blob.sentences:
        print sentence.sentiment

Sentiment(polarity=0.13111111111111112, subjectivity=0.49222222222222217)
Sentiment(polarity=0.11145833333333333, subjectivity=0.5604166666666667)
Sentiment(polarity=0.20491228070175438, subjectivity=0.5332456140350876)
Sentiment(polarity=0.1401315789473684, subjectivity=0.4657894736842104)
Sentiment(polarity=0.1378787878787879, subjectivity=0.3878787878787879)
Sentiment(polarity=0.2234375, subjectivity=0.4510416666666667)
Sentiment(polarity=0.03295454545454546, subjectivity=0.45454545454545453)
Sentiment(polarity=0.006666666666666659, subjectivity=0.38333333333333325)
Sentiment(polarity=0.10297280844155846, subjectivity=0.6294642857142856)
Sentiment(polarity=0.9, subjectivity=1.0)
Sentiment(polarity=0.1603896103896104, subjectivity=0.5739177489177489)
Sentiment(polarity=-0.19242424242424241, subjectivity=0.6416666666666667)
Sentiment(polarity=0.560763888888889, subjectivity=0.7135416666666666)
Sentiment(polarity=0.36319444444444443, subjectivity=0.5854166666666667)
Sentiment(polarity=

Now that we have all of our scores, we will need to gather all of our data to find out how many reviews are positive, negative or neutral.

In [16]:
# Separate positive, negative, and neutral reviews
positive_review_t = []
negative_review_t = []
neutral_review_t = []

for i in texts:
    i = str(i)
    blob = TextBlob(i)
    for sentence in blob.sentences:
        if sentence.sentiment.polarity > 0:
            positive_review_t.append(sentence)
        elif sentence.sentiment.polarity == 0:
            neutral_review_t.append(sentence)
        else:
            negative_review_t.append(sentence)
        
print len(positive_review_t), 'Positive'
print len(negative_review_t), 'Negative'
print len(neutral_review_t), 'Neutral'
print len(reviews), 'Total'

81 Positive
25 Negative
0 Neutral
106 Total


By using TextBlob we can say that out of 106 total reviews, 81 were positive and 25 were negative.