# Simple sentiment analysis for AGL directors comments

Using the sentiment analysis functionality from the Pattern natural language processing library:
http://www.clips.ua.ac.be/pages/pattern-en#sentiment

In [17]:
from pattern.en import sentiment

## Load the text

I opened the Word document and saved as plain text. Won't need to do this for data that's been pasted into the spreadsheet, but it was the quickest way to run an analysis.

It looks like some punctuation hasn't survived the transformation, e.g., apostrophes. Cleaner text, including this punctuation and the newlines lost in the initial copy-paste would be better for more detailed natural language processing. 

However, the text is pretty clean. It is certainly sufficient for the simple approach from the Pattern library, which averages sentiment values from a lexicon.

In [18]:
comments = open('data/agl.txt').read()
print comments

In accordance with a resolution of the Board, the Directors present their report on the consolidated entity (AGL) consisting of AGL Energy Limited and its controlled entities, either during or at the end of the half-year ended 31 December 2016 (the period). Financial comparisons used in this report are of results for the halfyear ended 31 December 2015 (the prior corresponding period) for statement of profit or loss and cash flow analysis, and 30 June 2016 for statement of financial position analysis. 1. About AGL AGL is one of Australia�s leading integrated energy companies. It is taking action to responsibly reduce its greenhouse gas emissions, while providing secure and affordable energy to its customers. Drawing on over 175 years of experience, AGL serves its customers throughout eastern Australia by meeting their energy requirements, including gas, electricity, solar PV and related products and services. AGL has a diverse power generation portfolio including base, peaking and inte

## Sentiment for the AGL example

From the pattern web page:
> The sentiment() function returns a (polarity, subjectivity)-tuple for the given sentence, based on the adjectives it contains, where polarity is a value between -1.0 and +1.0 and subjectivity between 0.0 and 1.0.

I assume we're primarily interested in polarity here, which indicates whether the passage is expressing positive or negative opinions (<0). It could also be interesting to use the measure of subjectivity/objectivity.

We'll need to do some calibration and sanity checking over the full dataset to get a sense of how useful these measures are for this data.

In [19]:
comments_sentiment = sentiment(comments)
print 'Polarity:', comments_sentiment[0]
print 'Subjectivity:', comments_sentiment[1]

Polarity: 0.0366139069264
Subjectivity: 0.344818574283


## Word-level sentiment

We can also inspect the words from the text that appear in the lexicon and the corresponding polarity and subjectivity values.

In [20]:
sentiment(s).assessments

[(['present'], 0.0, 0.0, None),
 (['limited'], -0.07142857142857142, 0.14285714285714285, None),
 (['financial'], 0.0, 0.0, None),
 (['prior'], 0.0, 0.0, None),
 (['financial'], 0.0, 0.0, None),
 (['action'], 0.1, 0.1, None),
 (['responsibly'], 0.2, 0.55, None),
 (['secure'], 0.4, 0.6, None),
 (['related'], 0.0, 0.4, None),
 (['base'], -0.8, 1.0, None),
 (['traditional'], 0.0, 0.75, None),
 (['related'], 0.0, 0.4, None),
 (['natural'], 0.1, 0.4, None),
 (['digital'], 0.0, 0.0, None),
 (['other'], -0.125, 0.375, None),
 (['natural'], 0.1, 0.4, None),
 (['related'], 0.0, 0.4, None),
 (['currently'], 0.0, 0.4, None),
 (['approximately'], -0.4, 0.6, None),
 (['responsible'], 0.2, 0.55, None),
 (['satisfying'], 0.5, 1.0, None),
 (['traditional'], 0.0, 0.75, None),
 (['natural'], 0.1, 0.4, None),
 (['various'], 0.0, 0.5, None),
 (['related'], 0.0, 0.4, None),
 (['australian'], 0.0, 0.0, None),
 (['same'], 0.0, 0.125, None),
 (['internal'], 0.0, 0.0, None),
 (['satisfying'], 0.5, 1.0, None),
