A couple of reminders about how to use Jupyter Notebook: 
- Hit Shift+Enter to run code and finish typing in Markdown
- You should not have to change much code here, but be careful to document carefully if you do 

Import all the necessary packages into Python. If there are any you don't have, type "!pip install package-name" above the import statements and it should install that package. You should only have to do that once, so you can delete it right afterwards. To get access to the sentiment module package in this notebook, use this link and extract the resulting zip file into your .ipython folder in your user directory: https://www.csc2.ncsu.edu/faculty/healey/msa-18/text/sentiment_module.zip

In [15]:
from sentiment_module import sentiment
import pandas as pd
import nltk
import re
import string

This part is initializes the various lists that will be used to form the dataframes below. The preprocess() function in this case reads in the file, and takes the president and reelection values as parameters (these can be removed, it was relevant for this project but may not be for yours. It can also be used for any sort of categorization of the sentiment that you may want to do). Then a term vectors is created with stop words removed, and sentences are scored on their sentiment using the sentiment dictionary in the sentiment module, based on their arousal (how active/inactive the emotion is, for example "content" is low arousal and "excited" is high arousal) and their valence (how positive or negative a feeling is). 

In [16]:
punc = re.compile( '[%s]' % re.escape( string.punctuation ) )
term_vec = []
presidents = []
reelections = []
porter = nltk.stem.porter.PorterStemmer()
stop_words = nltk.corpus.stopwords.words( 'english' )
additional = ['weve','ive','were','would','well','here','there']
arousal = []
valence = []

def preprocess(file, president,reelection):
    term_vec_inner = []
    president_list = []
    reelection_list = []
    arousal_list = []
    valence_list = []
    with open (file, "r",encoding="utf-8") as myfile:
        filename=myfile.readlines()
        filename = [x.strip() for x in filename]
    for d in filename:
        d = d.lower()
        d = punc.sub( '', d )
        d = nltk.word_tokenize(d) 
        e = []
        for i in d:
            if i not in stop_words and i not in additional:
                e.append(i)
        president_list.append(president)
        term_vec_inner.append(e)
        reelection_list.append(reelection)
        arousal_list.append(sentiment.sentiment(e)['arousal'])
        valence_list.append(sentiment.sentiment(e)['valence'])
    term_vec.append(term_vec_inner)
    presidents.append(president_list)
    reelections.append(reelection_list)
    arousal.append(arousal_list)
    valence.append(valence_list)
        
preprocess("Obama_town_hall_pre.txt","Obama","No")
preprocess("Obama_town_hall_post.txt","Obama","Yes")
preprocess("Bush_town_hall_pre.txt","Bush","No")
preprocess("Bush_town_hall_post.txt","Bush","Yes")
preprocess("Clinton_town_hall_pre.txt","Clinton","No")
preprocess("Clinton_town_hall_post.txt","Clinton","Yes")

The code below builds a Pandas dataframe based on the lists we have: 'Sentences' is the list of term vectors for each sentence (in this case, synonymous with "document") we read in. Then for each sentence, we have the president who said it, whether or not they were running for reelection when they said it, the arousal for the sentence, and the valence for the sentence.

In [19]:
dd = pd.DataFrame({'Sentence': term_vec[0], 'President':presidents[0],'Reelection':reelections[0],'Arousal':arousal[0],'Valence':valence[0]})
for i in range(1,len(term_vec)):
    df = pd.DataFrame({'Sentence': term_vec[i], 'President':presidents[i],'Reelection':reelections[i],'Arousal':arousal[i],'Valence':valence[i]})
    dd = dd.append(df, ignore_index=True)

Finally, we have code to write the Pandas dataframe to a csv file, so that we can do visualizations or analysis in another software tool, or simply to continue that analysis by reading this csv file into another Jupyter notebook.

In [18]:
dd.to_csv('sentiment.csv')