Sentimental Labeling

1. Read the file

In [39]:
import pandas as pd

df = pd.read_csv("test(in).csv")

df.head()

Unnamed: 0,Subject,body,date,from
0,EnronOptions Update!,EnronOptions Announcement\n\n\nWe have updated...,5/10/2010,sally.beck@enron.com
1,(No Subject),"Marc,\n\nUnfortunately, today is not going to ...",7/29/2010,eric.bass@enron.com
2,Phone Screen Interview - Shannon L. Burnham,"When: Wednesday, June 06, 2001 10:00 AM-11:00 ...",7/25/2011,sally.beck@enron.com
3,RE: My new work email,we were thinking papasitos (we can meet somewh...,3/25/2010,johnny.palmer@enron.com
4,Bet,Since you never gave me the $20 for the last t...,5/21/2011,lydia.delgado@enron.com


2. Data Cleaning and Initializing

In [40]:
df = df[(df['Subject'].notnull()) | (df['body'].notnull())]

df['text'] = df['Subject'].fillna('') + ' ' + df['body'].fillna('')

df.head()


Unnamed: 0,Subject,body,date,from,text
0,EnronOptions Update!,EnronOptions Announcement\n\n\nWe have updated...,5/10/2010,sally.beck@enron.com,EnronOptions Update! EnronOptions Announcement...
1,(No Subject),"Marc,\n\nUnfortunately, today is not going to ...",7/29/2010,eric.bass@enron.com,"(No Subject) Marc,\n\nUnfortunately, today is ..."
2,Phone Screen Interview - Shannon L. Burnham,"When: Wednesday, June 06, 2001 10:00 AM-11:00 ...",7/25/2011,sally.beck@enron.com,Phone Screen Interview - Shannon L. Burnham W...
3,RE: My new work email,we were thinking papasitos (we can meet somewh...,3/25/2010,johnny.palmer@enron.com,RE: My new work email we were thinking papasit...
4,Bet,Since you never gave me the $20 for the last t...,5/21/2011,lydia.delgado@enron.com,Bet Since you never gave me the $20 for the la...


In [41]:
import re

def clean_text(text):
    text = text.lower()
    text = text.replace('\n', ' ').replace('\t', ' ')
    return text.strip()  

df['text'] = df['text'].apply(clean_text)

df.to_csv('cleaned_messages.csv', index=False)


3. Sentimental Labeling

In [43]:
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

def get_sentiment_label(text):
    scores = analyzer.polarity_scores(text)
    compound = scores['compound']
    if compound >= 0.05:
        return 'Positive'
    elif compound <= -0.05:
        return 'Negative'
    else:
        return 'Neutral'

df['sentiment'] = df['text'].apply(get_sentiment_label)

df.to_csv('sentimental_labeling.csv', index=False)

df

Unnamed: 0,Subject,body,date,from,text,sentiment
0,EnronOptions Update!,EnronOptions Announcement\n\n\nWe have updated...,5/10/2010,sally.beck@enron.com,enronoptions update! enronoptions announcement...,Positive
1,(No Subject),"Marc,\n\nUnfortunately, today is not going to ...",7/29/2010,eric.bass@enron.com,"(no subject) marc, unfortunately, today is no...",Positive
2,Phone Screen Interview - Shannon L. Burnham,"When: Wednesday, June 06, 2001 10:00 AM-11:00 ...",7/25/2011,sally.beck@enron.com,phone screen interview - shannon l. burnham w...,Neutral
3,RE: My new work email,we were thinking papasitos (we can meet somewh...,3/25/2010,johnny.palmer@enron.com,re: my new work email we were thinking papasit...,Neutral
4,Bet,Since you never gave me the $20 for the last t...,5/21/2011,lydia.delgado@enron.com,bet since you never gave me the $20 for the la...,Positive
...,...,...,...,...,...,...
2186,Re: Resume,Thanks for the resume. She has had some good ...,6/17/2011,johnny.palmer@enron.com,re: resume thanks for the resume. she has had...,Positive
2187,"Final Schedule - Wednesday, May 2, 2001 - Jesu...",Attached please find the following documents:\...,1/20/2011,johnny.palmer@enron.com,"final schedule - wednesday, may 2, 2001 - jesu...",Positive
2188,(No Subject),Good to finally hear from. Judging from your ...,1/2/2011,don.baughman@enron.com,(no subject) good to finally hear from. judgi...,Positive
2189,League is Set,It looks like we have our 12 teams. We will p...,3/11/2011,rhonda.denton@enron.com,league is set it looks like we have our 12 tea...,Positive
