<h1> Sentiment Hackpad </h1>

#### Authors: 
Daniela Huppenkothen, Phil Marshall, Madhura Killedar

In [22]:
!pip install textblob



In [23]:
from __future__ import unicode_literals, print_function
import textblob
import pandas as pd
import numpy as np

### Test data
As a quick test, we feed some text into the textblob sentiment analyzer.

`polarity` can range from -1 to 1.
* -1 reflects extreme negative associations
* 1 reflects extreme positive associations
* 0 is neutral language

In [24]:
textblob.TextBlob("Hello World I hate you").sentiment.polarity

-0.8

In [25]:
textblob.TextBlob("Hello World I love you").sentiment.polarity

0.5

### Hackpad data

#### Read data

To do: Find a more automatic text-scraping method

In [26]:
textfile = "../hackpadtext_Thu_active.txt"

In [27]:
rawdata = pd.read_csv(textfile, header=None, names=["text"], sep="\n", encoding="utf-8")

#### Analyse 
Analyse and store polarity of each chunk

In [28]:
rawdata["polarity"] = np.zeros_like(np.array(rawdata.columns["0"]))

In [29]:
# analyse each data/hack idea
feelings = []
for i in rawdata.index:
    data = rawdata.loc[i].values[0]
    polarity = textblob.TextBlob(data).sentiment.polarity
    rawdata.loc[i,"polarity"] = polarity
    feelings.append(polarity)

How happy are we on average?

In [30]:
average_feels = sum(feelings)/len(feelings)
print(average_feels)

0.126935835329


In [31]:
if average_feels>0:
    print("Yay, we're happy! wooooooooooo!")
else:
    print("oh no not happy jan")

Yay, we're happy! wooooooooooo!


Who sounds sad?

In [32]:
# search for sad hacks
rawdata[rawdata["polarity"]<0]

Unnamed: 0,text,polarity
9,AstroHackWeek image Gallery - (Arna) Image gal...,-0.0375
18,Deprojecting Galaxies (or molecular structure)...,-0.0888889
23,Here's my ongoing failure in notebook form,-0.316667
32,Classifying the pulse shapes of pulsars using ...,-0.0218182
33,A custom Monte Carlo sampler for the Kepler pr...,-0.225952
38,"Making MCMC fail on problems with implicit, fl...",-0.00625


In [33]:
# search for happy hacks
#rawdata[rawdata["polarity"]>0]

In [34]:
# Top Five Happy Hacks!
rawdata.sort_values("polarity")[::-1][:5]

Unnamed: 0,text,polarity
4,Tips and Tricks for Teaching with Jupyter Note...,1.0
3,Gaussian Process Tutorial (Jake/Phil) We start...,0.625
27,"collaboratr (Mike Baumer, Usman Khan, Casey L...",0.5
37,Create color palettes for custom queries (Adri...,0.5
1,"AstroHackWeek HackPad Happiness (Madhura, Dani...",0.46875


Wait... most of those sound like comments, not hacks!

### Hackpad data (filtering out short comments)
Now, we'll assume and hope that a chunk of text with more than 20 words is an actual hack project idea as opposed to a comment. This isn't always true, so there's room for improvement.

In [35]:
rawdata["mask"] = np.zeros_like(np.array(rawdata.columns["0"]))

In [36]:
# select only 
for i in rawdata.index:
    if len(rawdata.loc[i,"text"].split(" "))>20:
        rawdata.loc[i,"mask"] = True
    else:
        rawdata.loc[i,"mask"] = False

New dataset only includes hacks, not comments

In [44]:
hackdata = rawdata[rawdata["mask"]]

In [38]:
#Top Five Sad Actually-Hacks (probably)
hackdata.sort_values("polarity")[:5]

Unnamed: 0,text,polarity,mask
33,A custom Monte Carlo sampler for the Kepler pr...,-0.225952,True
18,Deprojecting Galaxies (or molecular structure)...,-0.0888889,True
9,AstroHackWeek image Gallery - (Arna) Image gal...,-0.0375,True
32,Classifying the pulse shapes of pulsars using ...,-0.0218182,True
38,"Making MCMC fail on problems with implicit, fl...",-0.00625,True


In [39]:
# Top Five Happy Actually-Hacks (probably)
hackdata.sort_values("polarity")[::-1][:5]

Unnamed: 0,text,polarity,mask
4,Tips and Tricks for Teaching with Jupyter Note...,1.0,True
3,Gaussian Process Tutorial (Jake/Phil) We start...,0.625,True
37,Create color palettes for custom queries (Adri...,0.5,True
27,"collaboratr (Mike Baumer, Usman Khan, Casey L...",0.5,True
1,"AstroHackWeek HackPad Happiness (Madhura, Dani...",0.46875,True


Repeat analysis from earlier

In [40]:
moarfeelings = []
for i in hackdata.index:
    data = hackdata.loc[i].values[0]
    polarity = textblob.TextBlob(data).sentiment.polarity
    moarfeelings.append(polarity)

In [41]:
average_feels = sum(moarfeelings)/len(moarfeelings)
print(average_feels)

0.174720151589


In [42]:
if average_feels>0:
    print("YAY, WE'RE ACTUALLY HAPPY! wooooooooooo!")
else:
    print("oh no we're actually sad")

YAY, WE'RE ACTUALLY HAPPY! wooooooooooo!
