<h1> Sentiment Hackpad </h1>

#### Authors: 
Daniela Huppenkothen, Phil Marshall, Madhura Killedar

In [1]:
!pip install textblob



In [2]:
from __future__ import unicode_literals, print_function
import textblob
import pandas as pd
import numpy as np

### Test data
As a quick test, we feed some text into the textblob sentiment analyzer.

`polarity` can range from -1 to 1.
* -1 reflects extreme negative associations
* 1 reflects extreme positive associations
* 0 is neutral language

In [3]:
textblob.TextBlob("Hello World I hate you").sentiment.polarity

-0.8

In [4]:
textblob.TextBlob("Hello World I love you").sentiment.polarity

0.5

### Hackpad data

#### Read data

To do: Find a more automatic text-scraping method

In [5]:
textfile = "../hackpadtext.txt"

In [6]:
rawdata = pd.read_csv(textfile, header=None, names=["text"], sep="\n", encoding="utf-8")

#### Analyse 
Analyse and store polarity of each chunk

In [7]:
rawdata["polarity"] = np.zeros_like(np.array(rawdata.columns["0"]))

  return getitem(key)


In [8]:
# analyse each data/hack idea
feelings = []
for i in rawdata.index:
    data = rawdata.loc[i].values[0]
    polarity = textblob.TextBlob(data).sentiment.polarity
    rawdata.loc[i,"polarity"] = polarity
    feelings.append(polarity)

How happy are we on average?

In [9]:
average_feels = sum(feelings)/len(feelings)
print(average_feels)

0.186349310623


In [10]:
if average_feels>0:
    print("Yay, we're happy! wooooooooooo!")
else:
    print("oh no not happy jan")

Yay, we're happy! wooooooooooo!


Who sounds sad?

In [11]:
# search for sad hacks
rawdata[rawdata["polarity"]<0]

Unnamed: 0,text,polarity
11,Deprojecting Galaxies (or molecular structure)...,-0.0888889
16,Here's my ongoing failure in notebook form,-0.316667
24,Classifying the pulse shapes of pulsars using ...,-0.0218182
25,A custom Monte Carlo sampler for the Kepler pr...,-0.225952
39,Classifying the pulse shapes of pulsars using ...,-0.0218182
49,Modelling 2-D Impulse Response Function for Ac...,-0.129167
61,Managing Large Scale Structure Data with Datab...,-0.0111772
71,Neural Networks (Zaki Ali) - I'm working on a ...,-0.148864
79,"Start with a single species (say FeII), conver...",-0.0107143
89,Bayesian networks for inference of young star ...,-0.11


In [12]:
# search for happy hacks
#rawdata[rawdata["polarity"]>0]

In [13]:
# Top Five Happy Hacks!
rawdata.sort_values("polarity")[::-1][:5]

Unnamed: 0,text,polarity
87,Lunch sounds good!,0.875
98,happy to chat about uncertainty and implementi...,0.8
85,A good point of reference: streams. Hope to jo...,0.7
51,"Sure, sounds good!",0.6875
36,Brigitta Sipocz I updated the hack. let's make...,0.625


Wait... most of those sound like comments, not hacks!

### Hackpad data (filtering out short comments)
Now, we'll assume (hoping) that a chunk of text with more than 20 words is an actual hack project idea as opposed to a comment. This isn't always true, so there's room for improvement.

In [14]:
rawdata["mask"] = np.zeros_like(np.array(rawdata.columns["0"]))

In [15]:
# select only 
for i in rawdata.index:
    if len(rawdata.loc[i,"text"].split(" "))>20:
        rawdata.loc[i,"mask"] = True
    else:
        rawdata.loc[i,"mask"] = False

In [16]:
hackdata = rawdata[rawdata["mask"]]

In [17]:
#Top Five Sad Actually-Hacks (probably)
hackdata.sort_values("polarity")[:5]

Unnamed: 0,text,polarity,mask
25,A custom Monte Carlo sampler for the Kepler pr...,-0.225952,True
71,Neural Networks (Zaki Ali) - I'm working on a ...,-0.148864,True
49,Modelling 2-D Impulse Response Function for Ac...,-0.129167,True
89,Bayesian networks for inference of young star ...,-0.11,True
92,Python API to perform SDSS SQL Queries: Sky Se...,-0.09375,True


In [18]:
# Top Five Happy Actually-Hacks (probably)
hackdata.sort_values("polarity")[::-1][:5]

Unnamed: 0,text,polarity,mask
0,Gaussian Process Tutorial (Jake/Phil) We start...,0.625,True
82,Long-shot: if we finish the automatic velocity...,0.5,True
20,"collaboratr (Mike Baumer, Usman Khan, Casey L...",0.5,True
29,Create color palettes for custom queries (Adri...,0.5,True
50,You might want to take a look at Sherpa. Aneta...,0.4625,True


Repeat analysis from earlier

In [19]:
moarfeelings = []
for i in hackdata.index:
    data = hackdata.loc[i].values[0]
    polarity = textblob.TextBlob(data).sentiment.polarity
    moarfeelings.append(polarity)

In [20]:
average_feels = sum(moarfeelings)/len(moarfeelings)
print(average_feels)

0.156783688455


In [21]:
if average_feels>0:
    print("YAY, WE'RE ACTUALLY HAPPY! wooooooooooo!")
else:
    print("oh no we're actually sad")

YAY, WE'RE ACTUALLY HAPPY! wooooooooooo!
