# Capturing Sentimental Opinions on Presidential Candidates

In this project, we'll aim to capture the opinions and sentiments of a sample of electoral voters. We'll do this by analyzing their tweets on a daily basis over a period of time to see how these opinions are varying or changing over time. 

Can analyzing sentiments of tweets from a sample during a given period provide insight to how an election will swing? The main goal of the project is to examine and analyze tweets containing reference to candidates vying for a presidential spot, classify these tweets into positive, negative or neutral sentiments and from these, try to conclude on how much online sentiments provide for a good indicator to predict the results of an election.

## Tutorials

#### You can run each cell below by pressing 'Shift' + 'Enter'. Inserting, deleting cells etc can be done with the menu items at the top left. 

In [2]:
#import data exploration packages
import pandas as pd 
import os #operating system package
import numpy

In [3]:
#basically just to return the current directory
current_working_directory = os.getcwd()

In [4]:
print(current_working_directory)

C:\Users\Enendu\Documents\GitHub\sentimentanalysis


In [5]:
#Here, I create a dataframe (df) by reading the csv from the filepath where it is located using pandas (pd)
df = pd.read_csv(current_working_directory + '\sentiment_labelled_sentences\\amazon_cells_labelled.csv')

In [6]:
#head returns the first five lines of the file
#You can see the file doesn't have headers and some of the (0 - negative and 1 - positive) labels are out of place 
df.head()

Unnamed: 0,So there is no way for me to plug it in here in the US unless I go by a converter.,0,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,Good case,Excellent value.,1.0,,,
1,Great for the jawbone.,1,,,,
2,Tied to charger for conversations lasting more...,0,,,,
3,The mic is great.,1,,,,
4,I have to jiggle the plug to get it to line up...,0,,,,


In [7]:
#shows information about the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 999 entries, 0 to 998
Data columns (total 6 columns):
So there is no way for me to plug it in here in the US unless I go by a converter.    999 non-null object
0                                                                                     999 non-null object
Unnamed: 2                                                                            229 non-null object
Unnamed: 3                                                                            55 non-null object
Unnamed: 4                                                                            14 non-null object
Unnamed: 5                                                                            4 non-null float64
dtypes: float64(1), object(5)
memory usage: 46.9+ KB


### QUICK TASK - Can you adjust the dataframe to include a header row and also make it a 2-column dataframe??
#### PS - Google is your friend :) 

In [96]:
#Same thing you did, just nicer to create it while creating the dataframe
df = pd.read_csv(current_working_directory + '\sentiment_labelled_sentences\\amazon_cells_labelled.csv', 
                 names = ["Sequence", "Start", "End", "Coverage", "Biscuit", "Drink"])

In [84]:
df.head()

Unnamed: 0,Sequence,Start,End,Coverage,Biscuit,Drink
0,So there is no way for me to plug it in here i...,0,,,,
1,Good case,Excellent value.,1.0,,,
2,Great for the jawbone.,1,,,,
3,Tied to charger for conversations lasting more...,0,,,,
4,The mic is great.,1,,,,


In [85]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
Sequence    1000 non-null object
Start       1000 non-null object
End         229 non-null object
Coverage    55 non-null object
Biscuit     14 non-null object
Drink       4 non-null float64
dtypes: float64(1), object(5)
memory usage: 47.0+ KB


In [97]:
#Just showing the names of the columns
df.columns.values

array(['Sequence', 'Start', 'End', 'Coverage', 'Biscuit', 'Drink'], dtype=object)

In [98]:
#Changing the dataframe to a python list to be able to easily alter it
mylist = df.values.tolist()

In [99]:
#First item in the list. Compare with the dataframe, you'll see it's the same
mylist[0]

['So there is no way for me to plug it in here in the US unless I go by a converter.',
 '0',
 nan,
 nan,
 nan,
 nan]

In [87]:
#This is called list comprehension. It is another way of doing for loops. Basically what I'm doing here is, for every list x
#in mylist, return another list y that doesn't contain any nan values 
cleanedlist = [[y for y in x if str(y) != 'nan'] for x in mylist]

In [100]:
#What the first five lists in cleanedlist looks like. This is list slicing
cleanedlist[:5]

[['So there is no way for me to plug it in here in the US unless I go by a converter.',
  '0'],
 ['Good case', ' Excellent value.', '1'],
 ['Great for the jawbone.', '1'],
 ['Tied to charger for conversations lasting more than 45 minutes.MAJOR PROBLEMS!!',
  '0'],
 ['The mic is great.', '1']]

In [101]:
#Here I create a dictionary because I'm aiming for a 2-column "key-value" pair. For each list in cleanedlist,
#merge together all the strings as long as it is not the last string. The last string is (0 or 1) in all of the sublists.
dict1 = {}
for x in cleanedlist:
    mergedsentence = ''
    for y in x[:-1]:
        mergedsentence += y + ""
    dict1[mergedsentence] = x[-1]

In [102]:
dict1

{'!I definitly recommend!!': '1',
 '#1 It Works - #2 It is Comfortable.': '1',
 '$50 Down the drain.': '0',
 '(It works!)': '1',
 ")Setup couldn't have been simpler.": '1',
 '* Comes with a strong light that you can use to light up your camera shots and even flash SOS signals (seriously!': '1',
 '.... Item arrived quickly and works great with my Metro PCS Samsung SCH-r450 slider phone and Sony Premium Sound in ear plugs.': '1',
 "1. long lasting battery (you don't have to recharge it as frequentyly as some of the flip phones)2.": '0',
 '2 thumbs up to this seller': '1',
 ':-)Oh the charger seems to work fine.': '1',
 'A Disappointment.': '0',
 'A PIECE OF JUNK THAT BROKE AFTER BEING ON MY PHONE FOR 2 DAYS!!!': '0',
 "A good quality bargain.. I bought this after I bought a cheapy from Big Lots that sounded awful and people on the other end couldn't hear me.": '1',
 'A lot of websites have been rating this a very good phone and so do I.': '1',
 'A must study for anyone interested in the 

In [93]:
finaldf = pd.DataFrame(list(dict1.items()), columns=['Review', 'Sentiment'])

In [95]:
finaldf.head()

Unnamed: 0,Review,Sentiment
0,So there is no way for me to plug it in here i...,0
1,Good case Excellent value.,1
2,Great for the jawbone.,1
3,Tied to charger for conversations lasting more...,0
4,The mic is great.,1


## New Tutorial

### Can you count and show the number of negative (0) and positive (1) sentiments?