# Exploratory Data Analysis of ~20k tweets from 9/11/2016 to 12/12/2016.

   Today Demonetisation is something that has affected every single person in the country, in a mostly direct way. Though are no official stats on the impact it had on the economy in the last 40 days or so I thought it would be interesting to explore the general sentiment on Twitter, to understand what folks from diverse walks of life think.
  
  It was easier said than done. Depending on the way you see it, Twitter data is either entirely usesless or is very difficult to classify. Though I certainly dont think Twitter as the gold standard of the country's opinion, I still valid  opinions are voiced there and it's worth taking a look. 
  
 I compiled close to 20,000 tweets, from 9/11/2016 to 12/12/2016, classified sentiments using Indico.io's sentiment classifier and also hand classified a few tweets written in Hinglish.
 
 These were the questions I was trying to get answered with these tweets:
 * What was the average sentiment over these 34 days. 
 * How did sentiments change over a period of time in India's top 10 cities by population. What were folks outside india saying and how did that change over time?
 * Where did the most positive tweets come from? Where did the most negative tweets come from?
 * What were Twitter's "influencers" saying about this move? And how did that change over time?
 * 


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import indicoio as indico
import matplotlib.pyplot as plt
import collections
from collections import OrderedDict
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

%matplotlib inline

In [34]:
df = pd.read_csv('tweets.csv')

In [35]:
df.drop(['Unnamed: 0'], axis = 1, inplace = True, errors = 'ignore')

In [36]:
df.columns

Index(['user id', 'name', 'handle', 'date', 'retweets', 'favorites', 'text',
       'id', 'permalink', 'language', 'followers_count', 'location',
       'sentiment'],
      dtype='object')

In [37]:
df.head()

Unnamed: 0,user id,name,handle,date,retweets,favorites,text,id,permalink,language,followers_count,location,sentiment
0,7.108627e+17,The gamer,VenkateshaPanc2,2016-11-09 23:38:24,0.0,0.0,If common man is supporting the decision why c...,7.964142e+17,https://twitter.com/VenkateshaPanc2/status/796...,en,6.0,"Indore, India",0.712256
1,72919560.0,Uma Kant Singh,umakantsingh_in,2016-11-09 23:38:42,0.0,0.0,"#DeMonetisation,Not govt of common man,PM who ...",7.964142e+17,https://twitter.com/umakantsingh_in/status/796...,en,455.0,New Delhi,0.433632
2,238190100.0,Antony bothagar,Antonybothagar,2016-11-09 23:38:46,0.0,1.0,I am already suffering to get changes for ₹ 10...,7.964142e+17,https://twitter.com/Antonybothagar/status/7964...,en,76.0,,0.844959
3,2433278000.0,Abhishek Sharma,skyneeldotcom,2016-11-09 23:38:53,1.0,1.0,#APPSC #Recruitment 2016 – Apply Online for 98...,7.964143e+17,https://twitter.com/skyneeldotcom/status/79641...,en,6288.0,,0.568757
4,3246635000.0,abdul rahman,RahmanAbdul2603,2016-11-09 23:39:21,0.0,0.0,#DeMonetisation almst everybody knows fr a mar...,7.964144e+17,https://twitter.com/RahmanAbdul2603/status/796...,en,48.0,"Saharanpur, India",0.080881


## Lets first look at overall sentiment over the course of these 34 days.

In [38]:
## Get average sentiment for each week. For that, return Dataframes for each week.  
def search(df, *words):  #1
    return df[np.logical_or.reduce([df['date'].str.contains(word) for word in words])]
week1 = search(df,"2016-11-09","2016-11-10","2016-11-11","2016-11-12","2016-11-13","2016-11-14","2016-11-15","2016-11-16")

## Maybe we can look at the data in a slightly granular way. Mean sentiment for each day. 
### A scatter plot would help.

In [49]:
day_count = df['date'].unique().tolist()
day_count = [i.split(' ', 1)[0] for i in day_count] #remove time-stamps.
day_count = list(OrderedDict.fromkeys(day_count))
mean_sentiment_each_day = []
for i in range(0, 34):
    mean_sentiment_each_day.append(search(df,day_count[i])['sentiment'].mean())
print(mean_sentiment_each_day)

[0.53136832903225806, 0.52921829355149175, 0.50774063636363631, 0.48179133333333335, 0.51169756265984656, 0.53476719999999989, 0.50039012500000002, 0.51350087678339817, 0.51913394965986392, 0.4794827131367293, 0.52364179829545454, 0.48739098203592818, 0.52228293522267211, 0.48062430821917812, 0.5170525532994924, 0.49913779966887412, 0.47832414985994398, 0.55673494999999984, 0.39815657142857142, 0.49845762499999996, 0.51119342201834861, 0.457264375, 0.45608518881118881, 0.47380484290540537, 0.49258282573099416, 0.50329545744680859, 0.47727182420091324, 0.49089637484586929, 0.47717728712871288, 0.44266404385964908, 0.47598505787781359, 0.48914792917166872, 0.48785451050679857, 0.47431779556898285]
