## TASK

You are required to identify and carry out an analysis of a large dataset gleaned from the twitter API and is available on Moodle as “ProjectTweets.csv”
This data should be stored as requested below, and you are then required to analyse any change sentiment that occurs over the time period detailed in the file.
Context

This dataset contains 1,600,000 tweets extracted using the twitter api . 

Content
It contains the following 5 fields:
* ids: The id of the tweet (eg. 4587)
* date: the date of the tweet (eg. Sat May 16 23:58:44 UTC 2009)
* flag: The query (eg. lyx). If there is no query, then this value is NO_QUERY.
* user: the user that tweeted (eg. bobthebuilder)
* text: the text of the tweet (eg. Lyx is cool)

Following your analysis, you are then required to make a time series forecast of the sentiment of the entire dataset at 1 day, 3 days and 7 days going forward. This forecast must be displayed as a dynamic dashboard.     


In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from datetime import datetime
from numpy import mean
import seaborn as sns
import matplotlib.pyplot as plt
import re

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv(r'C:\Users\tadeo\Desktop\CA2-Sem2\Data\ProjectTweets.csv', header = None)
pd.set_option('display.max_colwidth', 1000)

In [3]:
df.head()

Unnamed: 0,0,1,2,3,4,5
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D"
1,1,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!
2,2,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds
3,3,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,4,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there."


In [4]:
df1 = df.copy()

## DF Cleaning

In [5]:
# Drop columns
df1 = df1.drop(columns = [0,1,3,4]).reset_index(drop = True)

# Split date 
df1[['Day_of_week', 'Month', 'Day', 'Time', 'Timezone', 'Year']] = df1[2].str.split(expand=True)

# Name columns
df1.columns= ['DATE', 'TWEET',  'DAY_OF_WEEK', 'MONTH', 'DAY', 'TIME', 'TIMEZONE', 'YEAR']

# Map months
m_map = {'Jan': '01', 'Feb': '02', 'Mar': '03',
             'Apr': '04', 'May': '05', 'Jun': '06',
             'Jul': '07', 'Aug': '08', 'Sep': '09',
             'Oct': '10', 'Nov': '11', 'Dec': '12'}
df1['MONTH'] = df1['MONTH'].map(m_map)

# Create clean date column
df1['DATE'] = df1['YEAR'] + '-' + df1['MONTH'] + '-' + df1['DAY'] + ' ' + df1['TIME']

# Map days of week
m_map = {'Mon': '01', 'Tue': '02', 'Wed': '03',
         'Thu': '04', 'Fri': '05', 'Sat': '06',
         'Sun': '07',
        }
df1['DAY_OF_WEEK'] = df1['DAY_OF_WEEK'].map(m_map)

# Organize columns
df1 = df1[['DATE', 'DAY_OF_WEEK', 'TWEET', 'YEAR', 'MONTH', 'DAY', 'TIME', 'TIMEZONE']]

# Eliminate columns
df1 = df1.drop(columns = ['YEAR', 'MONTH', 'DAY', 'TIME', 'TIMEZONE']).reset_index(drop = True)

# Eliminate https & mentions (@)
df1['TWEET'] = df1['TWEET'].apply(
    lambda tweet: ' '.join(
        ['' if word.startswith('@') and len(word) > 1 or word.startswith('http') else word for word in tweet.split(' ')]
    ))

# Eliminate special characters from tweets keep , . and :
def clean_tweet(tweet):
    c_tweet = re.sub(r'[^\w\s,.:]', '', tweet)
    return c_tweet

df1['TWEET'] = df1['TWEET'].apply(clean_tweet)



In [11]:
df1

Unnamed: 0,DATE,DAY_OF_WEEK,TWEET
0,2009-04-06 22:19:45,01,"Awww, thats a bummer. You shoulda got David Carr of Third Day to do it. D"
1,2009-04-06 22:19:49,01,is upset that he cant update his Facebook by texting it... and might cry as a result School today also. Blah
2,2009-04-06 22:19:53,01,I dived many times for the ball. Managed to save 50 The rest go out of bounds
3,2009-04-06 22:19:57,01,my whole body feels itchy and like its on fire
4,2009-04-06 22:19:57,01,"no, its not behaving at all. im mad. why am i here because I cant see you all over there."
...,...,...,...
1599995,2009-06-16 08:40:49,02,Just woke up. Having no school is the best feeling ever
1599996,2009-06-16 08:40:49,02,TheWDB.com Very cool to hear old Walt interviews â
1599997,2009-06-16 08:40:49,02,Are you ready for your MoJo Makeover Ask me for details
1599998,2009-06-16 08:40:49,02,Happy 38th Birthday to my boo of alll time Tupac Amaru Shakur
