### Sentiment Analysis of Twitter Data

Sources: 
https://towardsdatascience.com/creating-the-twitter-sentiment-analysis-program-in-python-with-naive-bayes-classification-672e5589a7ed

In [32]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

In [33]:
# load twitter data

data = pd.read_csv('../cleandata/tweets.csv')

In [34]:
# set column width in display

pd.set_option('display.max_colwidth', None)

In [35]:
data.head()

Unnamed: 0,date,text,state,query_term
0,2020-04-24 21:59:42+00:00,The economy will collapse in the long term if we continue with the business closures and large government making decisions for us that have proven to mess everything up.,DC,business closures
1,2020-04-23 21:28:12+00:00,"Classic counsel on cover letters! Covid-19 and business closures mean more job seekers. Get informed, boost your chances. #jobsearch #jobsearchcoach @National Harbor, Maryland https://www.instagram.com/p/B_Vosy8nb3S/?igshid=xj0jre0tc5sq",DC,business closures
2,2020-04-17 01:01:43+00:00,I'm guessing that would violate the mayor's non-essential business closure order?,DC,business closures
3,2020-04-16 18:19:35+00:00,"Ok, why did this take so long...and yet mandatory stay at home orders and “non-essential business” closures have been in place for weeks... Wearing masks is key to stopping the spread but also .....responsibly reopening our Maryland economy.",DC,business closures
4,2020-04-16 01:53:41+00:00,Explain exactly how government-forced business closures are the “free-market”?,DC,business closures


In [36]:
# check missing data
data.isna().sum()

date             0
text          1085
state            0
query_term       0
dtype: int64

In [37]:
# drop missing data

data.dropna(inplace=True)
data.shape

(215358, 4)

In [45]:
# check if duplicates are really duplicates
duplicates = [text for text in data['text'] if 'business closures mean more job' in text]


In [46]:
duplicates

['Classic counsel on cover letters! Covid-19 and business closures mean more job seekers. Get informed, boost your chances. #jobsearch #jobsearchcoach @National Harbor, Maryland https://www.instagram.com/p/B_Vosy8nb3S/?igshid=xj0jre0tc5sq',
 'Classic counsel on cover letters! Covid-19 and business closures mean more job seekers. Get informed, boost your chances. #jobsearch #jobsearchcoach @National Harbor, Maryland https://www.instagram.com/p/B_Vosy8nb3S/?igshid=xj0jre0tc5sq']

In [48]:
# drop duplicates

data.drop_duplicates(subset='text', inplace=True)

In [49]:
# export data into csv file

data.to_csv('../cleandata/tweets_noDuplicates.csv', index=False)

In [17]:
text = data['text']
text.head()

0                                                                             The economy will collapse in the long term if we continue with the business closures and large government making decisions for us that have proven to mess everything up.
1          Classic counsel on cover letters! Covid-19 and business closures mean more job seekers. Get informed, boost your chances. #jobsearch #jobsearchcoach @National Harbor, Maryland https://www.instagram.com/p/B_Vosy8nb3S/?igshid=xj0jre0tc5sq
2                                                                                                                                                                     I'm guessing that would violate the mayor's non-essential business closure order?
3    Ok, why did this take so long...and yet mandatory stay at home orders and “non-essential business” closures have been in place for weeks... Wearing masks is key to stopping the spread but also .....responsibly reopening our Maryland economy. 
4       

In [21]:
# clean data with CountVectorizer

cvec = CountVectorizer()
text_cv = cvec.fit_transform(text)