# Twitter Sentiment Analysis Classification

## Libraries

In [4]:
import numpy as np
import pandas as pd

## Dataset Load

In [3]:
!curl "https://dbdmg.polito.it/dbdmg_web/wp-content/uploads/2021/12/DSL2122_january_dataset.zip" -Lo dataset.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.7M  100 17.7M    0     0  16.0M      0  0:00:01  0:00:01 --:--:-- 16.0M


In [4]:
!unzip -q dataset.zip; rm dataset.zip; rm -r __MACOSX/

In [55]:
tweets = pd.read_csv("./DSL2122_january_dataset/development.csv")
tweets

Unnamed: 0,sentiment,ids,date,flag,user,text
0,1,1833972543,Mon May 18 01:08:27 PDT 2009,NO_QUERY,Killandra,"@MissBianca76 Yes, talking helps a lot.. going..."
1,1,1980318193,Sun May 31 06:23:17 PDT 2009,NO_QUERY,IMlisacowan,SUNSHINE. livingg itttt. imma lie on the grass...
2,1,1994409198,Mon Jun 01 11:52:54 PDT 2009,NO_QUERY,yaseminx3,@PleaseBeMine Something for your iphone
3,0,1824749377,Sun May 17 02:45:34 PDT 2009,NO_QUERY,no_surprises,@GabrielSaporta couldn't get in to the after p...
4,0,2001199113,Tue Jun 02 00:08:07 PDT 2009,NO_QUERY,Rhi_ShortStack,@bradiewebbstack awww is andy being mean again...
...,...,...,...,...,...,...
224989,0,2261324310,Sat Jun 20 20:36:48 PDT 2009,NO_QUERY,CynthiaBuroughs,@Dropsofreign yeah I hope Iran people reach fr...
224990,1,1989408152,Mon Jun 01 01:25:45 PDT 2009,NO_QUERY,unitechy,Trying the qwerty keypad
224991,0,1991221316,Mon Jun 01 06:38:10 PDT 2009,NO_QUERY,Xaan,I love Jasper &amp; Jackson but that wig in th...
224992,0,2239702807,Fri Jun 19 08:51:56 PDT 2009,NO_QUERY,Ginger_Billie,I am really tired and bored and bleh! I feel c...


## Data exploration

In [56]:
tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 224994 entries, 0 to 224993
Data columns (total 6 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   sentiment  224994 non-null  int64 
 1   ids        224994 non-null  int64 
 2   date       224994 non-null  object
 3   flag       224994 non-null  object
 4   user       224994 non-null  object
 5   text       224994 non-null  object
dtypes: int64(2), object(4)
memory usage: 10.3+ MB


In [57]:
tweets["date"]

0         Mon May 18 01:08:27 PDT 2009
1         Sun May 31 06:23:17 PDT 2009
2         Mon Jun 01 11:52:54 PDT 2009
3         Sun May 17 02:45:34 PDT 2009
4         Tue Jun 02 00:08:07 PDT 2009
                      ...             
224989    Sat Jun 20 20:36:48 PDT 2009
224990    Mon Jun 01 01:25:45 PDT 2009
224991    Mon Jun 01 06:38:10 PDT 2009
224992    Fri Jun 19 08:51:56 PDT 2009
224993    Wed Jun 03 06:00:29 PDT 2009
Name: date, Length: 224994, dtype: object

The _date_ feature contains several different information, these are retrived with the following lines of code

In [58]:
tweets[["day_of_week", "month", "day", "time", "tz", "year"]] = tweets['date'].str.split(' ', expand=True)

In [59]:
tweets[["hour", "minute", "second"]] = tweets['time'].str.split(':', expand=True)

At this point, the information whose information have been extracted, can be removed

In [60]:
tweets.drop(columns=["date", "time"], inplace=True)

In [61]:
tweets["flag"].unique()

array(['NO_QUERY'], dtype=object)

In [62]:
tweets["tz"].unique()

array(['PDT'], dtype=object)

In [63]:
tweets["year"].unique()

array(['2009'], dtype=object)

Since the dataset containes only dates in Pacific Daylight Time (PDT) format and only for the year 2009, these features are not relevant and can be dropped.
The flag feature does not contain any useful info and minutes and seconds do not convey any information so they can be removed as well.

In [64]:
tweets.drop(columns=["tz", "year", "minute", "second", "flag"], inplace=True)

In [65]:
tweets

Unnamed: 0,sentiment,ids,user,text,day_of_week,month,day,hour
0,1,1833972543,Killandra,"@MissBianca76 Yes, talking helps a lot.. going...",Mon,May,18,01
1,1,1980318193,IMlisacowan,SUNSHINE. livingg itttt. imma lie on the grass...,Sun,May,31,06
2,1,1994409198,yaseminx3,@PleaseBeMine Something for your iphone,Mon,Jun,01,11
3,0,1824749377,no_surprises,@GabrielSaporta couldn't get in to the after p...,Sun,May,17,02
4,0,2001199113,Rhi_ShortStack,@bradiewebbstack awww is andy being mean again...,Tue,Jun,02,00
...,...,...,...,...,...,...,...,...
224989,0,2261324310,CynthiaBuroughs,@Dropsofreign yeah I hope Iran people reach fr...,Sat,Jun,20,20
224990,1,1989408152,unitechy,Trying the qwerty keypad,Mon,Jun,01,01
224991,0,1991221316,Xaan,I love Jasper &amp; Jackson but that wig in th...,Mon,Jun,01,06
224992,0,2239702807,Ginger_Billie,I am really tired and bored and bleh! I feel c...,Fri,Jun,19,08
