# Twitter Sentiment Analysis

#### This task will identify how many tweets are negative and positive so that we can give a conclusion.

### 1. Loading the required packages and libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import re
import nltk
import nltk

### 2. Loading the data

In [2]:
data_frame = pd.read_csv("/Data/twitter.csv")

### 3. Exploring the data

In [3]:
data_frame

Unnamed: 0.1,Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet
0,0,3,0,0,3,2,!!! RT @mayasolovely: As a woman you shouldn't...
1,1,3,0,3,0,1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2,2,3,0,3,0,1,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3,3,3,0,2,1,1,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4,4,6,0,6,0,1,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...
...,...,...,...,...,...,...,...
24778,25291,3,0,2,1,1,you's a muthaf***in lie &#8220;@LifeAsKing: @2...
24779,25292,3,0,1,2,2,"you've gone and broke the wrong heart baby, an..."
24780,25294,3,0,3,0,1,young buck wanna eat!!.. dat nigguh like I ain...
24781,25295,6,0,6,0,1,youu got wild bitches tellin you lies


In [4]:
print(data_frame.head())

   Unnamed: 0  count  hate_speech  offensive_language  neither  class  \
0           0      3            0                   0        3      2   
1           1      3            0                   3        0      1   
2           2      3            0                   3        0      1   
3           3      3            0                   2        1      1   
4           4      6            0                   6        0      1   

                                               tweet  
0  !!! RT @mayasolovely: As a woman you shouldn't...  
1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...  
2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...  
3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...  
4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...  


In [5]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /home/mule/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so commonly used that they carry very little useful information.

In [6]:
stemmer = nltk.SnowballStemmer("english")

Stemming is reducing a word to its base word. For example: care, cared and caring lie under the same stem 'care'.

In [7]:
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

In [8]:
def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text

In [9]:
data_frame["tweet"] = data_frame["tweet"].apply(clean)

In [10]:
data_frame.head()

Unnamed: 0.1,Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet
0,0,3,0,0,3,2,rt mayasolov woman shouldnt complain clean ho...
1,1,3,0,3,0,1,rt boy dat coldtyga dwn bad cuffin dat hoe ...
2,2,3,0,3,0,1,rt urkindofbrand dawg rt ever fuck bitch sta...
3,3,3,0,2,1,1,rt cganderson vivabas look like tranni
4,4,6,0,6,0,1,rt shenikarobert shit hear might true might f...


In [11]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/mule/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [12]:
data_frame["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data_frame["tweet"]]
data_frame["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data_frame["tweet"]]
data_frame["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data_frame["tweet"]]

Now, let us select the columns from this data that we need for the rest of the task of Twitter sentiment analysis

In [13]:
data_frame = data_frame[["tweet", "Positive", "Negative", "Neutral"]]
data_frame.head()

Unnamed: 0,tweet,Positive,Negative,Neutral
0,rt mayasolov woman shouldnt complain clean ho...,0.147,0.157,0.696
1,rt boy dat coldtyga dwn bad cuffin dat hoe ...,0.0,0.28,0.72
2,rt urkindofbrand dawg rt ever fuck bitch sta...,0.0,0.577,0.423
3,rt cganderson vivabas look like tranni,0.333,0.0,0.667
4,rt shenikarobert shit hear might true might f...,0.154,0.407,0.44


Let us find out the most frequently assigned label as per the scores above:

In [14]:
x = sum(data_frame["Positive"])
y = sum(data_frame["Negative"])
z = sum(data_frame["Neutral"])

In [15]:
def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive tweet ")
    elif (b>a) and (b>c):
        print("Negative tweet ")
    else:
        print("Neutral tweet ")

In [16]:
sentiment_score(x,y,z)

Neutral tweet 


#### individual scores

In [17]:
print("Positive: ", x)
print("Negative: ", y)
print("Neutral: ", z)

Positive:  2880.086000000009
Negative:  7201.020999999922
Neutral:  14696.887999999733


Neutral tweets are higher than positive or negative. However, Negative tweets is three time compared to positive tweets. Because of this we can say that most opinions are Negative