Twitter is one of those social media platforms where people are free to share their opinions on any topic. Sometimes we see a strong discussion on Twitter about someone’s opinion that sometimes results in a collection of negative tweets.

## Twitter Sentiment Analysis

Sentiment analysis is a task of natural language processing. All social media platforms should monitor the sentiments of those engaged in a discussion. We mostly see negative opinions on Twitter when the discussion is political. So, each platform should continue to analyze the sentiments to find the type of people who are spreading hate and negativity on their platform.

For the task of Twitter sentiment analysis, I have collected a dataset from Kaggle that contains tweets about a long discussion within a group of users. Here our task is to identify how many tweets are negative and positive so that we can give a conclusion. So, in the section below, I’m going to introduce you to a task of Twitter sentiment analysis using Python.

Let’s start the task of Twitter sentiment analysis by importing the necessary Python libraries and the dataset:

In [4]:
import pandas as pd
import numpy as np
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import re
import nltk
import nltk

data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/twitter.csv")
print(data.head())

   Unnamed: 0  count  hate_speech  offensive_language  neither  class  \
0           0      3            0                   0        3      2   
1           1      3            0                   3        0      1   
2           2      3            0                   3        0      1   
3           3      3            0                   2        1      1   
4           4      6            0                   6        0      1   

                                               tweet  
0  !!! RT @mayasolovely: As a woman you shouldn't...  
1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...  
2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...  
3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...  
4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...  


The tweet column in the above dataset contains the tweets that we need to use to analyze the feelings of those engaged in the discussion. But to go further, we have to clean up a lot of errors and other special symbols because these tweets contain a lot of language errors. So here is how we can clean up the tweet column:

In [2]:
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["tweet"] = data["tweet"].apply(clean)

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\SHREE\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Now, the next step is to calculate the sentiment scores of these tweets and assign a label to the tweets as positive, negative, or neutral. Here is how you can calculate the sentiment scores of the tweets:

In [5]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["tweet"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["tweet"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["tweet"]]

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\SHREE\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Now I will only select the columns from this data that we need for the rest of the task of Twitter sentiment analysis:

In [6]:
data = data[["tweet", "Positive", 
             "Negative", "Neutral"]]
print(data.head())

                                               tweet  Positive  Negative  \
0  !!! RT @mayasolovely: As a woman you shouldn't...     0.120     0.000   
1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...     0.000     0.237   
2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...     0.000     0.538   
3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...     0.344     0.000   
4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...     0.081     0.249   

   Neutral  
0    0.880  
1    0.763  
2    0.462  
3    0.656  
4    0.669  


Now let’s have a look at the most frequent label assigned to the tweets according to the sentiment scores:

In [7]:
x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive 😊 ")
    elif (b>a) and (b>c):
        print("Negative 😠 ")
    else:
        print("Neutral 🙂 ")
sentiment_score(x, y, z)

Neutral 🙂 


So the most of the tweets are neutral, which means they are neither positive nor negative. Now let’s have a look at the total of the sentiment scores:

In [8]:
print("Positive: ", x)
print("Negative: ", y)
print("Neutral: ", z)

Positive:  2338.1619999999994
Negative:  5687.947999999999
Neutral:  16756.911999999837


The total of neutral is way higher than negative and positive, but out of all the tweets, the negative tweets are more than the positive tweets, so we can say that most of the opinions are negative.

## Summary

So this is how you can perform the task of Twitter sentiment analysis by using the Python programming language. Analyzing sentiments is a task of natural language processing. All the social media platforms need to keep a check on the sentiments of people engaged in a discussion.