X Sentiment Analysis using Python


Sentiment analysis is a task of natural language processing. All social media platforms should monitor the sentiments of those engaged in a discussion. We mostly see negative opinions on Twitter when the discussion is political. So, each platform should continue to analyze the sentiments to find the type of people who are spreading hate and negativity on their platform.

For the task of Twitter sentiment analysis, I have collected a dataset from Kaggle that contains tweets about a long discussion within a group of users. Here our task is to identify how many tweets are negative and positive so that we can give a conclusion. So, in the section below, I’m going to introduce you to a task of Twitter sentiment analysis using Python.

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import re
import nltk
import nltk

data = pd.read_csv("/content/drive/MyDrive/elon_musk_tweets.csv")
print(data.head())

                    id  user_name user_location           user_description  \
0  1544379368478212100  Elon Musk           NaN  Mars & Cars, Chips & Dips   
1  1544377493263720450  Elon Musk           NaN  Mars & Cars, Chips & Dips   
2  1544377130590552064  Elon Musk           NaN  Mars & Cars, Chips & Dips   
3  1544375575724400645  Elon Musk           NaN  Mars & Cars, Chips & Dips   
4  1544375148605853699  Elon Musk           NaN  Mars & Cars, Chips & Dips   

                user_created  user_followers  user_friends  user_favourites  \
0  2009-06-02 20:12:29+00:00       101240855           115            13503   
1  2009-06-02 20:12:29+00:00       101240806           115            13503   
2  2009-06-02 20:12:29+00:00       101240806           115            13503   
3  2009-06-02 20:12:29+00:00       101240806           115            13503   
4  2009-06-02 20:12:29+00:00       101240806           115            13503   

   user_verified                       date  \
0        

The tweet column in the above dataset contains the tweets that we need to use to analyze the feelings of those engaged in the discussion. But to go further, we have to clean up a lot of errors and other special symbols because these tweets contain a lot of language errors. So here is how we can clean up the tweet column:


In [4]:
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text

# Replace 'Tweets' with the actual column name from data.columns output
data["text"] = data["text"].apply(clean) # Apply clean function to the correct column

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Now, the next step is to calculate the sentiment scores of these tweets and assign a label to the tweets as positive, negative, or neutral. Here is how you can calculate the sentiment scores of the tweets:

In [5]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["text"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["text"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["text"]]

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


Now I will only select the columns from this data that we need for the rest of the task of Twitter sentiment analysis:

In [7]:
data = data[["text", "Positive",
             "Negative", "Neutral"]]
print(data.head())

                                                text  Positive  Negative  \
0   find gold toe sock – inevit kilter amp wash –...       0.0       0.0   
1                               sock con confer sock       0.0       0.0   
2  alway someth new magazin cover articl practic ...       0.0       0.0   
3                             explainthisbob guy get       0.0       0.0   
4  sock tech advanc get pretti much anyth sock fo...       0.0       0.0   

   Neutral  
0      1.0  
1      1.0  
2      1.0  
3      1.0  
4      1.0  


Now let’s have a look at the most frequent label assigned to the tweets according to the sentiment scores:

In [8]:
x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive 😊 ")
    elif (b>a) and (b>c):
        print("Negative 😠 ")
    else:
        print("Neutral 🙂 ")
sentiment_score(x, y, z)

Neutral 🙂 


So the most of the tweets are neutral, which means they are neither positive nor negative. Now let’s have a look at the total of the sentiment scores:

In [9]:
print("Positive: ", x)
print("Negative: ", y)
print("Neutral: ", z)

Positive:  969.3199999999972
Negative:  325.2209999999998
Neutral:  4584.47199999997


The total of neutral is way higher than negative and positive, but out of all the tweets, the negative tweets are more than the positive tweets, so we can say that most of the opinions are negative.

**Summary**

So this is how you can perform the task of Twitter sentiment analysis by using the Python programming language. Analyzing sentiments is a task of natural language processing. All the social media platforms need to keep a check on the sentiments of people engaged in a discussion.