# Twitter Sentiment Analysis using Machine Learning
Twitter is a social networking platform where people have the freedom to share their opinions on any topic. We sometimes see a strong discussion on Twitter about someone's opinion that sometimes leads to a bunch of negative tweets.

Sentiment analysis is the task of natural language processing. All social media platforms should monitor the feelings of discussion participants. We often see negative opinions on Twitter when the discussion is political. Therefore, each platform must continue to analyze sentiment to find the type of people who are spreading hate and negativity on their platform. For the Twitter Sentiment Analysis task, I collected a Kaggle dataset containing tweets about a long-form discussion within a group of users. Our task here is to determine the number of positive and negative tweets so that we can give a result. Therefore, in the section below, I will create a task of analyzing Twitter sentiment using Python.

**Please note that my project draws inspiration from 'Machine Learning through Examples' by Dr. Alaa Tuaima, as I explore the concepts and techniques outlined in the book to create innovative solutions.**

### Importing Libraries for Generalized Text Classification, and loading the dataset
This code snippet showcases the initial step of importing necessary libraries for a generalized text classification task. It includes the essential libraries like pandas and numpy for data manipulation, as well as modules from scikit-learn for text feature extraction and classification.

In [19]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import re
import nltk
import warnings
warnings.filterwarnings('ignore')

In [20]:
df = pd.read_csv('twitter.csv')
df.drop(columns='Unnamed: 0', inplace=True)
df.head()

Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet
0,3,0,0,3,2,!!! RT @mayasolovely: As a woman you shouldn't...
1,3,0,3,0,1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2,3,0,3,0,1,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3,3,0,2,1,1,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4,6,0,6,0,1,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...


### Text Cleaning and Preprocessing using NLTK
This code uses NLTK and regular expressions to clean and prepare text data. It involves lowercase conversion, URL and tag removal, punctuation stripping, stopword elimination, and word stemming. The code showcases a streamlined approach to enhance the quality of text data for analysis or modeling.

In [21]:
#nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))  # Stemming is used to transform words into their base forms
def clean(text):
    text = str(text).lower()   # converts the input text to lowercase
    text = re.sub('\[.*?\]', '', text)   # removes any content enclosed within square brackets
    text = re.sub('https?://\S+|www\.\S+', '', text)   # removes URLs and website links 
    text = re.sub('<.*?>+', '', text)   # removes HTML tags
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)   # removes any punctuation marks from the text
    text = re.sub('\n', '', text)   # removes newline characters from the text
    text = re.sub('\w*\d\w*', '', text)   # removes words that contain numbers
    text = [word for word in text.split(' ') if word not in stopword]   # split the cleaned text into individual words, and remove any stopwords
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
df["tweet"] = df["tweet"].apply(clean)

### Sentiment Analysis with NLTK's VADER
The code employs the VADER sentiment analysis tool from NLTK. By analyzing text in the "tweet" column of a DataFrame, it computes positive, negative, and neutral sentiment scores for each entry. The sentiment scores provide insights into the emotional tone of the text, aiding in understanding sentiment patterns within the data.

In [22]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer   # It is a class from NLTK used for sentiment analysis
#nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
df["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in df["tweet"]]   
df["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in df["tweet"]]
df["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in df["tweet"]]
# this code calculates sentiment scores using the polarity_scores method

In [23]:
df = df[["tweet", "Positive", "Negative", "Neutral"]]
df.head()

Unnamed: 0,tweet,Positive,Negative,Neutral
0,rt mayasolov woman shouldnt complain clean ho...,0.147,0.157,0.696
1,rt boy dat coldtyga dwn bad cuffin dat hoe ...,0.0,0.28,0.72
2,rt urkindofbrand dawg rt ever fuck bitch sta...,0.0,0.577,0.423
3,rt cganderson vivabas look like tranni,0.333,0.0,0.667
4,rt shenikarobert shit hear might true might f...,0.154,0.407,0.44


### Aggregating Sentiment Scores and Determining Overall Sentiment
This code calculates the total positive, negative, and neutral sentiment scores from the "Positive," "Negative," and "Neutral" columns of a DataFrame, respectively. It then defines a function, "sentiment_score," that takes these scores as arguments and identifies the overall sentiment tendency based on which score is the highest.

In [24]:
x = sum(df["Positive"])
y = sum(df["Negative"])
z = sum(df["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive")
    elif (b>a) and (b>c):
        print("Negative")
    else:
        print("Neutral")

sentiment_score(x, y, z)

Neutral


In [25]:
print("Positive: ", x)
print("Negative: ", y)
print("Neutral: ", z)

Positive:  2880.086000000009
Negative:  7201.020999999922
Neutral:  14696.887999999733
