### Social Media Trend Analysis

#### 1) Problem Statement

The world is fast changing, social platforms are growing and becoming even popular than ever. Organizations are trying to leverage the information in these sites to their advantage. Through engagements in posts, likes, reactions and even public discussions.
Therefore, understanding the sentiments shared across different platforms is fundamnetal to businesses, policy makers and organizations for them to make informed decisons.
This project aims to develop a sentiment analysis model that categorizes social media posts as positive, negative, or neutral. By leveraging machine learning techniques and natural language processing (NLP), we will analyze sentiment trends across platforms, assess engagement metrics (likes, retweets), and identify key topics driving online conversations.
Insights and findings from these trends will help inform consumer behavior, brand perception, and or emerging trends that will in the long run help businesses and organizations improve decision-making process, brand management and customer engagement.

#### 2) Data Collection

* Data Collection - https://www.kaggle.com/datasets/kashishparmar02/social-media-sentiments-analysis-dataset/data 
* The data consists of 15 columns and 732 rows

#### 2.1 Import Data and Required Packages

**Importing Pandas, Numpy, Matplotlib, Seaborn and Warings Library.**

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

Import the CSV Data as Pandas DataFrame

In [3]:
df = pd.read_csv('data/sentimentdataset.csv')

In [4]:
df.info(), df.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 732 entries, 0 to 731
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0.1  732 non-null    int64  
 1   Unnamed: 0    732 non-null    int64  
 2   Text          732 non-null    object 
 3   Sentiment     732 non-null    object 
 4   Timestamp     732 non-null    object 
 5   User          732 non-null    object 
 6   Platform      732 non-null    object 
 7   Hashtags      732 non-null    object 
 8   Retweets      732 non-null    float64
 9   Likes         732 non-null    float64
 10  Country       732 non-null    object 
 11  Year          732 non-null    int64  
 12  Month         732 non-null    int64  
 13  Day           732 non-null    int64  
 14  Hour          732 non-null    int64  
dtypes: float64(2), int64(6), object(7)
memory usage: 85.9+ KB


(None,
    Unnamed: 0.1  Unnamed: 0  \
 0             0           0   
 1             1           1   
 2             2           2   
 3             3           3   
 4             4           4   
 5             5           5   
 6             6           6   
 7             7           7   
 8             8           8   
 9             9           9   
 
                                                 Text    Sentiment  \
 0   Enjoying a beautiful day at the park!        ...   Positive     
 1   Traffic was terrible this morning.           ...   Negative     
 2   Just finished an amazing workout! 💪          ...   Positive     
 3   Excited about the upcoming weekend getaway!  ...   Positive     
 4   Trying out a new recipe for dinner tonight.  ...   Neutral      
 5   Feeling grateful for the little things in lif...   Positive     
 6   Rainy days call for cozy blankets and hot coc...   Positive     
 7   The new movie release is a must-watch!       ...   Positive     
 8   Poli

2.2 Dataset information

This dataset contains 732 entries/rows and 15 columns.Key columns include:
* **Text**: The social media post.

* **Sentiment**: Labeled as Positive, Negative, or Neutral.

* **Timestamp**: Date and time of the post.

* **User**: The account that posted.

* **Platform**: Source (Twitter, Facebook, Instagram, etc.).

* **Hashtags**: Associated hashtags.

* **Retweets & Likes**: Engagement metrics.

* **Country**: Origin of the post.

* **Year, Month, Day, Hour**: Date components.

#### 3. Data Checks to perform

* Check Missing values
* Check Duplicates
* Check data type
* Check the number of unique values of each column

In [5]:
#Checking for missing values
df.isnull().sum()

Unnamed: 0.1    0
Unnamed: 0      0
Text            0
Sentiment       0
Timestamp       0
User            0
Platform        0
Hashtags        0
Retweets        0
Likes           0
Country         0
Year            0
Month           0
Day             0
Hour            0
dtype: int64

**Missing Values**: No missing values in any column.

In [6]:
#Checking for duplicates
df.duplicated().sum()

0

**Duplicate Entries**: No duplicate rows found.

In [7]:
#Checking the data types in in the dataset
df.dtypes

Unnamed: 0.1      int64
Unnamed: 0        int64
Text             object
Sentiment        object
Timestamp        object
User             object
Platform         object
Hashtags         object
Retweets        float64
Likes           float64
Country          object
Year              int64
Month             int64
Day               int64
Hour              int64
dtype: object

In [8]:
#Dropping unncessary columns
df_cleaned=df.drop(columns=["Unnamed: 0.1", "Unnamed: 0"],errors="ignore")


In [9]:
#Checking unique sentiment values
unique_sentiments=df_cleaned["Sentiment"].unique()
unique_sentiments


array([' Positive  ', ' Negative  ', ' Neutral   ', ' Anger        ',
       ' Fear         ', ' Sadness      ', ' Disgust      ',
       ' Happiness    ', ' Joy          ', ' Love         ',
       ' Amusement    ', ' Enjoyment    ', ' Admiration   ',
       ' Affection    ', ' Awe          ', ' Disappointed ',
       ' Surprise     ', ' Acceptance   ', ' Adoration    ',
       ' Anticipation ', ' Bitter       ', ' Calmness     ',
       ' Confusion    ', ' Excitement   ', ' Kind         ',
       ' Pride        ', ' Shame        ', ' Confusion ', ' Excitement ',
       ' Shame ', ' Elation       ', ' Euphoria      ', ' Contentment   ',
       ' Serenity      ', ' Gratitude     ', ' Hope          ',
       ' Empowerment   ', ' Compassion    ', ' Tenderness    ',
       ' Arousal       ', ' Enthusiasm    ', ' Fulfillment  ',
       ' Reverence     ', ' Compassion', ' Fulfillment   ', ' Reverence ',
       ' Elation   ', ' Despair         ', ' Grief           ',
       ' Loneliness     

**Total unique sentiment labels**: 279

**Unique Sentiment Labels**: The dataset contains a mix of standard sentiment categories (*Positive, Negative, Neutral*) and many other emotion labels (*Joy, Surprise, Sadness, Love, Regret, etc.*). Some labels also have trailing spaces.

#### **Clean Sentiment Labels**

Standardize labels by removing extra spaces and grouping similar sentiments.

In [10]:
#Clean sentiment labesl by stripping extra spaces
df_cleaned["Sentiment"] = df_cleaned["Sentiment"].str.strip()

#Standardize sentiment labeles
positive_labels = {"Joy", "Happiness", "Excitement", "Love", "Admiration", "Gratitude", "Pride", "Euphoria", "Optimism"}
negative_labels = {"Sadness", "Anger", "Fear", "Disgust", "Regret", "Loneliness", "Frustration", "Betrayal", "Despair"}
neutral_labels = {"Neutral", "Indifference", "Curiosity", "Surprise", "Reflection"}

def categorize_sentiment(sentiment):
    if sentiment in positive_labels:
        return "Positive"
    elif sentiment in negative_labels:
        return "Negative"
    elif sentiment in neutral_labels:
        return "Neutral"
    return "Other" #For any sentiment that is not categorized


df_cleaned["Sentiment"] = df_cleaned["Sentiment"].apply(categorize_sentiment)

#Display the updated sentiment distribution
df_cleaned["Sentiment"].value_counts()

Sentiment
Other       508
Positive    126
Neutral      50
Negative     48
Name: count, dtype: int64

The "Other" category has too many entries. Further refining needs to be done.

First, we'll use a pretrained model TextBlob, then later we can consider building a model using our own dataset.

#### Using TextBlob to categorize our sentiments

In [11]:
from textblob import  TextBlob

def get_textblob_sentiments(text):
    score = TextBlob(str(text)).sentiment.polarity
    if score > 0:
        return "Positive"
    elif score < 0 :
        return "Negaive"
    else:
        return "Neutral"

#Apply the sentiment classification
df_cleaned["TextBlob_Sentiment"] = df_cleaned["Text"].apply(get_textblob_sentiments)

#Display sentiment distribution using TextBlob
df_cleaned["TextBlob_Sentiment"].value_counts()

#df_cleaned  

TextBlob_Sentiment
Neutral     324
Positive    282
Negaive     126
Name: count, dtype: int64

In [12]:
df_cleaned

Unnamed: 0,Text,Sentiment,Timestamp,User,Platform,Hashtags,Retweets,Likes,Country,Year,Month,Day,Hour,TextBlob_Sentiment
0,Enjoying a beautiful day at the park! ...,Other,2023-01-15 12:30:00,User123,Twitter,#Nature #Park,15.0,30.0,USA,2023,1,15,12,Positive
1,Traffic was terrible this morning. ...,Other,2023-01-15 08:45:00,CommuterX,Twitter,#Traffic #Morning,5.0,10.0,Canada,2023,1,15,8,Negaive
2,Just finished an amazing workout! 💪 ...,Other,2023-01-15 15:45:00,FitnessFan,Instagram,#Fitness #Workout,20.0,40.0,USA,2023,1,15,15,Positive
3,Excited about the upcoming weekend getaway! ...,Other,2023-01-15 18:20:00,AdventureX,Facebook,#Travel #Adventure,8.0,15.0,UK,2023,1,15,18,Positive
4,Trying out a new recipe for dinner tonight. ...,Neutral,2023-01-15 19:55:00,ChefCook,Instagram,#Cooking #Food,12.0,25.0,Australia,2023,1,15,19,Positive
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
727,Collaborating on a science project that receiv...,Other,2017-08-18 18:20:00,ScienceProjectSuccessHighSchool,Facebook,#ScienceFairWinner #HighSchoolScience,20.0,39.0,UK,2017,8,18,18,Positive
728,Attending a surprise birthday party organized ...,Other,2018-06-22 14:15:00,BirthdayPartyJoyHighSchool,Instagram,#SurpriseCelebration #HighSchoolFriendship,25.0,48.0,USA,2018,6,22,14,Positive
729,Successfully fundraising for a school charity ...,Other,2019-04-05 17:30:00,CharityFundraisingTriumphHighSchool,Twitter,#CommunityGiving #HighSchoolPhilanthropy,22.0,42.0,Canada,2019,4,5,17,Positive
730,"Participating in a multicultural festival, cel...",Other,2020-02-29 20:45:00,MulticulturalFestivalJoyHighSchool,Facebook,#CulturalCelebration #HighSchoolUnity,21.0,43.0,UK,2020,2,29,20,Positive


In [13]:
df_clean1=df_cleaned.drop(columns="Sentiment")

In [15]:
 df_clean1.to_excel("Social_Sentiments.xlsx")
