Skip to content

WhatsApp Sentiment Analysis: Unveiling Emotional Trends in Conversations

License

Notifications You must be signed in to change notification settings

ecetsn/Whatsapp-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis through My WhatsApp Messages · Build Status npm PRs Welcome GitHub license

The detailed results and steps are given in the .ipynb

A project that conducts sentiment analysis on WhatsApp messages, exploring patterns and intensity of communication throughout the day.

Installing / Getting Started

To get started, clone the project and navigate to the project directory. Install the required dependencies:

git clone https://github.com/ecetsn/CS210_Term_Project.git
cd CS210_Term_Project/
pip install -r requirements.txt

Developing

Built With

  • Python
  • Jupyter Notebook

Prerequisites

  • Python (>=3.6)
  • Jupyter Notebook
  • ZEMBEREK Turkish NLP library
  • BERT-based Turkish sentiment analysis model

To reproduce the analysis:

  • Install the necessary libraries mentioned in the notebook.
!pip install pandas
!pip install transformers
!pip install zemberek-nlp
!pip install plotly
!pip install seaborn
!pip install matplotlib
  • Ensure the availability of WhatsApp data for analysis.
  • Run the notebook in a Jupyter environment, following each step.

Building

No additional building steps are required for this project.

Deploying / Publishing

No specific deployment steps are needed as this project primarily focuses on analysis and exploration.

Configuration

No user-configurable parameters. The configuration involves installing the required dependencies and setting up the environment.

API Reference

No external API is used. The project primarily utilizes Python libraries for sentiment analysis and spell-checking.

Steps for Analysis

Sentiment Analysis

  • Data Collection
    • Collecting conversational data from WhatsApp, ensuring the inclusion of timestamps for each interaction.
  • Data Cleaning
    • Handle missing values
    • Convert to lowercase
    • Remove special characters
    • Remove links
    • Tokenization
    • Remove stopwords
    • Spell checking
      • Set the environment for Spell Checking using ZEMBEREK Turkish NLP
  • Sentiment Analysis
    • Set the environment for Sentiment Analysis using BERT-based Turkish Mode

Habit Analysis

After the data collection, the intensity of the conversation is observed according to time and date

  • Get the intensity of messages
  • Observe them by the periods of the day
  • Combine this knowledge with the sentiment score of the messages

Findings

Sentiment Correlations Findings:

Analysis of Weighted Sentiment Over Time to Derive a Correlation Heatmap

In this analysis, I examine the trends in weighted sentiment over time by grouping the data based on date and time slots. The weighted sentiment is calculated and averaged for each group, resulting in a pivot table that provides insights into sentiment patterns during different times of the day.

img5

Sentiment Scores Overview

In this analysis, a table containing sentiment scores for every message is presented. The sentiment scores were calculated using the BERT-Turkish model. This table serves as the foundation for hypothesis testing to explore patterns and trends in sentiment across different time slots.

img2

  • The table provides a comprehensive view of sentiment scores for each message. These scores include both positive and negative values, allowing for a detailed examination of the sentiment distribution.

img3

  • The table provides a comprehensive view of positive sentiment scores for each message

img4

  • The table provides a comprehensive view of negative sentiment scores for each message

These findings on sentiment correlation are used to test my hypothesis

Hypothesis Testing

Null Hypothesis(H0)

  • There is no significant correlation between the selected periods of the day.

Alternative Hypothesis(H1)

  • There is a significant correlation between the selected periods of the day.

Hypothesis Testing Technique:

  • In this section, I employ the Pearson correlation test to assess the correlation between different variables. The pearsonr function is utilized to calculate both correlation coefficients and associated p-values for hypothesis testing.

Purpose:

  • The Pearson correlation test is employed to understand the strength and direction of the linear relationship between two variables. This analysis provides insight into whether changes in one variable are associated with systematic changes in another.

Method:

  • The code snippet utilizes the pearsonr function from the scipy.stats library to conduct hypothesis testing based on the Pearson correlation coefficient. This coefficient is employed to measure the linear relationship between two variables, providing insights into the strength and direction of their association. The accompanying p-value assists in evaluating the statistical significance of the observed correlation.

  • In this context, the null hypothesis posits that there is no correlation between the specified pairs of variables. The code then calculates the p-value associated with the correlation coefficients for morning vs. night average sentiment and evening vs. noon average sentiment. The significance level (alpha) is set to 0.05, a commonly used threshold in hypothesis testing.

  • The subsequent evaluation of results involves comparing the computed p-values with the chosen significance level. If a p-value is less than the significance level, it suggests that there is a statistically significant correlation between the respective pairs of variables. Conversely, if the p-value exceeds the significance level, the conclusion is that there is no significant correlation.

Morning vs. Night Sentiment:

  • There is no significant correlation between morning average sentiment and night average sentiment.

Morning vs. Evening Sentiment:

  • There is no significant correlation between morning average sentiment and evening average sentiment.

Morning vs. Noon Sentiment:

  • There is no significant correlation between morning average sentiment and noon average sentiment.

Night vs. Evening Sentiment:

  • There is no significant correlation between night average sentiment and evening average sentiment.

Night vs. Noon Sentiment:

  • There is no significant correlation between night average sentiment and noon average sentiment.

Evening vs. Noon Sentiment:

  • There is a significant correlation between evening average sentiment and noon average sentiment.

Interpretation:

The lack of significant correlations in most comparisons suggests that the sentiment during one-time slot is generally independent of the sentiment during other time slots. This indicates that factors influencing sentiment may vary throughout the day. However, the significant correlation between evening and noon sentiment implies a potential pattern or similarity in sentiment during these specific time slots. Further investigation into the nature of this correlation may provide valuable insights into factors influencing sentiment during these times. This analysis lays the groundwork for understanding the temporal dynamics of sentiment and can guide future explorations or targeted interventions during specific time slots.

Habit Analysis Findings:

newplot

img1

The total number of messages during different time slots follows the following ranks:

Noon:

  • Noon has the highest total number of messages, indicating that this time slot is the most active in terms of messaging intensity.

Evening:

  • The evening comes next in terms of total messages, suggesting a considerable level of communication during this period.

Night:

  • Nighttime exhibits a lower but significant total number of messages, signifying a notable level of activity during nighttime hours.

Morning:

  • Morning shows the lowest total number of messages among the time slots, indicating relatively lower messaging activity during the morning hours.

Interpretation:

The observed ranks in messaging intensity provide valuable insights into my messaging habits. The highest activity during noon and evening may be influenced by various factors such as work schedules, social interactions, or personal preferences. The lower messaging intensity in the morning could be attributed to factors like work commitments or the start of the day.

Understanding these patterns can help me manage my communication effectively and adapt to the natural rhythm of my messaging behaviour throughout the day.

Licensing

This project is licensed under the MIT License. For details, refer to the LICENSE file.

Releases

No releases published

Packages

No packages published