Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions Whatsapp Chats Sentiment Analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# WhatsApp Chats Sentiment Analysis

## Overview
This project performs sentiment analysis on WhatsApp chat data using Natural Language Processing (NLP) techniques. The implementation extracts chat messages from a WhatsApp export file, processes the text data, and analyzes sentiment patterns using VADER (Valence Aware Dictionary and sentiment Reasoner).

## Features
- WhatsApp chat file parsing and message extraction
- Multiline message handling
- Sentiment analysis using NLTK's VADER
- Visualization of sentiment distribution (Positive, Negative, Neutral)


## Technologies Used
- Python 3
- Pandas & NumPy for data manipulation
- NLTK for sentiment analysis
- Matplotlib & Seaborn for visualization
- Emoji library for special character handling
- Regular expressions for text parsing

## Installation
1. Clone this repository
2. Install required packages:


## Usage
1. Export your WhatsApp chat as a .txt file (without media)
2. Place the file in the project directory
3. Update the file path in the notebook:
```python
conversation = r"your_whatsapp_chat.txt"
```
4. Run the Jupyter notebook cells sequentially

## Code Structure
1. **Data Extraction**: Parses WhatsApp chat format and extracts messages with metadata
2. **Data Cleaning**: Handles multiline messages and missing values
3. **Sentiment Analysis**: Uses VADER to compute positive, negative, and neutral scores
4. **Visualization**: Generates bar charts showing sentiment distribution

## Results
The analysis provides:
- Average sentiment scores for the conversation
- Visual representation of sentiment distribution
- Message-level sentiment scoring

## Note
- The implementation handles WhatsApp's specific date-time format and message structure
- Supports both 12-hour time formats
- Properly processes messages with emojis and special characters
- Maintains message context across line breaks

## Privacy
This tool processes chat data locally. No data is sent to external servers, ensuring your conversations remain private.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "571ba0f0-de49-4f1d-b0c5-2627d9575bc1",
"metadata": {},
"source": [
"**Implementation**"
]
},
{
"cell_type": "markdown",
"id": "4a14af81-8abc-41d1-8253-a0d4cebf8025",
"metadata": {},
"source": [
"Importing the required libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91a3c082-7476-4b12-baf5-6172999ef0af",
"metadata": {},
"outputs": [],
"source": [
"!pip install emoji\n",
"!pip install wordcloud\n",
"import re\n",
"import pandas as pd\n",
"import numpy as np\n",
"import emoji\n",
"from collections import Counter\n",
"import matplotlib.pyplot as plt\n",
"from PIL import Image\n",
"from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator\n",
"import nltk\n",
"from nltk.sentiment.vader import SentimentIntensityAnalyzer\n",
"import seaborn as sns\n"
]
},
{
"cell_type": "markdown",
"id": "cfd7181c-20b3-4605-89cc-67a05d3e1bd5",
"metadata": {},
"source": [
"Define the File Path & Open and Read the File"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c3b74922-04f6-4526-a9bf-049d8b5a3845",
"metadata": {},
"outputs": [],
"source": [
"conversation = r\"whatspp group chat txt file.txt\"\n",
"\n",
"with open(conversation, \"r\", encoding=\"utf-8\") as file:\n",
" lines = file.readlines()\n",
"\n",
"print(f\"Total lines in chat file: {len(lines)}\")\n",
"print(\"\\nFirst 10 lines from the file:\")\n",
"for i in range(min(10, len(lines))):\n",
" print(lines[i].strip())"
]
},
{
"cell_type": "markdown",
"id": "5de5da7c-b79c-40a0-b7db-18f361398e47",
"metadata": {},
"source": [
"Identification of whether a line from a WhatsApp chat file starts with a timestamp."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28df0be9-e25c-479a-9785-a60a05411833",
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"\n",
"def date_time(s):\n",
" pattern = r'^(\\d{1,2})/(\\d{1,2})/(\\d{2,4}), (\\d{1,2}):(\\d{2}) ?(AM|PM|am|pm)? -'\n",
" return bool(re.match(pattern, s))\n",
"\n",
"# Test on first 10 lines from the chat file\n",
"for line in lines[:10]:\n",
" print(f\"{line.strip()} → {date_time(line)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9bf78892-9c78-4855-8c7a-7f077109439b",
"metadata": {},
"outputs": [],
"source": [
"def getMessage(line):\n",
" if \" - \" not in line:\n",
" return None, None, None, None # Skip invalid lines\n",
"\n",
" splitline = line.split(\" - \", 1)\n",
" datetime_part = splitline[0]\n",
" \n",
" try:\n",
" date, time = datetime_part.split(\", \", 1)\n",
" except ValueError:\n",
" return None, None, None, None # Skip invalid lines\n",
" \n",
" message_part = splitline[1]\n",
" if \": \" in message_part:\n",
" author, message = message_part.split(\": \", 1)\n",
" else:\n",
" author, message = None, message_part # No contact name found\n",
" \n",
" return date, time, author, message"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a12a35bf-1127-4c14-8598-668cc6e462d8",
"metadata": {},
"outputs": [],
"source": [
"for line in lines[:10]: \n",
" print(getMessage(line))"
]
},
{
"cell_type": "markdown",
"id": "3cd59572-ffe0-46db-98fb-7636f66a7f44",
"metadata": {},
"source": [
"Extracts structured message data from a WhatsApp chat file and stores it in a list. It correctly handles multiline messages, ensuring they are grouped with their respective timestamps and authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42224c23-b168-4c6e-9576-657eb09f2b62",
"metadata": {},
"outputs": [],
"source": [
"data = []\n",
"messageBuffer = []\n",
"date, time, author = None, None, None\n",
"\n",
"for line in lines:\n",
" line = line.strip()\n",
" if not line:\n",
" continue # Skip empty lines\n",
"\n",
" if date_time(line): # If it's a new message\n",
" if messageBuffer:\n",
" data.append([date, time, author, ' '.join(messageBuffer)])\n",
" messageBuffer.clear()\n",
" date, time, author, message = getMessage(line)\n",
" messageBuffer.append(message)\n",
" else:\n",
" messageBuffer.append(line) # Append multiline messages\n",
"\n",
"if messageBuffer:\n",
" data.append([date, time, author, ' '.join(messageBuffer)])\n",
"\n",
"print(f\"Total messages extracted: {len(data)}\")\n",
"print(data[:5]) # Show first 5 extracted messages\n"
]
},
{
"cell_type": "markdown",
"id": "9d029402-54c2-477f-bda8-976a543413a3",
"metadata": {},
"source": [
"Sentiment of WhatsApp chat messages using NLTK's VADER Sentiment Analysis."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "684780ea-a33c-4905-b5ca-769fa7bc427b",
"metadata": {},
"outputs": [],
"source": [
"#Convert Extracted Data into a Pandas DataFrame\n",
"df = pd.DataFrame(data, columns=[\"Date\", \"Time\", \"Contact\", \"Message\"])\n",
"\n",
"#Ensure Data is Clean\n",
"if df.empty:\n",
" print(\"No messages extracted. Fix chat parsing first.\")\n",
"else:\n",
" df['Date'] = pd.to_datetime(df['Date'])\n",
" df.dropna(inplace=True)\n",
"\n",
"\n",
"# Initialize Sentiment Analyzer\n",
"sentiments = SentimentIntensityAnalyzer()\n",
"\n",
"# Apply Sentiment Analysis\n",
"df[\"Positive\"] = df[\"Message\"].astype(str).apply(lambda x: sentiments.polarity_scores(x)[\"pos\"])\n",
"df[\"Negative\"] = df[\"Message\"].astype(str).apply(lambda x: sentiments.polarity_scores(x)[\"neg\"])\n",
"df[\"Neutral\"] = df[\"Message\"].astype(str).apply(lambda x: sentiments.polarity_scores(x)[\"neu\"])\n",
"\n",
"# Display first 5 messages\n",
"pd.set_option('display.width', 200) # Adjust width for better formatting\n",
"print(df.head(25).to_string(index=False))\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "619059af-5f11-4d21-a122-dc13ef1381bc",
"metadata": {},
"outputs": [],
"source": [
"# Sentiment Visualization (Positive, Neutral, Negative Messages)\n",
"plt.figure(figsize=(10, 5))\n",
"sentiment_counts = df[[\"Positive\", \"Negative\", \"Neutral\"]].mean()\n",
"sentiment_counts.plot(kind=\"bar\", color=[\"green\", \"red\", \"blue\"])\n",
"plt.title(\"Sentiment Analysis of Chat Messages\")\n",
"plt.ylabel(\"Average Sentiment Score\")\n",
"plt.xticks(rotation=0)\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:base] *",
"language": "python",
"name": "conda-base-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}