WhatsApp is a great source of data to analyze many patterns and relationships between two or more people chatting personally or even in groups.

To analyze the sentiments of a WhatsApp chat, we need to collect data from WhatsApp. Most of us must be using this messaging app, so to collect data about your chat, simply follow the steps mentioned below:

### For iPhone:
1. Open chat with a person or a group
2. Just tap on the profile of the person or the group
3. We will see an option to export chat down below

### For Android:
1. Open chat with a person or a group 
2. Click on the three dots above 
3. Click on more
4. Click on the export chat

We will see an option to attach media while exporting chat. For simplicity, it is best not to attach media. Finally, enter our email and we will find our WhatsApp chat in our inbox.

### WhatsApp Chat Sentiment Analysis using Python

Now let’s start with WhatsApp chat sentiment analysis. We’ll start this task by defining some helper functions because the data we get from WhatsApp is not a dataset that is ready to be used for any kind of data science task. So, to prepare our data for the sentiment analysis task, just define all the functions as defined below:

In [24]:
import re
import pandas as pd
import numpy as np
import emoji
from collections import Counter
import matplotlib.pyplot as plt
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

In [25]:
# Extract Time
def date_time(s):
    pattern = '^([0-9]+)(\/)([0-9]+)(\/)([0-9]+), ([0-9]+):([0-9]+)[ ]?(AM|PM|am|pm)? -'
    result = re.match(pattern, s)
    if result:
        return True
    return False

In [26]:
# Find Authors or Contacts
def find_author(s):
    s = s.split(":")
    if len(s)==2:
        return True
    else:
        return False

In [27]:
# Finding Messages
def getDatapoint(line):
    splitline = line.split(' - ')
    dateTime = splitline[0]
    date, time = dateTime.split(", ")
    message = " ".join(splitline[1:])
    if find_author(message):
        splitmessage = message.split(": ")
        author = splitmessage[0]
        message = " ".join(splitmessage[1:])
    else:
        author= None
    return date, time, author, message

It doesn’t matter if we are using a group chat dataset or our conversation with one person. All the functions defined above will prepare our data for sentiment analysis as well as for any data science task. Now here is how we can prepare the data we collected from WhatsApp by using the above functions:

In [28]:
data = []
conversation = 'WhatsApp Chat with CMA Think Tank.txt'
with open(conversation, encoding="utf-8") as fp:
    fp.readline()
    messageBuffer = []
    date, time, author = None, None, None
    while True:
        line = fp.readline()
        if not line:
            break
        line = line.strip()
        if date_time(line):
            if len(messageBuffer) > 0:
                data.append([date, time, author, ' '.join(messageBuffer)])
            messageBuffer.clear()
            date, time, author, message = getDatapoint(line)
            messageBuffer.append(message)
        else:
            messageBuffer.append(line)

In [29]:
data

[['23/09/2021',
  '10:46 pm',
  'Chairman KBC Azeem',
  'Dear Asad, InshaAllah will discuss in the morning.'],
 ['23/09/2021',
  '10:49 pm',
  'Chairman KBC Azeem',
  'Yes, it is an apparent case of IoS. Nevertheless, as I wrote, a case may be made on technical ground to reflect it as gift and at the same time treating it as non-taxable. Numerous case laws..!'],
 ['23/09/2021', '10:50 pm', 'Chairman KBC Azeem', "That's right."],
 ['24/09/2021', '12:32 am', '+92 332 1319773', 'No'],
 ['24/09/2021',
  '2:25 am',
  '+92 300 3475887',
  'Is Advance tax on function and gathering u/s 236 D, adjustable? If yes then where we can record it as no option is available in Adjustable tax tab.'],
 ['24/09/2021', '6:01 am', '+92 300 2358250', '👍'],
 ['24/09/2021',
  '11:18 am',
  '+92 300 3592724',
  "Yes case laws exist such such atir's on cash gifts as well as indian cases that even companies can give gift to its employees but couldnt find case law on 39 (3) ntn condition"],
 ['24/09/2021',
  '11:21

Now here is how we can analyze the sentiments of WhatsApp chat using Python:

In [30]:
df = pd.DataFrame(data, columns=["Date", 'Time', 'Author', 'Message'])
df['Date'] = pd.to_datetime(df['Date'])

data = df.dropna()

In [31]:
data

Unnamed: 0,Date,Time,Author,Message
0,2021-09-23,10:46 pm,Chairman KBC Azeem,"Dear Asad, InshaAllah will discuss in the morn..."
1,2021-09-23,10:49 pm,Chairman KBC Azeem,"Yes, it is an apparent case of IoS. Neverthele..."
2,2021-09-23,10:50 pm,Chairman KBC Azeem,That's right.
3,2021-09-24,12:32 am,+92 332 1319773,No
4,2021-09-24,2:25 am,+92 300 3475887,Is Advance tax on function and gathering u/s 2...
5,2021-09-24,6:01 am,+92 300 2358250,👍
6,2021-09-24,11:18 am,+92 300 3592724,Yes case laws exist such such atir's on cash g...
7,2021-09-24,11:21 am,+92 333 2180919,پڑوسی ممالک کو افغان طالبان کو ڈکٹیشن کے بجائے...


In [32]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sentiments = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Waqas.Ali\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [33]:
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Message"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Message"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["Message"]]

In [34]:
data.head()

Unnamed: 0,Date,Time,Author,Message,Positive,Negative,Neutral
0,2021-09-23,10:46 pm,Chairman KBC Azeem,"Dear Asad, InshaAllah will discuss in the morn...",0.271,0.0,0.729
1,2021-09-23,10:49 pm,Chairman KBC Azeem,"Yes, it is an apparent case of IoS. Neverthele...",0.151,0.0,0.849
2,2021-09-23,10:50 pm,Chairman KBC Azeem,That's right.,0.0,0.0,1.0
3,2021-09-24,12:32 am,+92 332 1319773,No,0.0,1.0,0.0
4,2021-09-24,2:25 am,+92 300 3475887,Is Advance tax on function and gathering u/s 2...,0.087,0.071,0.841


In [35]:
x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

In [36]:
def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive 😊 ")
    elif (b>a) and (b>c):
        print("Negative 😠 ")
    else:
        print("Neutral 🙂 ")

In [37]:
sentiment_score(x, y, z)

Neutral 🙂 


So, the data we used indicates that most of the messages are neutral. Which means it’s neither positive nor negative.