## WhatsApp Chat Sentiment Analysis

Now let’s start with the task of WhatsApp chat sentiment analysis with Python. I’ll start this task by defining some helper functions because the data we get from WhatsApp is not a dataset that is ready to be used for any kind of data science task. So, to prepare your data for the sentiment analysis task, just define all the functions as defined below:

In [177]:
import re
import pandas as pd
import numpy as np
from collections import Counter
import matplotlib.pyplot as plt
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

In [178]:
# Extract time
def date_time(s):
    pattern='^([0-9]+)(\/)([0-9]+)(\/)([0-9]+), ([0-9]+):([0-9]+)[ ]?(AM|PM|am|pm)? -'
    result=re.match(pattern,s)
    if result:
        return True
    return False

In [179]:
def find_author(s):
    s=s.split(":")
    if len(s)==2:
        return True
    else:
        return False
    # Finding Messages
def getDatapoint(line):
    splitline = line.split('] ')
    dateTime = splitline[0]
    date,time = dateTime.split(", ")
    message = " ".join(splitline[1:])
    if find_author(message):
        splitmessage = message.split(": ")
        author = splitmessage[0]
        message = " ".join(splitmessage[1:])
    else:
        author= None
    return date, time, author, message

In [180]:
data = []
conversation = 'C:/Users/Deeksha Kotian/Documents/Data_Science/Whatsapp_sentiment_analysis/_chat.txt'
with open(conversation, encoding="utf-8") as fp:
    fp.readline()
    messageBuffer = []
    date, time, author = None, None, None
    while True:
        line = fp.readline()
        print(line)
        if not line:
            break
        line = line.strip()
        #if date_time(line):
        if len(messageBuffer) >= 0:
            data.append([date, time, author, ' '.join(messageBuffer)])
            messageBuffer.clear()
            date, time, author, message = getDatapoint(line)
            messageBuffer.append(message)

data

[15/02/21, 10:51:49 PM] Deeksha Kotian: Hey

[15/02/21, 10:52:19 PM] Shruthi: Hi

[15/02/21, 10:52:42 PM] Deeksha Kotian: How are u?

[15/02/21, 10:53:16 PM] Shruthi: Never mind

[15/02/21, 10:53:20 PM] Shruthi: How are you patutie?

[15/02/21, 10:53:53 PM] Deeksha Kotian: Fine

[15/02/21, 10:54:18 PM] Shruthi: What’s happening girl?

[15/02/21, 10:54:20 PM] Shruthi: Missed me?

[15/02/21, 10:54:40 PM] Deeksha Kotian: My gran is sick

[15/02/21, 10:54:51 PM] Deeksha Kotian: She is hospitalised

[15/02/21, 10:55:37 PM] Deeksha Kotian: I am feeling emotional right now

[15/02/21, 10:58:10 PM] Shruthi: Come here baby

[15/02/21, 10:58:13 PM] Shruthi: You need a tight hug

[15/02/21, 10:58:17 PM] Shruthi: Come here right now

[15/02/21, 10:58:23 PM] Shruthi: I’ll make you feel better

[15/02/21, 10:58:28 PM] Shruthi: I’ll right with you sweetie

[15/02/21, 10:58:34 PM] Shruthi: Don’t worry she will comeback home soon

[15/02/21, 10:58:45 PM] Shruthi: She is totally fine

[15/02/21, 10:59:2

[[None, None, None, ''],
 ['[15/02/21', '10:51:49 PM', 'Deeksha Kotian', 'Hey'],
 ['[15/02/21', '10:52:19 PM', 'Shruthi', 'Hi'],
 ['[15/02/21', '10:52:42 PM', 'Deeksha Kotian', 'How are u?'],
 ['[15/02/21', '10:53:16 PM', 'Shruthi', 'Never mind'],
 ['[15/02/21', '10:53:20 PM', 'Shruthi', 'How are you patutie?'],
 ['[15/02/21', '10:53:53 PM', 'Deeksha Kotian', 'Fine'],
 ['[15/02/21', '10:54:18 PM', 'Shruthi', 'What’s happening girl?'],
 ['[15/02/21', '10:54:20 PM', 'Shruthi', 'Missed me?'],
 ['[15/02/21', '10:54:40 PM', 'Deeksha Kotian', 'My gran is sick'],
 ['[15/02/21', '10:54:51 PM', 'Deeksha Kotian', 'She is hospitalised'],
 ['[15/02/21',
  '10:55:37 PM',
  'Deeksha Kotian',
  'I am feeling emotional right now'],
 ['[15/02/21', '10:58:10 PM', 'Shruthi', 'Come here baby'],
 ['[15/02/21', '10:58:13 PM', 'Shruthi', 'You need a tight hug'],
 ['[15/02/21', '10:58:17 PM', 'Shruthi', 'Come here right now'],
 ['[15/02/21', '10:58:23 PM', 'Shruthi', 'I’ll make you feel better'],
 ['[15/02/21

In [183]:
df = pd.DataFrame(data, columns=["Date", 'Time', 'Author', 'Message'])
#df['Date'] = pd.to_datetime(df['Date'])

data = df.dropna()
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Message"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Message"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["Message"]]
print(data.head())

        Date         Time          Author               Message  Positive  \
1  [15/02/21  10:51:49 PM  Deeksha Kotian                   Hey       0.0   
2  [15/02/21  10:52:19 PM         Shruthi                    Hi       0.0   
3  [15/02/21  10:52:42 PM  Deeksha Kotian            How are u?       0.0   
4  [15/02/21  10:53:16 PM         Shruthi            Never mind       0.0   
5  [15/02/21  10:53:20 PM         Shruthi  How are you patutie?       0.0   

   Negative  Neutral  
1       0.0      1.0  
2       0.0      1.0  
3       0.0      1.0  
4       0.0      1.0  
5       0.0      1.0  


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Message"]]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Message"]]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["Neutral"] = [sentiments.polarity_

In [189]:
data['Positive'].value_counts()

0.000    321
1.000     26
0.688      6
0.524      5
0.508      5
        ... 
0.328      1
0.643      1
0.491      1
0.588      1
0.452      1
Name: Positive, Length: 100, dtype: int64

In [184]:
x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive 😊 ")
    elif (b>a) and (b>c):
        print("Negative 😠 ")
    else:
        print("Neutral 🙂 ")
sentiment_score(x, y, z)

Neutral 🙂 
