# Workbook for processing Whatsapp chats 
Uses pandas for data pre-processing and HuggingFace transformers to conduct some rudimentary sentiment analysis

This workbook analyses the Whatsapp chat established in response to the changing of Brighton schools catchment areas. The data was downloaded 16/10/24. 

In [20]:
import numpy as np
import sklearn
import regex
import pandas as pd
import emoji

from collections import Counter
import matplotlib.pyplot as plt

from transformers import BertTokenizerFast, pipeline

import whatsapp_processing_functions as wpf

Load the .txt file exported from the whatsapp chat

In [21]:
conversation = '/Users/bea/Documents/AI4CI/projects/brighton_schools/whatsapp_data/WhatsApp Chat with School catchment area updates/WhatsApp Chat with School catchment area updates.txt'

In [22]:
df = wpf.whatsapptxt_to_df(conversation)

In [23]:
df.tail()

Unnamed: 0,Date,Time,Author,Message,emoji
1464,2024-10-16,10:44:00,+44 7950 703968,"Yes, and that’s what the new Longhill head ha...",[]
1465,2024-10-16,10:44:00,+44 7824 353019,We should name this group Catchment gate.,[]
1466,2024-10-16,10:48:00,+44 7835 412850,I have heard (from hearsay) she comes with a ...,[]
1467,2024-10-16,10:51:00,+44 7950 703968,"Agreed, the underlying issues (like housing) ...",[]
1468,2024-10-16,10:53:00,+44 7909 524938,*What’s the Catch*ment <This message was edited>,[]


In [24]:
classifier = pipeline("text-classification", model="j-hartmann/sentiment-roberta-large-english-3-classes", top_k=1, truncation=True)
sentiment = classifier(list(df['Message']))
# have to flatten the list
sentiment = [x for xs in sentiment for x in xs]
sentiment_df = pd.DataFrame(sentiment)
df = pd.concat([df, sentiment_df], axis=1)

Some weights of the model checkpoint at j-hartmann/sentiment-roberta-large-english-3-classes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [41]:
df.tail()

Unnamed: 0,Date,Time,Author,Message,emoji,0,label,score,0.1,label.1,score.1
0,2024-10-10,16:22:00,,"~ Ruth created group ""School catchment area u...",[],"{'label': 'neutral', 'score': 0.9986624717712402}",neutral,0.998662,"{'label': 'neutral', 'score': 0.9986624717712402}",neutral,0.998662
1,2024-10-10,17:06:00,,You joined using this group's invite link,[],"{'label': 'neutral', 'score': 0.9987905621528625}",,,"{'label': 'neutral', 'score': 0.9987905621528625}",neutral,0.998791
2,2024-10-10,17:07:00,,+61 403 883 959 joined using this group's inv...,[],"{'label': 'neutral', 'score': 0.9989998936653137}",,,"{'label': 'neutral', 'score': 0.9989998936653137}",neutral,0.999000
3,2024-10-10,17:07:00,,Emma Welsh joined using this group's invite link,[],"{'label': 'neutral', 'score': 0.9986116886138916}",,,"{'label': 'neutral', 'score': 0.9986116886138916}",neutral,0.998612
4,2024-10-10,17:07:00,Adam Dennett,Ah glad there's a group hopefully deliverin...,[],"{'label': 'neutral', 'score': 0.9964105486869812}",,,"{'label': 'neutral', 'score': 0.9964105486869812}",neutral,0.996411
...,...,...,...,...,...,...,...,...,...,...,...
1464,2024-10-16,10:44:00,+44 7950 703968,"Yes, and that’s what the new Longhill head ha...",[],"{'label': 'negative', 'score': 0.9970899820327...",,,"{'label': 'negative', 'score': 0.9970899820327...",negative,0.997090
1465,2024-10-16,10:44:00,+44 7824 353019,We should name this group Catchment gate.,[],"{'label': 'neutral', 'score': 0.9989375472068787}",,,"{'label': 'neutral', 'score': 0.9989375472068787}",neutral,0.998938
1466,2024-10-16,10:48:00,+44 7835 412850,I have heard (from hearsay) she comes with a ...,[],"{'label': 'positive', 'score': 0.9958052635192...",,,"{'label': 'positive', 'score': 0.9958052635192...",positive,0.995805
1467,2024-10-16,10:51:00,+44 7950 703968,"Agreed, the underlying issues (like housing) ...",[],"{'label': 'neutral', 'score': 0.7668870091438293}",,,"{'label': 'neutral', 'score': 0.7668870091438293}",neutral,0.766887
