# Measuring Engagement and Satisfaction in Online Mental Health Platform Conversations

## Data preprocessing

In [261]:
import pandas as pd
import numpy as np

In [262]:
df = pd.read_csv("mentalhealthsupport_dyadic_convs_clean_emotion.csv")

In [263]:
df.head(25)

Unnamed: 0,conversation id,subreddit,post title,author,dialog turn,text,compound,sentiment,emotion prediction
0,1,MentalHealthSupport,"DBT cheat sheet, for when you need a quick rem...",missfittnc,1,[DBT Skils over view](http://www.dbtselfhelp.c...,0.0,neutral,anticipating
1,1,MentalHealthSupport,"DBT cheat sheet, for when you need a quick rem...",skbloom,2,Thanks for sharing!,0.7177,positive,grateful
2,2,MentalHealthSupport,[BPD] Difficulty in avoiding drama,I_Am_Jacks_BPD,1,*** **All names have been changed to protect t...,0.9654,positive,anticipating
3,2,MentalHealthSupport,[BPD] Difficulty in avoiding drama,redsaidfred,2,"The best advice someone ever told me was... ""I...",0.7717,positive,faithful
4,3,MentalHealthSupport,How are you today and what's your recovery pro...,professorplumdidit,1,I thought maybe we should just do a thread to ...,0.6486,positive,hopeful
5,3,MentalHealthSupport,How are you today and what's your recovery pro...,blazingbunny,2,My psychological health is probably better tha...,-0.9867,negative,anxious
6,4,MentalHealthSupport,Imagine you get to tell your younger self one ...,depression_anon,1,"""Younger self"" can mean whatever you want it t...",0.923,positive,anxious
7,4,MentalHealthSupport,Imagine you get to tell your younger self one ...,Cordux,2,"Accepting help means you are strong, not weak.",0.8752,positive,anxious
8,5,MentalHealthSupport,Resolving emptiness,gotja,1,I'm wondering if anyone has had success with t...,-0.3971,negative,anxious
9,5,MentalHealthSupport,Resolving emptiness,blazingbunny,2,I feed it. I look into it. There are endless u...,-0.0507,negative,sentimental


In [264]:
### --- CLEANING OUT MONOLOGUES FROM DATASET --- ###

# Group data by conversation id and calculate count of each conversation id
df_conv = df.groupby("conversation id").count()
df_conv = df_conv.drop(columns=["subreddit", "post title", "author", "text", "compound", "sentiment", "emotion prediction"])
print("Number of conversations in subreddit: ", len(df_conv))

# Separate conversation id's with a single occurrence as monologues
df_mono = df_conv[df_conv["dialog turn"] == 1]
print("Number of monologues in subreddit: ", len(df_mono))
df_mono_ids = df_mono.reset_index()
df_mono_ids = df_mono_ids["conversation id"]

# Separate conversation id's with multiple occurrences as dialogues
df_dia = df_conv[df_conv["dialog turn"] > 1]
print("Number of dialogues in subreddit: ", len(df_dia))
df_dia = df_dia.reset_index()
df_dia = df_dia.drop(columns=['dialog turn'])

# Join dialogue conversation id's with original data such that only dialogues remain in the dataset
df = df.join(df_dia.set_index('conversation id'), on='conversation id', how="right") 

### ---------------------------------------------- ###

Number of conversations in subreddit:  3551
Number of monologues in subreddit:  29
Number of dialogues in subreddit:  3522


## Measuring the level (high/low) of engagement

### Does the speaker respond back when the listener gives a response?

In [265]:
def calculate_speaker_listener_ratio(conversation):
    author_counts = conversation.groupby("author").count()
    num_speaker_responses = author_counts.iloc[0,0]
    num_listener_responses = author_counts.iloc[1,0]
    speaker_listener_ratio = num_speaker_responses / num_listener_responses
    
    engagement = ""
    if len(conversation) == 2:
        engagement = "low"
    elif len(conversation) <= 4 and speaker_listener_ratio < 1:
        engagement = "low"
    elif len(conversation) <= 4 and speaker_listener_ratio >= 1:
        engagement = "moderate"
    elif len(conversation) >= 4 and speaker_listener_ratio < 1:
        engagement = "moderate"
    elif len(conversation) >= 4  and speaker_listener_ratio > 1:
        engagement = "high"
        
    return speaker_listener_ratio, engagement

In [266]:
conversation = df[df["conversation id"] == 7]
calculate_speaker_listener_ratio(conversation)

(0.5, 'low')

### How the literature measures engagement

## Measuring the level (high/low) of satisfaction

### Lexical details: "Thank you", "It means a lot"

### Shift of sentiment in speaker responses (emotion trend)

### If the last speaker turn has grateful sentiment and has positive emotion

### How the literature measures satisfaction

## Equation to measure the engagement and satisfaction of a conversation