In [1]:
from IPython.display import Image, display; display(Image(url="https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.sprinklr.com%2Fblog%2Fchatbot-examples%2F&psig=AOvVaw3GjLwPVFaNAUG6e4xKJYH2&ust=1705391165437000&source=images&cd=vfe&opi=89978449&ved=0CBMQjRxqFwoTCJDLi8yZ34MDFQAAAAAdAAAAABAI"))



## <div style="color:white;display:fill;border-radius:8px;background-color:##800080;font-size:150%; letter-spacing:1.0px"><p style="padding: 15px;color:white;"><b><b><span style='color:white'><span style='color:#F1A424'> | </span> </span></b>Defining the Question</b></p></div>

## <b><span style='color:#F1A424'>|</span> Executive Summary:</b> 

**Mental health, fundamentally a state of well-being, is crucial for individuals to realize their abilities, manage life's normal stresses, work productively, and contribute to their communities. Despite the rising global prevalence of mental health issues, including a 13% increase over the last decade noted by the WHO, access to effective treatments remains uneven, particularly among urban youths who face distinct challenges and stressors.
 Saidika, a burgeoning mental health service provider for urban youth, has encountered challenges due to the growing demand for mental health services. The volume of clients has impeded the prompt allocation of therapy resources, particularly for urgent cases, prompting the need for innovative solutions to enhance the efficiency and effectiveness of mental health care delivery. By leveraging the capabilities of AI and advancements in NLP, the project aims to bridge the gap between the growing demand for mental health services and the current limitations in supply and accessibility.**


## <b><span style='color:#F1A424'>|</span> Problem Statement:</b> 

**Saidika's platform is currently unable to efficiently handle the increasing influx of clients seeking mental health services. The inability to quickly triage and prioritize client needs is leading to potential delays in addressing urgent cases, which could have severe consequences on the well-being of individuals in need.**
**

## <b><span style='color:#F1A424'>|</span> Proposed Solution:</b> 

**Main Objective is to integrate ban advanced AI-powered mental health chatbot into Saidika's existing platform
to optimize client management processes, ensuring timely and appropriate allocation of therapy resources to those in need.**


## <b><span style='color:#F1A424'>|</span>Specific Obectives:</b> 
- **Client Categorization: To develop a chatbot that can accurately categorize clients based on their responses, distinguishing between varying levels of care requirements and scheduling clients based on their assessed needs and therapists' availability, optimizing the use of Saidika's resources.**
- **Urgency Escalation: To ensure the chatbot is capable of rapidly identifying and escalating urgent cases to therapists, facilitating prompt intervention.**
- **Service Accessibility: To broaden access to mental health care by providing a 24/7 chatbot service that will offer real-time interaction to clients who require immediate attention or a platform to express their concerns, bridging the gap until a professional is available.**
- **Resource Optimization: To aid therapists in managing their workload more effectively by allowing the chatbot to handle routine inquiries and non-urgent interactions.**
- **Data Collection and Analysis: To gather and analyze interaction data to continually improve the chatbot’s performance and the platform’s services.**
- **User Experience Enhancement: To create a user-friendly chatbot interface that provides a supportive environment for clients to express their concerns.**
- **Integration and Compatibility: To seamlessly integrate the chatbot into both web and mobile applications, ensuring functionality across various devices.**


## <b><span style='color:#F1A424'>|</span> Project Impact:</b> 

**The successful implementation of the mental health chatbot is expected to significantly improve the scalability of Saidika's services, enabling them to handle a greater volume of clients without sacrificing the quality of care. This technological solution aims to not only streamline operations but also to provide a critical early support system for individuals seeking mental health assistance. The chatbot's ability to analyze data will also furnish Saidika with valuable insights, driving policy and decision-making to better serve the community's mental health needs. Ultimately, the project endeavors to foster a more resilient urban youth population, better equipped to contribute positively to their communities**

## DATA PERTINENCE AND ATTRIBUTION


**The business aims to gain valuable insights into mental health trends, sentiments, and urgency levels by leveraging a diverse dataset acquired from public domain resources and Saidika's private, anonymized user data with proper consent and privacy law adherence. The data primarily consists of information gathered from health forums, Reddit, a dedicated mental health forum, and Beyond Blue.**

**Data Preparation:**

**Data Sources: Public domain resources and private Saidika user data.**

**Variable Types:**

- **Categorical variables: Representing various types of mental health issues.**

- **Binary variables: Indicating urgency levels.**
- **Continuous variables: Expressing sentiment scores associated with mental health discussions.**

**Preprocessing Steps:**

- **Text data cleaning: Removal of identifiable information.**

- **Tokenization: Breaking down text into tokens.**

- **Lemmatization: Reducing words to their base or root form.**

- **Vectorization: Converting text into numerical vectors suitable for Natural Language Processing (NLP) tasks.**

**Libraries Used:**

- **BeautifulSoup: Utilized for parsing and extracting data from HTML content.**

- **Python Libraries (NLTK, spaCy): Applied for NLP tasks such as tokenization, lemmatization, and other text processing operations.**

**Algorithms:**

- **Logistic Regression: Employed for analyzing categorical and binary variables, predicting urgency levels based on mental health issues.**

- **LSTM (Long Short-Term Memory): Utilized for sequence modeling in NLP, capturing dependencies in sentiment scores over the course of discussions.**

- **BERT (Bidirectional Encoder Representations from Transformers): Implemented for advanced contextualized embeddings, enhancing understanding of the nuanced context within mental health discourse.**

- **GPT (Generative Pre-trained Transformer): Employed for generating human-like text responses and comprehending the context of mental health discussions.**

**Overall, the objective is to extract meaningful insights, patterns, and correlations from this rich dataset, contributing to a deeper understanding of mental health issues, sentiments, and urgency levels, ultimately informing strategies for better mental health support and intervention.**








## <div style="color:white;display:fill;border-radius:8px;background-color:#800080;font-size:150%; letter-spacing:1.0px"><p style="padding: 12px;color:white;"><b><b><span style='color:white'><span style='color:#F1A424'>1 |</span></span></b>Data Loading & Preparation</b></p></div>

## <b>1.1 <span style='color:#F1A424'>|</span> Importing Necessary Libraries</b> 

In [2]:
import re
import string
import numpy as np
import random
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns  #plotting statistical graphs
%matplotlib inline
from plotly import graph_objs as go
import plotly.express as px
import plotly.figure_factory as ff
# import squarify
from collections import Counter

# Load the Text Cleaning Package
import neattext.functions as nfx

from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator ##is a data visualization technique used
#for representing text data in which the size of each word indicates its frequency

from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.metrics import confusion_matrix,roc_auc_score,classification_report
from sklearn.compose import ColumnTransformer

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier,ExtraTreesClassifier
from sklearn.linear_model import RidgeClassifier,SGDClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB


import nltk
from nltk.corpus import stopwords

from tqdm import tqdm ##new progress bars repeatedly
import os
import nltk ##building Python programs to work with human language data
#import spacy #for training the NER model tokenize words
#import random
#from spacy.util import compounding
#from spacy.util import minibatch


pd.set_option('max_colwidth', 400)
pd.set_option('use_mathjax', False)


import warnings
warnings.filterwarnings("ignore")

## <b>1.2 <span style='color:#F1A424'>|</span>Loading in our Data</b> 

In [3]:
# load the dataset -> feature extraction -> data visualization -> data cleaning -> train test split
# -> model building -> model training -> model evaluation -> model saving -> streamlit application deploy

# load the dataset just using specific features
df = pd.read_csv('../data/Aggregated_Data_Final.csv')

df

Unnamed: 0,Subreddit,Reddit Post,Unnamed: 2
0,CPTSD,Feeling like I was made to be unlovable,"I don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the CSA, whatever. I'm a mess right now. I feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade I'm still a monster. As if I was someone who was built for being unlovable and despised, physically and emotionally, since I was born. I keep working and wor..."
1,CPTSD,DAE not know what to do with themselves when they have time?,"See title.\n\nI used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. Now I am one of them.\n\nThrough years of hard depression and su.c.dal.ty thanks to cptsd I have stopped all my hobbies. I entrench myself in work and by now also meeting people and sometimes ob..."
2,CPTSD,Yoga triggers me- anyone else?,"I was doing yoga for years as a tool to help me back into my body when I was feeling rough as a form of reconnection. I even went as far as becoming trained in teaching, doing a 200hr training. As my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (In retrospect I wonder if I was in fact being dissociated the whole time.)\n\nStarted a..."
3,CPTSD,Did anyone else have a parent who said - you can make the choice - do you want ho listen to the sweet loving voice of tell it in or should I beat you?,The child me thought I made the right choice by listening to him. And he said as much. That I had finally done something right . Especially when they kept blaming me for all the things I did wrong . Anyone else ?
4,CPTSD,"Women: What is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?",
...,...,...,...
27673,suicidewatch,"But I want help doing it Tired of everyone saying no don't.\n\n*Get up so I can punch you again*\n\nI'm exhausted I want to clock out and be done everything just keeps getting worse, everything keeps losing value including myself- as if any of that even mattered.\n\nFuck the helpline I want real help I want help out",
27674,suicidewatch,Nothing to live for The ONLY reason I am alive right now is because of my sweet cat Pippin. Yesterday was the anniversary of adopting him 2 years ago. \nI've been really depressed and haven't been able to play with him as much so hes been meowing and being a little naughty as a result. I got so mad yesterday and yelled at him. All I can think about now is how I should give him to someone healt...,
27675,suicidewatch,Iâ€™m going to fucking kill myself 18 years too long. I think Iâ€™m going to go,
27676,suicidewatch,Iâ€™m going to pieces All Iâ€™ve done for about a month has been lay in bed. I donâ€™t enjoy anything. Canâ€™t focus on anything. I am terrified of the future. I donâ€™t want to be alive. I am in so much emotional pain. The only reason I am alive is my father because I donâ€™t want to hurt him. That and I canâ€™t decide on a method. I think not wanting to hurt him keeps me from choosing a meth...,


In [4]:
# Combine the two columns,'Reddit Post','Unnamed: 2' into a new column named "reddit_post"
df['reddit_post'] = df['Unnamed: 2'].fillna(df['Reddit Post'])

# Drop the original columns if needed
df.drop(['Reddit Post', 'Unnamed: 2'], axis=1, inplace=True)


In [5]:
df

Unnamed: 0,Subreddit,reddit_post
0,CPTSD,"I don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the CSA, whatever. I'm a mess right now. I feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade I'm still a monster. As if I was someone who was built for being unlovable and despised, physically and emotionally, since I was born. I keep working and wor..."
1,CPTSD,"See title.\n\nI used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. Now I am one of them.\n\nThrough years of hard depression and su.c.dal.ty thanks to cptsd I have stopped all my hobbies. I entrench myself in work and by now also meeting people and sometimes ob..."
2,CPTSD,"I was doing yoga for years as a tool to help me back into my body when I was feeling rough as a form of reconnection. I even went as far as becoming trained in teaching, doing a 200hr training. As my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (In retrospect I wonder if I was in fact being dissociated the whole time.)\n\nStarted a..."
3,CPTSD,The child me thought I made the right choice by listening to him. And he said as much. That I had finally done something right . Especially when they kept blaming me for all the things I did wrong . Anyone else ?
4,CPTSD,"Women: What is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?"
...,...,...
27673,suicidewatch,"But I want help doing it Tired of everyone saying no don't.\n\n*Get up so I can punch you again*\n\nI'm exhausted I want to clock out and be done everything just keeps getting worse, everything keeps losing value including myself- as if any of that even mattered.\n\nFuck the helpline I want real help I want help out"
27674,suicidewatch,Nothing to live for The ONLY reason I am alive right now is because of my sweet cat Pippin. Yesterday was the anniversary of adopting him 2 years ago. \nI've been really depressed and haven't been able to play with him as much so hes been meowing and being a little naughty as a result. I got so mad yesterday and yelled at him. All I can think about now is how I should give him to someone healt...
27675,suicidewatch,Iâ€™m going to fucking kill myself 18 years too long. I think Iâ€™m going to go
27676,suicidewatch,Iâ€™m going to pieces All Iâ€™ve done for about a month has been lay in bed. I donâ€™t enjoy anything. Canâ€™t focus on anything. I am terrified of the future. I donâ€™t want to be alive. I am in so much emotional pain. The only reason I am alive is my father because I donâ€™t want to hurt him. That and I canâ€™t decide on a method. I think not wanting to hurt him keeps me from choosing a meth...


In [6]:
#summary of our DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27678 entries, 0 to 27677
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Subreddit    27678 non-null  object
 1   reddit_post  27678 non-null  object
dtypes: object(2)
memory usage: 432.6+ KB


Finding the number of unique classes (subreddits) in our data

In [7]:
#obtain the unique values in the 'Subreddit' column
df.Subreddit.unique()

array(['CPTSD', 'diagnosedPTSD', 'alcoholism', 'socialanxiety',
       'suicidewatch'], dtype=object)

Below, we count the number of characters in each post.

In [8]:
#character count of each reddit post
df['reddit_post'].apply(str).apply(len)

0         443
1        1789
2         522
3         212
4         103
         ... 
27673     311
27674    1147
27675      79
27676     607
27677     526
Name: reddit_post, Length: 27678, dtype: int64

We find the number of null values in each column

In [9]:
#df.isna().sum()

In [10]:
#drop all NAN values in our dataframe
#df.dropna(inplace=True)


In [11]:
#check the number of null values 
#df.isna().sum()


Next, we find the number of words in each post

In [12]:
#word count in each reddit post
df[df['reddit_post'].isna()==False]['reddit_post'].apply(lambda x: len(x.split(" ")))

0         81
1        315
2         96
3         43
4         16
        ... 
27673     56
27674    225
27675     16
27676    125
27677     98
Name: reddit_post, Length: 27678, dtype: int64

## <div style="color:white;display:fill;border-radius:8px;background-color:#800080;font-size:150%; letter-spacing:1.0px"><p style="padding: 12px;color:white;"><b><b><span style='color:white'><span style='color:#F1A424'>2 |</span></span></b> Data Quality Checks</b></p></div>
   
- **Another crucial step in any project involves ensuring the quality of your data. Remember that your model’s performance is directly tied to the data it processes. Therefore, take the time to remove duplicates and handle missing values appropriately.**

- **Here we always check for missing values, outliers and remove any unnecessary variables/features/columns. Since we have text data, outliers cannot be checked.**

## <b>2.1 <span style='color:#F1A424'>|</span> Checking for NaN Values</b> 

In [13]:
#check for the sum NaN values in our dataframe
df.isna().sum()

Subreddit      0
reddit_post    0
dtype: int64

In [14]:
#prints the count of NaN values for each column after dropping NaN values
print(df.isna().sum())
print("*"*40)

Subreddit      0
reddit_post    0
dtype: int64
****************************************


**As noted , we have no missing values in our dataframe.**

## <b>2.2 <span style='color:#F1A424'>|</span> Checking for Sentence Length Consistency</b> 

In [15]:
df['reddit_post'].apply(len)

0         443
1        1789
2         522
3         212
4         103
         ... 
27673     311
27674    1147
27675      79
27676     607
27677     526
Name: reddit_post, Length: 27678, dtype: int64

**This can give you an overview of the number of words per tweet. We also notice that some consist of less then five words hence won't be instrumental in constructing our predictive model.**

In [16]:
sum(df['reddit_post'].apply(len) > 5) , sum(df['reddit_post'].apply(len) <= 5)

(27678, 0)

**All our posts have words greater than five**

## <b>2.3 <span style='color:#F1A424'>|</span> Checking for Duplicates</b> 

In [17]:
#check and print the number of duplicates
print(df.duplicated().sum())
print("*"*40)

12
****************************************


**we notice that we have 12 duplicates.**

In [18]:
#checking if the duolicate values are indeed duplicates
df[df.duplicated(subset=['reddit_post'],keep=False)].sort_values(by='reddit_post').sample(10)

Unnamed: 0,Subreddit,reddit_post
4225,socialanxiety,Am I strange for being aroused by a man's looks? Or does it make me dirty that I like looking at men? Is it common? I ask because Ive been reading a lot on reddit about dating &amp; attraction lately. A lot of men keep telling me that most women are able to be aroused by ugly men so long as those men say &amp; do the right things. I am not this way though. I just like beauty. My question is: i...
1755,alcoholism,How yâ€™all get so much booze Iâ€™m 17 and working towards it but I always hit a speedbump whether itâ€™s money or finding somewhere to get some I find myself getting high off household appliances whenever I canâ€™t find any and thatâ€™s not good for me so I just want some ideas on finding a steady constant source
17790,suicidewatch,"I need help My grandma says that I have to go to a ""school"" but Im suspicious, I can't find anything about it online and my grandma is being weird about it. I have a feeling they are gonna drive me to a pysch ward as a punishment. They did it before and no one listened to me. I'm almost 100 percent sure that's what they are gonna do, I'm not depressed or anything. I got kicked out of school la..."
17596,suicidewatch,Itâ€™s been a while Itâ€™s been a while since I posted on here but Iâ€™ve been used to keeping my feelings to myself. I guess I just feel alone. One of two friends that I could talk to seriously about depression doesnâ€™t want to speak to me. I feel like my anxiety is getting worse. Breakdowns are more frequent. My parents know about all this but I can tell they want to ignore it. I guess I ju...
1118,alcoholism,"48 hrs sober I don't know how I feel. I mean I guess I feel grateful I'm not hungover and disgusting. \n\nI've had a couple spells of sobriety before and I remember the healthy feeling, mostly in my body not my mind. \n\nI have things I want to do that I know I will never do if I'm constantly planning for or recovering from drinking so for today I'm gonna keep going."
832,CPTSD,#NAME?
14521,suicidewatch,Am I strange for being aroused by a man's looks? Or does it make me dirty that I like looking at men? Is it common? I ask because Ive been reading a lot on reddit about dating &amp; attraction lately. A lot of men keep telling me that most women are able to be aroused by ugly men so long as those men say &amp; do the right things. I am not this way though. I just like beauty. My question is: i...
1754,alcoholism,How yâ€™all get so much booze Iâ€™m 17 and working towards it but I always hit a speedbump whether itâ€™s money or finding somewhere to get some I find myself getting high off household appliances whenever I canâ€™t find any and thatâ€™s not good for me so I just want some ideas on finding a steady constant source
6460,suicidewatch,"sad new year i went into 2020 completely alone in my room, crying and wanting to die. my boyfriend didnâ€™t help. heâ€™s at a friends house and doesnâ€™t wanna talk to me today."
4592,socialanxiety,Question about benzodiazepines/social exposure therapy It's been a while since I've seen a doctor about some of my social issues and I wanted some perspective before seeking any specific medication. Have benzodiazepines ever facilitated for you any kind of exposure therapy? Does the ease benzodiazepines bring in social scenarios leave any lasting effects that go beyond the timeframe the drug i...


In [19]:
df = df.drop_duplicates()

print(df.duplicated().sum())
print("*"*40)

0
****************************************


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27666 entries, 0 to 27677
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Subreddit    27666 non-null  object
 1   reddit_post  27666 non-null  object
dtypes: object(2)
memory usage: 648.4+ KB


## <div style="color:white;display:fill;border-radius:8px;background-color:#800080;font-size:150%; letter-spacing:1.0px"><p style="padding: 12px;color:white;"><b><b><span style='color:white'><span style='color:#F1A424'>3 |</span></span></b> Data Preprocessing</b></p></div>




## <b>3.1 <span style='color:#F1A424'>|</span> cleaning textual data </b> 

We will clean and preprocess the textual data in the dataset to enhance its quality and consistency:
- Remove unnecessary characters.
- Convert text to lowercase for uniformity.
- Tokenization: Tokenize the text data to break it into individual words or tokens. This step is crucial for further analysis of the textual content.
- Normalization:Apply normalization techniques, such as stemming or lemmatization, to reduce words to their base or root forms. This aids in standardizing the text.
- Stop Word Removal:Eliminate common stop words from the text to focus on meaningful content. Stop words often do not contribute significantly to the analysis.
- Entity Recognition:Identify and recognize entities within the text. This step is particularly useful when dealing with named entities or specific information entities.
- Syntax Parsing:Perform syntax parsing to analyze the grammatical structure of sentences. This can provide insights into relationships between words.
- Text Transformation:Implement additional text transformations as needed for your specific analysis or modeling requirements.

**We will utilize the NeatText Library for text cleaning, a straightforward NLP package designed for cleaning and preprocessing textual data. This library simplifies the process of cleaning unstructured text by handling tasks such as removing special characters and stopwords, thereby reducing noise in the data.**

In [21]:
# load the text cleaning packages

import neattext as nt
import neattext.functions as nfx

# Methods and Attributes of the function
dir(nt)

['AUTOMATED_READ_INDEX',
 'BTC_ADDRESS_REGEX',
 'CONTRACTIONS_DICT',
 'CURRENCY_REGEX',
 'CURRENCY_SYMB_REGEX',
 'Callable',
 'Counter',
 'CreditCard_REGEX',
 'DATE_REGEX',
 'EMAIL_REGEX',
 'EMOJI_REGEX',
 'FUNCTORS_WORDLIST',
 'HASTAG_REGEX',
 'HTML_TAGS_REGEX',
 'List',
 'MASTERCard_REGEX',
 'MD5_SHA_REGEX',
 'MOST_COMMON_PUNCT_REGEX',
 'NUMBERS_REGEX',
 'PHONE_REGEX',
 'PUNCT_REGEX',
 'PoBOX_REGEX',
 'SPECIAL_CHARACTERS_REGEX',
 'STOPWORDS',
 'STOPWORDS_de',
 'STOPWORDS_en',
 'STOPWORDS_es',
 'STOPWORDS_fr',
 'STOPWORDS_ru',
 'STOPWORDS_yo',
 'STREET_ADDRESS_REGEX',
 'TextCleaner',
 'TextExtractor',
 'TextFrame',
 'TextMetrics',
 'TextPipeline',
 'Tuple',
 'URL_PATTERN',
 'USER_HANDLES_REGEX',
 'VISACard_REGEX',
 'ZIP_REGEX',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'clean_text',
 'defaultdict',
 'digit2words',
 'emoji_explainer',
 'emojify',
 'explainer',
 'extract_btc_address',
 

### <b>3.1.1 <span style='color:#F1A424'>|</span> clean_text function</b> 

In [22]:
# Noise scan
df['reddit_post'].apply(lambda x: nt.TextFrame(x).noise_scan()['text_noise'])

0        13.769752
1        11.906093
2        12.835249
3        14.150943
4        10.679612
           ...    
27673    12.540193
27674    14.734089
27675     8.860759
27676    14.332784
27677    14.068441
Name: reddit_post, Length: 27666, dtype: float64

In [23]:
# Ensure all entries in reddit_post column are strings
df['reddit_post'] = df['reddit_post'].astype(str)

# Now apply the clean_text function
df['clean_post'] = df['reddit_post'].apply(lambda x: nfx.clean_text(x, puncts=False, stopwords=False))

In [24]:
df

Unnamed: 0,Subreddit,reddit_post,clean_post
0,CPTSD,"I don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the CSA, whatever. I'm a mess right now. I feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade I'm still a monster. As if I was someone who was built for being unlovable and despised, physically and emotionally, since I was born. I keep working and wor...","i don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the csa, whatever. i'm a mess right now. i feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade i'm still a monster. as if i was someone who was built for being unlovable and despised, physically and emotionally, since i was born. i keep working and work..."
1,CPTSD,"See title.\n\nI used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. Now I am one of them.\n\nThrough years of hard depression and su.c.dal.ty thanks to cptsd I have stopped all my hobbies. I entrench myself in work and by now also meeting people and sometimes ob...","see title. i used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. now i am one of them. through years of hard depression and su.c.dal.ty thanks to cptsd i have stopped all my hobbies. i entrench myself in work and by now also meeting people and sometimes obligato..."
2,CPTSD,"I was doing yoga for years as a tool to help me back into my body when I was feeling rough as a form of reconnection. I even went as far as becoming trained in teaching, doing a 200hr training. As my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (In retrospect I wonder if I was in fact being dissociated the whole time.)\n\nStarted a...","i was doing yoga for years as a tool to help me back into my body when i was feeling rough as a form of reconnection. i even went as far as becoming trained in teaching, doing a 200hr training. as my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (in retrospect i wonder if i was in fact being dissociated the whole time.) started agai..."
3,CPTSD,The child me thought I made the right choice by listening to him. And he said as much. That I had finally done something right . Especially when they kept blaming me for all the things I did wrong . Anyone else ?,the child me thought i made the right choice by listening to him. and he said as much. that i had finally done something right . especially when they kept blaming me for all the things i did wrong . anyone else ?
4,CPTSD,"Women: What is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?","women: what is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?"
...,...,...,...
27673,suicidewatch,"But I want help doing it Tired of everyone saying no don't.\n\n*Get up so I can punch you again*\n\nI'm exhausted I want to clock out and be done everything just keeps getting worse, everything keeps losing value including myself- as if any of that even mattered.\n\nFuck the helpline I want real help I want help out","but i want help doing it tired of everyone saying no don't. *get up so i can punch you again* i'm exhausted i want to clock out and be done everything just keeps getting worse, everything keeps losing value including myself- as if any of that even mattered. fuck the helpline i want real help i want help out"
27674,suicidewatch,Nothing to live for The ONLY reason I am alive right now is because of my sweet cat Pippin. Yesterday was the anniversary of adopting him 2 years ago. \nI've been really depressed and haven't been able to play with him as much so hes been meowing and being a little naughty as a result. I got so mad yesterday and yelled at him. All I can think about now is how I should give him to someone healt...,nothing to live for the only reason i am alive right now is because of my sweet cat pippin. yesterday was the anniversary of adopting him 2 years ago. i've been really depressed and haven't been able to play with him as much so hes been meowing and being a little naughty as a result. i got so mad yesterday and yelled at him. all i can think about now is how i should give him to someone healthi...
27675,suicidewatch,Iâ€™m going to fucking kill myself 18 years too long. I think Iâ€™m going to go,iâ€™m going to fucking kill myself 18 years too long. i think iâ€™m going to go
27676,suicidewatch,Iâ€™m going to pieces All Iâ€™ve done for about a month has been lay in bed. I donâ€™t enjoy anything. Canâ€™t focus on anything. I am terrified of the future. I donâ€™t want to be alive. I am in so much emotional pain. The only reason I am alive is my father because I donâ€™t want to hurt him. That and I canâ€™t decide on a method. I think not wanting to hurt him keeps me from choosing a meth...,iâ€™m going to pieces all iâ€™ve done for about a month has been lay in bed. i donâ€™t enjoy anything. canâ€™t focus on anything. i am terrified of the future. i donâ€™t want to be alive. i am in so much emotional pain. the only reason i am alive is my father because i donâ€™t want to hurt him. that and i canâ€™t decide on a method. i think not wanting to hurt him keeps me from choosing a meth...


In [25]:
# Extract URLs into another column before removing them
# If we were to remove the URLs after remove the special characters e.g '//' the function would be ubable to detect the URLs
df['urls'] = df['clean_post'].apply(nfx.extract_urls)

df[['reddit_post', 'clean_post', 'urls']].sample(5)

Unnamed: 0,reddit_post,clean_post,urls
2625,"I had a panic attack on Christmas Eve and I'm not sure what to do next Hey, like the title says I had a pretty bad panic attack on Christmas Eve... And I've been feeling a lot of mixed emotions for the party week. I think I need some help, but I don't have too much money to spend on a therapist. \n\nMaybe I'll give a little backstory. I was at my uncle's place and there's a language barrier fo...","i had a panic attack on christmas eve and i'm not sure what to do next hey, like the title says i had a pretty bad panic attack on christmas eve... and i've been feeling a lot of mixed emotions for the party week. i think i need some help, but i don't have too much money to spend on a therapist. maybe i'll give a little backstory. i was at my uncle's place and there's a language barrier for me...",[]
6128,"How do you put things into perspective with social anxiety? I'm sorry if this is rambly. I'm seeing if this might help me lay out my thoughts a little and maybe get some advice. I'll throw down a tldr at the end :)\n\nRight now, I'm coping with multiple flavors of anxiety disorder (hypochondria, dp/dr, agoraphobia) that I feel, to some extent, I've achieved some kind of control over. I'm getti...","how do you put things into perspective with social anxiety? i'm sorry if this is rambly. i'm seeing if this might help me lay out my thoughts a little and maybe get some advice. i'll throw down a tldr at the end :) right now, i'm coping with multiple flavors of anxiety disorder (hypochondria, dp/dr, agoraphobia) that i feel, to some extent, i've achieved some kind of control over. i'm getting ...",[]
2440,"Long-time alcoholic with a question I've been a heavy drinking alcoholic for the better part of 15 years. There have only been two things in those years that have been able to keep me from getting blacked out wasted daily, the gym (former bodybuilder) and work. 4 years ago my brain surgery killed my bodybuilding dream and goals, emptied my bank account and put me in a bad spot. I quit the gym ...","long-time alcoholic with a question i've been a heavy drinking alcoholic for the better part of 15 years. there have only been two things in those years that have been able to keep me from getting blacked out wasted daily, the gym (former bodybuilder) and work. 4 years ago my brain surgery killed my bodybuilding dream and goals, emptied my bank account and put me in a bad spot. i quit the gym ...",[]
17321,"Iâ€™m not going to kill myself yet. But I do think that eventually, I will kill myself. Well, I pretty much know this. For some reason it just seems inevitable. I feel like nobody truly enjoys my presence. I feel like many of my friendships are formed from pity. And I donâ€™t really like many people. Iâ€™m not understood very well. I self deprecate so much I believe all I say. Sometimes i will...","iâ€™m not going to kill myself yet. but i do think that eventually, i will kill myself. well, i pretty much know this. for some reason it just seems inevitable. i feel like nobody truly enjoys my presence. i feel like many of my friendships are formed from pity. and i donâ€™t really like many people. iâ€™m not understood very well. i self deprecate so much i believe all i say. sometimes i will...",[]
16461,"Iâ€™m just over it Iâ€™m so over how everyone doesnâ€™t give a shit anymore. I give my all every damn day just to make others happy, make them laugh, help them with everything. But not one person ever ask how I am doing, cause if they did theyâ€™d know I just want to die. Iâ€™m over the fake laughs and smiles, Iâ€™m over pretending. But Iâ€™m scared. Iâ€™m so fucking scared to leave everything...","iâ€™m just over it iâ€™m so over how everyone doesnâ€™t give a shit anymore. i give my all every damn day just to make others happy, make them laugh, help them with everything. but not one person ever ask how i am doing, cause if they did theyâ€™d know i just want to die. iâ€™m over the fake laughs and smiles, iâ€™m over pretending. but iâ€™m scared. iâ€™m so fucking scared to leave everything...",[]


### <b>3.1.4 <span style='color:#F1A424'>|</span> Special Characters</b> 

In [26]:
# Remove special characters

df['clean_post'] = df['clean_post'].apply(nfx.remove_special_characters)

df[['reddit_post', 'clean_post']].sample(5)

Unnamed: 0,reddit_post,clean_post
25656,I don't know what to do I hurt someone I have affected a person for life someone that I cared about I hurt them in such a way that I can never be forgiven I don't see the point anymore in living I don't know what I'm going to do I don't know how long I can keep this shit up soon I just hope that it's quick or at least painless,i dont know what to do i hurt someone i have affected a person for life someone that i cared about i hurt them in such a way that i can never be forgiven i dont see the point anymore in living i dont know what im going to do i dont know how long i can keep this shit up soon i just hope that its quick or at least painless
19188,"Bipolar type I, can't stop doing and selling drugs. going to end it next month Ever since I was 6, I wanted to kill myself. I'm muslim, so suicide's something never to be considered nor talked about, to anyone, even a fucking therapist for fuck's sake... I have been dealing with alot lately, what with my wife running away with my 1 1/2 yo daughter, my drug abuse, my buying and selling drugs an...",bipolar type i cant stop doing and selling drugs going to end it next month ever since i was 6 i wanted to kill myself im muslim so suicides something never to be considered nor talked about to anyone even a fucking therapist for fucks sake i have been dealing with alot lately what with my wife running away with my 1 12 yo daughter my drug abuse my buying and selling drugs and cheap cigarettes...
10476,"fml I moved to a new state and I hate it. I miss my old friends and I miss my old school, I wish I could time travel and tell my parents absolutely not when they suggested moving. it's super expensive, so the chances of moving back are unlikely. i've always had more humor-based friendships so I don't know how to bring up how I feel. I feel like I desperately need to talk to them but I also don...",fml i moved to a new state and i hate it i miss my old friends and i miss my old school i wish i could time travel and tell my parents absolutely not when they suggested moving its super expensive so the chances of moving back are unlikely ive always had more humorbased friendships so i dont know how to bring up how i feel i feel like i desperately need to talk to them but i also dont want the...
8802,I am just going to kill myself. Thats pretty much it.,i am just going to kill myself thats pretty much it
12511,"Bit confused with how I am feeling \n\nHello, so I have been pondering with the idea for the past few weeks, and today I was supposed to see my Gp, I was awake and ready to leave and just ended up sitting on my bed and did not attend it. So tonight I have been preparing to kill myself. And have been on some sort of buzz of it, like hyper excited/happy about it, I'm aware I'm being somewhat odd...",bit confused with how i am feeling hello so i have been pondering with the idea for the past few weeks and today i was supposed to see my gp i was awake and ready to leave and just ended up sitting on my bed and did not attend it so tonight i have been preparing to kill myself and have been on some sort of buzz of it like hyper excitedhappy about it im aware im being somewhat odd with how i am...


### <b>3.1.5 <span style='color:#F1A424'>|</span> Multiple Whitespaces</b> 

In [27]:
# Remove whitespaces
df['clean_post'] = df['clean_post'].apply(nfx.remove_multiple_spaces)

df[['reddit_post', 'clean_post']].head()

Unnamed: 0,reddit_post,clean_post
0,"I don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the CSA, whatever. I'm a mess right now. I feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade I'm still a monster. As if I was someone who was built for being unlovable and despised, physically and emotionally, since I was born. I keep working and wor...",i dont know if it was the emotional neglect the psychological abuse medical abuse bullying the csa whatever im a mess right now i feel like a horrible monster that somewhat a lot of people see as attractive but under the facade im still a monster as if i was someone who was built for being unlovable and despised physically and emotionally since i was born i keep working and working but i still...
1,"See title.\n\nI used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. Now I am one of them.\n\nThrough years of hard depression and su.c.dal.ty thanks to cptsd I have stopped all my hobbies. I entrench myself in work and by now also meeting people and sometimes ob...",see title i used to be the person full of hobbies biking drawing reading writing walking gaming who really disliked people who never knew what to do with their free time and would be clingy now i am one of them through years of hard depression and sucdalty thanks to cptsd i have stopped all my hobbies i entrench myself in work and by now also meeting people and sometimes obligatory projects li...
2,"I was doing yoga for years as a tool to help me back into my body when I was feeling rough as a form of reconnection. I even went as far as becoming trained in teaching, doing a 200hr training. As my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (In retrospect I wonder if I was in fact being dissociated the whole time.)\n\nStarted a...",i was doing yoga for years as a tool to help me back into my body when i was feeling rough as a form of reconnection i even went as far as becoming trained in teaching doing a 200hr training as my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me in retrospect i wonder if i was in fact being dissociated the whole time started again recen...
3,The child me thought I made the right choice by listening to him. And he said as much. That I had finally done something right . Especially when they kept blaming me for all the things I did wrong . Anyone else ?,the child me thought i made the right choice by listening to him and he said as much that i had finally done something right especially when they kept blaming me for all the things i did wrong anyone else
4,"Women: What is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?",women what is the real situation of misogyny patriarchy sexual abuse and harassment in your country


### <b>3.1.6 <span style='color:#F1A424'>|</span> Emojis</b> 

In [28]:
# Remove emojis
df['clean_post'] = df['clean_post'].apply(nfx.remove_emojis)

df[['reddit_post', 'clean_post']].sample(5)

Unnamed: 0,reddit_post,clean_post
4287,"Ease physical responses? Whenever I have an â€œin the spotlight feelingâ€, my heart will start racing, my face will get red, and Iâ€™ll stutter, despite not being stressed in my mind. I donâ€™t care. Iâ€™m wondering if anyone else has gone through the same thing and found something that stops it. Beta blockers would probably help but I canâ€™t get a prescription.",ease physical responses whenever i have an in the spotlight feeling my heart will start racing my face will get red and ill stutter despite not being stressed in my mind i dont care im wondering if anyone else has gone through the same thing and found something that stops it beta blockers would probably help but i cant get a prescription
27348,"No motivation leading to suicidal thoughts Hi guys, I'm a 23 year old male in the final year of my degree. With everything going on I'm stuck at home and I find myself unmotivated to do any work for college or anything productive at all for. \n\nThis spirals into guilt and self loathing about not doing anything and now I'm idealising suicide. I have a loving family and great friends, but I can...",no motivation leading to suicidal thoughts hi guys im a 23 year old male in the final year of my degree with everything going on im stuck at home and i find myself unmotivated to do any work for college or anything productive at all for this spirals into guilt and self loathing about not doing anything and now im idealising suicide i have a loving family and great friends but i cant bring myse...
6348,"I wish I had the courage I wish I had the courage to do it, but I'm too scared",i wish i had the courage i wish i had the courage to do it but im too scared
13378,My boyfriend told me I wouldn't be able to commit suicide because I'm weak and would give in before I let it happen Its almost as if it was a test,my boyfriend told me i wouldnt be able to commit suicide because im weak and would give in before i let it happen its almost as if it was a test
22263,"need to talk to someone urgently i'm currently in a very big crisis and i really need someone to just listen for a bit, please please please don't let me do this, i really don't want to do it but i see no other way right now",need to talk to someone urgently im currently in a very big crisis and i really need someone to just listen for a bit please please please dont let me do this i really dont want to do it but i see no other way right now


### <b>3.1.7 <span style='color:#F1A424'>|</span> Contractions</b> 

In [29]:
pip install contractions

Note: you may need to restart the kernel to use updated packages.


In [30]:
import contractions

# Apply the contractions.fix function to the clean_tweet column
df['clean_post'] = df['clean_post'].apply(contractions.fix)

df[['reddit_post', 'clean_post']].head()

Unnamed: 0,reddit_post,clean_post
0,"I don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the CSA, whatever. I'm a mess right now. I feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade I'm still a monster. As if I was someone who was built for being unlovable and despised, physically and emotionally, since I was born. I keep working and wor...",i do not know if it was the emotional neglect the psychological abuse medical abuse bullying the csa whatever i am a mess right now i feel like a horrible monster that somewhat a lot of people see as attractive but under the facade i am still a monster as if i was someone who was built for being unlovable and despised physically and emotionally since i was born i keep working and working but i...
1,"See title.\n\nI used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. Now I am one of them.\n\nThrough years of hard depression and su.c.dal.ty thanks to cptsd I have stopped all my hobbies. I entrench myself in work and by now also meeting people and sometimes ob...",see title i used to be the person full of hobbies biking drawing reading writing walking gaming who really disliked people who never knew what to do with their free time and would be clingy now i am one of them through years of hard depression and sucdalty thanks to cptsd i have stopped all my hobbies i entrench myself in work and by now also meeting people and sometimes obligatory projects li...
2,"I was doing yoga for years as a tool to help me back into my body when I was feeling rough as a form of reconnection. I even went as far as becoming trained in teaching, doing a 200hr training. As my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (In retrospect I wonder if I was in fact being dissociated the whole time.)\n\nStarted a...",i was doing yoga for years as a tool to help me back into my body when i was feeling rough as a form of reconnection i even went as far as becoming trained in teaching doing a 200hr training as my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me in retrospect i wonder if i was in fact being dissociated the whole time started again recen...
3,The child me thought I made the right choice by listening to him. And he said as much. That I had finally done something right . Especially when they kept blaming me for all the things I did wrong . Anyone else ?,the child me thought i made the right choice by listening to him and he said as much that i had finally done something right especially when they kept blaming me for all the things i did wrong anyone else
4,"Women: What is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?",women what is the real situation of misogyny patriarchy sexual abuse and harassment in your country


### <b>3.1.8 <span style='color:#F1A424'>|</span> Stopwords</b> 

In [31]:
# Extract stopwords
df['clean_post'].apply(lambda x: nt.TextExtractor(x).extract_stopwords())

0                                                                                                                                                               [i, do, not, if, it, was, the, the, the, whatever, i, am, a, now, i, a, that, a, of, see, as, but, under, the, i, am, still, a, as, if, i, was, someone, who, was, for, being, and, and, since, i, was, i, keep, and, but, i, still, do, not, for, this]
1        [see, i, used, to, be, the, full, of, who, really, who, never, what, to, do, with, their, and, would, be, now, i, am, one, of, them, through, of, and, to, i, have, all, my, i, myself, in, and, by, now, also, and, sometimes, to, an, that, me, without, any, when, i, have, and, am, alone, i, on, the, and, do, nothing, i, i, even, myself, also, myself, a, for, being, this, sometimes, i, for, the, ...
2                                                                                                       [i, was, doing, for, as, a, to, me, back, into, my, when, i, was, as, a, of, i

In [32]:
# Remove the stop words

df['clean_post'] = df['clean_post'].apply(nfx.remove_stopwords)

df[['reddit_post', 'clean_post']].head()

Unnamed: 0,reddit_post,clean_post
0,"I don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the CSA, whatever. I'm a mess right now. I feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade I'm still a monster. As if I was someone who was built for being unlovable and despised, physically and emotionally, since I was born. I keep working and wor...",know emotional neglect psychological abuse medical abuse bullying csa mess right feel like horrible monster somewhat lot people attractive facade monster built unlovable despised physically emotionally born working working feel fit world
1,"See title.\n\nI used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. Now I am one of them.\n\nThrough years of hard depression and su.c.dal.ty thanks to cptsd I have stopped all my hobbies. I entrench myself in work and by now also meeting people and sometimes ob...",title person hobbies biking drawing reading writing walking gaming disliked people knew free time clingy years hard depression sucdalty thanks cptsd stopped hobbies entrench work meeting people obligatory projects like drivers license extent leaves free time free time lie couch think pity hate lot way watch netflix hours doom scroll reddit waste time browsing internet try sleep lot better inte...
2,"I was doing yoga for years as a tool to help me back into my body when I was feeling rough as a form of reconnection. I even went as far as becoming trained in teaching, doing a 200hr training. As my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (In retrospect I wonder if I was in fact being dissociated the whole time.)\n\nStarted a...",yoga years tool help body feeling rough form reconnection went far trained teaching 200hr training trauma symptoms peaked yoga actually start reverse effect dissociate retrospect wonder fact dissociated time started recently bad damn disconnect hard exercise tips
3,The child me thought I made the right choice by listening to him. And he said as much. That I had finally done something right . Especially when they kept blaming me for all the things I did wrong . Anyone else ?,child thought right choice listening said finally right especially kept blaming things wrong
4,"Women: What is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?",women real situation misogyny patriarchy sexual abuse harassment country


In [33]:
# Noise Scan after cleaning text
df['clean_post'].apply(lambda x: nt.TextFrame(x).noise_scan()['text_noise'])

0        0
1        0
2        0
3        0
4        0
        ..
27673    0
27674    0
27675    0
27676    0
27677    0
Name: clean_post, Length: 27666, dtype: int64

## <b>3.2 <span style='color:#F1A424'>|</span> Linguistic Processing (Clean Text)</b> 

+ Tokenization
+ Stemming / Lemmatization
+ Parts of Speech Tagging
+ Calculating Sentiment Based on Polarity & Subjectivity

### <b>3.2.1 <span style='color:#F1A424'>|</span> Tokenization</b> 

In [34]:
test_sample = df['clean_post'].loc[12827]

test_sample

'close life think wrong lot pain losing girlfriend realizing real friends family uncaring suicidal broke suicidal pain great find alleviate pain try feel better working tell moment'

In [35]:
from nltk.tokenize import RegexpTokenizer

basic_token_pattern = r"(?u)\b\w\w+\b"

tokenizer = RegexpTokenizer(basic_token_pattern)

tokenizer.tokenize(test_sample)

['close',
 'life',
 'think',
 'wrong',
 'lot',
 'pain',
 'losing',
 'girlfriend',
 'realizing',
 'real',
 'friends',
 'family',
 'uncaring',
 'suicidal',
 'broke',
 'suicidal',
 'pain',
 'great',
 'find',
 'alleviate',
 'pain',
 'try',
 'feel',
 'better',
 'working',
 'tell',
 'moment']

In [36]:
# Tokenise the clean_tweet column
df['preprocessed_post'] = df['clean_post'].apply(lambda x: tokenizer.tokenize(x))

# df.iloc[100]["preprocessed_tweet"][:20]

In [37]:
df[['clean_post', 'preprocessed_post']].iloc[100]

clean_post                      curious immense pressure fear paranoia strings attached thought feeling word
preprocessed_post    [curious, immense, pressure, fear, paranoia, strings, attached, thought, feeling, word]
Name: 100, dtype: object

In [1]:
df

NameError: name 'df' is not defined

### <b>3.2.2 <span style='color:#F1A424'>|</span> Lemmatization</b> 

In [38]:
import nltk
nltk.download('wordnet')


[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [39]:
# Define a function to lemmatise the tokens
def lemmatise_tokens(tokens):
    lemmatizer = nltk.stem.WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in tokens]
    return lemmatized_tokens

# Lemmatise the tokens
# Lemmatise the tokens
df['lemma_preprocessed_post'] = df['preprocessed_post'].apply(lambda x: lemmatise_tokens(x))
 
# df.iloc[100]["preprocessed_tweet"][:20]
    

In [40]:
df[['clean_post', 'lemma_preprocessed_post']].iloc[260]

clean_post                                                     mom cruel pos growing neglected invalidated hated mewhen got depressed child treated worse burden stand suicidal cut arms mom told ahead encouraged sucide mother loves child matter mad hurts mother like friends partner family wish god loving mother leastits wanted
lemma_preprocessed_post    [mom, cruel, po, growing, neglected, invalidated, hated, mewhen, got, depressed, child, treated, worse, burden, stand, suicidal, cut, arm, mom, told, ahead, encouraged, sucide, mother, love, child, matter, mad, hurt, mother, like, friend, partner, family, wish, god, loving, mother, leastits, wanted]
Name: 260, dtype: object

In [41]:
# Define a function to stem the tokens
def stem_tokens(tokens):
    stemmer = nltk.stem.PorterStemmer()
    stemmed_tokens = [stemmer.stem(token) for token in tokens]
    return stemmed_tokens

# Stem the tokens
df['stemma_preprocessed_post'] = df['preprocessed_post'].apply(lambda x: stem_tokens(x))

# df.iloc[100]["preprocessed_tweet"][:20]

In [42]:
df[['clean_post', 'stemma_preprocessed_post']].iloc[260]

clean_post                             mom cruel pos growing neglected invalidated hated mewhen got depressed child treated worse burden stand suicidal cut arms mom told ahead encouraged sucide mother loves child matter mad hurts mother like friends partner family wish god loving mother leastits wanted
stemma_preprocessed_post    [mom, cruel, po, grow, neglect, invalid, hate, mewhen, got, depress, child, treat, wors, burden, stand, suicid, cut, arm, mom, told, ahead, encourag, sucid, mother, love, child, matter, mad, hurt, mother, like, friend, partner, famili, wish, god, love, mother, leastit, want]
Name: 260, dtype: object

### <b>3.2.3 <span style='color:#F1A424'>|</span> Calculating Sentiment Based on Polarity & Subjectivity</b>

TextBlob is a Python library for processing textual data, including sentiment analysis. It uses natural language processing (NLP) and the Natural Language Toolkit (NLTK) to achieve its tasks. When a sentence is passed into TextBlob, it returns two outputs: polarity and subjectivity. The polarity score is a float within the range [-1, 1], where -1 indicates a negative sentiment and 1 indicates a positive sentiment. The subjectivity score is a float within the range, where 0 is very objective and 1 is very subjective.

In [43]:
pip install textblob





In [44]:
from textblob import TextBlob

# Create a function to get the subjectivity
def getSubjectivity(text):
  return TextBlob(text).sentiment.subjectivity

# Create a function to get the polarity
def getPolarity(text):
  return TextBlob(text).sentiment.polarity

# Create two new columns 'Subjectivity' & 'Polarity'
df['Subjectivity'] = df['clean_post'].apply(getSubjectivity)
df['Polarity'] = df['clean_post'].apply(getPolarity)

# Show the new dataframe with columns 'Subjectivity' & 'Polarity'
df[['clean_post','Subjectivity','Polarity']].head()

Unnamed: 0,clean_post,Subjectivity,Polarity
0,know emotional neglect psychological abuse medical abuse bullying csa mess right feel like horrible monster somewhat lot people attractive facade monster built unlovable despised physically emotionally born working working feel fit world,0.50119,0.034524
1,title person hobbies biking drawing reading writing walking gaming disliked people knew free time clingy years hard depression sucdalty thanks cptsd stopped hobbies entrench work meeting people obligatory projects like drivers license extent leaves free time free time lie couch think pity hate lot way watch netflix hours doom scroll reddit waste time browsing internet try sleep lot better inte...,0.528098,0.00455
2,yoga years tool help body feeling rough form reconnection went far trained teaching 200hr training trauma symptoms peaked yoga actually start reverse effect dissociate retrospect wonder fact dissociated time started recently bad damn disconnect hard exercise tips,0.541667,-0.198333
3,child thought right choice listening said finally right especially kept blaming things wrong,0.742857,0.017857
4,women real situation misogyny patriarchy sexual abuse harassment country,0.566667,0.35


In [45]:
# Create a function to compute the negative, positive and nuetral analysis
def getAnalysis(score):
  if score < 0:
    return 'Negative'
  elif score == 0:
    return 'Neutral'
  else:
    return 'Positive'
  
df['sentiment'] = df['Polarity'].apply(getAnalysis)

# Show the dataframe
df[['reddit_post','clean_post','Subjectivity','Polarity','sentiment']].head()

Unnamed: 0,reddit_post,clean_post,Subjectivity,Polarity,sentiment
0,"I don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the CSA, whatever. I'm a mess right now. I feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade I'm still a monster. As if I was someone who was built for being unlovable and despised, physically and emotionally, since I was born. I keep working and wor...",know emotional neglect psychological abuse medical abuse bullying csa mess right feel like horrible monster somewhat lot people attractive facade monster built unlovable despised physically emotionally born working working feel fit world,0.50119,0.034524,Positive
1,"See title.\n\nI used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. Now I am one of them.\n\nThrough years of hard depression and su.c.dal.ty thanks to cptsd I have stopped all my hobbies. I entrench myself in work and by now also meeting people and sometimes ob...",title person hobbies biking drawing reading writing walking gaming disliked people knew free time clingy years hard depression sucdalty thanks cptsd stopped hobbies entrench work meeting people obligatory projects like drivers license extent leaves free time free time lie couch think pity hate lot way watch netflix hours doom scroll reddit waste time browsing internet try sleep lot better inte...,0.528098,0.00455,Positive
2,"I was doing yoga for years as a tool to help me back into my body when I was feeling rough as a form of reconnection. I even went as far as becoming trained in teaching, doing a 200hr training. As my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (In retrospect I wonder if I was in fact being dissociated the whole time.)\n\nStarted a...",yoga years tool help body feeling rough form reconnection went far trained teaching 200hr training trauma symptoms peaked yoga actually start reverse effect dissociate retrospect wonder fact dissociated time started recently bad damn disconnect hard exercise tips,0.541667,-0.198333,Negative
3,The child me thought I made the right choice by listening to him. And he said as much. That I had finally done something right . Especially when they kept blaming me for all the things I did wrong . Anyone else ?,child thought right choice listening said finally right especially kept blaming things wrong,0.742857,0.017857,Positive
4,"Women: What is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?",women real situation misogyny patriarchy sexual abuse harassment country,0.566667,0.35,Positive


In [46]:
df['sentiment'].value_counts()

Negative    14784
Positive    11533
Neutral      1349
Name: sentiment, dtype: int64

In [47]:
# df['sentiment'].value_counts()

In [48]:
df

Unnamed: 0,Subreddit,reddit_post,clean_post,urls,preprocessed_post,lemma_preprocessed_post,stemma_preprocessed_post,Subjectivity,Polarity,sentiment
0,CPTSD,"I don't know if it was the emotional neglect, the psychological abuse, medical abuse, bullying, the CSA, whatever. I'm a mess right now. I feel like a horrible monster that somewhat a lot of people see as attractive, but under the facade I'm still a monster. As if I was someone who was built for being unlovable and despised, physically and emotionally, since I was born. I keep working and wor...",know emotional neglect psychological abuse medical abuse bullying csa mess right feel like horrible monster somewhat lot people attractive facade monster built unlovable despised physically emotionally born working working feel fit world,[],"[know, emotional, neglect, psychological, abuse, medical, abuse, bullying, csa, mess, right, feel, like, horrible, monster, somewhat, lot, people, attractive, facade, monster, built, unlovable, despised, physically, emotionally, born, working, working, feel, fit, world]","[know, emotional, neglect, psychological, abuse, medical, abuse, bullying, csa, mess, right, feel, like, horrible, monster, somewhat, lot, people, attractive, facade, monster, built, unlovable, despised, physically, emotionally, born, working, working, feel, fit, world]","[know, emot, neglect, psycholog, abus, medic, abus, bulli, csa, mess, right, feel, like, horribl, monster, somewhat, lot, peopl, attract, facad, monster, built, unlov, despis, physic, emot, born, work, work, feel, fit, world]",0.501190,0.034524,Positive
1,CPTSD,"See title.\n\nI used to be the person full of hobbies (biking, drawing, reading, writing, walking, gaming) who really disliked people who never knew what to do with their free time and would be clingy. Now I am one of them.\n\nThrough years of hard depression and su.c.dal.ty thanks to cptsd I have stopped all my hobbies. I entrench myself in work and by now also meeting people and sometimes ob...",title person hobbies biking drawing reading writing walking gaming disliked people knew free time clingy years hard depression sucdalty thanks cptsd stopped hobbies entrench work meeting people obligatory projects like drivers license extent leaves free time free time lie couch think pity hate lot way watch netflix hours doom scroll reddit waste time browsing internet try sleep lot better inte...,[],"[title, person, hobbies, biking, drawing, reading, writing, walking, gaming, disliked, people, knew, free, time, clingy, years, hard, depression, sucdalty, thanks, cptsd, stopped, hobbies, entrench, work, meeting, people, obligatory, projects, like, drivers, license, extent, leaves, free, time, free, time, lie, couch, think, pity, hate, lot, way, watch, netflix, hours, doom, scroll, reddit, wa...","[title, person, hobby, biking, drawing, reading, writing, walking, gaming, disliked, people, knew, free, time, clingy, year, hard, depression, sucdalty, thanks, cptsd, stopped, hobby, entrench, work, meeting, people, obligatory, project, like, driver, license, extent, leaf, free, time, free, time, lie, couch, think, pity, hate, lot, way, watch, netflix, hour, doom, scroll, reddit, waste, time,...","[titl, person, hobbi, bike, draw, read, write, walk, game, dislik, peopl, knew, free, time, clingi, year, hard, depress, sucdalti, thank, cptsd, stop, hobbi, entrench, work, meet, peopl, obligatori, project, like, driver, licens, extent, leav, free, time, free, time, lie, couch, think, piti, hate, lot, way, watch, netflix, hour, doom, scroll, reddit, wast, time, brows, internet, tri, sleep, lo...",0.528098,0.004550,Positive
2,CPTSD,"I was doing yoga for years as a tool to help me back into my body when I was feeling rough as a form of reconnection. I even went as far as becoming trained in teaching, doing a 200hr training. As my trauma symptoms peaked however yoga would actually start having the reverse effect and would dissociate me. (In retrospect I wonder if I was in fact being dissociated the whole time.)\n\nStarted a...",yoga years tool help body feeling rough form reconnection went far trained teaching 200hr training trauma symptoms peaked yoga actually start reverse effect dissociate retrospect wonder fact dissociated time started recently bad damn disconnect hard exercise tips,[],"[yoga, years, tool, help, body, feeling, rough, form, reconnection, went, far, trained, teaching, 200hr, training, trauma, symptoms, peaked, yoga, actually, start, reverse, effect, dissociate, retrospect, wonder, fact, dissociated, time, started, recently, bad, damn, disconnect, hard, exercise, tips]","[yoga, year, tool, help, body, feeling, rough, form, reconnection, went, far, trained, teaching, 200hr, training, trauma, symptom, peaked, yoga, actually, start, reverse, effect, dissociate, retrospect, wonder, fact, dissociated, time, started, recently, bad, damn, disconnect, hard, exercise, tip]","[yoga, year, tool, help, bodi, feel, rough, form, reconnect, went, far, train, teach, 200hr, train, trauma, symptom, peak, yoga, actual, start, revers, effect, dissoci, retrospect, wonder, fact, dissoci, time, start, recent, bad, damn, disconnect, hard, exercis, tip]",0.541667,-0.198333,Negative
3,CPTSD,The child me thought I made the right choice by listening to him. And he said as much. That I had finally done something right . Especially when they kept blaming me for all the things I did wrong . Anyone else ?,child thought right choice listening said finally right especially kept blaming things wrong,[],"[child, thought, right, choice, listening, said, finally, right, especially, kept, blaming, things, wrong]","[child, thought, right, choice, listening, said, finally, right, especially, kept, blaming, thing, wrong]","[child, thought, right, choic, listen, said, final, right, especi, kept, blame, thing, wrong]",0.742857,0.017857,Positive
4,CPTSD,"Women: What is the real situation of misogyny, patriarchy, sexual abuse and harassment in your country?",women real situation misogyny patriarchy sexual abuse harassment country,[],"[women, real, situation, misogyny, patriarchy, sexual, abuse, harassment, country]","[woman, real, situation, misogyny, patriarchy, sexual, abuse, harassment, country]","[women, real, situat, misogyni, patriarchi, sexual, abus, harass, countri]",0.566667,0.350000,Positive
...,...,...,...,...,...,...,...,...,...,...
27673,suicidewatch,"But I want help doing it Tired of everyone saying no don't.\n\n*Get up so I can punch you again*\n\nI'm exhausted I want to clock out and be done everything just keeps getting worse, everything keeps losing value including myself- as if any of that even mattered.\n\nFuck the helpline I want real help I want help out",want help tired saying punch exhausted want clock keeps getting worse keeps losing value including mattered fuck helpline want real help want help,[],"[want, help, tired, saying, punch, exhausted, want, clock, keeps, getting, worse, keeps, losing, value, including, mattered, fuck, helpline, want, real, help, want, help]","[want, help, tired, saying, punch, exhausted, want, clock, keep, getting, worse, keep, losing, value, including, mattered, fuck, helpline, want, real, help, want, help]","[want, help, tire, say, punch, exhaust, want, clock, keep, get, wors, keep, lose, valu, includ, matter, fuck, helplin, want, real, help, want, help]",0.580000,-0.280000,Negative
27674,suicidewatch,Nothing to live for The ONLY reason I am alive right now is because of my sweet cat Pippin. Yesterday was the anniversary of adopting him 2 years ago. \nI've been really depressed and haven't been able to play with him as much so hes been meowing and being a little naughty as a result. I got so mad yesterday and yelled at him. All I can think about now is how I should give him to someone healt...,live reason alive right sweet cat pippin yesterday anniversary adopting 2 years ago depressed able play hes meowing little naughty result got mad yesterday yelled think healthier stable away live boyfriend hes far away friends spoken mother 17 dad disowned political opinion accomplished wish job guess minimum wage qualifications experience customer service caregiving waste resources hate skin ...,[],"[live, reason, alive, right, sweet, cat, pippin, yesterday, anniversary, adopting, years, ago, depressed, able, play, hes, meowing, little, naughty, result, got, mad, yesterday, yelled, think, healthier, stable, away, live, boyfriend, hes, far, away, friends, spoken, mother, 17, dad, disowned, political, opinion, accomplished, wish, job, guess, minimum, wage, qualifications, experience, custom...","[live, reason, alive, right, sweet, cat, pippin, yesterday, anniversary, adopting, year, ago, depressed, able, play, he, meowing, little, naughty, result, got, mad, yesterday, yelled, think, healthier, stable, away, live, boyfriend, he, far, away, friend, spoken, mother, 17, dad, disowned, political, opinion, accomplished, wish, job, guess, minimum, wage, qualification, experience, customer, s...","[live, reason, aliv, right, sweet, cat, pippin, yesterday, anniversari, adopt, year, ago, depress, abl, play, he, meow, littl, naughti, result, got, mad, yesterday, yell, think, healthier, stabl, away, live, boyfriend, he, far, away, friend, spoken, mother, 17, dad, disown, polit, opinion, accomplish, wish, job, guess, minimum, wage, qualif, experi, custom, servic, caregiv, wast, resourc, hate...",0.574048,0.023063,Positive
27675,suicidewatch,Iâ€™m going to fucking kill myself 18 years too long. I think Iâ€™m going to go,going fucking kill 18 years long think going,[],"[going, fucking, kill, 18, years, long, think, going]","[going, fucking, kill, 18, year, long, think, going]","[go, fuck, kill, 18, year, long, think, go]",0.600000,-0.325000,Negative
27676,suicidewatch,Iâ€™m going to pieces All Iâ€™ve done for about a month has been lay in bed. I donâ€™t enjoy anything. Canâ€™t focus on anything. I am terrified of the future. I donâ€™t want to be alive. I am in so much emotional pain. The only reason I am alive is my father because I donâ€™t want to hurt him. That and I canâ€™t decide on a method. I think not wanting to hurt him keeps me from choosing a meth...,going pieces month lay bed enjoy focus terrified future want alive emotional pain reason alive father want hurt decide method think wanting hurt keeps choosing method future father longer think choice end life scared wish sleep wake hate alive,[],"[going, pieces, month, lay, bed, enjoy, focus, terrified, future, want, alive, emotional, pain, reason, alive, father, want, hurt, decide, method, think, wanting, hurt, keeps, choosing, method, future, father, longer, think, choice, end, life, scared, wish, sleep, wake, hate, alive]","[going, piece, month, lay, bed, enjoy, focus, terrified, future, want, alive, emotional, pain, reason, alive, father, want, hurt, decide, method, think, wanting, hurt, keep, choosing, method, future, father, longer, think, choice, end, life, scared, wish, sleep, wake, hate, alive]","[go, piec, month, lay, bed, enjoy, focu, terrifi, futur, want, aliv, emot, pain, reason, aliv, father, want, hurt, decid, method, think, want, hurt, keep, choos, method, futur, father, longer, think, choic, end, life, scare, wish, sleep, wake, hate, aliv]",0.437500,-0.012500,Negative


In [49]:
df['preprocessed_post']

0                                                                                                                                         [know, emotional, neglect, psychological, abuse, medical, abuse, bullying, csa, mess, right, feel, like, horrible, monster, somewhat, lot, people, attractive, facade, monster, built, unlovable, despised, physically, emotionally, born, working, working, feel, fit, world]
1        [title, person, hobbies, biking, drawing, reading, writing, walking, gaming, disliked, people, knew, free, time, clingy, years, hard, depression, sucdalty, thanks, cptsd, stopped, hobbies, entrench, work, meeting, people, obligatory, projects, like, drivers, license, extent, leaves, free, time, free, time, lie, couch, think, pity, hate, lot, way, watch, netflix, hours, doom, scroll, reddit, wa...
2                                                                                                          [yoga, years, tool, help, body, feeling, rough, form, reconnection, went, f

In [50]:
df['lemma_preprocessed_post'] = df['lemma_preprocessed_post'].apply(lambda x: ' '.join(x))

In [51]:
df['stemma_preprocessed_post'] = df['stemma_preprocessed_post'].apply(lambda x: ' '.join(x))

df['preprocessed_post'] = df['preprocessed_post'].apply(lambda x: ' '.join(x))

In [52]:
df['preprocessed_post']

0                                                                                                                                                                          know emotional neglect psychological abuse medical abuse bullying csa mess right feel like horrible monster somewhat lot people attractive facade monster built unlovable despised physically emotionally born working working feel fit world
1        title person hobbies biking drawing reading writing walking gaming disliked people knew free time clingy years hard depression sucdalty thanks cptsd stopped hobbies entrench work meeting people obligatory projects like drivers license extent leaves free time free time lie couch think pity hate lot way watch netflix hours doom scroll reddit waste time browsing internet try sleep lot better inte...
2                                                                                                                                                yoga years tool help body feeling rou

In [53]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27666 entries, 0 to 27677
Data columns (total 10 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Subreddit                 27666 non-null  object 
 1   reddit_post               27666 non-null  object 
 2   clean_post                27666 non-null  object 
 3   urls                      27666 non-null  object 
 4   preprocessed_post         27666 non-null  object 
 5   lemma_preprocessed_post   27666 non-null  object 
 6   stemma_preprocessed_post  27666 non-null  object 
 7   Subjectivity              27666 non-null  float64
 8   Polarity                  27666 non-null  float64
 9   sentiment                 27666 non-null  object 
dtypes: float64(2), object(8)
memory usage: 3.6+ MB


In [54]:
# save the dataframe to csv using the name 'interim_data.csv' fo the data folder
# df.to_csv('interim_data.csv', index=False)