<img src="../../Img/backdrop-wh.png" alt="Drawing" style="width: 300px;"/>

<div style="display: block; width: 100%; height: 100px;">

<p style="float: left;">
    <span style="font-weight: bold; font-size: 20px;">
        DIGHUM160 - Critical Digital Humanities 
        <br />
        Instructor: Tom van Nuenen<br />
        Final Project
    </span>
</p>

**Project title:** ```Understanding Abortion Discourse: A Computational Analysis of Personal Narratives on r/abortion```

**Research Question:** ```How do discussions on the r/abortion subreddit reflect the physical and emotional dimensions of the abortion experience, and what does this reveal about the needs and concerns of individuals seeking or having undergone an abortion?```

**Student name:** ```Dream Lopez```

In [2]:
# set up your environment

%pip install swifter
import pandas as pd
import json
import spacy
from tqdm import tqdm 
import numpy as np

from gensim import corpora, models, similarities
from gensim.models.coherencemodel import CoherenceModel
from gensim.models.ldamodel import LdaModel
%pip install pyLDAvis
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis

# < import other packages here >

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Introduction

The topic of abortion remains one of the most contentious and deeply personal issues in contemporary discourse, particularly following significant legal and political shifts such as the overturning of Roe v. Wade. This context has amplified the urgency to understand how abortion is perceived and discussed in various spaces. As a Data Science and Public Health major with a focus on female healthcare and gynecology, my interest lies in exploring how digital platforms, specifically the r/abortion subreddit, contribute to the broader dialogue surrounding abortion.

The r/abortion subreddit serves as a critical online community where individuals share their personal narratives and seek support related to abortion. This subreddit has emerged as a vital resource, offering a supportive environment that is heavily moderated to exclude harmful content. Users come to this space not only for advice and emotional support but also to share their stories, which range from deeply personal experiences to practical advice on navigating the abortion process. This subreddit is meant to be a supportive oasis for those who need help navigating the scary experience of an accidental pregnancy and truly just want support while making that difficult decision. Every poster is warned with a message that they may receive hateful messages and how to deal with them, as well as given countless resources, and access to stories from other individuals categorized into the type of abortion and the trimester that person was in when they received the abortion. Comments on each submission are kind and helpful, with responders assisting the posters with both mental and physical pain from the abortion experience.

Previous research has been conducted to understand abortion experiences on Reddit and understand the best language processing methods. One study, which specifically focuses on experiences during the COVID-19 pandemic, found themes of COVID-19 related barriers, quarantine driven privacy challenges, and change in delivery of abortion services leaving patients more isolated (Jacques et al.). Although this study is specific to the time during the pandemic, I believe we may find some residual effects on abortion patients. I believe that during the pandemic the issues, concerns, and topics discussed by those within this subreddit were exacerbated meaning these have always been important themes to them. A different study gave me insight into the natural language processing methods and social media mining that can be used within r/abortion to assess themes and affects. They found “neural network–based topic modeling pipeline (BERTopic)” (Valdez et al.) to be effective, as well as iterative coherence and Text2Emotion to conduct “positive, neutral, and negative affect and an emotion analysis” (Valdez et al.). This is important information for me throughout this project, as I know that topic modeling and looking into the types of emotional language present in posts will allow for a more nuanced analysis. 

This project aims to investigate how these narratives and personal stories contribute to the collective understanding of abortion and influence the stigma surrounding it. By analyzing the content shared on this subreddit, I seek to uncover how individual experiences shape and reflect broader societal attitudes towards abortion. The goal is to understand how these stories either challenge or reinforce existing stigmas and perceptions, providing insights into the ways in which online communities can affect public discourse on sensitive topics.
Through this analysis, I hope to reveal the ways in which personal stories on r/abortion contribute to a nuanced collective understanding of abortion and assess their role in shaping or mitigating stigma associated with the procedure. 

## Analysis

### Pre-Processing the Data

In [4]:
nlp = spacy.load("en_core_web_sm")

def spacy_preprocess(text):
    doc = nlp(text)

    #lemmatize, remove stop words, and remove punctuation/special characters
    tokens = [token.lemma_ for token in doc if not token.is_stop and not token.is_punct and not token.like_url]

    #join tokens back to a single string
    return ' '.join(tokens)

def preprocess_parallel(data, num_partitions, num_workers):
    data_split = np.array_split(data, num_partitions)
    pool = Pool(num_workers)
    data = pd.concat(pool.map(spacy_preprocess, data_split))
    pool.close()
    pool.join()
    return data

In [6]:
#loading r/abortion submissions
submissions = pd.read_csv('data/submissions.csv')

#drop columns that we don't need
submissions = submissions.drop(['self', 'nsfw', 'url', 'subreddit', 'augmented_at', 'augmented_count'], axis=1)

#select rows that don't have 'removed' or 'deleted' as the selftext
submissions = submissions.loc[~submissions['selftext'].isin(['[removed]', '[deleted]' ]),:]

#select all rows that have >3 characters in selftext
submissions = submissions.loc[submissions['selftext'].str.len() > 3]

#drop null values in selftext
submissions = submissions.dropna(subset=['selftext'])

submissions.shape

  submissions = pd.read_csv('data/submissions.csv')


(27419, 12)

In [8]:
#apply preprocessing to the submissions 'selftext' column (this takes a LONG time)
submissions['processed_selftext'] = submissions['selftext'].apply(spacy_preprocess)

submissions.sample(5)

Unnamed: 0,idint,idstr,created,author,title,selftext,score,distinguish,textlen,num_comments,flair_text,flair_css_class,processed_selftext
26594,1766524358,t3_t7qqx2,1646540047,Adept-Seesaw142,I regret my decision,In November 2021 I was still with my ex at the...,4,,9,3,,,November 2021 ex time birth control pull metho...
16674,1254096349,t3_kqnmxp,1609811316,AcanthocephalaAny467,Help!! Just vaginally inserted pills!,I just vaginally inserted the 4 pills and I’m ...,4,,200,22,,,vaginally insert 4 pill scared nervous come sh...
26903,1780302155,t3_tfy1xn,1647478781,Im-a-lonely-dirtbag,Regret,I did it. I got the abortion. I wish I didn’t....,1,,666,4,,,get abortion wish parent check location find m...
7598,731587848,t3_c3kgso,1561172810,aea0825,24 Hours After Medical,"I am in SO much pain right now, more than I wa...",3,,394,7,,,pain right yesterday early today 6 week AWFUL ...
15895,1218060581,t3_k579k5,1606906923,Lovecats-3,Obsessing over the what if's.,So I had my abortion a few months ago in secre...,3,,910,3,,,abortion month ago secret tell father miscarry...


In [38]:
submissions.to_csv('data/submissions_processed.csv', index=False)
submissions_processed = pd.read_csv('data/submissions_processed.csv')

In [30]:
#loading r/abortion comments
comments = pd.read_csv('data/comments.csv')

#drop columns that we don't need
comments = comments.drop(['subreddit'], axis=1)

#select rows that don't have 'removed' or 'deleted' as the body
comments = comments.loc[~comments['body'].isin(['[removed]', '[body]' ]),:]

#select all rows that have >3 characters in body
comments = comments.loc[comments['body'].str.len() > 10]

#drop null values in body
comments = comments.dropna(subset=['body'])

#only look at popular comments
comments = comments.loc[comments['score'] > 4]

comments.shape

  comments = pd.read_csv('data/comments.csv')


(31287, 10)

In [32]:
#apply preprocessing to the comments 'body' column (this takes a while)
comments['processed_body'] = comments['body'].apply(spacy_preprocess)

comments.sample(5)

Unnamed: 0,idint,idstr,created,author,parent,submission,body,score,distinguish,textlen,processed_body
108239,35003870227,t1_g2wef4j,1598440939,kv617,t3_igv0rd,t3_igv0rd,YES! It is 1000% OK to not tell your abuser. I...,16.0,,269.0,yes 1000 ok tell abuser sorry situation know r...
152118,37073246323,t1_h14gbxv,1623219675,Typical-Foundation-6,t3_nvmbyo,t3_nvmbyo,I actually came on here to also post. I’m abou...,10.0,,963.0,actually come post abortion day feel conflicte...
63343,33152899770,t1_f8adp6i,1574394434,grace64123,t3_dzuuyg,t3_dzuuyg,This is not a decision anyone can make for you...,18.0,,852.0,decision need think long hard ok raise child f...
208871,39595800091,t1_i6ubevv,1651358595,TrustedAdult,t3_ufkikw,t3_ufkikw,I see a_a directed you to others' stories alre...,7.0,,829.0,a_a direct story let know question read \n\n...
236591,40345521804,t1_ij8ojv0,1659825283,[deleted],t3_whtcuj,t3_whtcuj,I can't tell you what to do but just tell you ...,6.0,,9.0,tell tell story situation 20 college pregnant ...


In [40]:
comments.to_csv('data/comments_processed.csv', index=False)
comments_processed = pd.read_csv('data/comments_processed.csv')

After loading the datasets containing submissions and comments from the r/abortion subreddit, I performed several preprocessing steps to clean and refine the data for analysis. Firstly I removed unnecessary columns, filtering out entries that had been removed or deleted, and eliminating any null values. When focusing on submissions and comments with substantial text content, we can ensure that the analysis would be based on meaningful and relevant discussions within the subreddit.

To further refine the text, I utilized SpaCy, a natural language processing library, to perform additional preprocessing tasks. SpaCY preforms lemmatization, which reduces words to their base forms, and the removal of stop words and punctuation to focus on the most significant parts of the text. These steps helped normalize the data, making it suitable for our computational analysis.

These comprehensive pre-processing steps are crucial for ensuring that the data is in a suitable format for the application of various natural language processing (NLP) techniques. With this cleaned and normalized dataset, I can now proceed to apply distant reading methods, such as topic modeling and word embeddings, to uncover themes, patterns, and sentiments within the subreddit. Additionally, I will conduct close reading of specific posts and comments to explore the emotional and psychological dimensions of the discourse.

### Topic Modeling

In [74]:
from tqdm import tqdm 

submissions_processed = submissions_processed.dropna(subset=['processed_selftext'])
submissions_processed['processed_selftext'] = submissions_processed['processed_selftext'].astype(str)
lemmas_split = [lemma.split() for lemma in tqdm(submissions_processed['processed_selftext'])]

100%|█████████████████████████████████| 27390/27390 [00:00<00:00, 157248.51it/s]


In [76]:
from gensim import corpora, models, similarities
from gensim.models.coherencemodel import CoherenceModel

# Create Dictionary 
dictionary = corpora.Dictionary(tqdm(lemmas_split))

# filter extremes and assign new ids
dictionary.filter_extremes(no_below=10, no_above=0.4)
dictionary.compactify() 

# SAVE DICT
dictionary.save('../../data/abortion.db')

# Create Document-Term Matrix of our whole corpus 
corpus = [dictionary.doc2bow(text) for text in tqdm(lemmas_split)]

100%|██████████████████████████████████| 27390/27390 [00:01<00:00, 16819.67it/s]
100%|██████████████████████████████████| 27390/27390 [00:00<00:00, 28000.15it/s]


In [78]:
from gensim.models.ldamodel import LdaModel

%time
lda_model = LdaModel(corpus=tqdm(corpus),   # stream of document vectors or sparse matrix of shape
            id2word=dictionary,       # mapping from word IDs to words (for determining vocab size)
            num_topics=10,            # amount of topics
            random_state=100,         # seed to generate random state; useful for reproducibility
            passes=2,                 # amount of iterations/epochs 
            per_word_topics=False)    # computing most-likely topics for each word 

CPU times: user 3 μs, sys: 1 μs, total: 4 μs
Wall time: 8.82 μs


100%|███████████████████████████████████| 27390/27390 [00:09<00:00, 2977.55it/s]


In [80]:
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
pyLDAvis.enable_notebook()

# feed the LDA model into the pyLDAvis instance
lda_viz = gensimvis.prepare(lda_model, corpus, dictionary, n_jobs=1)
lda_viz

In [122]:
topics = lda_model.show_topics(num_topics=-1, formatted=False)
topics

[(0,
  [('pill', 0.024760585),
   ('period', 0.017285448),
   ('help', 0.014103417),
   ('sex', 0.011986713),
   ('plan', 0.011636759),
   ('need', 0.011061965),
   ('get', 0.010973378),
   ('take', 0.010408051),
   ('find', 0.010284611),
   ('test', 0.009492802)]),
 (1,
  [('MA', 0.034167793),
   ('SA', 0.02475547),
   ('ma', 0.023582254),
   ('say', 0.019119488),
   ('get', 0.01422884),
   ('tell', 0.013869076),
   ('iud', 0.0114279585),
   ('ask', 0.011247208),
   ('cry', 0.009316007),
   ('think', 0.008930413)]),
 (2,
  [('pregnancy', 0.036923602),
   ('test', 0.030861923),
   ('doctor', 0.01565668),
   ('ultrasound', 0.014004076),
   ('positive', 0.013347966),
   ('take', 0.01047859),
   ('period', 0.01014441),
   ('birth', 0.009899485),
   ('get', 0.008615969),
   ('control', 0.008308499)]),
 (3,
  [('take', 0.026002312),
   ('pill', 0.02092237),
   ('bleed', 0.018857727),
   ('cramp', 0.018450862),
   ('hour', 0.015615141),
   ('period', 0.015543324),
   ('pain', 0.014293937),
 

Out topic modeling results have shown some recurring themes in the abortion discourse on the r/abortion subreddit. These topics reflect distinct aspects of the conversations occurring in this online community. I have categorized these into: medical concerns, emotional responses, decision-making processes, and social support systems.

Firstly we have medical concerns, which includes topic 0 and topic 2. Topic 0 focuses on practical advice and medical concerns, such as taking the abortion pill, managing periods, and accessing Plan B. This topic is dominated by terms like "pill," "period," "help," and "test," indicating that users frequently seek advice and share experiences related to the medical aspects of abortion. The high usage of these terms suggests that a large portion of the subreddit is dedicated to information exchange, especially surrounding the logistics and health implications of abortion. Topic 2 also emphasizes the topic of medical concerns with terms like "pregnancy," "test," "doctor," and "ultrasound." This topic captures discussions of confirming pregnancy, medical consultation, and decisions after a positive pregnancy test. This indicates how this subreddit is used for navigating complexities of reproductive health decisions.

The next topic, emotional responses, encapsulates topic 7 and topic 8. Topic 7 has words the represent users’ emotional struggles and fears with words such as “scared,” “tell,” and “baby.” This emphasizes the psychological and emotional dimensions of their experiences. The repeated use of "m" and "not" suggests a narrative of uncertainty and hesitation, further highlighting the emotional weight of the discussions. Topic 8 delves into the pain and discomfort that comes with “pain,” “bad”, and “try,” highlighting the physical discomfort that comes with abortion. Users use this subreddit to discuss their pain management and recovery experiences. This underscores the physical and emotional toll of an abortion and how users seek support and reassurance from this community.

Next is the decision-making group, including topic 6 and topic 5. Topic 6 handles the decision-making process with terms like “think,” “decision,” and “life.” We are able to capture the moral and ethical consideration that users weigh when deciding if an abortion is the right choice for them. The presence of words like “child” and “year” provide context that suggests users often consider abortion in the broader life contexts (such as future plans and responsibilities). Topic 5 social and political dimensions of abortion with terms like “woman,” “people,” “life,” and “choice.” The individual's experiences include a broader debate about women’s rights, bodily autonomy, and the social/societal implications of abortion. Additionally, the mention of  “experience,” “support,” and “pro” could indicate how users share their personal stories and engage in advocacy and support for others that are going through similar experiences. 

Topic 4 and topic 3 discuss the procedural and clinical experiences. In topic 4 we have terms like “procedure,” “nurse,” “room,” and “appointment,” suggesting that users frequently discuss their experience within the healthcare setting including logistics and their feelings during the procedure. There is a focus on “pain,” “minute,” and “experience” indicating these discussions are detailed and personal, giving specifics to the procedural aspects of an abortion. Topic 3 instead focuses on the physical aspects of getting an abortion, especially the aftermath, with words like “bleed,” “cramp,” “pain,” and “clot.” This indicates that users are sharing detailed experiences especially with medication-induced abortion. This topic emphasizes the importance of peer support and information-sharing in order to manage physical aspects of abortion.

Lastly, we have topic 9 which is communication and storytelling aspects including “tell,” “say,” “friend,” and “talk.” This topic tells us how communication and storytelling are crucial parts of r/abortion. This could be a means of coping with their experiences or even giving support when they did not have any. It’s clear that communication is an incredibly important part of this subreddit, allowing users to connect with each other through their shared experiences and collective advice.

The analysis of these topics provides a comprehensive view of the abortion discourse on r/abortion. The subreddit serves as a critical space for users to seek medical advice, share emotional experiences, navigate decision-making processes, and find social support. The diversity of topics highlights the multifaceted nature of abortion experiences, encompassing both the physical and emotional dimensions of reproductive health. By examining these discussions, we gain a deeper understanding of the personal narratives that shape the abortion discourse, revealing the complexities and nuances that characterize this deeply personal and often contentious issue.

### Word Embeddings

In [155]:
from IPython.display import IFrame

# Display the HTML plot in the notebook
IFrame("tsne_plot.html", width=800, height=600)

In [157]:
from IPython.display import IFrame

# Display the HTML plot in the notebook
IFrame("scatter_plot.html", width=800, height=600)

#target1 = ['pain' , 'blood' , 'hurt' , 'traumatic' , 'cramps' , 'bleeding' , 'discomfort' , 'procedure']
#target2 = ['anxiety' , 'scared' , 'fear' , 'stress' , 'grief' , 'guilt' , 'support' , 'help']

I want to note that I opted to import my diagrams from week 4 Word_Embeddings_Project file.

I wanted to focus on the difference between the physical process of an abortion and the more emotional aspects of an abortion. My target’s were:

**Target 1:** ['pain' , 'blood' , 'hurt' , 'traumatic' , 'cramps' , 'bleeding' , 'discomfort' , 'procedure'] 

**Target 2:** ['anxiety' , 'scared' , 'fear' , 'stress' , 'grief' , 'guilt' , 'support' , 'help']

**Target 1 biased words:** 'period', 'painful', 'cramping', 'heavy', 'period.', 'nausea', 'clots', 'bleed', 'bleeding.', 'regular', 'period,', 'cramps.', 'mild', 'bleeding,', 'menstrual', 'spotting'

**Target 2  biased words:** 'resources', 'support.', 'shame', 'respect', 'hate', 'support,', 'accept', 'encourage', 'supporting', 'manipulate', 'judgement', 'beliefs'

The biased words for Target 1 and Target 2 reflect two distinct yet intertwined aspects of the abortion experience. The physical process focuses on the pain, bleeding, and includes comparisons to menstruation. This emphasizes the tangible, challenging bodily experience that comes with abortion. On the other hand, the emotional aspects reveal a focus on the need for support, burden of shame, guilt, and societal judgment. This analysis suggests that the discourse on r/abortion is deeply concerned with both the physical and emotional complexities of abortion. Understanding these dimensions is crucial for addressing the needs of individuals undergoing the procedure, as it highlights the importance of both medical care and emotional support in the abortion experience.

When visualizing the word embeddings of the selected target words, a distinct pattern emerged between the two categories providing valuable insight into the discourse on r/abortion.

Target 1 words, representing the physical process of abortion, were predominantly clustered together in the top right quadrant of the graph, with the exception of “painful,” which was slightly removed from the cluster. This tight clustering suggests that discussions around the physical aspects of abortion, such as pain, bleeding, and discomfort, are highly interconnected and often occur in similar contexts within the subreddit. The proximity of these words to one another indicates that when users discuss one physical symptom, they frequently mention others, reflecting a focused and perhaps shared experience of the physical challenges of abortion.

In contrast, Target 2 words, which reflect the emotional and psychological dimensions of abortion, were more dispersed across the graph. This spread suggests a broader set of discussions related to emotional support and feelings. The avoidance of overlap with Target 1 words implies that conversations about emotional aspects are different from those about physical symptoms. This shows how users may compartmentalize these two dimensions of the abortion experience, reflecting the complexity and individuality of emotional responses to abortion. This also shows the different contexts in which these emotions are discussed—ranging from seeking support to grappling with feelings of guilt, fear, or stress.

This analysis reveals that within the r/abortion subreddit, discussions around the physical process of abortion tend to be concentrated and focused, as users frequently share experiences of pain, cramping, and bleeding in a tightly connected discourse. On the other hand, conversations about emotional and psychological aspects are more dispersed, reflecting a wider range of experiences and issues, from seeking support to dealing with complex emotions. This also shows us that the emotional feelings are not talked about together too frequently. When discussing pain, they name all the physical feelings but when it comes to emotions it isn’t as common to discuss all the feelings at once.
The divergence in the clustering patterns of Target 1 and Target 2 words supports the conclusion that the subreddit serves as a multifaceted space where users seek and share information about the physical realities of abortion but also navigate the diverse emotional experience. This separation highlights the complex nature of abortion, where both the body and mind are affected, and emphasizes the importance of creating spaces that address both of these aspects so that we can support individuals going through this process.

## Conclusions

This research has uncovered distinct patterns in the discourse on the r/abortion subreddit, revealing how users navigate both the physical and emotional dimensions of the abortion experience. Through topic modeling, we identified recurring themes such as medical concerns, emotional responses, decision-making processes, and social support systems. These themes demonstrate the multifaceted nature of the discussions that occurred on this subreddit. Users seeked advice on medical procedures while also dealing with the emotional weight of their decisions.

The word embeddings further highlighted the separation between discussions of physical symptoms and emotional experiences, suggesting that users compartmentalize these aspects of their abortion journeys. Physical concerns like pain and bleeding are tightly clustered, indicating a shared experience among users, while emotional aspects such as anxiety, guilt, and support are more dispersed, reflecting the individual and varied nature of these feelings.

These findings emphasize the importance of creating supportive spaces that address both the physical and emotional needs of individuals undergoing abortion. Future research could explore how emotional discourse evolves over time, especially in response to external factors such as legal changes or societal attitudes. Additionally, investigating how different demographic groups engage with this subreddit could provide further insights into the specific needs and concerns of more specific and diverse communities. This study lays the groundwork for a deeper understanding of abortion discourse in digital spaces and its potential to inform more compassionate and comprehensive support systems. We must better understand and serve those undergoing abortion, as we saw, people had extremely personal and unique experiences leading to an environment full of knowledge and collective support. Individuals need comprehensive support for any medical procedures, but especially abortion.

## References

Valdez, Daniel, et al. "Analyzing Reddit Forums Specific to Abortion That Yield Diverse Dialogues Pertaining to Medical Information Seeking and Personal Worldviews: Data Mining and Natural Language Processing Comparative Study." Journal of Medical Internet Research, vol. 26, 2024, p. e47408, https://doi.org/10.2196/47408.

Jacques, Leah, et al. “‘I'm Going to Be Forced to Have a Baby’: A Study of COVID-19 Abortion Experiences on Reddit.” Perspectives on Sexual and Reproductive Health, 2023, https://doi.org/10.1363/psrh.12225.

In [182]:
# This code cell will give you a word count, in case you need it. 
# Note that this opens the file you are currently in, so make sure to save it first to get an accurate word count.

with open('DIGHUM160_essay_Lopez.ipynb',encoding='utf-8') as json_file:
    data = json.load(json_file)

wordCount = 0
for each in data['cells']:
    cellType = each['cell_type']
    if cellType == "markdown":
        content = each['source']
        for line in content:
            # you might want to filter for more markdown keywords here
            temp = [word for word in line.split() if "#" not in word] 
            wordCount = wordCount + len(temp)
            
print(wordCount)

2640
