In [2]:
from legislation_analysis.utils.constants import (
    CONGRESS_DATA_CLEANED_FILE,
    CONGRESS_DATA_CLUSTERED_FILE,
    CONGRESS_DATA_NER_FILE,
    CONGRESS_DATA_POS_TAGGED_FILE,
    CONGRESS_DATA_TOKENIZED_FILE,
    SCOTUS_DATA_CLEANED_FILE,
    SCOTUS_DATA_CLUSTERED_FILE,
    SCOTUS_DATA_NER_FILE,
    SCOTUS_DATA_POS_TAGGED_FILE,
    SCOTUS_DATA_TOKENIZED_FILE,
)
from legislation_analysis.utils.functions import load_file_to_df

# From Roe to Dobbs: Tracing the Legislative Shift in Abortion Rights in the United States

## Rationale and Research Focus

This project aims to analyze the evolution of abortion discourse in U.S. legislation, 
particularly between the landmark Roe v. Wade (1973) and Dobbs v. Jackson (2022) 
decisions, which established and then rescinded abortion as a constitutional right, 
respectively. We have assembled a dataset of congressional legislation from 1973 to 
2024 alongside a dataset of opinions from 13 pivotal Supreme Court cases.

Our main research questions are as follows:

1. How, if at all, has abortion discourse evolved between the two landmark decisions?
2. What congressional legislation came as a direct result of, or in contradiction to, 
  abortion-related SCOTUS decisions?
3. What are the main arguments used in favor of and opposed to abortion?

This project will focus on two main datasets: abortion-related SCOTUS opinions and 
abortion-related congressional legislation. For congressional legislation, we will 
explore all legislation from 1973 until 2024. We sourced this legislation from the 
congress.gov legislation search, where we filtered for any legislation within this 
period that could have become bills and included the following keywords in the bill 
text or summary: 'abortion,' 'reproduction,' or 'reproductive health care.' For the
 SCOTUS abortion legislation, we targeted SCOTUS decisions outlined on 
`supreme.justia.com`, which provides a list of abortion-relevant SCOTUS decisions 
from 1965-2022. From here, we used web-scraping to extract the relevant information 
for each SCOTUS opinion, including opinion text.

## Methods

Based on our outlined questions, we expect to apply the following methods:

1. A **clustering method** to categorize the congressional legislation into three 
  groups: `pro-abortion`, `anti-abortion`, and `unrelated` (noise). We will review 
  and discard the unrelated legislation.
2. Similarly, a **clustering method** to categorize arguments within SCOTUS opinions 
  as either pro- or anti-abortion.
3. A **structural topic modeling** on the two relevant clusters to explore the 
  evolution of arguments for and against abortion over time. Due to the absence of 
  specific dates on the legislation, we will use the congressional session number as 
  a temporal reference.
4. A **named-entity recognition** model for extracting referenced individuals, other 
  congressional legislation, and most importantly, the relevant supreme court cases.

We plan to use them as benchmarks to categorize legislative sessions. For example, if 
a case on abortion was adjudicated in 2010 and another in 2015, we would assess the 
semantic, tonal, and linguistic similarities among all congressional legislation 
enacted between those years. Our hypothesis posits that the substance of Supreme Court 
opinions significantly influences abortion-related legislation from the time of their 
issuance until a subsequent opinion emerges. We will also determine which opinions, 
aside from Roe and Dobbs, exert the most influence on legislative documents. Finally, 
as a stretch goal, we will create a network graph that centers around the SCOTUS 
opinions and links out to pieces of congressional legislation.

## Analysis

[small blurb]

## Results

## Conclusions and Reflections

## Step-by-Step Processing

In [3]:
# Raw Congressional legislation DataFrame preview

In [4]:
# Raw SCOTUS decision DataFrame preview

In [5]:
# Cleaned Congressional legislation DataFrame preview
cleaned_cong_df = load_file_to_df(CONGRESS_DATA_CLEANED_FILE)
cleaned_cong_df[
    ["legislation number", "title", "raw_text", "cleaned_text"]
].head()

Unnamed: 0,legislation number,title,raw_text,cleaned_text
0,H.R. 2907,Let Doctors Provide Reproductive Health Care Act,\n[Congressional Bills 118th Congress]\n[From ...,[Congressional Bills 118th Congress] [From the...
1,S. 1297,Let Doctors Provide Reproductive Health Care Act,\n[Congressional Bills 118th Congress]\n[From ...,[Congressional Bills 118th Congress] [From the...
2,H.R. 4901,Reproductive Health Care Accessibility Act,\n[Congressional Bills 118th Congress]\n[From ...,[Congressional Bills 118th Congress] [From the...
3,S. 2544,Reproductive Health Care Accessibility Act,\n[Congressional Bills 118th Congress]\n[From ...,[Congressional Bills 118th Congress] [From the...
4,H.R. 4147,Reproductive Health Care Training Act of 2023,\n[Congressional Bills 118th Congress]\n[From ...,[Congressional Bills 118th Congress] [From the...


In [6]:
# Cleaned SCOTUS decision DataFrame preview
cleaned_scotus_df = load_file_to_df(SCOTUS_DATA_CLEANED_FILE)
cleaned_scotus_df[["title", "raw_text", "cleaned_text"]].head()

Unnamed: 0,title,raw_text,cleaned_text
0,Dobbs v. Jackson Women's Health Organization,\n \n \n \n \n \n \n \n \n \n \n ...,"1 (Slip Opinion) OCTOBER TERM, 2021 Syllabus N..."
1,Whole Woman's Health v. Hellerstedt,\n \n \n \n \n \n \n \n \n \n \n \n \...,"1 (Slip Opinion) OCTOBER TERM, 2015 Syllabus N..."
2,Gonzales v. Carhart,"(Bench Opinion) OCTOBER TERM, 2006 1 \n \nSyl...","(Bench Opinion) OCTOBER TERM, 2006 1 Syllabus ..."
3,Stenberg v. Carhart,530US2 Unit: $U85 [11-21-01 16:51:50] PAGES PG...,530US2 Unit: U85 [11-21-01 16:51:50] PAGES PGT...
4,Planned Parenthood of Southeastern Pennsylvani...,505us3u117 07-09-96 09:34:02 PAGES OPINPGT\n83...,505us3u117 07-09-96 09:34:02 PAGES OPINPGT 833...


In [7]:
# Tokenized Congressional legislation DataFrame preview
tokenized_cong_df = load_file_to_df(CONGRESS_DATA_TOKENIZED_FILE)
tokenized_cong_df[
    [
        "legislation number",
        "title",
        "cleaned_text",
        "tokenized_text_sents",
        "tokenized_text_words",
        "tokenized_text_words_norm",
    ]
].head()

Unnamed: 0,legislation number,title,cleaned_text,tokenized_text_sents,tokenized_text_words,tokenized_text_words_norm
0,H.R. 2907,Let Doctors Provide Reproductive Health Care Act,[Congressional Bills 118th Congress] [From the...,"[[congressional bills 118th congress], [from t...","[congressional, bills, congress, u., s., gover...","[congressional, bill, congress, u., s., govern..."
1,S. 1297,Let Doctors Provide Reproductive Health Care Act,[Congressional Bills 118th Congress] [From the...,"[[congressional bills 118th congress], [from t...","[congressional, bills, congress, u., s., gover...","[congressional, bill, congress, u., s., govern..."
2,H.R. 4901,Reproductive Health Care Accessibility Act,[Congressional Bills 118th Congress] [From the...,"[[congressional bills 118th congress], [from t...","[congressional, bills, congress, u., s., gover...","[congressional, bill, congress, u., s., govern..."
3,S. 2544,Reproductive Health Care Accessibility Act,[Congressional Bills 118th Congress] [From the...,"[[congressional bills 118th congress], [from t...","[congressional, bills, congress, u., s., gover...","[congressional, bill, congress, u., s., govern..."
4,H.R. 4147,Reproductive Health Care Training Act of 2023,[Congressional Bills 118th Congress] [From the...,"[[congressional bills 118th congress], [from t...","[congressional, bills, congress, u., s., gover...","[congressional, bill, congress, u., s., govern..."


In [8]:
# Tokenized SCOTUS decision DataFrame preview
tokenized_scotus_df = load_file_to_df(SCOTUS_DATA_TOKENIZED_FILE)
tokenized_scotus_df[
    [
        "title",
        "cleaned_text",
        "tokenized_text_sents",
        "tokenized_text_words",
        "tokenized_text_words_norm",
    ]
].head()

Unnamed: 0,title,cleaned_text,tokenized_text_sents,tokenized_text_words,tokenized_text_words_norm
0,Dobbs v. Jackson Women's Health Organization,"1 (Slip Opinion) OCTOBER TERM, 2021 Syllabus N...","[1 (slip opinion) october term, 2021 syllabus ...","[slip, opinion, october, term, syllabus, note,...","[slip, opinion, october, term, syllabus, note,..."
1,Whole Woman's Health v. Hellerstedt,"1 (Slip Opinion) OCTOBER TERM, 2015 Syllabus N...","[1 (slip opinion) october term, 2015 syllabus ...","[slip, opinion, october, term, syllabus, note,...","[slip, opinion, october, term, syllabus, note,..."
2,Gonzales v. Carhart,"(Bench Opinion) OCTOBER TERM, 2006 1 Syllabus ...","[(bench opinion) october term, 2006 1 syllabus...","[bench, opinion, october, term, syllabus, note...","[bench, opinion, october, term, syllabus, note..."
3,Stenberg v. Carhart,530US2 Unit: U85 [11-21-01 16:51:50] PAGES PGT...,"[530us2 unit: u85, [11-21-01 16:51:50] pages p...","[530us2, unit, u85, 16:51:50, pages, pgt, o, p...","[530us2, unit, u85, 16:51:50, page, pgt, o, pi..."
4,Planned Parenthood of Southeastern Pennsylvani...,505us3u117 07-09-96 09:34:02 PAGES OPINPGT 833...,[505us3u117 07-09-96 09:34:02 pages opinpgt 83...,"[505us3u117, 09:34:02, pages, opinpgt, october...","[505us3u117, 09:34:02, page, opinpgt, october,..."


In [9]:
# PoS-tagged Congressional legislation DataFrame preview
pos_cong_df = load_file_to_df(CONGRESS_DATA_POS_TAGGED_FILE)
pos_cong_df[
    [
        "legislation number",
        "title",
        "tokenized_text_words_norm",
        "summary_pos_tags_of_interest",
    ]
].head()

Unnamed: 0,legislation number,title,tokenized_text_words_norm,summary_pos_tags_of_interest
0,H.R. 2907,Let Doctors Provide Reproductive Health Care Act,"[congressional, bill, congress, u., s., govern...","[Let, Doctors, Provide, bill, sets, protection..."
1,S. 1297,Let Doctors Provide Reproductive Health Care Act,"[congressional, bill, congress, u., s., govern...","[Let, Doctors, Provide, bill, sets, protection..."
2,H.R. 4901,Reproductive Health Care Accessibility Act,"[congressional, bill, congress, u., s., govern...","[bill, establishes, various, grants, related, ..."
3,S. 2544,Reproductive Health Care Accessibility Act,"[congressional, bill, congress, u., s., govern...","[bill, establishes, various, grants, related, ..."
4,H.R. 4147,Reproductive Health Care Training Act of 2023,"[congressional, bill, congress, u., s., govern...",


In [10]:
# PoS-tagged SCOTUS decision DataFrame preview
pos_scotus_df = load_file_to_df(SCOTUS_DATA_POS_TAGGED_FILE)
pos_scotus_df[
    ["title", "tokenized_text_words_norm", "text_pos_tags_of_interest"]
].head()

Unnamed: 0,title,tokenized_text_words_norm,text_pos_tags_of_interest
0,Dobbs v. Jackson Women's Health Organization,"[slip, opinion, october, term, syllabus, note,...","[feasible, syllabus, headnote, released, done,..."
1,Whole Woman's Health v. Hellerstedt,"[slip, opinion, october, term, syllabus, note,...","[feasible, syllabus, headnote, released, done,..."
2,Gonzales v. Carhart,"[bench, opinion, october, term, syllabus, note...","[NOTE, feasible, syllabus, headnote, released,..."
3,Stenberg v. Carhart,"[530us2, unit, u85, 16:51:50, page, pgt, o, pi...","[Unit, STEN, eighth, circuit, Argued, offers, ..."
4,Planned Parenthood of Southeastern Pennsylvani...,"[505us3u117, 09:34:02, page, opinpgt, october,...","[PLANNED, PARENTHOOD, appeals, third, circuit,..."


In [11]:
# Clustered Congressional legislation DataFrame preview
clustered_cong_df = load_file_to_df(CONGRESS_DATA_CLUSTERED_FILE)
clustered_cong_df[
    [
        "legislation number",
        "title",
        "summary_pos_tags_of_interest",
        "hc_clusters",
        "hw_clusters",
        "knn_clusters",
    ]
].head()

Unnamed: 0,legislation number,title,summary_pos_tags_of_interest,hc_clusters,hw_clusters,knn_clusters
0,H.R. 2907,Let Doctors Provide Reproductive Health Care Act,"[Let, Doctors, Provide, bill, sets, protection...",18,27,15
1,S. 1297,Let Doctors Provide Reproductive Health Care Act,"[Let, Doctors, Provide, bill, sets, protection...",18,27,15
2,H.R. 4901,Reproductive Health Care Accessibility Act,"[bill, establishes, various, grants, related, ...",19,27,23
3,S. 2544,Reproductive Health Care Accessibility Act,"[bill, establishes, various, grants, related, ...",19,27,23
4,H.R. 4147,Reproductive Health Care Training Act of 2023,,7,22,6


In [12]:
# Clustered SCOTUS decision DataFrame preview
clustered_scotus_df = load_file_to_df(SCOTUS_DATA_CLUSTERED_FILE)
clustered_scotus_df[
    [
        "title",
        "text_pos_tags_of_interest",
        "hc_clusters",
        "hw_clusters",
        "knn_clusters",
    ]
].head()

Unnamed: 0,title,text_pos_tags_of_interest,hc_clusters,hw_clusters,knn_clusters
0,Dobbs v. Jackson Women's Health Organization,"[feasible, syllabus, headnote, released, done,...",2,2,2
1,Whole Woman's Health v. Hellerstedt,"[feasible, syllabus, headnote, released, done,...",1,1,2
2,Gonzales v. Carhart,"[NOTE, feasible, syllabus, headnote, released,...",3,3,4
3,Stenberg v. Carhart,"[Unit, STEN, eighth, circuit, Argued, offers, ...",6,6,4
4,Planned Parenthood of Southeastern Pennsylvani...,"[PLANNED, PARENTHOOD, appeals, third, circuit,...",1,1,2


In [19]:
# NER'ed Congressional legislation DateFrame preview
ner_cong_df = load_file_to_df(CONGRESS_DATA_NER_FILE)
ner_cong_df[
    [
        "legislation number",
        "title",
        "cleaned_text",
        "cleaned_summary",
        "cleaned_text_ner",
        "cleaned_summary_ner",
    ]
].head()

Unnamed: 0,legislation number,title,cleaned_text,cleaned_summary,cleaned_text_ner,cleaned_summary_ner
0,H.R. 2907,Let Doctors Provide Reproductive Health Care Act,[Congressional Bills 118th Congress] [From the...,Let Doctors Provide Reproductive Health Care A...,"[[Congressional Bills 118th Congress, ORG], [t...","[[1, CARDINAL], [2, CARDINAL], [The Department..."
1,S. 1297,Let Doctors Provide Reproductive Health Care Act,[Congressional Bills 118th Congress] [From the...,Let Doctors Provide Reproductive Health Care A...,"[[Congressional Bills 118th Congress, ORG], [t...","[[1, CARDINAL], [2, CARDINAL], [The Department..."
2,H.R. 4901,Reproductive Health Care Accessibility Act,[Congressional Bills 118th Congress] [From the...,Reproductive Health Care Accessibility Act Thi...,"[[Congressional Bills 118th Congress, ORG], [t...","[[Reproductive Health Care Accessibility Act, ..."
3,S. 2544,Reproductive Health Care Accessibility Act,[Congressional Bills 118th Congress] [From the...,Reproductive Health Care Accessibility Act Thi...,"[[Congressional Bills 118th Congress, ORG], [t...","[[Reproductive Health Care Accessibility Act, ..."
4,H.R. 4147,Reproductive Health Care Training Act of 2023,[Congressional Bills 118th Congress] [From the...,,"[[Congressional Bills 118th Congress, ORG], [t...",[]


In [21]:
# NER'ed SCOTUS decision DateFrame preview
ner_scotus_df = load_file_to_df(SCOTUS_DATA_NER_FILE)
ner_scotus_df[
    [
        "title",
        "cleaned_text",
        "cleaned_text_ner",
    ]
].head()

Unnamed: 0,title,cleaned_text,cleaned_text_ner
0,Dobbs v. Jackson Women's Health Organization,"1 (Slip Opinion) OCTOBER TERM, 2021 Syllabus N...","[[1, CARDINAL], [OCTOBER, DATE], [2021, DATE],..."
1,Whole Woman's Health v. Hellerstedt,"1 (Slip Opinion) OCTOBER TERM, 2015 Syllabus N...","[[1, CARDINAL], [OCTOBER, DATE], [2015, DATE],..."
2,Gonzales v. Carhart,"(Bench Opinion) OCTOBER TERM, 2006 1 Syllabus ...","[[Bench Opinion, PERSON], [2006 1, DATE], [Cou..."
3,Stenberg v. Carhart,530US2 Unit: U85 [11-21-01 16:51:50] PAGES PGT...,"[[530US2, PERSON], [U85, ORG], [11-21-01, DATE..."
4,Planned Parenthood of Southeastern Pennsylvani...,505us3u117 07-09-96 09:34:02 PAGES OPINPGT 833...,"[[07-09-96, DATE], [833, CARDINAL], [OCTOBER, ..."


In [None]:
# Topic modeled Congressional legislation DateFrame preview

In [None]:
# DYN topic modeled Congressional legislation DateFrame preview

In [None]:
# Topic modeled SCOTUS decision DateFrame preview