## Supreme Court Project Guide

The ultimate goal of this project is to build a database of Supreme Court cases for 2020 (or a different range of years) that includes the dialogue from the oral arguments of each case. As we have seen in class the arguments were scraped from this page: https://www.supremecourt.gov/oral_arguments/argument_transcript.aspx 

See if you can follow that guide to downloading and transforming pdfs to texts (don't be shy on slack!)

Once you have a folder of texts transcripts there are three primary programmatic steps that you need to complete:

**Please note:** Step 3 is the most challenging--if you want to spend some time coding, you can skip Steps 1 and 2 and get to work on Step 3

**STEP 1:** scrape all of the case information available on this page: https://www.supremecourt.gov/oral_arguments/argument_transcript/2020

This should include case name, docket number, etc--and most importantly the name of the PDF file. All of the text files share the exact same name as the PDF files they came from. This file name will allow you to connect your transcription data with your case data. 

It is up to you what kind data structure you want to build. But it likely to be a list of lists, or list of dictionaries--for each case you will have a list or dictionary of the information you scrape from the webpage.

**STEP 2:** find secondary source(s) to scrape/integrate with your case data. The information on the Supreme Court page is very limited. You need to find a source or group of sources that ad information. The most important information would likely be: the decision, who voted for and against, and the district court origin of the case (for geocoding). You might think of other great things to put in there too! This information needs to be merged with the data you have from STEP 2.

**STEP 3:** use regular expressions to clean up and parse the text files so that you have a searchable data structure containing the dialog from the transcripts. 

**Data Architecture** 
You will need to think about how you will set up, separate, and join different tables that you create. The initial scraping will give you very simple dataframe: the columns will be dockett, case name, date argued, and PDF name. The regex work on the PDFs should result in a very simple table (or just a list of tuples) of speaker name and dialogue. 

`[('MR. BERGERON'," Yes. That's essentially the same thing"),('JUSTICE SOTOMAYOR',' So how do you deal with Chambers?')]`

But make sure you attach the docket number or pdf filename to each set of arguments you transform using regex. Your secondary sources and information should be linked by docket number, but the question is how to set up those data frames, join them, aggregate them, and narrow them to the fields necessary for presentation.

Go step-by-step through this, and DM me on Slack whenever you get stuck, and I will help. If you complete all the steps before Friday, Slack me if you want to go further.

**Interpretive Architecture**
Also consider what kind of interpretive categories you can add through your reading and research. At the very least, it is recommended that you come up with categories for the kinds of cases that are before the court: human clustering for meaning is always more effective than computational clustering. Try to come up with perhaps 8 to 10 domains that groups of cases might belong to. But also think of other ways of categorizing these cases or these decisions--by politics, by consequences on citizens (you could make a scale from 1 to 10), even an aggregated index of consequences/effects on different types of communities, sectors, regions, etc. 

You are the researcher, these categories or ways of expressing your point-of-view.



### STEP 1
Scrape all of the necessary information from:

https://www.supremecourt.gov/oral_arguments/argument_transcript/2020

This should result in a list of dictionaries for each case.

In [1293]:
import requests
from bs4 import BeautifulSoup
from urllib.request import unquote

import pandas as pd

import time

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

from webdriver_manager.chrome import ChromeDriverManager

In [1294]:
###Write your scraping code here
my_url = "https://www.supremecourt.gov/oral_arguments/argument_transcript/2020"
raw_html = requests.get(my_url).content



In [1295]:
soup_doc = BeautifulSoup(raw_html, "html.parser")


In [1296]:
all_tables = soup_doc.find_all(class_='table table-bordered')

all_cases=[]

for table in all_tables:
    all_rows = table.find_all('tr')
    for row in all_rows:
        each_case = []
        
        spans = row.find_all('span') 
        tags = row.find_all('a')
        for tag in tags:
            url = tag.get('href')
            each_case.append(url)
        
        for data in spans:
            each_case.append(data.string)
        all_cases.append(each_case)
  

b = []


for case in all_cases:
    dic = {}
    if case:
        dic['pdf'] = case[0]
        dic['docket'] = case[1]
        dic['title'] = case[2]
        
        b.append(dic)
        
b
df = pd.DataFrame(b)
df

Unnamed: 0,pdf,docket,title
0,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation
1,../argument_transcripts/2020/20-315_l647.pdf,20-315,Santos Sanchez v. Mayorkas
2,../argument_transcripts/2020/19-8709_5hek.pdf,19-8709,Greer v. United States
3,../argument_transcripts/2020/20-444_5i26.pdf,20-444,United States v. Gary
4,../argument_transcripts/2020/20-334_p86b.pdf,20-334,"San Antonio v. Hotels.com, L.P."
5,../argument_transcripts/2020/20-440_k5fm.pdf,20-440,"Minerva Surgical, Inc. v. Hologic, Inc."
6,../argument_transcripts/2020/19-251_h3ci.pdf,19-251,Americans for Prosperity Foundation v. Bonta
7,../argument_transcripts/2020/20-382_4f14.pdf,20-382,Guam v. United States
8,../argument_transcripts/2020/20-472_bp7c.pdf,20-472,"Hollyfrontier Cheyenne Refining, LLC v. Renewable Fuels Assn."
9,../argument_transcripts/2020/20-437_n758.pdf,20-437,United States v. Palomar-Santiago


### STEP 2 
Scrape the additional source(s)

For this you need to do research and try to find sources that will give you useful information that you can add to the table/dictionary you created in Step 1.

Here are some recommended sources that you can scrape and add to your data. You do not need to scrape all of these, and you may want to look for other sources that are useful.

Geographical locations:
https://system.uslegal.com/us-courts-of-appeals/

Transcripts by year
https://www.supremecourt.gov/oral_arguments/argument_transcript/2017

Dockets buy circuit court (I recommend at least this one):
https://www.supremecourt.gov/orders/ordersbycircuit/ordercasebycircuit/061118OrderCasesByCircuit

Dockett information by case:
https://www.supremecourt.gov/search.aspx?filename=/docket/docketfiles/html/public/17-7919.html

Opinions (as seen in Homework 3):
https://www.supremecourt.gov/opinions/slipopinion/17

In [1297]:
##Try to do everything yourself

#Code away!
all_dockets = df.docket #this takes all the dockett numbers from a different dataframe 

for docket in all_dockets:
    print(docket)

all_href = [] #looping through the urls 
for docket_num in all_dockets:
     href = "https://www.supremecourt.gov/search.aspx?filename=/docket/docketfiles/html/public/" + docket_num + ".html"
     all_href.append(href)
    
all_data = []

for href in all_href:
    data = {}
    my_url = href
    raw_html = requests.get(my_url).content
    soup_doc = BeautifulSoup(raw_html, "html.parser")
    all_tables = soup_doc.find_all(class_='table')
    #data['docket'] = soup_doc.find(class_='DocketInfoTitle')
    docket = soup_doc.find(class_='DocketInfoTitle')
    
    if docket is not None:
        data['docket'] = docket.text
    else:
        data['docket'] = "missing"

    for table in all_tables:
        span_tag = table.find_all('span')
        data['location'] = span_tag[9].text
    all_data.append(data)


df_lower_courts = pd.DataFrame(all_data)
df_lower_courts 





20-543
20-315
19-8709
20-444
20-334
20-440
19-251
20-382
20-472
20-437
20-255
19-1039
20-5904
20-107
19-1414
20-157
20-222
20-297
20-512
142-Orig
19-1155
20-18
19-1434
19-1257
19-1442
19-897
19-968
19-508
19-1231
19-1189
20-366
19-783
19-416
19-930
19-5807
18-1447
19-351
19-511
19-963
19-422
19-547
19-199
18-1259
19-5410
19-123
19-863
19-546
19-840
19-309
65-Orig
18-540
19-71
18-956
19-368
19-108
19-357
19-292
19-438


Unnamed: 0,docket,location
0,No. 20-543,
1,No. 20-315,United States Court of Appeals for the Third Circuit
2,No. 19-8709,United States Court of Appeals for the Eleventh Circuit
3,No. 20-444,United States Court of Appeals for the Fourth Circuit
4,No. 20-334,United States Court of Appeals for the Fifth Circuit
5,No. 20-440,United States Court of Appeals for the Federal Circuit
6,No. 19-251,United States Court of Appeals for the Ninth Circuit
7,No. 20-382,United States Court of Appeals for the District of Columbia Circuit
8,No. 20-472,United States Court of Appeals for the Tenth Circuit
9,No. 20-437,United States Court of Appeals for the Ninth Circuit


In [1298]:
pd.set_option('display.max_colwidth', None)
df_lower_courts

Unnamed: 0,docket,location
0,No. 20-543,
1,No. 20-315,United States Court of Appeals for the Third Circuit
2,No. 19-8709,United States Court of Appeals for the Eleventh Circuit
3,No. 20-444,United States Court of Appeals for the Fourth Circuit
4,No. 20-334,United States Court of Appeals for the Fifth Circuit
5,No. 20-440,United States Court of Appeals for the Federal Circuit
6,No. 19-251,United States Court of Appeals for the Ninth Circuit
7,No. 20-382,United States Court of Appeals for the District of Columbia Circuit
8,No. 20-472,United States Court of Appeals for the Tenth Circuit
9,No. 20-437,United States Court of Appeals for the Ninth Circuit


In [1299]:
#some of the docket numbers were messed up so had to manually fix it in a spreadsheet and inport it in as a csv file so that I can join the two data frames 
import pandas as pd

df_lower_courts = pd.read_csv('lower_courts.csv')
df

Unnamed: 0,pdf,docket,title
0,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation
1,../argument_transcripts/2020/20-315_l647.pdf,20-315,Santos Sanchez v. Mayorkas
2,../argument_transcripts/2020/19-8709_5hek.pdf,19-8709,Greer v. United States
3,../argument_transcripts/2020/20-444_5i26.pdf,20-444,United States v. Gary
4,../argument_transcripts/2020/20-334_p86b.pdf,20-334,"San Antonio v. Hotels.com, L.P."
5,../argument_transcripts/2020/20-440_k5fm.pdf,20-440,"Minerva Surgical, Inc. v. Hologic, Inc."
6,../argument_transcripts/2020/19-251_h3ci.pdf,19-251,Americans for Prosperity Foundation v. Bonta
7,../argument_transcripts/2020/20-382_4f14.pdf,20-382,Guam v. United States
8,../argument_transcripts/2020/20-472_bp7c.pdf,20-472,"Hollyfrontier Cheyenne Refining, LLC v. Renewable Fuels Assn."
9,../argument_transcripts/2020/20-437_n758.pdf,20-437,United States v. Palomar-Santiago


In [1300]:
join = pd.merge(df, df_lower_courts, how ='left')
join

Unnamed: 0,pdf,docket,title,location
0,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit
1,../argument_transcripts/2020/20-315_l647.pdf,20-315,Santos Sanchez v. Mayorkas,United States Court of Appeals for the Third Circuit
2,../argument_transcripts/2020/19-8709_5hek.pdf,19-8709,Greer v. United States,United States Court of Appeals for the Eleventh Circuit
3,../argument_transcripts/2020/20-444_5i26.pdf,20-444,United States v. Gary,United States Court of Appeals for the Fourth Circuit
4,../argument_transcripts/2020/20-334_p86b.pdf,20-334,"San Antonio v. Hotels.com, L.P.",United States Court of Appeals for the Fifth Circuit
5,../argument_transcripts/2020/20-440_k5fm.pdf,20-440,"Minerva Surgical, Inc. v. Hologic, Inc.",United States Court of Appeals for the Federal Circuit
6,../argument_transcripts/2020/19-251_h3ci.pdf,19-251,Americans for Prosperity Foundation v. Bonta,United States Court of Appeals for the Ninth Circuit
7,../argument_transcripts/2020/20-382_4f14.pdf,20-382,Guam v. United States,United States Court of Appeals for the District of Columbia Circuit
8,../argument_transcripts/2020/20-472_bp7c.pdf,20-472,"Hollyfrontier Cheyenne Refining, LLC v. Renewable Fuels Assn.",United States Court of Appeals for the Tenth Circuit
9,../argument_transcripts/2020/20-437_n758.pdf,20-437,United States v. Palomar-Santiago,United States Court of Appeals for the Ninth Circuit


### STEP 3
Here we go: the text files that were extracted from the PDFs are quite messy, you do not need to get them perfect, but you need to clean them up enough so that you can zone in on the arguments themselves. Below I take you step-by-step through what you need to do. In the end you want to have a separate list for each case that contains the speaker and the dialogue attached to that speaker.

**Step 1:** Download the text files from courseworks.

Make sure they are locally on your computer. 

Open up the text files in a text editor like sublime, and carefully look at the problems with the files. How will you clean this up?

**Step 2:** Eventually you will want to loop through all of the text files and run the cleanup on all of them. But first just select one text file to open up and begin cleaning up.

In [1301]:
#Import the regular expression library
import re

In [1302]:

text_files = [
    "20-543_hgciNEW",
"20-315_l647NEW",
"19-8709_5hekNEW",
"20-444_5i26NEW",
"20-334_p86bNEW",
"20-440_k5fmNEW",
"19-251_h3ciNEW",
"20-382_4f14NEW",
"19-1039_o7jqNEW",
"20-472_bp7cNEW",
"20-437_n758NEW",
"20-255_869dNEW",
"20-5904_1bn2NEW",
"20-107_n758NEW",
"19-1414_p86bNEW",
"20-157_5i36NEW",
"20-222_3fbhNEW",
"20-297_3ea4NEW",
"20-512_g314NEW",
"142-orig_2_3ebhNEW",
"19-1155_6537NEW",
"20-18_986bNEW",
"19-1434_e1p3NEW",
"19-1257_1b7dNEW",
"19-1442_9o6bNEW",
"19-897_l537NEW",
"19-968_6kh7NEW",
"19-508_3f14NEW",
"19-1231_9ol1NEW",
"19-1189_k53mNEW",
"20-366_7lhoNEW",
"19-783_2d8fNEW",
"19-416_6k47NEW",
"19-930_c07eNEW",
"19-5807_i4djNEW",
"18-1447_apl1NEW",
"19-351_d0fiNEW",
"19-511_l537NEW",
"19-963_2c8fNEW",
"19-422_4gdjNEW",
"19-547_c07dNEW",
"19-199_m6hnNEW",
"18-1259_e2p3NEW",
"19-5410_8n59NEW",
"19-123_o758NEW",
"19-863_k5gmNEW",
"19-546_2d9gNEW",
"19-840_1a72NEW",
"19-309_4425NEW",
"65-orig_7l48NEW",
"18-540_8njqNEW",
"19-71_e2q3NEW",
"18-956_2dp3NEW",
"19-368_m648NEW",
"19-108_e1p3NEW",
"19-357_2b35NEW",
"19-292_5hdkNEW",
"19-438_q713NEW",
]

res = []
for text in text_files:
    f = open('/Users/sarahgrevy/Documents/Data_Journalism/databases/final_project/txt_files/' + text + ".txt", 'r')
    transcript = f.read()
    all_docekets = re.findall(r"\d\d-.*",transcript,re.M) 
    docket = all_docekets[0]
    

    transcript =re.sub(r"\n"," ", transcript)
    transcript_clean = re.sub('Heritage Reporting Corporation',  '',    transcript)        
    transcript_clean = re.sub("  \d+ ", " ", transcript_clean)
    transcript_clean = transcript_clean.split(".m.)   ")
    transcript_clean = transcript_clean[1]
    transcript_clean = transcript_clean.split("was submitted.)")
    transcript_clean = transcript_clean[0]
    clean_text = re.split(r'([A-Z][A-Z ]+:)', transcript_clean)
    clean_text.pop(0)
    languages = clean_text[::2]
    speakers = clean_text[1::2]

    X = languages
    Y = speakers

    final_list = ([i for i in zip(X, Y)])

    df = pd.DataFrame(final_list)
    df['docket'] = docket
    df.columns.values[0] = "speaker"
    df.columns.values[1] = "text"
    res.append(df)


appended_data = pd.concat(res)
appended_data


Unnamed: 0,speaker,text,docket
0,CHIEF JUSTICE ROBERTS:,"We will hear argument first this morning in Case 20-543, Yellen versus the Confederated Tribes, and the consolidated case. Mr. Guarnieri. ORAL ARGUMENT OF MATTHEW GUARNIERI ON BEHALF OF THE PETITIONER IN CASE NO. 20-543 MR.",20-543
1,GUARNIERI:,"Mr. Chief Justice, and may it please the Court: Our fundamental submission in this case is that in defining ""Indian Tribe"" for ISDA purposes, Congress did not deliberately include Alaska native regional and village corporations only to then exclude all of them by subjecting them to a formal political recognition requirement that no ANC meets or, indeed, has ever met. Instead, the settled understanding for the last 45 years has been that ANCs are eligible to be treated as Indian Tribes for ISDA purposes, even though ANCs are not and have never been federally recognized Indian Tribes. That interpretation has been endorsed by all Official - Subject to Final Review three branches of the federal government. Congress was acting against the backdrop of those settled understandings when it incorporated the ISDA definition of ""Indian Tribe"" into the CARES Act in 2020. Congress chose to make ANCs eligible to receive millions of dollars of coronavirus relief funds to benefit the many Alaska natives whom they serve. The decision below contravenes that policy judgment and threatens to shut ANCs out of a wide range of important federal programs. No sound principle of textual interpretation justifies such a dramatic departure from the status quo. Reading the ISDA definition to mean that ANCs are included only in the event that they are someday somehow recognized by the United States for government-to-government relations would render their deliberate inclusion in the statute a dead letter. Either the recognition clause must mean something else, or it does not apply to ANCs. Now we principally urge the latter approach, which the Department of the Interior and the Indian Health Service adopted decades ago and which the Ninth Circuit endorsed in the Official - Subject to Final Review Cook Inlet case. In our view, Congress defined the entities eligible to enter into ISDA agreements as federally recognized Indian Tribes and also, in addition, the entities that play a similar role in the special case of Alaska, namely, Alaska native villages and Alaska native corporations defined in and established pursuant to ANCSA. That reading, unlike Respondents' reading, gives effect to every word and clause in the statute. I welcome the Court's questions.",20-543
2,CHIEF JUSTICE ROBERTS:,"Counsel, as I think you confirmed in this opening statement, you rely heavily on the legislative history, the congressional purpose, the post-enactment history, and there was a time when this Court also relied on those sources, but this -- this is not that time. And what is the best case you can cite from recent years for your -- your general approach? MR.",20-543
3,GUARNIERI:,"Well, I think the case that -- that we find the most instructive is the Court's decision against -- in United States Official - Subject to Final Review against Hayes, which is the case discussed in our opening brief. In Hayes, the Court was considering a statutory definition of the term ""misdemeanor"" -- ""domestic misdemeanor violence"" -- or, sorry, ""misdemeanor crime of domestic violence,"" and the -- the statutory definition there had a prefatory clause and then two subsections, and the question before the Court was how to apply a modifier in the second subsection. And based on textual and contextual evidence, the Court concluded that the modifier that appeared in the second subclause of that definition actually applied to its -- its antecedent was one of the words in the prefatory clause at the beginning of the definition. And we think we're asking the Court here for -- for an even less sort of -- the interpretation that we're urging here is even more naturally sort of derived from the text than the interpretation the Court adopted in Hayes. And also, you know, to your -- to your point, Mr. Chief Justice, I mean, we are making a textual argument. It's not entirely Official - Subject to Final Review purposive. And -- and it's a text -- it's a textual argument derived from ISDA's definition, as well as from the other statutes that Congress has enacted that in their text presuppose that ANCs are eligible to be treated as Indian Tribes.",20-543
4,CHIEF JUSTICE ROBERTS:,"Thank you, counsel. Justice Thomas.",20-543
...,...,...,...
170,CHIEF JUSTICE ROBERTS:,"A minute to wrap up, Mr. Bond. MR.",19-438
171,BOND:,"Thank you, Mr. Chief Justice. Petitioner's basic argument is that Official - Subject to Final Review you should start with decisions that put a gloss on various other statutes and retrofit this statute to match. We submit that that is backwards. The Court should start with the governing statutory text, and, here, that text answers the question presented by putting the burden of proving eligibility on the alien, including a lack of disqualifying convictions. Now, in our view, Congress's judgment is compatible with this Court's precedent addressing the categorical and modified approaches, but if there were any inconsistency or tension, it should be resolved in favor of the language Congress enacted to address this particular issue. The court of appeals' decision should be affirmed.",19-438
172,CHIEF JUSTICE ROBERTS:,"Thank you, counsel. Three minutes for rebuttal. REBUTTAL ARGUMENT OF BRIAN P. GOLDMAN ON BEHALF OF THE PETITIONER MR.",19-438
173,GOLDMAN:,"Thank you, Mr. Chief Justice. I'll try to make four points quickly. Official - Subject to Final Review First, I agree with my friend on the other side that this is an issue of statutory interpretation. The Congress passed two provisions. One uses the term ""conviction"" that embraces the least acts presumption, which Congress understood serves the important functions that we've discussed. Separately, Congress passed a burden of proof. But the two are not at war. A non-citizen can satisfy his burden by invoking the presumption, as is common in the law. And the REAL ID Act did not suspend a 100-year-old presumption. Second, Justice Gorsuch asked me in the opening argument about the burden of production, and that came up in the last half hour. And I would just emphasize that the government produced the documents here. Page 2a of our blue brief has the certification of the immigration officer. And that wasn't an act of generosity here. That is what the government does in all of these cases, and that is because it bears the initial burden of production to show the existence of a conviction that, at least on its face, appears to be disqualifying. Official - Subject to Final Review Section 1229a(c)(3)(B), which Justice Gorsuch asked me about, and subparagraph (C) as well, refer to ""in any proceeding under this chapter."" So it's not limited to the context in which the government is trying to prove deportability. As for the regulation that my friend on the other side mentioned, Section 1240.8(d), the attorney general's own interpretations of that regulation in the Matter of A-G-G- case and the Matter of S-K- case that we've cited in our reply show that that regulation places an initial burden of production on the government, not to speculate that a bar may apply but to actually make out a full prima facie case that the bar to relief may apply. Third, Mr. Chief Justice, you asked about some of the practicalities around memorializing the -- the terms of a plea. And I didn't hear my friend on the other side give any answer to how this could work for old convictions, like the decades-old convictions that I mentioned, nor did I hear any answer to how exactly the criminal defendant could force something to be recorded in the many county and Official - Subject to Final Review state systems where this is simply checking off boxes on a computer program or a paper form and there's no opportunity to comment further. Finally, my friend on the other side mentioned efficiency concerns around allowing these cases to be decided at the threshold. And I would just note that our rule has been in effect in the First Circuit since 2016, in the Second Circuit since 2008, and in the Ninth Circuit for six of the last 13 years. And as in the Nasrallah case last term when the government made a similar efficiency argument, it has not substantiated that by pointing to any actual problems arising in those circuits. The judgment should be reversed.",19-438


In [1305]:
#join with other dataframe 
join2 = pd.merge(join, appended_data, how ='left')
join2


Unnamed: 0,pdf,docket,title,location,speaker,text
0,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,CHIEF JUSTICE ROBERTS:,"We will hear argument first this morning in Case 20-543, Yellen versus the Confederated Tribes, and the consolidated case. Mr. Guarnieri. ORAL ARGUMENT OF MATTHEW GUARNIERI ON BEHALF OF THE PETITIONER IN CASE NO. 20-543 MR."
1,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,GUARNIERI:,"Mr. Chief Justice, and may it please the Court: Our fundamental submission in this case is that in defining ""Indian Tribe"" for ISDA purposes, Congress did not deliberately include Alaska native regional and village corporations only to then exclude all of them by subjecting them to a formal political recognition requirement that no ANC meets or, indeed, has ever met. Instead, the settled understanding for the last 45 years has been that ANCs are eligible to be treated as Indian Tribes for ISDA purposes, even though ANCs are not and have never been federally recognized Indian Tribes. That interpretation has been endorsed by all Official - Subject to Final Review three branches of the federal government. Congress was acting against the backdrop of those settled understandings when it incorporated the ISDA definition of ""Indian Tribe"" into the CARES Act in 2020. Congress chose to make ANCs eligible to receive millions of dollars of coronavirus relief funds to benefit the many Alaska natives whom they serve. The decision below contravenes that policy judgment and threatens to shut ANCs out of a wide range of important federal programs. No sound principle of textual interpretation justifies such a dramatic departure from the status quo. Reading the ISDA definition to mean that ANCs are included only in the event that they are someday somehow recognized by the United States for government-to-government relations would render their deliberate inclusion in the statute a dead letter. Either the recognition clause must mean something else, or it does not apply to ANCs. Now we principally urge the latter approach, which the Department of the Interior and the Indian Health Service adopted decades ago and which the Ninth Circuit endorsed in the Official - Subject to Final Review Cook Inlet case. In our view, Congress defined the entities eligible to enter into ISDA agreements as federally recognized Indian Tribes and also, in addition, the entities that play a similar role in the special case of Alaska, namely, Alaska native villages and Alaska native corporations defined in and established pursuant to ANCSA. That reading, unlike Respondents' reading, gives effect to every word and clause in the statute. I welcome the Court's questions."
2,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,CHIEF JUSTICE ROBERTS:,"Counsel, as I think you confirmed in this opening statement, you rely heavily on the legislative history, the congressional purpose, the post-enactment history, and there was a time when this Court also relied on those sources, but this -- this is not that time. And what is the best case you can cite from recent years for your -- your general approach? MR."
3,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,GUARNIERI:,"Well, I think the case that -- that we find the most instructive is the Court's decision against -- in United States Official - Subject to Final Review against Hayes, which is the case discussed in our opening brief. In Hayes, the Court was considering a statutory definition of the term ""misdemeanor"" -- ""domestic misdemeanor violence"" -- or, sorry, ""misdemeanor crime of domestic violence,"" and the -- the statutory definition there had a prefatory clause and then two subsections, and the question before the Court was how to apply a modifier in the second subsection. And based on textual and contextual evidence, the Court concluded that the modifier that appeared in the second subclause of that definition actually applied to its -- its antecedent was one of the words in the prefatory clause at the beginning of the definition. And we think we're asking the Court here for -- for an even less sort of -- the interpretation that we're urging here is even more naturally sort of derived from the text than the interpretation the Court adopted in Hayes. And also, you know, to your -- to your point, Mr. Chief Justice, I mean, we are making a textual argument. It's not entirely Official - Subject to Final Review purposive. And -- and it's a text -- it's a textual argument derived from ISDA's definition, as well as from the other statutes that Congress has enacted that in their text presuppose that ANCs are eligible to be treated as Indian Tribes."
4,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,CHIEF JUSTICE ROBERTS:,"Thank you, counsel. Justice Thomas."
...,...,...,...,...,...,...
9822,../argument_transcripts/2020/19-438_q713.pdf,19-438,Pereida v. Barr,United States Court of Appeals for the Eighth Circuit,CHIEF JUSTICE ROBERTS:,"A minute to wrap up, Mr. Bond. MR."
9823,../argument_transcripts/2020/19-438_q713.pdf,19-438,Pereida v. Barr,United States Court of Appeals for the Eighth Circuit,BOND:,"Thank you, Mr. Chief Justice. Petitioner's basic argument is that Official - Subject to Final Review you should start with decisions that put a gloss on various other statutes and retrofit this statute to match. We submit that that is backwards. The Court should start with the governing statutory text, and, here, that text answers the question presented by putting the burden of proving eligibility on the alien, including a lack of disqualifying convictions. Now, in our view, Congress's judgment is compatible with this Court's precedent addressing the categorical and modified approaches, but if there were any inconsistency or tension, it should be resolved in favor of the language Congress enacted to address this particular issue. The court of appeals' decision should be affirmed."
9824,../argument_transcripts/2020/19-438_q713.pdf,19-438,Pereida v. Barr,United States Court of Appeals for the Eighth Circuit,CHIEF JUSTICE ROBERTS:,"Thank you, counsel. Three minutes for rebuttal. REBUTTAL ARGUMENT OF BRIAN P. GOLDMAN ON BEHALF OF THE PETITIONER MR."
9825,../argument_transcripts/2020/19-438_q713.pdf,19-438,Pereida v. Barr,United States Court of Appeals for the Eighth Circuit,GOLDMAN:,"Thank you, Mr. Chief Justice. I'll try to make four points quickly. Official - Subject to Final Review First, I agree with my friend on the other side that this is an issue of statutory interpretation. The Congress passed two provisions. One uses the term ""conviction"" that embraces the least acts presumption, which Congress understood serves the important functions that we've discussed. Separately, Congress passed a burden of proof. But the two are not at war. A non-citizen can satisfy his burden by invoking the presumption, as is common in the law. And the REAL ID Act did not suspend a 100-year-old presumption. Second, Justice Gorsuch asked me in the opening argument about the burden of production, and that came up in the last half hour. And I would just emphasize that the government produced the documents here. Page 2a of our blue brief has the certification of the immigration officer. And that wasn't an act of generosity here. That is what the government does in all of these cases, and that is because it bears the initial burden of production to show the existence of a conviction that, at least on its face, appears to be disqualifying. Official - Subject to Final Review Section 1229a(c)(3)(B), which Justice Gorsuch asked me about, and subparagraph (C) as well, refer to ""in any proceeding under this chapter."" So it's not limited to the context in which the government is trying to prove deportability. As for the regulation that my friend on the other side mentioned, Section 1240.8(d), the attorney general's own interpretations of that regulation in the Matter of A-G-G- case and the Matter of S-K- case that we've cited in our reply show that that regulation places an initial burden of production on the government, not to speculate that a bar may apply but to actually make out a full prima facie case that the bar to relief may apply. Third, Mr. Chief Justice, you asked about some of the practicalities around memorializing the -- the terms of a plea. And I didn't hear my friend on the other side give any answer to how this could work for old convictions, like the decades-old convictions that I mentioned, nor did I hear any answer to how exactly the criminal defendant could force something to be recorded in the many county and Official - Subject to Final Review state systems where this is simply checking off boxes on a computer program or a paper form and there's no opportunity to comment further. Finally, my friend on the other side mentioned efficiency concerns around allowing these cases to be decided at the threshold. And I would just note that our rule has been in effect in the First Circuit since 2016, in the Second Circuit since 2008, and in the Ninth Circuit for six of the last 13 years. And as in the Nasrallah case last term when the government made a similar efficiency argument, it has not substantiated that by pointing to any actual problems arising in those circuits. The judgment should be reversed."


In [1306]:
#save file 
join2.to_csv('final_project_dec.csv')

In [1310]:
#now start text analysis 
pd.set_option('display.max_colwidth', None)
df = pd.read_csv("final_project_dec.csv")
df.head(5)


Unnamed: 0.1,Unnamed: 0,pdf,docket,title,location,speaker,text
0,0,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,CHIEF JUSTICE ROBERTS:,"We will hear argument first this morning in Case 20-543, Yellen versus the Confederated Tribes, and the consolidated case. Mr. Guarnieri. ORAL ARGUMENT OF MATTHEW GUARNIERI ON BEHALF OF THE PETITIONER IN CASE NO. 20-543 MR."
1,1,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,GUARNIERI:,"Mr. Chief Justice, and may it please the Court: Our fundamental submission in this case is that in defining ""Indian Tribe"" for ISDA purposes, Congress did not deliberately include Alaska native regional and village corporations only to then exclude all of them by subjecting them to a formal political recognition requirement that no ANC meets or, indeed, has ever met. Instead, the settled understanding for the last 45 years has been that ANCs are eligible to be treated as Indian Tribes for ISDA purposes, even though ANCs are not and have never been federally recognized Indian Tribes. That interpretation has been endorsed by all Official - Subject to Final Review three branches of the federal government. Congress was acting against the backdrop of those settled understandings when it incorporated the ISDA definition of ""Indian Tribe"" into the CARES Act in 2020. Congress chose to make ANCs eligible to receive millions of dollars of coronavirus relief funds to benefit the many Alaska natives whom they serve. The decision below contravenes that policy judgment and threatens to shut ANCs out of a wide range of important federal programs. No sound principle of textual interpretation justifies such a dramatic departure from the status quo. Reading the ISDA definition to mean that ANCs are included only in the event that they are someday somehow recognized by the United States for government-to-government relations would render their deliberate inclusion in the statute a dead letter. Either the recognition clause must mean something else, or it does not apply to ANCs. Now we principally urge the latter approach, which the Department of the Interior and the Indian Health Service adopted decades ago and which the Ninth Circuit endorsed in the Official - Subject to Final Review Cook Inlet case. In our view, Congress defined the entities eligible to enter into ISDA agreements as federally recognized Indian Tribes and also, in addition, the entities that play a similar role in the special case of Alaska, namely, Alaska native villages and Alaska native corporations defined in and established pursuant to ANCSA. That reading, unlike Respondents' reading, gives effect to every word and clause in the statute. I welcome the Court's questions."
2,2,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,CHIEF JUSTICE ROBERTS:,"Counsel, as I think you confirmed in this opening statement, you rely heavily on the legislative history, the congressional purpose, the post-enactment history, and there was a time when this Court also relied on those sources, but this -- this is not that time. And what is the best case you can cite from recent years for your -- your general approach? MR."
3,3,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,GUARNIERI:,"Well, I think the case that -- that we find the most instructive is the Court's decision against -- in United States Official - Subject to Final Review against Hayes, which is the case discussed in our opening brief. In Hayes, the Court was considering a statutory definition of the term ""misdemeanor"" -- ""domestic misdemeanor violence"" -- or, sorry, ""misdemeanor crime of domestic violence,"" and the -- the statutory definition there had a prefatory clause and then two subsections, and the question before the Court was how to apply a modifier in the second subsection. And based on textual and contextual evidence, the Court concluded that the modifier that appeared in the second subclause of that definition actually applied to its -- its antecedent was one of the words in the prefatory clause at the beginning of the definition. And we think we're asking the Court here for -- for an even less sort of -- the interpretation that we're urging here is even more naturally sort of derived from the text than the interpretation the Court adopted in Hayes. And also, you know, to your -- to your point, Mr. Chief Justice, I mean, we are making a textual argument. It's not entirely Official - Subject to Final Review purposive. And -- and it's a text -- it's a textual argument derived from ISDA's definition, as well as from the other statutes that Congress has enacted that in their text presuppose that ANCs are eligible to be treated as Indian Tribes."
4,4,../argument_transcripts/2020/20-543_hgci.pdf,20-543,Yellen v. Confederated Tribes of Chehalis Reservation,United States Court of Appeals for the District of Columbia Circuit,CHIEF JUSTICE ROBERTS:,"Thank you, counsel. Justice Thomas."


In [1311]:
#who speakes the most in each case?
df.groupby(by='docket').speaker.value_counts().groupby(level=0, group_keys = False).head()


docket   speaker               
18-1259  CHIEF JUSTICE ROBERTS:    42
         NOBILE:                   34
         SHAPIRO:                  34
         JUSTICE SOTOMAYOR:        17
         LIU:                      17
                                   ..
20-5904  CHIEF JUSTICE ROBERTS:    46
         FEIGIN:                   41
         ADLER:                    37
         MORTARA:                  34
         JUSTICE SOTOMAYOR:        19
Name: speaker, Length: 190, dtype: int64

In [1312]:
#who speakes the most in each circut
df.groupby(by='location').speaker.value_counts().groupby(level=0, group_keys = False).head()


location                                                            speaker               
Court of Appeal of California, First Appellate District             CHIEF JUSTICE ROBERTS:    56
                                                                    FISHER:                   48
                                                                    RICE:                     36
                                                                    HARBOURT:                 27
                                                                    ROSS:                     24
                                                                                              ..
United States District Court for the Southern District of New York  GENERAL WALL:             53
                                                                    CHIEF JUSTICE ROBERTS:    50
                                                                    UNDERWOOD:                39
                                    

In [1314]:
#who speakes the most in general?
df.speaker.value_counts().groupby(level=0, group_keys = False).head()


CHIEF JUSTICE ROBERTS:                                    1623
JUSTICE KAGAN:                                             575
JUSTICE GORSUCH:                                           556
JUSTICE SOTOMAYOR:                                         552
JUSTICE ALITO:                                             501
                                                          ... 
JUSTICE BARRET:                                              1
WALL   ON BEHALF OF THE APPELLANTS   GENERAL WALL:           1
WALL       ON BEHALF OF THE APPELLANTS   GENERAL WALL:       1
SUPPORTING THE RESPONDENTS   GENERAL PRELOGAR:               1
GUARNERI:                                                    1
Name: speaker, Length: 99, dtype: int64

In [1317]:
#Top Cases in each Circuit Containing Agreeable and Disagreeable Language:

searchfor_agreement = ["absolutely", "Exactly", "for sure", "I agree", "so true", "that’s right", "same here"]
searchfor_disagreement = ["disagree", "incorrect", "not true", "opposite", "Not necessarily", "no way"]


circuit = df[df['location'] == "United States Court of Appeals for the Third Circuit"]


print("------")
print("agreement")
print("------")
print(circuit.groupby(["title"])['text'].apply(lambda x: x[x.str.contains('|'.join(searchfor_agreement), flags=re.IGNORECASE, na=False)].count()))
print("------")
print("disagreement")
print("------")
print(circuit.groupby(["title"])['text'].apply(lambda x: x[x.str.contains('|'.join(searchfor_disagreement), flags=re.IGNORECASE, na=False)].count()))

------
agreement
------
title
Carney v. Adams                         0
FCC v. Prometheus Radio Project         5
Fulton v. Philadelphia                 12
Mahanoy Area School Dist. v. B. L.     15
Penneast Pipeline Co. v. New Jersey    11
Santos Sanchez v. Mayorkas              4
Name: text, dtype: int64
------
disagreement
------
title
Carney v. Adams                        0
FCC v. Prometheus Radio Project        8
Fulton v. Philadelphia                 8
Mahanoy Area School Dist. v. B. L.     5
Penneast Pipeline Co. v. New Jersey    5
Santos Sanchez v. Mayorkas             4
Name: text, dtype: int64
