# Measuring Judge Ideology

### Background:
Example: Martin-Quinn Scores, it's computed using a dynamic item-response model. It's *dynamic* because it allows for the latent ideology measure to vary across time. They take a Bayesian approach and achieve the *dynamic* part using a hierarchical random walk prior on the latent ideology measure. 

![Martin_Quinn](https://mqscores.lsa.umich.edu/images/ipAnim1937_2006.gif)

![Image1](images/img1.png)

## Using text to do the same

Consider Snyder vs. Phelps (2011):
- Quick facts: Westboro church members picketed with egregious homophobic signs at the funeral service of Marine Lance Corporal Matthew Snyder
- Briefly, Roberts states there are "special protections" for first amendment for speech regarding public issues that takes place on public land.

![Roberts](images/roberts.png)

- Alito dissents that free speech is not a license to inflict harm. Here's a passage:

![Alito](images/alito.png)

## Modeling Challenges:

**Legal Language**: Legal language is generally jargony and typically avoids colloquialisms (e.g. in Congress, members use colloquial speech in floor speeches often such as "death tax", "fake news", "alternative facts", "pro-life", "anti-choice").

Opinion Structure:
1. Fact Pattern + How Case Got to This Court
2. Explain Legal Concepts, Explain Precedents
3. Apply Legal Analysis
4. Summarize


## Data Challenges:

Example #1: https://www.courtlistener.com/opinion/2075/greenwich-financial-services-distressed-mortgage-f/

In [1]:
library(rjson)
example1 = fromJSON(file='data/2448.json')
example1

### Problem 1: "Year" is missing  

Easy to get from "local path" if available, otherwise need to get from opinion.

In [2]:
example1$local_path

In [3]:
file_names = list.files('data/')
opinions = list()
for(i in 1:length(file_names)) opinions[[i]] = fromJSON(file=paste('data/',file_names[i],sep=''))
lapply(opinions,function(x) x$local_path)

### Problem 2: "plain_text", "html", "html_lawbox", "html_columbia", "html_with_citations" each contain argument with different format

In [4]:
substr(opinions[[1]][c('plain_text','html','html_columbia','html_with_citations')],0,200)

In [5]:
print_some_lines = function(x,n_lines) {
    print(paste('plain_text:',substr(x$plain_text,0,n_lines)))
    print(paste('html:',substr(x$html,0,n_lines)))
    print(paste('html_columbia:',substr(x$html_columbia,0,n_lines)))
    print(paste('html_with_citations:',substr(x$html_with_citations,0,n_lines)))
}

In [8]:
opinions[[4]]$absolute_url

In [6]:
print_some_lines(opinions[[4]],2000)

[1] "plain_text: "
[1] "html: <p class=\"case_cite\">227 F.2d 282</p>\n    <p class=\"parties\">Doris Sylvia GREY, infant, and Howard Martin Grey, infant,<br>children of Harry M. Goldberg, deceased, and<br>Sophie Goldberg, deceased, by Esther<br>Weiner, their guardian ad<br>litem, Plaintiffs-Appellants,<br>v.<br>AMERICAN AIRLINES, Inc., Defendant-Appellee.</p>\n    <p class=\"docket\">No. 73, Docket 23601.</p>\n    <p class=\"court\">United States Court of Appeals Second Circuit.</p>\n    <p class=\"date\">Argued Oct. 7, 1955.<br>Decided Nov. 7, 1955.</p>\n    <div class=\"prelims\">\n      <p class=\"indent\">Manes, Sturim, Donovan &amp; Laufer, New York City (Arthur M. Laufer and Samuel S. Sturim, New York City, on the brief), for plaintiffs-appellants.</p>\n      <p class=\"indent\">Haight, Gardner, Poor &amp; Havens, New York City (William J. Junkerman and James B. McQuillan, New York City, of counsel), for defendant-appellee.</p>\n      <p class=\"indent\">Before HAND, MEDINA and 

### Problem 3: Get Author Name

JSON files do have "author" and "author_str" keys, but they are empty...  
Metadata are available but professors won't share unless I am a co-author

In [9]:
opinions[[4]]$author
opinions[[4]]$author_str

NULL

https://www.courtlistener.com/opinion/237855/doris-sylvia-grey-infant-and-howard-martin-grey-infant-children-of/

Author is MEDINA not LUMBARD

In [10]:
substr(opinions[[4]]$html_with_citations,1025,1200)

In [11]:
library(stringr)
str_match(opinions[[4]]$html_with_citations, "<p class=\"indent\"([ \t>]*)([a-zA-Z,]*)[ \t]Circuit[ \t]+Judge[.]")[,3]

### Problem 4: Dissents, Concurring, Per Curiam, Errata Sheets

Example Dissent: 

https://www.courtlistener.com/opinion/1036108/manning-v-boston-medical-center/

Example Errata Sheet:

https://www.courtlistener.com/opinion/1034770/in-reauerhahn-v/

Excluding these for now  

Total First Circuit Files: 34834  
Dissents: 275 (< 1%)  
Errata: 1527 (4%)  
Per Curiam: 3713 (10%)  

![image2](images/federal_dissents.png)

(From "Why (and when) judges dissents: A Theoretical Empiriccal Analysis" - Epstein, Landes, Posner (2011))

## Metadata

I created a metadata file (so far only includes first circuit):

Roughly 50,000 federal appellate court cases per year
Roughly 1400 first curcuit cases per year

In [12]:
load('data_inventory.ca1.RDATA')

In [13]:
head(df)

file_name,year,case_name,alt_case_name,circuit,local_path,absolute_url,type,author,joined_by,download_url,plain_text,judge,dissent,concurring,per_curiam,errata
1.json,2010,US_v._Davila-Gonzalez,united-states-v-davila-gonzalez,ca1,pdf/2010/02/10/US_v._Davila-Gonzalez.pdf,/opinion/1/united-states-v-davila-gonzalez/,010combined,,,http://www.ca1.uscourts.gov/pdf.opinions/08-2575P-01A.pdf,plain,"SELYA,",0,0,0,0
10.json,2010,US_v._Mitchell,united-states-v-mitchell,ca1,pdf/2010/02/22/US_v._Mitchell.pdf,/opinion/10/united-states-v-mitchell/,010combined,,,http://www.ca1.uscourts.gov/pdf.opinions/09-1260P-01A.pdf,plain,"TORRUELLA,",0,0,0,0
1000.json,2010,Adams_v._Adams,adams-v-adams,ca1,pdf/2010/03/31/Adams_v._Adams.pdf,/opinion/1000/adams-v-adams/,010combined,,,http://www.ca1.uscourts.gov/pdf.opinions/09-1443P-01A.pdf,plain,"STAHL,",0,0,0,0
1001.json,2010,Airframe_Systems_Inc._v._Raytheon_Company,airframe-systems-inc-v-raytheon-co,ca1,pdf/2010/03/31/Airframe_Systems_Inc._v._Raytheon_Company.pdf,/opinion/1001/airframe-systems-inc-v-raytheon-co/,010combined,,,http://www.ca1.uscourts.gov/pdf.opinions/09-1624P-01A.pdf,plain,"LYNCH,",0,0,0,0
1032057.json,2013,united_states_v._hogan,united-states-v-hogan,ca1,pdf/2013/07/05/united_states_v._hogan.pdf,/opinion/1032057/united-states-v-hogan/,010combined,,,http://media.ca1.uscourts.gov/pdf.opinions/12-1039P-01A.pdf,plain,"THOMPSON,",0,0,0,0
1032437.json,2013,mitchell_v._us_airways_inc.,mitchell-v-us-airways-inc,ca1,pdf/2013/07/09/mitchell_v._us_airways_inc..pdf,/opinion/1032437/mitchell-v-us-airways-inc/,010combined,,,http://media.ca1.uscourts.gov/pdf.opinions/12-1543P-01A.pdf,plain,"SELYA,",0,0,0,0


In [15]:
nrow(df)

In [14]:
sum(is.na(df$year))

In [16]:
sum(is.na(df$local_path))

In [17]:
sum(is.na(df$plain_text))

## Text Model

![doc2vec](images/doc2vec.png)

## Analysis

Average document vectors for each judge then plot.
Only keep judge if more than 100 opinions are available. (25 judges)

In [21]:
# Load Data
judge_meta = read.csv(file='ca1_Judge_Metadata.csv')
head(judge_meta,30)

Judge,Judge.full,Born,Active,Appointed.By,X
putnam,William LeBaron Putnam,1835,1892–1917,B. Harrison,0
colt,LeBaron B. Colt,1846,1891–1913[Note 1],Arthur,0
dodge,Frederic Dodge,1847,1912–1918,Taft,0
lowell,Francis Cabot Lowell,1855,1905–1911,T. Roosevelt,0
schofield,William Schofield,1857,1911–1912,Taft,0
johnson,Charles Fletcher Johnson,1859,1917–1929,Wilson,1
anderson,George Weston Anderson,1861,1918–1931,Wilson,0
bingham,George Hutchins Bingham,1864,1913–1939,Wilson,1
morton,James Madison Morton Jr.,1869,1932–1939,Hoover,1
wilson,Scott Wilson,1870,1929–1940,Hoover,1


![pca](images/pca.png)

Next: de-mean by decade, circuit court, THEN average by judge.

If possible, de-mean by topic as well.

## Code Schema

![schema](images/schema.png)