### Load In Data

In [1]:
import pandas as pd
import numpy as np
import re

In [2]:
cases = pd.read_csv('csvs/justice2.csv')
judges = pd.read_csv('csvs/table_of_justices.csv')
presidents = pd.read_csv('csvs/presidents.csv')

In [3]:
cases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3303 entries, 0 to 3302
Data columns (total 16 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Unnamed: 0          3303 non-null   int64 
 1   ID                  3303 non-null   int64 
 2   name                3303 non-null   object
 3   href                3303 non-null   object
 4   docket              3292 non-null   object
 5   term                3303 non-null   int64 
 6   first_party         3302 non-null   object
 7   second_party        3302 non-null   object
 8   facts               3303 non-null   object
 9   facts_len           3303 non-null   int64 
 10  majority_vote       3303 non-null   int64 
 11  minority_vote       3303 non-null   int64 
 12  first_party_winner  3288 non-null   object
 13  decision_type       3296 non-null   object
 14  disposition         3231 non-null   object
 15  issue_area          3161 non-null   object
dtypes: int64(6), object(10)


In [4]:
judges.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 121 entries, 0 to 120
Data columns (total 6 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Index                     121 non-null    int64 
 1   Justice Name              121 non-null    object
 2   Supreme Court Term Start  121 non-null    object
 3   Supreme Court Term End    121 non-null    object
 4   Appointing President      121 non-null    object
 5   Notable Opinion(s)        121 non-null    object
dtypes: int64(1), object(5)
memory usage: 5.8+ KB


It's nice to note that there isn't any missing information regarding the entered judges here. Every judge has
* Name
* Start Term
* End Term
* Appointing President

This will help us to potentially classify the judge's political affiliations when creating features for our model.

In [5]:
presidents.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   President   46 non-null     object
 1   Party       46 non-null     object
dtypes: object(2)
memory usage: 868.0+ bytes


### Data Prep

#### Get All Dates in Terms of Years

In [6]:
judges['start_year'] = pd.to_datetime(judges['Supreme Court Term Start']).dt.year
judges['end_year'] = pd.to_datetime(judges['Supreme Court Term End'].replace("--","01-Mar-24")).dt.year

  judges['end_year'] = pd.to_datetime(judges['Supreme Court Term End'].replace("--","01-Mar-24")).dt.year


All cases just have a "term" in which a ruling was placed, not an actual date. Therefore, we will need to make an assumption that if a Supreme Court Justice's service encapsulates the term in which a ruling was passed, they will in fact have had been part of that ruling.

#### Match Judges to their Party Affiliations

In [7]:
judges['president_last_name'] = judges["Appointing President"].str.extract(r'^(\w+),')
presidents['president_last_name']= presidents['President '].str.extract(r'\s(\w+)$') 

In [8]:
president_party_dict = presidents.set_index('president_last_name')['Party '].to_dict()
judges['party'] = np.where(judges['president_last_name'].isin(president_party_dict.keys()),
                           judges['president_last_name'].map(president_party_dict),
                           None)
judges['party'] = judges['party'].dropna().apply(lambda x: x.strip())

We make the assumption that the judges will have the same political affiliation as the President that inaugarated him/her. Therefore, it is helpful to map the President's party affiliation to the corresponding Supreme Court Justices they appointed.

In [9]:
judges.value_counts("party")

party
Republican                                   49
Democratic                                   44
Independent                                  11
Democratic-Republican                         6
Republican/National Union                     5
Democratic-Republican/National Republican     4
Whig                                          2
Name: count, dtype: int64

This is a problem that was unforseen. There have been some different sounding political parties over the years with some unique ideals. It's hard to classify distinction with the change of names because...

`Democratic` + `Republican` != `Democratic-Republican`

Our solution here is to instead to map party ideology into "neutral", "conservative", or "liberal" utilizing our pre-existing knowledge of these parties.

In [10]:
ideology_map = {
    "Republican": "conservative",
    "Democratic": "liberal",
    "Independent": "neutral", 
    "Whig": "conservative",
    "Democratic-Republican/National Republican": "conservative",
    "Republican/National Union": "conservative",
    "Democratic-Republican": "liberal"
}

In [11]:
judges['ideology'] = np.where(judges['party'].isin(ideology_map.keys()),
                           judges['party'].map(ideology_map),
                           None)

#### Match Cases to Judges to Get Counts for Each Ideology


In [12]:
cases["conservative"] = 0
cases["liberal"] = 0
cases["neutral"] = 0

In [13]:
for case_index, term in enumerate(cases.term):
    for judge_index, (start, end) in enumerate(zip(judges.start_year, judges.end_year)):
        if (start <= term <= end):
            judge_ideology = judges.ideology[judge_index]
            cases.loc[case_index, judge_ideology] += 1
        else:
            pass

#### Remove rows that are missing `Issue_area` 

In [31]:
cases_clean = cases[cases.issue_area.notnull()]
cases_clean.head()


Unnamed: 0.1,Unnamed: 0,ID,name,href,docket,term,first_party,second_party,facts,facts_len,majority_vote,minority_vote,first_party_winner,decision_type,disposition,issue_area,conservative,liberal,neutral
1,1,50613,Stanley v. Illinois,https://api.oyez.org/cases/1971/70-5014,70-5014,1971,"Peter Stanley, Sr.",Illinois,<p>Joan Stanley had three children with Peter ...,757,5,2,True,majority opinion,reversed/remanded,Civil Rights,5,4,0
2,2,50623,Giglio v. United States,https://api.oyez.org/cases/1971/70-29,70-29,1971,John Giglio,United States,<p>John Giglio was convicted of passing forged...,495,7,0,True,majority opinion,reversed/remanded,Due Process,5,4,0
3,3,50632,Reed v. Reed,https://api.oyez.org/cases/1971/70-4,70-4,1971,Sally Reed,Cecil Reed,"<p>The Idaho Probate Code specified that ""male...",378,7,0,True,majority opinion,reversed/remanded,Civil Rights,5,4,0
4,4,50643,Miller v. California,https://api.oyez.org/cases/1971/70-73,70-73,1971,Marvin Miller,California,"<p>Miller, after conducting a mass mailing cam...",305,5,4,True,majority opinion,vacated/remanded,First Amendment,5,4,0
5,5,50644,Kleindienst v. Mandel,https://api.oyez.org/cases/1971/71-16,71-16,1971,"Richard G. Kleindienst, Attorney General of th...","Ernest E. Mandel, et al.",<p>Ernest E. Mandel was a Belgian professional...,2282,6,3,True,majority opinion,reversed,First Amendment,5,4,0


#### Create new column for unanimous vote

In [51]:
cases_clean['unanimous'] = cases_clean.minority_vote == 0


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cases_clean['unanimous'] = cases_clean.minority_vote == 0


Unnamed: 0.1,Unnamed: 0,ID,name,href,docket,term,first_party,second_party,facts,facts_len,majority_vote,minority_vote,first_party_winner,decision_type,disposition,issue_area,conservative,liberal,neutral,unanimous
1,1,50613,Stanley v. Illinois,https://api.oyez.org/cases/1971/70-5014,70-5014,1971,"Peter Stanley, Sr.",Illinois,<p>Joan Stanley had three children with Peter ...,757,5,2,True,majority opinion,reversed/remanded,Civil Rights,5,4,0,False
2,2,50623,Giglio v. United States,https://api.oyez.org/cases/1971/70-29,70-29,1971,John Giglio,United States,<p>John Giglio was convicted of passing forged...,495,7,0,True,majority opinion,reversed/remanded,Due Process,5,4,0,True
3,3,50632,Reed v. Reed,https://api.oyez.org/cases/1971/70-4,70-4,1971,Sally Reed,Cecil Reed,"<p>The Idaho Probate Code specified that ""male...",378,7,0,True,majority opinion,reversed/remanded,Civil Rights,5,4,0,True
4,4,50643,Miller v. California,https://api.oyez.org/cases/1971/70-73,70-73,1971,Marvin Miller,California,"<p>Miller, after conducting a mass mailing cam...",305,5,4,True,majority opinion,vacated/remanded,First Amendment,5,4,0,False
5,5,50644,Kleindienst v. Mandel,https://api.oyez.org/cases/1971/71-16,71-16,1971,"Richard G. Kleindienst, Attorney General of th...","Ernest E. Mandel, et al.",<p>Ernest E. Mandel was a Belgian professional...,2282,6,3,True,majority opinion,reversed,First Amendment,5,4,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3297,3297,63322,Yellen v. Confederated Tribes of the Chehalis ...,https://api.oyez.org/cases/2020/20-543,20-543,2020,"Janet L. Yellen, Secretary of the Treasury",Confederated Tribes of the Chehalis Reservatio...,<p>For over a century after the Alaska Purchas...,2340,6,3,True,majority opinion,reversed/remanded,Civil Rights,6,4,0,False
3298,3298,63324,United States v. Palomar-Santiago,https://api.oyez.org/cases/2020/20-437,20-437,2020,United States,Refugio Palomar-Santiago,"<p>Refugio Palomar-Santiago, a Mexican nationa...",2054,9,0,True,majority opinion,reversed/remanded,Criminal Procedure,6,4,0,True
3299,3299,63323,Terry v. United States,https://api.oyez.org/cases/2020/20-5904,20-5904,2020,Tarahrick Terry,United States,<p>Tarahrick Terry pleaded guilty to one count...,1027,9,0,False,majority opinion,affirmed,Criminal Procedure,6,4,0,True
3300,3300,63331,United States v. Cooley,https://api.oyez.org/cases/2020/19-1414,19-1414,2020,United States,Joshua James Cooley,<p>Joshua James Cooley was parked in his picku...,1309,9,0,True,majority opinion,vacated/remanded,Civil Rights,6,4,0,True


#### Select only variables of interest

In [52]:
selected_cases = cases_clean[['name', 'first_party', 'second_party', 'facts', 'majority_vote', 'minority_vote', 'decision_type', 'first_party_winner',
                               'disposition', 'issue_area', 'conservative', 'liberal', 'neutral', 'unanimous']]
selected_cases.head()

Unnamed: 0,name,first_party,second_party,facts,majority_vote,minority_vote,decision_type,first_party_winner,disposition,issue_area,conservative,liberal,neutral,unanimous
1,Stanley v. Illinois,"Peter Stanley, Sr.",Illinois,<p>Joan Stanley had three children with Peter ...,5,2,majority opinion,True,reversed/remanded,Civil Rights,5,4,0,False
2,Giglio v. United States,John Giglio,United States,<p>John Giglio was convicted of passing forged...,7,0,majority opinion,True,reversed/remanded,Due Process,5,4,0,True
3,Reed v. Reed,Sally Reed,Cecil Reed,"<p>The Idaho Probate Code specified that ""male...",7,0,majority opinion,True,reversed/remanded,Civil Rights,5,4,0,True
4,Miller v. California,Marvin Miller,California,"<p>Miller, after conducting a mass mailing cam...",5,4,majority opinion,True,vacated/remanded,First Amendment,5,4,0,False
5,Kleindienst v. Mandel,"Richard G. Kleindienst, Attorney General of th...","Ernest E. Mandel, et al.",<p>Ernest E. Mandel was a Belgian professional...,6,3,majority opinion,True,reversed,First Amendment,5,4,0,False


## Logistic Regression 

#### Split the data into test and train data sets


**HI HAILEY! CAN YOU LOOK AT THE CASES. I GOT THE IDEOLOGY COUNTS BUT THEY ARE LARGER THAN THE VOTING BECAUSE SOME YEARS SOME JUDGES LEFT AND NEW ONES WERE APPOINTED AND THEY BOTH COUNT**


**HELLO :) I SEE THAT THERE ARE 142 CASES WHERE THE ISSUE AREA IS MISSING.... I THINK THAT MEANS WE SHOULD DROP THOSE. THERE ARE 3303 TOTAL CASES SO THAT ISNT TOO MANY MISSING. THOUGHTS?**

i am thinking we may want to look at whether the vote was unanimous, idk if that would be interesting or not 