## Predicted Supreme Court Decisions

In [None]:
#### Executive Summary

In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split

The Supreme Court Database contains 4 basic databases (each in several formats).  http://scdb.wustl.edu/ 
Two databases cover the "legacy" period (1791-1945) and two cover the modern period (1946-2016)
One legacy database and one modern database contain justice level data -- one line for each justice so that the information is broken out for each individual vote.  The other legacy database and the other modern database contain on line for each case.  Other variants of the modern database are available, but for determining the outcomes of individual cases, the justice level and case level databases are the obvious choices.

The following work was necessary to create train/test sets for cross validation. I knew that ultimately my goal was to predict modern cases.  However, I wanted to be able to work with the case level or justice level database.  Running separate train-test splits on the case level and justice level datasets would have resulted in a mismatch where cases that were in the train set for the case level would be in the test set for the justice level.  Additionally, I didn't want to train on the votes of certain justices on a specific case to predict the votes of other justices on that same case.  My solution was to performa a train-test split on the modern case-level database and then divide the justice level database on the same cases. 

In [19]:
## Loading the Modern Datasets.  
df_case_mod = pd.read_csv('SCDB_2017_01_caseCentered_Citation.csv', encoding='ISO 8859-1', dtype='object')
df_justice_mod = pd.read_csv('SCDB_2017_01_justiceCentered_Citation.csv', encoding='ISO 8859-1', dtype='object')


I was not sure of my targets at this point in my project, but I knew that I wanted to separate informaton only available after a case is decided from information available prior to the decision.  That is how I chose the features and targets for the purpose of the train-test split

In [20]:
df_case_mod.columns ## The following is a list of columns -- descriptions at the source website were used to 
                    ## choose the features and targets

Index(['caseId', 'docketId', 'caseIssuesId', 'voteId', 'dateDecision',
       'decisionType', 'usCite', 'sctCite', 'ledCite', 'lexisCite', 'term',
       'naturalCourt', 'chief', 'docket', 'caseName', 'dateArgument',
       'dateRearg', 'petitioner', 'petitionerState', 'respondent',
       'respondentState', 'jurisdiction', 'adminAction', 'adminActionState',
       'threeJudgeFdc', 'caseOrigin', 'caseOriginState', 'caseSource',
       'caseSourceState', 'lcDisagreement', 'certReason', 'lcDisposition',
       'lcDispositionDirection', 'declarationUncon', 'caseDisposition',
       'caseDispositionUnusual', 'partyWinning', 'precedentAlteration',
       'voteUnclear', 'issue', 'issueArea', 'decisionDirection',
       'decisionDirectionDissent', 'authorityDecision1', 'authorityDecision2',
       'lawType', 'lawSupp', 'lawMinor', 'majOpinWriter', 'majOpinAssigner',
       'splitVote', 'majVotes', 'minVotes'],
      dtype='object')

In [21]:
targets = ['declarationUncon', 'caseDisposition',
       'caseDispositionUnusual', 'partyWinning', 'precedentAlteration',
       'voteUnclear','decisionDirection',
       'decisionDirectionDissent', 'authorityDecision1', 'authorityDecision2',
       'lawType', 'lawSupp', 'lawMinor', 'majOpinWriter', 'majOpinAssigner',
       'splitVote', 'majVotes', 'minVotes']

In [22]:
y_case = df_case_mod[targets]
X_case = df_case_mod[[col for col in df_case_mod.columns if col not in targets]]

In [23]:
X_case.columns

Index(['caseId', 'docketId', 'caseIssuesId', 'voteId', 'dateDecision',
       'decisionType', 'usCite', 'sctCite', 'ledCite', 'lexisCite', 'term',
       'naturalCourt', 'chief', 'docket', 'caseName', 'dateArgument',
       'dateRearg', 'petitioner', 'petitionerState', 'respondent',
       'respondentState', 'jurisdiction', 'adminAction', 'adminActionState',
       'threeJudgeFdc', 'caseOrigin', 'caseOriginState', 'caseSource',
       'caseSourceState', 'lcDisagreement', 'certReason', 'lcDisposition',
       'lcDispositionDirection', 'issue', 'issueArea'],
      dtype='object')

In [24]:
## train-test split on case level modern dataset - then pulling out the caseIds and apply to justice level sets

X_case_train, X_case_test, y_case_train, y_case_test = train_test_split(X_case, y_case, test_size=.2, random_state=2017)

train_caseIds = X_case_train['caseId']
test_caseIds = X_case_test['caseId']


In [26]:
## applying the caseIds to create and save test and train sets at the case and justice level

df_justice_mod_train = df_justice_mod[df_justice_mod.caseId.isin(train_caseIds)]
df_justice_mod_test = df_justice_mod[df_justice_mod.caseId.isin(test_caseIds)]

df_justice_mod_train.to_csv('justice_train_mod.csv')
df_justice_mod_test.to_csv('justice_test_mod.csv')

df_case_mod_train = df_case_mod[df_case_mod.caseId.isin(train_caseIds)]
df_case_mod_test = df_case_mod[df_case_mod.caseId.isin(test_caseIds)]

df_case_mod_train.to_csv('case_train_mod.csv')
df_case_mod_test.to_csv('case_test_mod.csv')