# Step 2: Merge Data


The purpose of this program is to merge the text data from the transcripts with the outcome of the cases and the direction of the justice votes. The updated dataframe will be saved to .pkl for use in the next notebook.

In [69]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Import csv file with case information
This file comes from the University of Washington Law Supreme Court Database. Found here: http://scdb.wustl.edu/index.php

In [70]:
vote_df = pd.read_csv('SCDB_2020_01_justiceCentered_Docket_updated.csv')
vote_df.head(2)

Unnamed: 0,caseId,docketId,caseIssuesId,voteId,dateDecision,decisionType,usCite,sctCite,ledCite,lexisCite,...,majVotes,minVotes,justice,justiceName,vote,opinion,direction,majority,firstAgreement,secondAgreement
0,1946-001,1946-001-01,1946-001-01-01,1946-001-01-01-01-01,11/18/1946,1,329 U.S. 1,67 S. Ct. 6,91 L. Ed. 3,1946 U.S. LEXIS 1724,...,8,1,86,HHBurton,2.0,1.0,1.0,1.0,,
1,1946-001,1946-001-01,1946-001-01-01,1946-001-01-01-01-02,11/18/1946,1,329 U.S. 1,67 S. Ct. 6,91 L. Ed. 3,1946 U.S. LEXIS 1724,...,8,1,84,RHJackson,1.0,1.0,2.0,2.0,,


Subset to relevant columns and only current justices:

In [71]:
vote_df = vote_df[['docket','dateArgument','partyWinning','justiceName','majority']]
justicelist = ['SSotomayor','NMGorsuch','BMKavanaugh','EKagan','CThomas','SGBreyer','SAAlito','JGRoberts']
vote_df = vote_df.rename(columns = {'docket':'case_num'})
vote_df = vote_df[vote_df.justiceName.isin(justicelist)]
print(f'Shape: {vote_df.shape}')
print('Null count:')
print(vote_df.isna().sum())
vote_df.tail(5)

Shape: (9915, 5)
Null count:
case_num         11
dateArgument    974
partyWinning      6
justiceName       0
majority        280
dtype: int64


Unnamed: 0,case_num,dateArgument,partyWinning,justiceName,majority
93962,19-760,5/12/2020,1.0,SAAlito,1.0
93963,19-760,5/12/2020,1.0,SSotomayor,2.0
93964,19-760,5/12/2020,1.0,EKagan,2.0
93965,19-760,5/12/2020,1.0,NMGorsuch,2.0
93966,19-760,5/12/2020,1.0,BMKavanaugh,2.0


Dropping values for which there is no vote recorded:

In [72]:
vote_df = vote_df.dropna(subset = ['majority'])
vote_df = vote_df.dropna(subset = ['partyWinning'])
print(f'Shape: {vote_df.shape}')
print('Null count:')
print(vote_df.isna().sum())
vote_df.tail(5)

Shape: (9629, 5)
Null count:
case_num         11
dateArgument    949
partyWinning      0
justiceName       0
majority          0
dtype: int64


Unnamed: 0,case_num,dateArgument,partyWinning,justiceName,majority
93962,19-760,5/12/2020,1.0,SAAlito,1.0
93963,19-760,5/12/2020,1.0,SSotomayor,2.0
93964,19-760,5/12/2020,1.0,EKagan,2.0
93965,19-760,5/12/2020,1.0,NMGorsuch,2.0
93966,19-760,5/12/2020,1.0,BMKavanaugh,2.0


Check unique values and datatypes:

In [73]:
print(vote_df['partyWinning'].unique())
print(vote_df['majority'].unique())
vote_df.dtypes

[0. 1. 2.]
[2. 1.]


case_num         object
dateArgument     object
partyWinning    float64
justiceName      object
majority        float64
dtype: object

Check the spread of values for the vote variable:

In [74]:
vote_df.majority.value_counts()

2.0    7890
1.0    1739
Name: majority, dtype: int64

Create new column indicating whether a justice voted in favor of the petitioner or respondent. Used these logical relationships:

- partyWinning 0, majority 1: justice was in minority and respondent won. vote for petitioner
- partyWinning 0, majority 2: justice was in the majority and the respondent won. vote for respondent
- partyWinning 1, majority 1: justice was in minorty and petitioner won. vote for respondent
- partyWinning 1, majority 2: justice was in majority and petitioner won. vote for petitioner

In new column:
- Petitioner wins: 1
- Respondent wins: 0

In [75]:
def petitioner_vote(row):
    if row['partyWinning']==0 and row['majority']==1:
        return 1
    elif row['partyWinning']==0 and row['majority']==2:
        return 0
    elif row['partyWinning']==1 and row['majority']==1:
        return 0
    elif row['partyWinning']==1 and row['majority']==2:
        return 1
    else:
        return np.nan

In [76]:
vote_df['final_vote'] = vote_df.apply (lambda row: petitioner_vote(row), axis=1)
vote_df.head(2)

Unnamed: 0,case_num,dateArgument,partyWinning,justiceName,majority,final_vote
69291,90-1102,11/6/1991,0.0,CThomas,2.0,0.0
69318,90-1491,11/5/1991,1.0,CThomas,2.0,1.0


### Create dataframe that merges the vote_df with the text dataframe of individual justices:
Saves new merged dataframe to .pkl

#### Sotomayor:

In [77]:
vote_df_sotomayor = vote_df[vote_df.justiceName=='SSotomayor']
text_df_sotomayor = pd.read_pickle("df_sotomayor.pkl")
vote_df_sotomayor.head(2)

Unnamed: 0,case_num,dateArgument,partyWinning,justiceName,majority,final_vote
85686,08-678,10/5/2009,0.0,SSotomayor,2.0,0.0
85695,08-351,10/14/2009,1.0,SSotomayor,2.0,1.0


In [78]:
text_df_sotomayor.head(2)

Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text
0,15-513,SULLIVAN,SINGH,but the government starts again the judgm...,so well lets get to that okay because do we...
1,15-1191,KNEEDLER,BROOME,the problem is with the exception thats been...,im sorry if we leveled up how would that aff...


In [79]:
merged_df_sotomayor=text_df_sotomayor.merge(vote_df_sotomayor, on='case_num')
print(merged_df_sotomayor.justiceName.unique())
print(merged_df_sotomayor.shape)
merged_df_sotomayor.head()

['SSotomayor']
(582, 10)


Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text,dateArgument,partyWinning,justiceName,majority,final_vote
0,15-513,SULLIVAN,SINGH,but the government starts again the judgm...,so well lets get to that okay because do we...,11/1/2016,0.0,SSotomayor,2.0,0.0
1,15-1191,KNEEDLER,BROOME,the problem is with the exception thats been...,im sorry if we leveled up how would that aff...,11/9/2016,1.0,SSotomayor,2.0,1.0
2,15-423,STETSON,CARROLL,it just see a little bit like an academic ex...,lets go back to justice alitos question why...,11/2/2016,1.0,SSotomayor,2.0,1.0
3,15-866,BURSCH,JAY,but they have put it on those other ite that...,does the university that contracts with you ...,10/31/2016,0.0,SSotomayor,2.0,0.0
4,15-497,BAGENSTOS,KATYAL,martinez that begs the last question that w...,can you tell us what page thats on no i k...,10/31/2016,1.0,SSotomayor,2.0,1.0


In [80]:
merged_df_sotomayor.isna().sum()

case_num        0
pet_name        0
res_name        0
pet_text        0
res_text        0
dateArgument    0
partyWinning    0
justiceName     0
majority        0
final_vote      2
dtype: int64

In [81]:
merged_df_sotomayor = merged_df_sotomayor.dropna(subset = ['final_vote'])
print(merged_df_sotomayor.shape)
merged_df_sotomayor.final_vote.value_counts()

(580, 10)


1.0    348
0.0    232
Name: final_vote, dtype: int64

In [82]:
merged_df_sotomayor.to_pickle("df_merged_sotomayor.pkl")

#### Breyer:

In [83]:
vote_df_breyer = vote_df[vote_df.justiceName=='SGBreyer']
text_df_breyer = pd.read_pickle("df_breyer.pkl")
merged_df_breyer = text_df_breyer.merge(vote_df_breyer, on='case_num')

print(merged_df_breyer.justiceName.unique())
print(merged_df_breyer.shape)
print(merged_df_breyer.isna().sum())
merged_df_breyer.head()

['SGBreyer']
(794, 10)
case_num        0
pet_name        0
res_name        0
pet_text        0
res_text        0
dateArgument    1
partyWinning    0
justiceName     0
majority        0
final_vote      2
dtype: int64


Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text,dateArgument,partyWinning,justiceName,majority,final_vote
0,15-513,SULLIVAN,SINGH,well thats not this question i had the same ...,do you have just on the top of your head som...,11/1/2016,0.0,SGBreyer,2.0,0.0
1,15-1191,KNEEDLER,BROOME,theres a lot of complicated things but the q...,on this but i would say at some point the p...,11/9/2016,1.0,SGBreyer,2.0,1.0
2,15-423,STETSON,CARROLL,so how does it work with diversity we have a...,i dont know that thats a different issue i ...,11/2/2016,1.0,SGBreyer,2.0,1.0
3,15-866,BURSCH,JAY,forget the special things i have a picture o...,but the point the question is i think if i ...,10/31/2016,0.0,SGBreyer,1.0,1.0
4,15-497,BAGENSTOS,KATYAL,thats true but you could find i mean i...,i see that i see this is what where where ...,10/31/2016,1.0,SGBreyer,2.0,1.0


In [84]:
merged_df_breyer = merged_df_breyer.dropna(subset = ['final_vote'])
print(merged_df_breyer.shape)
merged_df_breyer.final_vote.value_counts()

(792, 10)


1.0    488
0.0    304
Name: final_vote, dtype: int64

In [85]:
merged_df_breyer.to_pickle("df_merged_breyer.pkl")

#### Kavanaugh

In [86]:
vote_df_kavanaugh = vote_df[vote_df.justiceName=='BMKavanaugh']
text_df_kavanaugh = pd.read_pickle("df_kavanaugh.pkl")
merged_df_kavanaugh = text_df_kavanaugh.merge(vote_df_kavanaugh, on='case_num')

print(merged_df_kavanaugh.justiceName.unique())
print(merged_df_kavanaugh.shape)
print(merged_df_kavanaugh.isna().sum())
merged_df_kavanaugh.head()

['BMKavanaugh']
(94, 10)
case_num        0
pet_name        0
res_name        0
pet_text        0
res_text        0
dateArgument    0
partyWinning    0
justiceName     0
majority        0
final_vote      2
dtype: int64


Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text,dateArgument,partyWinning,justiceName,majority,final_vote
0,16-1363,TRIPP,WANG,if if if reasonable amount of time jus...,is that different from heritage reporting c...,10/10/2018,1.0,BMKavanaugh,2.0,1.0
1,17-1104,DVORETZKY,GOLDSTEIN,why are too many warnings bad why is that...,so you just changed the answer okay ju...,10/10/2018,0.0,BMKavanaugh,2.0,0.0
2,17-647,BREEMER,SACHS,,,10/3/2018,1.0,BMKavanaugh,2.0,1.0
3,17-5554,BRYN,LIU,but but counsel counsel in curtis johnson ...,but but curtis johnson says substantial deg...,10/9/2018,0.0,BMKavanaugh,2.0,0.0
4,18-481,YOUNG,LOEB,how about the when can when can it be de...,you seem to be making two distinct arguments...,4/22/2019,1.0,BMKavanaugh,2.0,1.0


In [87]:
merged_df_kavanaugh = merged_df_kavanaugh.dropna(subset = ['final_vote'])
print(merged_df_kavanaugh.shape)
merged_df_kavanaugh.final_vote.value_counts()

(92, 10)


1.0    54
0.0    38
Name: final_vote, dtype: int64

In [88]:
merged_df_kavanaugh.to_pickle("df_merged_kavanaugh.pkl")

#### Roberts

In [89]:
vote_df_roberts = vote_df[vote_df.justiceName=='JGRoberts']
text_df_roberts = pd.read_pickle("df_roberts.pkl")
merged_df_roberts = text_df_roberts.merge(vote_df_roberts, on='case_num')

print(merged_df_roberts.justiceName.unique())
print(merged_df_roberts.shape)
print(merged_df_roberts.isna().sum())
merged_df_roberts.head()

['JGRoberts']
(754, 10)
case_num        0
pet_name        0
res_name        0
pet_text        0
res_text        0
dateArgument    1
partyWinning    0
justiceName     0
majority        0
final_vote      2
dtype: int64


Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text,dateArgument,partyWinning,justiceName,majority,final_vote
0,15-513,SULLIVAN,SINGH,so youre arguing youre arguing the governme...,i think that it honestly asks very little of...,11/1/2016,0.0,JGRoberts,2.0,0.0
1,15-1191,KNEEDLER,BROOME,is that an argument we heard much about in t...,well but i mean that argument see to me that...,11/9/2016,1.0,JGRoberts,2.0,1.0
2,15-423,STETSON,CARROLL,justice breyer thank you counsel goldenbe...,thats not the outset thats a question of exe...,11/2/2016,1.0,JGRoberts,2.0,1.0
3,15-866,BURSCH,JAY,so i guess im not sure about your does that...,what do you do about the camouflage case ...,10/31/2016,0.0,JGRoberts,2.0,0.0
4,15-497,BAGENSTOS,KATYAL,i i understand youd be making two arguments...,thats boilerplate that may or may not be sig...,10/31/2016,1.0,JGRoberts,2.0,1.0


In [90]:
merged_df_roberts = merged_df_roberts.dropna(subset = ['final_vote'])
print(merged_df_roberts.shape)
merged_df_roberts.final_vote.value_counts()

(752, 10)


1.0    488
0.0    264
Name: final_vote, dtype: int64

In [91]:
merged_df_roberts.to_pickle("df_merged_roberts.pkl")

#### Alito

In [92]:
vote_df_alito = vote_df[vote_df.justiceName=='SAAlito']
text_df_alito = pd.read_pickle("df_alito.pkl")
merged_df_alito = text_df_alito.merge(vote_df_alito, on='case_num')

print(merged_df_alito.justiceName.unique())
print(merged_df_alito.shape)
print(merged_df_alito.isna().sum())
merged_df_alito.head()

['SAAlito']
(740, 10)
case_num        0
pet_name        0
res_name        0
pet_text        0
res_text        0
dateArgument    1
partyWinning    0
justiceName     0
majority        0
final_vote      2
dtype: int64


Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text,dateArgument,partyWinning,justiceName,majority,final_vote
0,15-513,SULLIVAN,SINGH,well you have two arguments one is that dism...,well if we issue an opinion when we issue a...,11/1/2016,0.0,SAAlito,2.0,0.0
1,15-1191,KNEEDLER,BROOME,kneedler can i ask you this question if the...,well that is true but isnt it something else...,11/9/2016,1.0,SAAlito,2.0,1.0
2,15-423,STETSON,CARROLL,i dont i dont quite understand that why wou...,the basis of that what do you think you woul...,11/2/2016,1.0,SAAlito,2.0,1.0
3,15-866,BURSCH,JAY,,as to the as to the surface design i have a...,10/31/2016,0.0,SAAlito,2.0,0.0
4,15-497,BAGENSTOS,KATYAL,what would happen if the claim was that the ...,,10/31/2016,1.0,SAAlito,2.0,1.0


In [93]:
merged_df_alito = merged_df_alito.dropna(subset = ['final_vote'])
print(merged_df_alito.shape)
merged_df_alito.final_vote.value_counts()

(738, 10)


1.0    446
0.0    292
Name: final_vote, dtype: int64

In [94]:
merged_df_alito.to_pickle("df_merged_alito.pkl")

#### Gorsuch

In [95]:
vote_df_gorsuch = vote_df[vote_df.justiceName=='NMGorsuch']
text_df_gorsuch = pd.read_pickle("df_gorsuch.pkl")
merged_df_gorsuch = text_df_gorsuch.merge(vote_df_gorsuch, on='case_num')

print(merged_df_gorsuch.justiceName.unique())
print(merged_df_gorsuch.shape)
print(merged_df_gorsuch.isna().sum())
merged_df_gorsuch.head()

['NMGorsuch']
(156, 10)
case_num        0
pet_name        0
res_name        0
pet_text        0
res_text        0
dateArgument    0
partyWinning    0
justiceName     0
majority        0
final_vote      2
dtype: int64


Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text,dateArgument,partyWinning,justiceName,majority,final_vote
0,15-1498,KNEEDLER,ROSENKRANZ,,,1/17/2017,0.0,NMGorsuch,2.0,0.0
1,15-1498,KNEEDLER,ROSENKRANZ,kneedler may i may i ask you just a couple...,lets lets say we dont think jordan decided ...,1/17/2017,0.0,NMGorsuch,2.0,0.0
2,15-1204,GERSHENGORN,ARULANANTHAM,,,11/30/2016,1.0,NMGorsuch,2.0,1.0
3,15-1204,STEWART,ARULANANTHAM,,counsel can you help me you know what im...,11/30/2016,1.0,NMGorsuch,2.0,1.0
4,16-5294,BRIGHT,BRASHER,but counsel if we could just follow up on th...,brasher general brasher one piece of ev...,4/24/2017,1.0,NMGorsuch,1.0,0.0


In [96]:
merged_df_gorsuch = merged_df_gorsuch.dropna(subset = ['final_vote'])
print(merged_df_gorsuch.shape)
merged_df_gorsuch.final_vote.value_counts()

(154, 10)


1.0    92
0.0    62
Name: final_vote, dtype: int64

In [97]:
merged_df_gorsuch.to_pickle("df_merged_gorsuch.pkl")

#### Thomas

In [98]:
vote_df_thomas = vote_df[vote_df.justiceName=='CThomas']
text_df_thomas = pd.read_pickle("df_thomas.pkl")
merged_df_thomas = text_df_thomas.merge(vote_df_thomas, on='case_num')

print(merged_df_thomas.justiceName.unique())
print(merged_df_thomas.shape)
print(merged_df_thomas.isna().sum())
merged_df_thomas.head()

['CThomas']
(798, 10)
case_num        0
pet_name        0
res_name        0
pet_text        0
res_text        0
dateArgument    1
partyWinning    0
justiceName     0
majority        0
final_vote      2
dtype: int64


Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text,dateArgument,partyWinning,justiceName,majority,final_vote
0,15-513,SULLIVAN,SINGH,,,11/1/2016,0.0,CThomas,2.0,0.0
1,15-1191,KNEEDLER,BROOME,,,11/9/2016,1.0,CThomas,2.0,1.0
2,15-423,STETSON,CARROLL,,,11/2/2016,1.0,CThomas,2.0,1.0
3,15-866,BURSCH,JAY,,,10/31/2016,0.0,CThomas,2.0,0.0
4,15-497,BAGENSTOS,KATYAL,,,10/31/2016,1.0,CThomas,2.0,1.0


In [99]:
merged_df_thomas = merged_df_thomas.dropna(subset = ['final_vote'])
print(merged_df_thomas.shape)
merged_df_thomas.final_vote.value_counts()

(796, 10)


1.0    462
0.0    334
Name: final_vote, dtype: int64

In [100]:
merged_df_thomas.to_pickle("df_merged_thomas.pkl")

#### Kagan

In [101]:
vote_df_kagan = vote_df[vote_df.justiceName=='EKagan']
text_df_kagan = pd.read_pickle("df_kagan.pkl")
merged_df_kagan = text_df_kagan.merge(vote_df_kagan, on='case_num')

print(merged_df_kagan.justiceName.unique())
print(merged_df_kagan.shape)
print(merged_df_kagan.isna().sum())
merged_df_kagan.head()

['EKagan']
(488, 10)
case_num        0
pet_name        0
res_name        0
pet_text        0
res_text        0
dateArgument    0
partyWinning    0
justiceName     0
majority        0
final_vote      2
dtype: int64


Unnamed: 0,case_num,pet_name,res_name,pet_text,res_text,dateArgument,partyWinning,justiceName,majority,final_vote
0,15-513,SULLIVAN,SINGH,i please sorry no please you mentioned...,do you think deference ought to be given to ...,11/1/2016,0.0,EKagan,2.0,0.0
1,15-1191,KNEEDLER,BROOME,but why do we look kneedler to the moment ...,i mean the problem isnt it broome that the ...,11/9/2016,1.0,EKagan,2.0,1.0
2,15-423,STETSON,CARROLL,well the section i mean the question here i...,and you say that for the commercial nexus ri...,11/2/2016,1.0,EKagan,2.0,1.0
3,15-866,BURSCH,JAY,how is your argument different from this tux...,jay see jay can i and isnt it just ...,10/31/2016,0.0,EKagan,2.0,0.0
4,15-497,BAGENSTOS,KATYAL,could could i ask about that bagenstos ...,but the declaration was that the ada had bee...,10/31/2016,1.0,EKagan,2.0,1.0


In [102]:
merged_df_kagan = merged_df_kagan.dropna(subset = ['final_vote'])
print(merged_df_kagan.shape)
merged_df_kagan.final_vote.value_counts()

(486, 10)


1.0    303
0.0    183
Name: final_vote, dtype: int64

In [103]:
merged_df_kagan.to_pickle("df_merged_kagan.pkl")