# Judicial Voter Data Analysis V3

### Last updated: 5/8/22

Notes on data:
- Only retention races
- Exclude subcircuit races (shouldn't be any subcircuit races in retention races)

Metrics calcuated:

A. Percentage of people who voted in at least one judicial race: the percentage of votes out of total ballots cast in the judicial race with the highest total votes. This assumes that the total number of votes in the race with the highest votes is equivalent to the number of people who voted in judicial elections. It's possible that someone skipped the highest vote race and voted in another, but we're assuming that's a very small portion of voters, if at all.

(working note: need to see a ranking of top 5 races with the most votes by ward, and see if it's generally the same across wards...or a distribution)

B. Percentage of people who didn't vote in any judicial race: 100% minus A

C. Percentage of judicial voters out of registrered voters.

D. Percentage of regiestered voters who didn't vote in any judicial races.

Google sheets is here: https://docs.google.com/spreadsheets/d/1cR_HXbwe4G9WpkGQxl8u21jGGdNDY_1leo2BQ1P53FQ/edit?usp=sharing

In [70]:
#Check my working directory
!pwd

/Users/amy/Code/injustice_watch/analysis


In [1]:
#Import other packages for analysis
import pandas as pd
from openpyxl import Workbook #used to import .xlsx files
from openpyxl import load_workbook #used to import .xlsx files
import numpy as np
#import gspread

### Define Data Cleaning Functions

Finds and tags retention races 

In [6]:
#Returns the df with an added row called 'Retention'
def tag_retention(df):
    def retention(a):
        if 'No' in a:
            return 'Retention'
        elif 'Yes' in a:
            return 'Retention'
        else:
            return 'Not Retention'
    
    df['Retention'] = df['CANDIDATE'].apply(lambda x: retention(x))
    
    return df

In [7]:
#finds and tags subcircuits
def tag_subcircuit(df):
    def subcircuit(a):
        if 'Subcircuit' in a:
            return 'Subcircuit'
        elif 'Sub' in a: #accounts for 2006 notation
            return 'Subcircuit'
        else:
            return 'Not Subcircuit'
   
    df['Subcircuit'] = df['RACE'].apply(lambda x: subcircuit(x))
    
    return df

In [8]:
#Pass a string pathname
#Returns a pandas df
def excel_to_df(pathname): 
    wb = load_workbook(pathname)
    ws = wb.active
    data = ws.values
    columns = next(data)[0:] #Gets the first line in the file as a header line
    df = pd.DataFrame(data, columns=columns)
    
    return df

## 2020 General - Citywide

In [74]:
#load data into pandas
wb = load_workbook('../Judicial General Data/judicial_general_2020.xlsx')
ws = wb.active
data = ws.values
columns = next(data)[0:] #Gets the first line in the file as a header line
df = pd.DataFrame(data, columns=columns)

In [75]:
#check the total number of rows = 7,505
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7505 entries, 0 to 7504
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   WARD               7505 non-null   float64
 1   REGISTERED VOTERS  7505 non-null   float64
 2   BALLOTS CAST       7505 non-null   float64
 3   RACE               7505 non-null   object 
 4   CANDIDATE          7505 non-null   object 
 5   VOTES              7505 non-null   float64
dtypes: float64(4), object(2)
memory usage: 351.9+ KB


In [76]:
df.head(20)

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,RACE,CANDIDATE,VOTES
0,1.0,38017.0,30731.0,Abbey Fishman Romanek,No,2839.0
1,1.0,38017.0,30731.0,Abbey Fishman Romanek,Yes,17794.0
2,1.0,38017.0,30731.0,Andrea M. Buford,No,2683.0
3,1.0,38017.0,30731.0,Andrea M. Buford,Yes,17950.0
4,1.0,38017.0,30731.0,Anjana Hansen,No,3801.0
5,1.0,38017.0,30731.0,Anjana Hansen,Yes,16603.0
6,1.0,38017.0,30731.0,Ann Collins-Dole,No,2789.0
7,1.0,38017.0,30731.0,Ann Collins-Dole,Yes,17806.0
8,1.0,38017.0,30731.0,Anna Helen Demacopoulos,No,7004.0
9,1.0,38017.0,30731.0,Anna Helen Demacopoulos,Yes,14045.0


In [77]:
#clean data
#remove ballot measures 
#total number of rows = 7,187; 318 ballot measures removed
wb = load_workbook('../Judicial General Data/ballot_measures.xlsx')
ws = wb.active
data = ws.values
columns = next(data)[0:] #Gets the first line in the file as a header line
ballot_measures = pd.DataFrame(data, columns=columns)

clean_df = df[~df.RACE.isin(ballot_measures.RACE)]

clean_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7187 entries, 0 to 7504
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   WARD               7187 non-null   float64
 1   REGISTERED VOTERS  7187 non-null   float64
 2   BALLOTS CAST       7187 non-null   float64
 3   RACE               7187 non-null   object 
 4   CANDIDATE          7187 non-null   object 
 5   VOTES              7187 non-null   float64
dtypes: float64(4), object(2)
memory usage: 393.0+ KB


In [78]:
clean_df.tail(20)

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,RACE,CANDIDATE,VOTES
7485,50.0,29522.0,21384.0,Raul Vega,Yes,9975.0
7486,50.0,29522.0,21384.0,Robert D. Kuzas,No,3189.0
7487,50.0,29522.0,21384.0,Robert D. Kuzas,Yes,9131.0
7488,50.0,29522.0,21384.0,Robert E. Gordon,No,2435.0
7489,50.0,29522.0,21384.0,Robert E. Gordon,Yes,10468.0
7490,50.0,29522.0,21384.0,Shelley Lynn Sutker-Dermer,No,2714.0
7491,50.0,29522.0,21384.0,Shelley Lynn Sutker-Dermer,Yes,10317.0
7492,50.0,29522.0,21384.0,Steven G. Watkins,No,2571.0
7493,50.0,29522.0,21384.0,Steven G. Watkins,Yes,9824.0
7494,50.0,29522.0,21384.0,Supreme Court Judge (Vacancy of Freeman),"P. Scott Neville, Jr.",14254.0


In [79]:
#remove non-retention races
#total number of rows = 6,200
clean_df = tag_retention(clean_df)
clean_df = clean_df[clean_df['Retention'] == 'Retention']
clean_df.info()
clean_df.head(20)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6200 entries, 0 to 7504
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   WARD               6200 non-null   float64
 1   REGISTERED VOTERS  6200 non-null   float64
 2   BALLOTS CAST       6200 non-null   float64
 3   RACE               6200 non-null   object 
 4   CANDIDATE          6200 non-null   object 
 5   VOTES              6200 non-null   float64
 6   Retention          6200 non-null   object 
dtypes: float64(4), object(3)
memory usage: 387.5+ KB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Retention'] = df['CANDIDATE'].apply(lambda x: retention(x))


Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,RACE,CANDIDATE,VOTES,Retention
0,1.0,38017.0,30731.0,Abbey Fishman Romanek,No,2839.0,Retention
1,1.0,38017.0,30731.0,Abbey Fishman Romanek,Yes,17794.0,Retention
2,1.0,38017.0,30731.0,Andrea M. Buford,No,2683.0,Retention
3,1.0,38017.0,30731.0,Andrea M. Buford,Yes,17950.0,Retention
4,1.0,38017.0,30731.0,Anjana Hansen,No,3801.0,Retention
5,1.0,38017.0,30731.0,Anjana Hansen,Yes,16603.0,Retention
6,1.0,38017.0,30731.0,Ann Collins-Dole,No,2789.0,Retention
7,1.0,38017.0,30731.0,Ann Collins-Dole,Yes,17806.0,Retention
8,1.0,38017.0,30731.0,Anna Helen Demacopoulos,No,7004.0,Retention
9,1.0,38017.0,30731.0,Anna Helen Demacopoulos,Yes,14045.0,Retention


In [80]:
#export to csv to check in excel
clean_df.to_csv('Test.csv')

In [81]:
#group by ward AND race
#should be 3,100 or half of the number of rows in clean_df
#make sure registered voters, ballots cast are the same and not summed up
grouped = clean_df.groupby(['WARD','RACE'], as_index=False).agg({'WARD': 'first','REGISTERED VOTERS': 'first',
                                                     'BALLOTS CAST': 'first','VOTES': 'sum'}).sort_values('VOTES',ascending=False)
grouped

Unnamed: 0,RACE,WARD,REGISTERED VOTERS,BALLOTS CAST,VOTES
2896,Michael P. Toomin,47.0,40767.0,36503.0,28918.0
2893,Mauricio Araujo,47.0,40767.0,36503.0,28241.0
2872,Jackie Marie Portman-Brown,47.0,40767.0,36503.0,27914.0
2858,Aurelia Marie Pucinski,47.0,40767.0,36503.0,27807.0
2882,Kenneth J. Wadas,47.0,40767.0,36503.0,27427.0
...,...,...,...,...,...
895,John J. Mahoney,15.0,18597.0,10086.0,7087.0
875,Bridget Anne Mitchell,15.0,18597.0,10086.0,7086.0
921,Robert D. Kuzas,15.0,18597.0,10086.0,7070.0
925,Terrence J. McGuire,15.0,18597.0,10086.0,7054.0


In [82]:
#export to csv to handcheck
grouped.to_csv('grouped.csv')

In [82]:
#group by race only and show the top 20 races with the most number of votes
#Aurella Marie Pucinski is the race with the most number of votes = 884,434
by_race = clean_df.groupby('RACE').sum().sort_values('VOTES',ascending=False)
by_race.head(20)

Unnamed: 0_level_0,WARD,REGISTERED VOTERS,BALLOTS CAST,VOTES
RACE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aurelia Marie Pucinski,2550.0,3168586.0,2321986.0,884434.0
Michael P. Toomin,2550.0,3168586.0,2321986.0,871882.0
Mauricio Araujo,2550.0,3168586.0,2321986.0,828503.0
Jackie Marie Portman-Brown,2550.0,3168586.0,2321986.0,825928.0
"James Patrick Flannery, Jr.",2550.0,3168586.0,2321986.0,824548.0
Mary Ellen Coghlan,2550.0,3168586.0,2321986.0,819483.0
Kenneth J. Wadas,2550.0,3168586.0,2321986.0,817770.0
Patricia Manila Martin,2550.0,3168586.0,2321986.0,816891.0
Mary Katherine Rochford,2550.0,3168586.0,2321986.0,816618.0
Shelley Lynn Sutker-Dermer,2550.0,3168586.0,2321986.0,814021.0


## Citywide Participation - 2020 General

In [83]:
#find total number of ballots cast and registered voters
#first groupby
by_ward = clean_df.groupby('WARD').agg({'REGISTERED VOTERS':'first','BALLOTS CAST':'first'})
by_ward

Unnamed: 0_level_0,REGISTERED VOTERS,BALLOTS CAST
WARD,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,38017.0,30731.0
2.0,40366.0,34147.0
3.0,38027.0,28573.0
4.0,34870.0,28567.0
5.0,30166.0,24275.0
6.0,31173.0,20236.0
7.0,31134.0,20337.0
8.0,36234.0,24776.0
9.0,34876.0,22021.0
10.0,26971.0,16218.0


In [84]:
#total number of registered voters
by_ward['REGISTERED VOTERS'].sum()

1584293.0

In [85]:
#total number of ballots cast
by_ward['BALLOTS CAST'].sum()

1160993.0

In [86]:
#overall voter turnout
by_ward['BALLOTS CAST'].sum()/by_ward['REGISTERED VOTERS'].sum()

0.7328145740718415

In [87]:
by_race['VOTES'].max()

884434.0

In [88]:
#number of votes in the race with the highest votes = 884,434 is Aurelia Marie Pucinski
#calculate participation rate 
participation = 884434/1160993
participation

0.761790984097234

In [89]:
#calculate judicial turnout rate
judicial_turnout = 884434/1584293
judicial_turnout

0.5582515355429836

## Citywide Participation 2006-2020

In [9]:
judicial_list = ['/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2006.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2008.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2010.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2012.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2014.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2016.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2018.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2020.xlsx']

In [10]:
participation_list = []
jud_turnout_list = []
turnout_list = []
years_list = ['2006','2008','2010','2012','2014','2016','2018','2020']
ballot_measures_df = excel_to_df('/Users/amy/Code/injustice_watch/Judicial General Data/ballot_measures.xlsx')

index = 0

for pathname in judicial_list:
    print(index)
    df = excel_to_df(pathname)
    df = tag_retention(df)
    clean_df = df[df['Retention'] == 'Retention']
    clean_df = clean_df[~clean_df.RACE.isin(ballot_measures_df.RACE)]
    
    #find race w most votes
    by_race = clean_df.groupby('RACE').sum().sort_values('VOTES',ascending=False)
    print(by_race.head(3))
    max_votes = by_race['VOTES'].max()
    print('max votes')
    print(max_votes)
    
    #find denominators
    by_ward = clean_df.groupby('WARD').agg({'REGISTERED VOTERS':'first','BALLOTS CAST':'first'})
    registered = by_ward['REGISTERED VOTERS'].sum()
    ballots = by_ward['BALLOTS CAST'].sum()
    
    #participation
    participation = max_votes/ballots
    participation_list.append(participation)
    
    #judicial_turnout
    judicial_turnout = max_votes/registered
    jud_turnout_list.append(judicial_turnout)
    
    #voter turnout
    turnout = ballots/registered
    turnout_list.append(turnout)
    
    index+=1
    

0
                     WARD  REGISTERED VOTERS  BALLOTS CAST     VOTES
RACE                                                                
Patrick J. Quinn   2550.0          2721494.0     1340444.0  436438.0
Warren D. Wolfson  2550.0          2721494.0     1340444.0  392230.0
Kathy M. Flanagan  2550.0          2721494.0     1340444.0  389768.0
max votes
436438.0
1
                        WARD  REGISTERED VOTERS  BALLOTS CAST     VOTES
RACE                                                                   
Michael J. Gallagher  2550.0          2994584.0     2211996.0  643592.0
Thomas E. Flanagan    2550.0          2994584.0     2211996.0  633698.0
Richard J. Elrod      2550.0          2994584.0     2211996.0  613793.0
max votes
643592.0
2
                        WARD  REGISTERED VOTERS  BALLOTS CAST     VOTES
RACE                                                                   
Charles E. Freeman    2550.0          2669614.0     1411738.0  463055.0
Thomas R. Fitzgerald  2550.0       

In [92]:
#check 2020
participation_list

[0.6511842344775314,
 0.581910636366431,
 0.6560069927989471,
 0.6037546045661746,
 0.6530695339900873,
 0.6272166171894047,
 0.7400612459035086,
 0.761790984097234]

In [93]:
jud_turnout_list

[0.3207341261821632,
 0.4298373329985066,
 0.34690783012075904,
 0.45529038655908105,
 0.3187473149861037,
 0.445558789427002,
 0.44898370509121943,
 0.5582515355429836]

In [94]:
turnout_list

[0.49253975941155853,
 0.7386655375170641,
 0.5288172747071299,
 0.7540984087172771,
 0.4880756158362485,
 0.7103746572014907,
 0.6066845245261758,
 0.7328145740718415]

In [95]:
#save into a dictionary and convert to df
dict_to_df = {
    'Year':years_list,
    'Participation':participation_list,
    'Judicial Turnout':jud_turnout_list,
    'Voter Turnout':turnout_list,
}

summary_df = pd.DataFrame(dict_to_df)
summary_df

Unnamed: 0,Year,Participation,Judicial Turnout,Voter Turnout
0,2006,0.651184,0.320734,0.49254
1,2008,0.581911,0.429837,0.738666
2,2010,0.656007,0.346908,0.528817
3,2012,0.603755,0.45529,0.754098
4,2014,0.65307,0.318747,0.488076
5,2016,0.627217,0.445559,0.710375
6,2018,0.740061,0.448984,0.606685
7,2020,0.761791,0.558252,0.732815


## Ward by Ward Participation 2020

To calculate ward by ward particpation, need to find the race with the highest particpation for each ward (which varies from ward to ward) because that's what would actually measure the percentage of people in that ward who likely participated in a judicial race.

Tutorials used

To find max of each ward w/ transform.max: https://stackoverflow.com/questions/43524549/pandas-groupby-transform-max-with-filter

In [96]:
#will need to start with the grouped by race dataframe
grouped.head(10)

Unnamed: 0,RACE,WARD,REGISTERED VOTERS,BALLOTS CAST,VOTES
2896,Michael P. Toomin,47.0,40767.0,36503.0,28918.0
2893,Mauricio Araujo,47.0,40767.0,36503.0,28241.0
2872,Jackie Marie Portman-Brown,47.0,40767.0,36503.0,27914.0
2858,Aurelia Marie Pucinski,47.0,40767.0,36503.0,27807.0
2882,Kenneth J. Wadas,47.0,40767.0,36503.0,27427.0
2856,Anna Helen Demacopoulos,47.0,40767.0,36503.0,27106.0
2867,Diana Rosario,47.0,40767.0,36503.0,27031.0
2879,John J. Mahoney,47.0,40767.0,36503.0,27003.0
2891,Mary Katherine Rochford,47.0,40767.0,36503.0,26979.0
2899,Patricia Manila Martin,47.0,40767.0,36503.0,26923.0


In [97]:
#find max votes in each ward
grouped['max_votes'] = grouped.groupby(['WARD'])['VOTES'].transform(max)
grouped.head()

Unnamed: 0,RACE,WARD,REGISTERED VOTERS,BALLOTS CAST,VOTES,max_votes
2896,Michael P. Toomin,47.0,40767.0,36503.0,28918.0,28918.0
2893,Mauricio Araujo,47.0,40767.0,36503.0,28241.0,28918.0
2872,Jackie Marie Portman-Brown,47.0,40767.0,36503.0,27914.0,28918.0
2858,Aurelia Marie Pucinski,47.0,40767.0,36503.0,27807.0,28918.0
2882,Kenneth J. Wadas,47.0,40767.0,36503.0,27427.0,28918.0


In [98]:
#See if max_votes is the same in a ward
grouped.loc[test['WARD'] == 48.0]

Unnamed: 0,RACE,WARD,REGISTERED VOTERS,BALLOTS CAST,VOTES,max_votes
2958,Michael P. Toomin,48.0,33825.0,29094.0,22296.0,22296.0
2920,Aurelia Marie Pucinski,48.0,33825.0,29094.0,21992.0,22296.0
2955,Mauricio Araujo,48.0,33825.0,29094.0,21560.0,22296.0
2934,Jackie Marie Portman-Brown,48.0,33825.0,29094.0,21388.0,22296.0
2944,Kenneth J. Wadas,48.0,33825.0,29094.0,21059.0,22296.0
...,...,...,...,...,...,...
2932,Edward A. Arce,48.0,33825.0,29094.0,20195.0,22296.0
2967,Robert D. Kuzas,48.0,33825.0,29094.0,20115.0,22296.0
2921,Bridget Anne Mitchell,48.0,33825.0,29094.0,20088.0,22296.0
2937,James Paul Pieczonka,48.0,33825.0,29094.0,20079.0,22296.0


In [99]:
#consolidate by ward
data_by_ward = grouped.groupby(['WARD'], as_index=False).agg({'WARD': 'first','REGISTERED VOTERS': 'first',
                                                     'BALLOTS CAST': 'first','max_votes':'first'})

In [100]:
data_by_ward

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,max_votes
0,1.0,38017.0,30731.0,22563.0
1,2.0,40366.0,34147.0,24038.0
2,3.0,38027.0,28573.0,21626.0
3,4.0,34870.0,28567.0,22556.0
4,5.0,30166.0,24275.0,18787.0
5,6.0,31173.0,20236.0,16085.0
6,7.0,31134.0,20337.0,16121.0
7,8.0,36234.0,24776.0,19573.0
8,9.0,34876.0,22021.0,17335.0
9,10.0,26971.0,16218.0,13048.0


In [101]:
data_by_ward['participation'] = data_by_ward['max_votes']/data_by_ward['BALLOTS CAST']

In [102]:
data_by_ward

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,max_votes,participation
0,1.0,38017.0,30731.0,22563.0,0.73421
1,2.0,40366.0,34147.0,24038.0,0.703956
2,3.0,38027.0,28573.0,21626.0,0.756868
3,4.0,34870.0,28567.0,22556.0,0.789582
4,5.0,30166.0,24275.0,18787.0,0.773924
5,6.0,31173.0,20236.0,16085.0,0.794871
6,7.0,31134.0,20337.0,16121.0,0.792693
7,8.0,36234.0,24776.0,19573.0,0.789998
8,9.0,34876.0,22021.0,17335.0,0.787203
9,10.0,26971.0,16218.0,13048.0,0.804538


In [103]:
data_by_ward['judicial turnout'] = data_by_ward['max_votes']/data_by_ward['REGISTERED VOTERS']
data_by_ward['turnout'] = data_by_ward['BALLOTS CAST']/data_by_ward['REGISTERED VOTERS']
data_by_ward

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,max_votes,participation,judicial turnout,turnout
0,1.0,38017.0,30731.0,22563.0,0.73421,0.593498,0.808349
1,2.0,40366.0,34147.0,24038.0,0.703956,0.595501,0.845935
2,3.0,38027.0,28573.0,21626.0,0.756868,0.568701,0.751387
3,4.0,34870.0,28567.0,22556.0,0.789582,0.64686,0.819243
4,5.0,30166.0,24275.0,18787.0,0.773924,0.622787,0.804714
5,6.0,31173.0,20236.0,16085.0,0.794871,0.515991,0.649152
6,7.0,31134.0,20337.0,16121.0,0.792693,0.517794,0.653209
7,8.0,36234.0,24776.0,19573.0,0.789998,0.540183,0.683778
8,9.0,34876.0,22021.0,17335.0,0.787203,0.497047,0.631408
9,10.0,26971.0,16218.0,13048.0,0.804538,0.483779,0.601313


## Ward by Ward Participation 2006-2020

In [16]:
judicial_list = ['/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2006.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2008.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2010.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2012.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2014.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2016.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2018.xlsx',
                '/Users/amy/Code/injustice_watch/Judicial General Data/judicial_general_2020.xlsx']

In [17]:
all_df_dict = {}
years_list = ['2006','2008','2010','2012','2014','2016','2018','2020']
ballot_measures_df = excel_to_df('/Users/amy/Code/injustice_watch/Judicial General Data/ballot_measures.xlsx')

index = 0

for pathname in judicial_list:
    print(index)
    df = excel_to_df(pathname)
    df = tag_retention(df)
    clean_df = df[df['Retention'] == 'Retention']
    clean_df = clean_df[~clean_df.RACE.isin(ballot_measures_df.RACE)]
    
    #group by race and ward
    grouped = clean_df.groupby(['WARD','RACE'], as_index=False).agg({'WARD': 'first','REGISTERED VOTERS': 'first',
                                                     'BALLOTS CAST': 'first','VOTES': 'sum'}).sort_values('VOTES',ascending=False)
    
    #find max votes in each ward
    grouped['max_votes'] = grouped.groupby(['WARD'])['VOTES'].transform(max)
    
    #consolidate by ward
    data_by_ward = grouped.groupby(['WARD'], as_index=False).agg({'WARD': 'first','REGISTERED VOTERS': 'first',
                                                     'BALLOTS CAST': 'first','max_votes':'first'})
    
    #make calculations
    data_by_ward['participation'] = data_by_ward['max_votes']/data_by_ward['BALLOTS CAST']
    data_by_ward['judicial turnout'] = data_by_ward['max_votes']/data_by_ward['REGISTERED VOTERS']
    data_by_ward['turnout'] = data_by_ward['BALLOTS CAST']/data_by_ward['REGISTERED VOTERS']

    #Add this year's dataframe to the dictionary with the year as the key
    current_year = years_list[index]
    print(current_year)
    all_df_dict[current_year] = data_by_ward
    
    index+=1
    

0
2006
1
2008
2
2010
3
2012
4
2014
5
2016
6
2018
7
2020


In [18]:
#Inspect to make sure it matches with 2020
all_df_dict['2020']

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,max_votes,participation,judicial turnout,turnout
0,1.0,38017.0,30731.0,22563.0,0.73421,0.593498,0.808349
1,2.0,40366.0,34147.0,24038.0,0.703956,0.595501,0.845935
2,3.0,38027.0,28573.0,21626.0,0.756868,0.568701,0.751387
3,4.0,34870.0,28567.0,22556.0,0.789582,0.64686,0.819243
4,5.0,30166.0,24275.0,18787.0,0.773924,0.622787,0.804714
5,6.0,31173.0,20236.0,16085.0,0.794871,0.515991,0.649152
6,7.0,31134.0,20337.0,16121.0,0.792693,0.517794,0.653209
7,8.0,36234.0,24776.0,19573.0,0.789998,0.540183,0.683778
8,9.0,34876.0,22021.0,17335.0,0.787203,0.497047,0.631408
9,10.0,26971.0,16218.0,13048.0,0.804538,0.483779,0.601313


In [106]:
all_df_dict['2006'].head()

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,max_votes,participation,judicial turnout,turnout
0,1.0,27903.0,11412.0,6606.0,0.578864,0.236749,0.408988
1,2.0,31729.0,15202.0,9678.0,0.636627,0.305021,0.47912
2,3.0,22733.0,10294.0,6185.0,0.600835,0.272071,0.452822
3,4.0,27868.0,15682.0,10652.0,0.67925,0.382231,0.562724
4,5.0,28691.0,15011.0,9966.0,0.663913,0.347356,0.523195


In [107]:
all_df_dict['2006'].tail()

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,max_votes,participation,judicial turnout,turnout
45,46.0,28030.0,14932.0,8824.0,0.590946,0.314806,0.532715
46,47.0,27041.0,17224.0,11338.0,0.658268,0.419289,0.636959
47,48.0,27138.0,15499.0,9522.0,0.614362,0.350873,0.571118
48,49.0,20322.0,10701.0,6968.0,0.651154,0.34288,0.526572
49,50.0,23995.0,11959.0,7196.0,0.601723,0.299896,0.498395


In [None]:
#will need to loop through the dictionary and expeort as csv each dataframe value paired with the year key

# 2018 Primary

Run the same citywide and by ward participation calculations but for the primaires, which don't include retention races. Only included contested races, where there is more than one candidate, and exclude subcircuit races so only include countywide races.

Democratice primary only?

https://stackoverflow.com/questions/45752601/how-to-do-a-conditional-count-after-groupby-on-a-pandas-dataframe


In [201]:
#load data into pandas
df = pd.read_csv('../Judicial General Data/Dem Primaries/judicial_democratic_primary_2018.csv')

In [202]:
#inspect
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1922 entries, 0 to 1921
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   WARD               1922 non-null   int64 
 1   REGISTERED VOTERS  1922 non-null   int64 
 2   BALLOTS CAST       1922 non-null   int64 
 3   RACE               1922 non-null   object
 4   CANDIDATE          1922 non-null   object
 5   VOTES              1922 non-null   int64 
dtypes: int64(4), object(2)
memory usage: 90.2+ KB


In [203]:
#remove subcircuits
df = tag_subcircuit(df)
df = df.loc[df['Subcircuit'] != 'Subcircuit']

In [204]:
#check if all subcircuits were removed. number of rows should be 1300
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1300 entries, 8 to 1921
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   WARD               1300 non-null   int64 
 1   REGISTERED VOTERS  1300 non-null   int64 
 2   BALLOTS CAST       1300 non-null   int64 
 3   RACE               1300 non-null   object
 4   CANDIDATE          1300 non-null   object
 5   VOTES              1300 non-null   int64 
 6   Subcircuit         1300 non-null   object
dtypes: int64(4), object(3)
memory usage: 81.2+ KB


In [205]:
#Look at just one ward
#Vacancy of Egan and Dunford are the only uncontested
df[df.WARD == 1]

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,RACE,CANDIDATE,VOTES,Subcircuit
8,1,33159,10237,"Judge, Cook County Circuit (Vac. of Prendergast)",Jack Hagerty,4464,Not Subcircuit
9,1,33159,10237,"Judge, Cook County Circuit (Vac. of Prendergast)",Mable Taylor,2817,Not Subcircuit
10,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Brewer)",John Maher,1009,Not Subcircuit
11,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Brewer)",Kathryn Maloney Vahey,4139,Not Subcircuit
12,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Brewer)",Oran F. Whiting,2569,Not Subcircuit
13,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Clay)",Jonathan Clark Green,1259,Not Subcircuit
14,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Clay)",Kathaleen Theresa Lanahan,3902,Not Subcircuit
15,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Clay)",Lori Ann Roper,1338,Not Subcircuit
16,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Clay)",Michael I. O'Malley,1066,Not Subcircuit
17,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Dooling)",Corri Diane Fetman,2337,Not Subcircuit


In [206]:
#count the number of times a race apperas by ward 
contested_counts = df.groupby(['WARD','RACE'],as_index=False).count()

In [207]:
contested_counts.head()

Unnamed: 0,WARD,RACE,REGISTERED VOTERS,BALLOTS CAST,CANDIDATE,VOTES,Subcircuit
0,1,"Judge, Cook County Circuit (Vac. of Prendergast)",2,2,2,2,2
1,1,"Judge, Cook County Circuit (Vacancy of Brewer)",3,3,3,3,3
2,1,"Judge, Cook County Circuit (Vacancy of Clay)",4,4,4,4,4
3,1,"Judge, Cook County Circuit (Vacancy of Dooling)",3,3,3,3,3
4,1,"Judge, Cook County Circuit (Vacancy of Dunford)",1,1,1,1,1


In [208]:
#find the rows where CANDIDATE count = 1
uncontested = contested_counts.loc[contested_counts['CANDIDATE'] == 1]
uncontested

Unnamed: 0,WARD,RACE,REGISTERED VOTERS,BALLOTS CAST,CANDIDATE,VOTES,Subcircuit
4,1,"Judge, Cook County Circuit (Vacancy of Dunford)",1,1,1,1,1
5,1,"Judge, Cook County Circuit (Vacancy of Egan)",1,1,1,1,1
14,2,"Judge, Cook County Circuit (Vacancy of Dunford)",1,1,1,1,1
15,2,"Judge, Cook County Circuit (Vacancy of Egan)",1,1,1,1,1
24,3,"Judge, Cook County Circuit (Vacancy of Dunford)",1,1,1,1,1
...,...,...,...,...,...,...,...
475,48,"Judge, Cook County Circuit (Vacancy of Egan)",1,1,1,1,1
484,49,"Judge, Cook County Circuit (Vacancy of Dunford)",1,1,1,1,1
485,49,"Judge, Cook County Circuit (Vacancy of Egan)",1,1,1,1,1
494,50,"Judge, Cook County Circuit (Vacancy of Dunford)",1,1,1,1,1


In [209]:
#remove rows in df based on rows in uncontested
#based on this: https://stackoverflow.com/questions/39880627/in-pandas-how-to-delete-rows-from-a-data-frame-based-on-another-data-frame

df = df[~df.RACE.isin(uncontested.RACE)]
#should equal 1300-100 = 1200
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1200 entries, 8 to 1921
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   WARD               1200 non-null   int64 
 1   REGISTERED VOTERS  1200 non-null   int64 
 2   BALLOTS CAST       1200 non-null   int64 
 3   RACE               1200 non-null   object
 4   CANDIDATE          1200 non-null   object
 5   VOTES              1200 non-null   int64 
 6   Subcircuit         1200 non-null   object
dtypes: int64(4), object(3)
memory usage: 75.0+ KB


In [210]:
df.head()

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,RACE,CANDIDATE,VOTES,Subcircuit
8,1,33159,10237,"Judge, Cook County Circuit (Vac. of Prendergast)",Jack Hagerty,4464,Not Subcircuit
9,1,33159,10237,"Judge, Cook County Circuit (Vac. of Prendergast)",Mable Taylor,2817,Not Subcircuit
10,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Brewer)",John Maher,1009,Not Subcircuit
11,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Brewer)",Kathryn Maloney Vahey,4139,Not Subcircuit
12,1,33159,10237,"Judge, Cook County Circuit (Vacancy of Brewer)",Oran F. Whiting,2569,Not Subcircuit


In [213]:
#find max votes for citywide
by_race = df.groupby('RACE').sum().sort_values('VOTES',ascending=False)
by_race.head(10)

Unnamed: 0_level_0,WARD,REGISTERED VOTERS,BALLOTS CAST,VOTES
RACE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"Judge, Cook County Circuit (Vacancy of Brewer)",3825,4482597,1357587,358540
"Judge, Cook County Circuit (Vacancy of Clay)",5100,5976796,1810116,356112
"Judge, Cook County Circuit (Vacancy of Flanagan)",5100,5976796,1810116,348345
"Judge, Cook County Circuit (Vacancy of Dooling)",3825,4482597,1357587,345126
"Judge, Cook County Circuit (Vac. of Prendergast)",2550,2988398,905058,343713
"Judge, Cook County Circuit (Vacancy of Hartigan)",2550,2988398,905058,343316
"Judge, Cook County Circuit (Vacancy of Jordan)",3825,4482597,1357587,342348
"Judge, Cook County Circuit (Vacancy of McGinnis)",3825,4482597,1357587,340666


In [212]:
max_votes = by_race['VOTES'].max()
print(max_votes)

358540


# Primaries citywide 2006-2020

Don't need to tag retention races because there are no retention races in the primaries. Need to add back subcircuits. Did a check by printing out row numbers of clean_df and checking if for 2018 it equaled 1200 after cleaning and by checking to see if the max votes for 2018 = 358540

In [220]:
judicial_primary_list = ['/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2006.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2008.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2010.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2012.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2014.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2016.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2018.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2020.csv']

In [224]:
primary_participation_list = []
primary_jud_turnout_list = []
primary_turnout_list = []
years_list = ['2006','2008','2010','2012','2014','2016','2018','2020']
ballot_measures_df = excel_to_df('/Users/amy/Code/injustice_watch/Judicial General Data/ballot_measures.xlsx')

index = 0

for pathname in judicial_primary_list:
    print(index)
    
    #load data
    df = pd.read_csv(pathname)
    #df = tag_retention(df) No retention in primaries
    
    #clean data
    #clean_df = df[df['Retention'] == 'Retention']
    #clean_df = clean_df[~clean_df.RACE.isin(ballot_measures_df.RACE)]
    df = tag_subcircuit(df)
    clean_df = df.loc[df['Subcircuit'] != 'Subcircuit']
    
    #take out uncontested races
    contested_counts = clean_df.groupby(['WARD','RACE'],as_index=False).count()
    uncontested = contested_counts.loc[contested_counts['CANDIDATE'] == 1]
    clean_df = clean_df[~clean_df.RACE.isin(uncontested.RACE)]
    
    #print(clean_df.info())
    
    #find race w most votes
    by_race = clean_df.groupby('RACE').sum().sort_values('VOTES',ascending=False)
    print(by_race.head(3))
    max_votes = by_race['VOTES'].max()
    print('max votes')
    print(max_votes)
    
    #find denominators
    by_ward = clean_df.groupby('WARD').agg({'REGISTERED VOTERS':'first','BALLOTS CAST':'first'})
    registered = by_ward['REGISTERED VOTERS'].sum()
    ballots = by_ward['BALLOTS CAST'].sum()
    print('ballots')
    print(ballots)
    
    #participation
    participation = max_votes/ballots
    primary_participation_list.append(participation)
    
    #judicial_turnout
    judicial_turnout = max_votes/registered
    primary_jud_turnout_list.append(judicial_turnout)
    
    #voter turnout
    turnout = ballots/registered
    primary_turnout_list.append(turnout)
    
    index+=1

0
                                             WARD  REGISTERED VOTERS  \
RACE                                                                   
Circuit Court Judge (Vac. Burr)              5100            5195488   
APELLATE COURT JUDGE 1ST DIST(VAC.HARTIGAN)  6375            6494360   
APELLATE COURT JUDGE 1ST DIST(VAC. HARTMAN)  6375            6494360   

                                             BALLOTS CAST   VOTES  
RACE                                                               
Circuit Court Judge (Vac. Burr)                   1563564  313549  
APELLATE COURT JUDGE 1ST DIST(VAC.HARTIGAN)       1954455  311283  
APELLATE COURT JUDGE 1ST DIST(VAC. HARTMAN)       1954455  308285  
max votes
313549
ballots
390891
1
                                                    WARD  REGISTERED VOTERS  \
RACE                                                                          
Judge of the Appellate Court (Vacancy of Burke)     3825            3922557   
Judge of the Appellate Cou

In [222]:
primary_participation_list

[0.8021392152799629,
 0.7685866436429432,
 0.8135796498153465,
 0.9019957247838994,
 0.852646708689236,
 0.6955156129476202,
 0.7923028137423237,
 0.8468149655225903]

In [223]:
#save into a dictionary and convert to df
dict_to_df = {
    'Year':years_list,
    'Participation':primary_participation_list,
    'Judicial Turnout':primary_jud_turnout_list,
    'Voter Turnout':primary_turnout_list,
}

primary_summary_df = pd.DataFrame(dict_to_df)
primary_summary_df

Unnamed: 0,Year,Participation,Judicial Turnout,Voter Turnout
0,2006,0.802139,0.241401,0.300947
1,2008,0.768587,0.377179,0.490744
2,2010,0.81358,0.201495,0.247665
3,2012,0.901996,0.187024,0.207345
4,2014,0.852647,0.113589,0.13322
5,2016,0.695516,0.330934,0.475811
6,2018,0.792303,0.239955,0.302857
7,2020,0.846815,0.307945,0.363651


## Ward by Ward Democratic Primary 2006-2020

In [20]:
judicial_primary_list = ['/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2006.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2008.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2010.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2012.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2014.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2016.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2018.csv',
                '/Users/amy/Code/injustice_watch/Judicial General Data/Dem Primaries/judicial_democratic_primary_2020.csv']

In [24]:
all_primary_df_dict = {}
years_list = ['2006','2008','2010','2012','2014','2016','2018','2020']
ballot_measures_df = excel_to_df('/Users/amy/Code/injustice_watch/Judicial General Data/ballot_measures.xlsx')

index = 0

for pathname in judicial_primary_list:
    #load data
    df = pd.read_csv(pathname)
    
    #take out subcircuits
    df = tag_subcircuit(df)
    clean_df = df.loc[df['Subcircuit'] != 'Subcircuit']
    
    #take out uncontested races
    contested_counts = clean_df.groupby(['WARD','RACE'],as_index=False).count()
    uncontested = contested_counts.loc[contested_counts['CANDIDATE'] == 1]
    clean_df = clean_df[~clean_df.RACE.isin(uncontested.RACE)]
    
    #group by race and ward
    grouped = clean_df.groupby(['WARD','RACE'], as_index=False).agg({'WARD': 'first','REGISTERED VOTERS': 'first',
                                                     'BALLOTS CAST': 'first','VOTES': 'sum'}).sort_values('VOTES',ascending=False)
    
    #find max votes in each ward
    grouped['max_votes'] = grouped.groupby(['WARD'])['VOTES'].transform(max)
    
    print(grouped.head())
    
    #consolidate by ward
    data_by_ward = grouped.groupby(['WARD'], as_index=False).agg({'WARD': 'first','REGISTERED VOTERS': 'first',
                                                     'BALLOTS CAST': 'first','max_votes':'first'})
    
    #make calculations
    data_by_ward['participation'] = data_by_ward['max_votes']/data_by_ward['BALLOTS CAST']
    data_by_ward['judicial turnout'] = data_by_ward['max_votes']/data_by_ward['REGISTERED VOTERS']
    data_by_ward['turnout'] = data_by_ward['BALLOTS CAST']/data_by_ward['REGISTERED VOTERS']

    #Add this year's dataframe to the dictionary with the year as the key
    current_year = years_list[index]
    print(current_year)
    all_primary_df_dict[current_year] = data_by_ward
    
    index+=1
    

                                            RACE  WARD  REGISTERED VOTERS  \
127  APELLATE COURT JUDGE 1ST DIST(VAC.HARTIGAN)    19              36708   
128              Circuit Court Judge (Vac. Burr)    19              36708   
129             Circuit Court Judge (Vac. Jaffe)    19              36708   
131          Circuit Court Judge (Vac. Schiller)    19              36708   
126  APELLATE COURT JUDGE 1ST DIST(VAC. HARTMAN)    19              36708   

     BALLOTS CAST  VOTES  max_votes  
127         16615  14330      14330  
128         16615  14167      14330  
129         16615  14164      14330  
131         16615  14164      14330  
126         16615  13845      14330  
2006
                                                  RACE  WARD  \
220    Judge of the Appellate Court (Vacancy of Burke)    21   
78   Judge of the Appellate Court (Vacancy of Campb...     8   
221  Judge of the Appellate Court (Vacancy of Campb...    21   
77     Judge of the Appellate Court (Vacancy of 

In [22]:
#test
test = all_primary_df_dict['2020']
test.head()

Unnamed: 0,WARD,REGISTERED VOTERS,BALLOTS CAST,max_votes,participation,judicial turnout,turnout
0,1,35398,14211,11343,0.798185,0.320442,0.401463
1,2,38016,14025,10733,0.765276,0.282328,0.368924
2,3,36810,14058,11893,0.845995,0.323092,0.381907
3,4,34091,15502,13488,0.870081,0.395647,0.454724
4,5,29415,13766,12047,0.875127,0.409553,0.467993


## Export ward by ward data as CSV

Export 2006-2020 data stored in all_df_dict and all_primary_df_dict

In [19]:
for key,value in all_df_dict.items():
    pathname = '/Users/amy/Code/injustice_watch/analysis/general_v3_' + key + '.csv'
    value.to_csv(pathname)

In [23]:
for key,value in all_primary_df_dict.items():
    pathname = '/Users/amy/Code/injustice_watch/analysis/dprimary_v3_' + key + '.csv'
    value.to_csv(pathname)

In [None]:
#Will do this at the very end