In [47]:
import numpy as np
import pandas as pd
from shapely import centroid
import os
import shutil

#  {city}_death_simplices_by_death_in_dim_1.npy data

This is the basic data for the paper result: Triangles corresponding to the resource holes, given by lon/lat pairs for each vertex,
and then the death filtration value, zscore of that value, and the ratio of the dath to birth filtration value. The triangles are stored as Shapely Polygon objects. Below extracts the vertex coords, the centroid coords, and stores everything in a data frame

## Convert death simplex data to Pandas dataframes

In [48]:
def convert_simplicies_to_df(data):

    keys = ['lon1','lat1','lon2','lat2','lon3','lat3','lon_center','lat_center',
            'death_filtration_value','death_filtration_zscore','death_birth_ratio']
    
    pdict = dict(zip(keys,[[] for key in keys]))
    
    for d in data:
        coords = d[0].__geo_interface__['coordinates'][0]

        #get coordinates of centroid, probably useful
        lonc, latc = centroid(d[0]).__geo_interface__['coordinates']

        # make sure all the polygons are triangles and the last coordinate pair is redundant and we can throw it away
        assert coords[-1] == coords[0]
        assert len(coords) == 4
    
        lon1, lat1= coords[0]
        lon2, lat2 = coords[1]
        lon3, lat3 = coords[2]

        # get death filtration values, zscores, and death/birth ratio
        dfv, dfz, dbr = d[1:]

        row = [lon1, lat1, lon2, lat2, lon3, lat3, lonc, latc, dfv, dfz, dbr]
        for ii, key in enumerate(keys):
            pdict[key].append(row[ii])

    return pd.DataFrame(pdict)

        


        


In [49]:
paths = ['../Salt Lake City/slc_death_simplices_by_death_in_dim_1.npy',
         '../Chicago/chc_death_simplices_by_death_in_dim_1.npy',
         '../Atlanta/atl_death_simplices_by_death_in_dim_1.npy',
         '../Jacksonville/jax_death_simplices_by_death_in_dim_1.npy',
         '../NYC/nyc_death_simplices_by_death_in_dim_1.npy'
        ]
for path in paths:
    folder_name = path.split('/')[2].split('_')[0]
    file_name = '{}.csv'.format(path.split('/')[2].split('.')[0])

    try: os.mkdir('../project_data/{}'.format(folder_name))
    except FileExistsError: print('{} Folder already exists. Continuing'. format(folder_name))
    shutil.copy(path, '../project_data/{}'.format(folder_name))
    data = np.load(path, allow_pickle=True)

    df = convert_simplicies_to_df(data)

    df.to_csv('../project_data/{}/{}'.format(folder_name,file_name))

slc Folder already exists. Continuing
chc Folder already exists. Continuing
atl Folder already exists. Continuing
jax Folder already exists. Continuing
nyc Folder already exists. Continuing


# PRESIDENT_precinct_general.csv


This is the raw # of votes by precinct broken down by party AND voting method (absentee, non-absentee, etc.)
A few things to note:
- Does NOT include total registered voters by precinct
- However, that data can be found elsewhere. For instance, Chicago gives these numbers by precinct [here](https://chicagoelections.gov/elections/results) (however, not by voting method). 
- Does NOT include demographic data
- Does NOT include spatial data, so precincts need to be looked up elsewhere to get their location beyond just what county they're in
- The precinct conventions are different for different cities, counties, etc as illustrated below
- There is no national database of precinct boundaries. But they do seem to be getable city-by-city. For example, the [2012-2022 Chicago precinct boundaries](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Ward-Precincts-2012-2022-/uvpq-qeeq) are stored as sets of lat/lon pairs for vertices. I think packages like Shapely should allow us to easily check which ones intersect with triangles in the data above
- I've put the Chicago coordinates in the corresponding data folder [here](../project_data/chc/ChicagoPrecincts2012_2022.csv)
  

In [50]:
data = pd.read_csv('../project_data/PRESIDENT_precinct_general.csv')

  data = pd.read_csv('../project_data/PRESIDENT_precinct_general.csv')


## Selecting, e.g. Chicago precincts

Chicago falls under the Cook County jurisdiction. There are multiple Cook counties, and the one in Illinois includes more than just Chicago

In [56]:
cook_data = data[(data.jurisdiction_name == 'COOK') & (data.state == 'ILLINOIS')] # There are multiple Cook counties, and the one in Illinois includes more than just Chicago

In [57]:
np.unique(cook_data.precinct)

array(['7000001.0', '7000002.0', '7000003.0', ..., 'WARD 50 PRECINCT 38',
       'WARD 50 PRECINCT 39', 'WARD 50 PRECINCT 40'], dtype=object)

Even in Cook County, precincts have different naming conventions.

Found [map of precincts here](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Ward-Precincts-2012-2022-/uvpq-qeeq).

All Chicago precincts are identified by a Ward # and a Precinct # that's not unique across Wards.
Selecting for things that look like that gives 2069 precincts, which is consistent with googling it for the map
covering 2012-2022. **Note:** The [precinct map for Chicago was redrawn](https://app.chicagoelections.com/Documents/general/Chicago%20Board%20of%20Elections%20Releases%20New%20Ward%20and%20Precinct%20Maps.pdf) ahead of the 2022 General Election, reducing the
number of precincts. This is something we're going to have to be careful with.



In [55]:
chicago_precincts = []
for ii, precinct in enumerate(np.unique(x.precinct)):
    if 'WARD' in precinct:
        chicago_precincts.append(precinct)

chicago_data = cook_data[cook_data.precinct.isin(chicago_precincts)]
print(len(np.unique(chicago_data.precinct)))

2069
