<a href="https://colab.research.google.com/github/lieuzhenghong/districtr-eda/blob/master/districtr_convert_json_to_partition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Converting Districtr assignments (JSON) to GerryChain partitions

## Motivation

to be able to do graphical analysis on the plans drawn by Districtr akin to that we can do with Gerrychain.

## MVP

a Python script that takes in shapefile + district assignments (in JSON) and returns several metrics like contiguity and number of cut edges.

## Rough steps

For each state:
1. Import the shapefile
2. Using the shapefile, create the dual graph of a state

For each proposed districting plan:
3. Import the district assignments JSON
4. Using the district assignments JSON, form the graph partition (possibly with
   Gerrychain, NetworkX, something else)
5. Using the graph partition, answer queries like number of cut edges and
   contiguity. (edited) 

## First attempt at the problem


In [None]:
!pip install geopandas
!pip install networkx
!pip install gerrychain



In [None]:
import sys
import json

In [None]:
import geopandas as gpd

Use geopandas to read in the shapefile

In [None]:
STATE_SHAPEFILE_PATH = "./sample_data/IA_counties.shp"
state_gdf = gpd.read_file(STATE_SHAPEFILE_PATH)

In [None]:
print(state_shapefile)

In [None]:
import networkx as nx

In [None]:
import matplotlib.pyplot as plt

In [None]:
f, ax = plt.subplots(figsize=(10, 10))
state_gdf.plot(ax=ax)
plt.show()

In [None]:
import gerrychain

We first import the Districtr district assignments.

In [None]:
JSON_PATH = './sample_data/incomplete-islands.json'
with open(f'{JSON_PATH}', 'r') as f:
  json_assignment = json.load(f)

In [None]:
assignment = {int(k):v for k,v in json_assignment['assignment'].items()}
print(len(assignment))
print(assignment)

In [None]:
state_gdf['assignment'] = state_gdf.GEOID10.map(dict(assignment))
state_gdf['assignment'] = state_gdf['assignment'].fillna(-1)
print(state_gdf)

We have the Districtr assignments and the GeoDataFrame. We now create a Graph using the GeoDataFrame, and assign each node to a district.

In [None]:
# There's a bug with Gerrychain here. Issue 328.
state_graph = gerrychain.Graph.from_geodataframe(state_gdf) # will worry about the CRS later.


In [None]:
# Workaround for issue 328: add columns manually
state_graph.add_data(state_gdf, columns=['GEOID10', 'assignment'])

In [None]:
## This bit visualises the dual graph. GerryChain has an inbuilt plot function,
## but I wrote my own in NetworkX

def visualise_partition(state_graph):
  node_assignments = (nx.get_node_attributes(state_graph, 'assignment'))
  districts = set(node_assignments.values())
  print(districts)

  # this is kind of slow because it loops through all the assignments many times
  # but we'll worry about this later
  district_assignments = [
                          [k for k, v in node_assignments.items() if v==district]
                          for district in districts
                        ]

  print(district_assignments)

  pos = nx.spectral_layout(state_graph)

  import pylab
  NUM_COLORS = len(districts)

  cm = pylab.get_cmap('gist_rainbow')
  cmap = [[cm(1.*i/NUM_COLORS)] for i in range(0, NUM_COLORS)]

  print(cmap)

  for idx, assignment in enumerate(district_assignments):
    nx.draw_networkx_nodes(state_graph, pos, 
                          nodelist=assignment,
                          node_color=cmap[idx],
                          node_size=100
                          )

  nx.draw_networkx_edges(state_graph, pos, width=1.0, alpha=0.8)
  nx.draw_networkx_labels(state_graph, pos, font_size=10)

  plt.axis('off')
  plt.show()

In [None]:
visualise_partition(state_graph)

## Final cleaned up version

GerryChain has a `from_districtr_file` functions which is literally one line of code. This simplifies the problem greatly.

It also has a built-in partition visualisation function, so I didn't need to write the `visualise_partition` function.

In [None]:
import sys
import json
import geopandas as gpd
import gerrychain
import networkx as nx
import matplotlib.pyplot as plt
STATE_SHAPEFILE_PATH = "./sample_data/IA_counties.shp"
JSON_PATH = './sample_data/incomplete-contiguous.json'

# Possible problems with the CRS here.
state_graph = gerrychain.Graph.from_file(STATE_SHAPEFILE_PATH)
state_graph.to_json('./sample_data/output_data/iowa_dual_graph')

# Form the partition with the JSON path
partition = gerrychain.Partition.from_districtr_file(state_graph, JSON_PATH, 
                                                     updaters=None)

# Visualise the partition
#state_gdf = gpd.read_file(STATE_SHAPEFILE_PATH)
#partition.plot(geometries=state_gdf)

# Now check for cut edges and for contiguity
print(partition['cut_edges'])
print(gerrychain.constraints.contiguity.contiguous(partition))

{(44, 66), (68, 71), (24, 92), (33, 45), (24, 59), (85, 92), (4, 27), (36, 48), (2, 12), (44, 48), (75, 88), (29, 80), (8, 24), (53, 59), (14, 18), (13, 69), (28, 35), (9, 87), (46, 97), (62, 74), (88, 98), (75, 82), (76, 92), (53, 89), (7, 58), (67, 97), (16, 92), (53, 62), (46, 58), (35, 93), (25, 34), (4, 62), (75, 81), (13, 56), (12, 19), (46, 77), (19, 56), (42, 90)}
True



  areas = df.geometry.area.to_dict()


In [None]:
STATE_SHAPEFILE_PATH = "./sample_data/tx_shp/TX_vtds.shp"

state_gdf = gpd.read_file(STATE_SHAPEFILE_PATH)
print(state_gdf)
state_gdf['geometry'] = state_gdf.buffer(0)
print(state_gdf)

state_graph = gerrychain.Graph.from_geodataframe(state_gdf)
#state_graph = gerrychain.Graph.from_file(STATE_SHAPEFILE_PATH)
state_graph.to_json('./sample_data/output_data/texas_dual_graph')

     CNTYVTD   VTD  ...  PERIM                                           geometry
0      10001  0001  ...     15  POLYGON ((1413960.808 1073012.816, 1413971.571...
1      10002  0002  ...     95  POLYGON ((1420165.429 1066385.798, 1420251.968...
2      10003  0003  ...     55  POLYGON ((1416275.023 1072178.732, 1416410.201...
3      10004  0004  ...     92  POLYGON ((1435604.819 1074650.256, 1435674.876...
4      10005  0005  ...     87  POLYGON ((1436888.342 1072498.734, 1436911.364...
...      ...   ...  ...    ...                                                ...
8936  990412  0412  ...     18  POLYGON ((1199509.662 997181.911, 1199489.562 ...
8937  990413  0413  ...    145  POLYGON ((1208431.172 1051581.578, 1208442.156...
8938  990414  0414  ...     64  POLYGON ((1224967.807 1051996.567, 1225381.276...
8939  990415  0415  ...     54  POLYGON ((1208533.058 1050659.635, 1208533.028...
8940  990416  0416  ...     59  POLYGON ((1201864.290 1014975.205, 1201927.399...

[8941 rows x 38

In [None]:
gerrychain.Graph.from_json('./sample_data/output_data/texas_dual_graph')
gerrychain.Graph.from_json('./sample_data/output_data/iowa_dual_graph')

<Graph [99 nodes, 222 edges]>