# Verification Example Notebook

This Notebook will show how to use the verification script by providing a few examples in order of increasing difficulty. Before you start I highly suggest at least skimming [README.md](https://github.com/OpenPrecincts/verification/blob/master/README.md)

In [1]:
import geopandas as gpd
import verify
import matplotlib.pyplot as plt

## Example #1 - The best case scenario

In [2]:
gdf = gpd.read_file('example-election-shapefiles/open-precincts-md-2016')
print(gdf.plot())
gdf.head(2)

AxesSubplot(0.125,0.222475;0.775x0.545051)


Unnamed: 0,JURIS,NAME,NUMBER,preid,G16RPRS,G16DPRS,G16PRELJoh,G16PREGSte,G16PREOth,G16USSRSze,...,G16H07RVau,G16H07DCum,G16H07GHoe,G16H07Oth,G16H08RCox,G16H08DRas,G16H08LWun,G16H08GWal,G16H08Oth,geometry
0,ALLE,ALLEGANY PRECINCT 01-000,01-000,ALLE-01-000,420,63,7,2,8,367,...,0,0,0,0,0,0,0,0,0,"POLYGON ((279387.444 229180.231, 279432.538 22..."
1,ALLE,ALLEGANY PRECINCT 02-000,02-000,ALLE-02-000,457,78,18,9,2,400,...,0,0,0,0,0,0,0,0,0,"POLYGON ((262891.921 216881.387, 263013.601 21..."


Great - our election shapefile looks good because it has:
* election results at a precinct level with vote counts 
    * for Clinton (G16DPRS)
    * and Trump (G16DPRS)
* AND geometries for each precinct (geometry)

We need all the bullets above in order to use the verification script. Next we will apply verify.verify_state. It's docstring is as follows:

```
returns a complete (StateReport) object and a ((CountyReport) list) for the state.

:state_prec_gdf: (GeoDataFrame) containing precinct geometries and election results
:state_abbreviation: (str) e.g. 'MA' for Massachusetts
:source: (str) person or organization that made the 'state_prec_gdf' e.g 'VEST'
:year: (str) 'YYYY' indicating the year the election took place e.g. '2016'
:d_col: (str) denotes the column for Hillary Clinton vote counts in each precinct
:r_col: (str) denotes the column for Donald Trump vote counts in each precinct
:path: (str) filepath to which the report should be saved (if None it won't be saved)

d_col, r_col are optional - if they are not provided, 'get_party_cols' will be used
to guess based on comparing each column in state_prec_gdf to the expected results.
```

Pro tip: If you want to view a docstring in Jupyter Notebooks just type hit `shift-tab` after the name of the function for which you want to see the docstring. 

In [3]:
state_report, county_report_lst = verify.verify_state(gdf, 'MD', 'OP', '2016')

d_col :  G16DPRS
r_col :  G16RPRS


KeyboardInterrupt: 

It's normal for the cell above this one to take a while - normally a few minutes, but even hours in extreme cases. It  depends on the complexity of the state shapefile.

Now that it's finished, let's inspect the reports it returned.

In [None]:
vars(state_report)

In [None]:
vars(county_report_lst[0])

In [None]:
len(county_report_lst)

Great - now let's use these report objects to render a markdown file. You can also do this with verify.verify_state by providing the optional arguement `path` 

In [None]:
report_file_path = 'open-precincts-maryland-2016'
verify.make_report(report_file_path, state_report, county_report_lst)

[Maryland's Report](https://github.com/OpenPrecincts/verification/blob/master/reports/mggg-vermont-2016.md)

## Example #2 - Manual GEOID Assignment

If the [GEOID column](https://github.com/OpenPrecincts/verification#geoid-county-assignment-for-each-precinct) is missing then the script will attempt to create it using the [MAUP package](https://github.com/mggg/maup#assigning-precincts-to-districts) to assign each precinct to the county which contains it. This election shapefile runs into trouble with MAUP

In [None]:
gdf = gpd.read_file('example-election-shapefiles/vest-nh-2016')
print(gdf.plot())

In [None]:
gdf.head(2)

In [None]:
state_report, county_report_lst = verify.verify_state(gdf, 'NH', 'VEST', '2016')

This assertion error is telling us that we are missing a GEOID column and the script was unable to assign it automatically. Luckily, this NH GeoDataFrame already has the two consitutents of a GEOID:
* STATEFP
* COUNTFP

So we can create a GEOID column manually like so:

In [None]:
gdf['GEOID'] = gdf['STATEFP'].map(str) + gdf['COUNTYFP'].map(str)
gdf.GEOID.head(5)

In [None]:
report_file_path = 'vest-new-hampshire-2016'
state_report, county_report_lst = verify.verify_state(gdf, 'NH', 'VEST', '2016',path=report_file_path)

[New Hampshire's Report](https://github.com/OpenPrecincts/verification/blob/master/reports/vest-new-hampshire-2016.md)

In a less trivial case, you may have the county names, but not their FIPS code. Let's consider VEST's Washington 2016:

In [None]:
gdf = gpd.read_file('example-election-shapefiles/vest-wa-2016')
print(gdf.plot())
gdf.head(2)

In [None]:
from reference_data import state_fip_to_geoid_to_county_name
washington_state_fips_code = 53
geoid_to_county_name = state_fip_to_geoid_to_county_name[53]
geoid_to_county_name

In [None]:
gdf['GEOID'] = gdf['COUNTY'].apply(lambda x: geoid_to_county_name[x + " County"])
print(gdf.GEOID.unique())
gdf.head(2)

Now Washington has a GEOID column and can be run through the verification script.

## Example #3 - Manual Candidate Column Selection
The script needs to know which column contains votes for Clinton and which column contains votes for Trump. They can be manually entered as arguments:

* `d_col` denotes the column for Hillary Clinton vote counts in each precinct
* `r_col` denotes the column for Donald Trump vote counts in each precinct.

Without those arguments, the script will guess based on the expected number of votes for each candidate.

In [None]:
gdf = gpd.read_file('example-election-shapefiles/mggg-vt-2016')
print(gdf.plot())
gdf.head(2)

In [None]:
report_file_path = 'mggg-vermont-2016'
state_report, county_report_lst = verify.verify_state(gdf, 'VT', 'MGGG', '2016',path=report_file_path)

None of those look right, so we will have to pass the correct columns as arguements.

In [None]:
gdf.columns

In [None]:
d_col = 'PRES16D'
r_col = 'PRES16R'
state_report, county_report_lst = verify.verify_state(gdf, 'VT', 'MGGG', '2016', d_col=d_col, r_col=r_col, path=report_file_path)

[Vermont Report](https://github.com/OpenPrecincts/verification/blob/master/reports/mggg-vermont-2016.md)

That's it! You may need to combine the method used in example 2 and example 3 in some cases, but hopefully most states will work like example #1. Happy Verifying :)