Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,36 @@


# Overview
This repository hopes to provide reliable tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves.
This repository provides tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves.
* Consolidation: take as input election results files from a wide variety of sources and load the data into a relational database
* Export: create tab-separated flat export files of results sets rolled up to any desired intermediate geography (e.g., by county, or by congressional district)
* Analysis: provide a variety of analysis tools
* Visualization: provide a variety of visualization tools.
* Export: create consistent-format export files of results sets rolled up to any desired intermediate geography
* tabular (tab-separated text)
* xml (following NIST Election Results Reporting Common Data Format V2)
* json (following NIST Election Results Reporting Common Data Format V2)
* Analysis:
* Curates one-county outliers of interest
* Calculates difference-in-difference for reaults available by vote type
* Visualization:
* Scatter plots
* Bar charts

# Target Audience
This system is intended to be of use to candidates and campaigns, election officials, students of politics and elections, and anyone else who is interested in assembling and understanding election results.

# How to Contribute Code
Please contribute code that works in python 3.7, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format.
Please contribute code that works in python 3.9, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format.

# How to Help in Other Ways
If you have skills to contribute to building the system, we can definitely use your help:
* Creating visualizations
* Importing and exporting data via xml feeds
* Preparing for intake of specific states' results files
* Managing collection of data files in real time
* Writing documentation
* Merging other data sets of interest (e.g., demographics)
* Building our open source community
* What else? Let us know!

If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- we would love to talk with you about what you want to from this system.
If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- let us know what you want to from this system.

If you are interested in contributing, or just staying updated on the progress of this project, please [contact Stephanie Singer](http://symmetrysinger.com/index.php?id=contact).

Expand All @@ -45,6 +51,7 @@ Detailed instructions can be found [here](docs/User_Guide.md).
Funding provided October 2019 - September 2021 by the National Science Foundation
* Award #1936809, "EAGER: Data Science for Election Verification"
* Award #2027089, "RAPID: Election Result Anomaly Detection for 2020"
Data collection and consolidation for the 2020 US General Election funded in part by the Verified Voting Foundation.

# License
See [LICENSE.md](./LICENSE.md)
Expand Down
24 changes: 16 additions & 8 deletions docs/User_Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ If the munger for the format of your results file doesn't already exist:

### \[format\]
There are two required format parameters: `file_type` and `count_location`.
The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header.
The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header. Acceptable values are 'flat_text', 'excel', 'xml', 'json-nested'. The `count_location` parameter indicates where the vote counts are to be found. For 'flat_text' or 'excel' file types, either `count_location=by_name:<list of names of columns containing vote counts>` or `count_location=by_number:<list of positions of columns containing vote counts`.
* 'flat_text': Any tab-, comma-, or other-separated table in a plain tabular text file.
* (required) a field delimiter `flat_text_delimiter` to be specified (usually `flat_text_delimiter=,` for csv or `flat_text_delimiter=tab` for .txt)

Expand All @@ -47,6 +47,10 @@ If the munger for the format of your results file doesn't already exist:
* (required if `count_location=by_name`) specify location of field names for count columns. with integer `count_field_name_row` (NB: top row not skipped is 0, next row is 1, etc.)
* (required):
* Either `all_rows=data` or designate row containing column names for the candidate, reporting unit, etc. with the `noncount_header_row` parameter. (NB: top row not skipped is 0, next row is 1, etc.)

* 'xml'

* 'json-nested'

Available if appropriate for any file type, under the `[format]` header:
* (required if any munging information needs to be read from the `<results>.ini` file) `constant_over_file`, a comma-separated list of elements to be read, e.g., `constant_over_file=CandidateContest,CountItemType`.
Expand Down Expand Up @@ -398,22 +402,24 @@ analyzer.export_election_to_tsv("tabular_results.tsv", "2020 General", "South Ca

This code will produce all South Carolina data from the 2018 general election, grouped by contest, county, and vote type (total, early, absentee, etc).

### NIST Common Data Format
This package also provides functionality to export the data to xml according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd). This is as simple as identifying an election and jurisdiction of interest:
### NIST Common Data Format Export
This package provides functionality to export the data to xml or json according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd).

This is as simple as identifying an election and jurisdiction of interest. For xml:
```
import electiondata as ea
analyzer = ea.Analyzer()
election_report = analyzer.export_nist_v2("2020 General", "Georgia")
election_report = analyzer.export_nist_xml_as_string("2020 General", "Georgia")
```
The output is a string, the contents of the xml file.

There is also an export in the NIST V1 json format:
And for json:
```
analyzer = ea.Analyzer()
analyzer.export_nist_v1_json("2020 General","Georgia")
analyzer.export_nist_json_as_string("2020 General","Georgia")
```
The output is a string, the contents of the json file.
Both of these can take an optional `major_subdivision` parameter to control the level to which results are rolled up. The default is to roll up to the subdivision type indicated in the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt).
The subdivision type for the roll-up is determined by the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt).


## Unload and reload data with `reload_juris_election()`
Expand Down Expand Up @@ -518,7 +524,9 @@ If there are hidden columns in an Excel file, you may need to omit the hidden co
### NIST Common Data Format imports
To import results from a file that is valid NIST V2 xml -- that can be formally validated against the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd) -- use the file_type 'nist_v2_xml'

Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration.
Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. For these files use the

Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration.

### Difference-in-Difference calculations
The system provides a way to calculate difference-in-difference statistics. For any particular election, `Analyzer.diff_in_diff_dem_vs_rep` produces a dataframe of values for any county with results by vote type, with Democratic or Republican candidates, and any comparable pair of contests both on some ballots in the county. Contests are considered "comparable" if their districts are of the same geographical district type -- e.g., both statewide, or both state-house, etc. The method also returns a list of jurisdictions for which vote counts were zero or missing.
Expand Down
24 changes: 12 additions & 12 deletions src/electiondata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3010,15 +3010,15 @@ def export_nist(
election: str,
jurisdiction,
) -> Union[str, Dict[str, Any]]:
"""picks either version 1.0 (json) or version 2.0 (xml) based on value of constants.nist_version"""
if electiondata.constants.nist_version == "1.0":
return self.export_nist_v1_json(election, jurisdiction)
elif electiondata.constants.nist_version == "2.0":
return self.export_nist_v2(election, jurisdiction)
"""picks either json or xml based on value of constants.nist_version"""
if electiondata.constants.default_nist_format == "json":
return self.export_nist_json(election,jurisdiction)
elif electiondata.constants.default_nist_format == "xml":
return self.export_nist_xml_as_string(election,jurisdiction)
else:
return ""

def export_nist_v1_json(self, election: str, jurisdiction: str) -> Dict[str, Any]:
def export_nist_json(self,election: str,jurisdiction: str) -> Dict[str,Any]:
election_id = db.name_to_id(self.session, "Election", election)
jurisdiction_id = db.name_to_id(self.session, "ReportingUnit", jurisdiction)

Expand All @@ -3045,16 +3045,16 @@ def export_nist_v1_json(self, election: str, jurisdiction: str) -> Dict[str, Any

return election_report

def export_nist_v1(
def export_nist_json_as_string(
self,
election: str,
jurisdiction: str,
) -> str:
"""exports NIST v1 json string"""
json_string = json.dumps(self.export_nist_v1_json(election, jurisdiction))
"""exports NIST v2 json string"""
json_string = json.dumps(self.export_nist_json(election,jurisdiction))
return json_string

def export_nist_v2(
def export_nist_xml_as_string(
self,
election: str,
jurisdiction: str,
Expand Down Expand Up @@ -3716,7 +3716,7 @@ def compare_to_results_file(
)
if not not_found_in_db.empty:
nfid_str = (
f"\nSome expected constests not found. For details, see {sub_dir}"
f"\nSome expected contests not found. For details, see {sub_dir}"
)
err = ui.add_new_error(
err,
Expand Down Expand Up @@ -3925,7 +3925,7 @@ def load_results_df(
err,
"jurisdiction",
juris_true_name,
f"No contest-selection pairs recognized via munger {munger_name}",
f"No contest-selection pairs recognized in file {file_name} via munger {munger_name}",
)
return err

Expand Down
2 changes: 1 addition & 1 deletion src/electiondata/constants/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -602,7 +602,7 @@ def jurisdiction_wide_contests(abbr: str) -> List[str]:

# constants dictated by NIST
if 1:
nist_version = "1.0"
default_nist_format = "json" # other option is "xml"
default_issuer = (
"unspecified user of code base at github.com/ElectionDataAnalysis/electiondata"
)
Expand Down
17 changes: 12 additions & 5 deletions src/electiondata/munge/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,8 @@ def melt_to_one_count_column(
if "in_count_headers" in p["munge_field_types"]:
# split header_0 column into separate columns
# # get header_rows
# TODO: the following throws PerformanceError for Kansas House of Representatives 2020g. Rather than
# assigning values, need to use melted = pd.concat([melted, <new_columns>])
melted[
[f"count_header_{idx}" for idx in p["count_header_row_numbers"]]
] = pd.DataFrame(melted["header_0"].str.split(";:;", expand=True).values)[
Expand Down Expand Up @@ -691,8 +693,8 @@ def add_contest_id(
working, new_err = replace_raw_with_internal_ids(
working,
juris_true_name,
file_name,
munger_name,
file_name,
df_for_type[c_type],
f"{c_type}Contest",
"Name",
Expand Down Expand Up @@ -741,7 +743,8 @@ def add_contest_id(
# fail if fatal errors or no contests recognized (in reverse order, just for fun
if working_temp.empty:
err = ui.add_new_error(
err, "jurisdiction", juris_true_name, f"No contests recognized."
err, "jurisdiction", juris_true_name,
f"No contests recognized from file {file_name} with munger {munger_name}."
)
else:
working = working_temp
Expand Down Expand Up @@ -1979,7 +1982,8 @@ def to_standard_count_frame(
)

# loop through dataframes in list
standard[sheet] = pd.DataFrame()
# create list of standard-form dataframes from dataframes in list
standard_list = list()
for n in range(len(df_list)):
raw = df_list[n]
working = raw.copy()
Expand Down Expand Up @@ -2050,9 +2054,12 @@ def to_standard_count_frame(
# clean Unnamed:... out of any values
working = blank_out(working, constants.pandas_default_pattern)

# append data from the nth dataframe to the standard-form dataframe
# append standard-forme data from the nth dataframe to the list
## NB: if df_list[n] fails it should not reach this statement
standard[sheet] = pd.concat([standard[sheet], working])
standard_list.append(working)

# put all the good standard-form dataframes together into one
standard[sheet] = pd.concat(standard_list)

# if even one df lacks a fatal error, consider all errors non-fatal for this sheet
non_fatal_dfs = [
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[election_results]
results_file=Kansas/2020_General_Election_Kansas_House_of_Representatives_results_by_precinct.xlsx
munger_list=ks_gen_main,ks_gen_johnson_count_from_B,ks_gen_shawnee_count_from_B,ks_gen_sedgwick,ks_gen_wyandotte_4_line_header_first_count_col_3
munger_list=ks_gen_main,ks_gen_johnson_count_from_B,ks_gen_shawnee_count_from_B,ks_gen_sedgwick,ks_gen_wyandotte_4_line_header_first_count_col_3,ks_gen_wyandotte_3_line_header_first_count_col_3,ks_gen_wyandotte_4_line_header_first_count_col_3_merged_rows
jurisdiction=Kansas
election=2020 General
results_short_name=ks_20g_kshouse
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ results_short_name=nh20g_cd2
results_download_date=2020-12-22
results_source=https://sos.nh.gov/elections/elections/election-results/
results_note=revised by hand to disambiguate counties & towns with same name (Carroll, Grafton, Hillsborough, Sullivan). Also, candidate Andrew Olding. As of 8/27/2021, the electiondata code throws a (seemingly harmless) warning when processing this file ( /usr/local/lib/python3.9/site-packages/openpyxl/worksheet/header_footer.py:48: UserWarning: Cannot parse header or footer so it will be ignored
warn("""Cannot parse header or footer so it will be ignored"""))
CountItemType=total
is_preliminary=False
CountItemType=total

6 changes: 3 additions & 3 deletions src/jurisdictions/Kansas/Candidate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -125,15 +125,12 @@ Rick Kloos
Rachel Willis
Brenda S. Dietrich
Anthony Hensley
Under Votes
Over Votes
Laura McConwell
Ethan Corson
Diana Whittington
Cindy Holscher
Vail Fruechting
Ty Masterson
Total Votes Cast
Timothy Don Fry II
Mary Ware
Dan Kerschen
Expand Down Expand Up @@ -356,3 +353,6 @@ Vic (T-Bone) Miller
Vicki Schmidt
Virgil Weigel
Wendy Bingesser
Jordan Michael Mackey
Greg Conchola
Rick Parsons
6 changes: 3 additions & 3 deletions src/jurisdictions/Kansas/dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,6 @@ Candidate Molly Baumgardner Baumgardner, Molly
Candidate Monica Murnan Murnan, Monica
Candidate Nancy J. Ingle Ingle, Nancy J.
Candidate Other Other
Candidate Over Votes Over Votes
Candidate Pat Pettey Pat Pettey
Candidate Pat Proctor Proctor, Pat
Candidate Patrick Penn Penn, Patrick
Expand Down Expand Up @@ -224,13 +223,11 @@ Candidate Todd Maddox Maddox, Todd
Candidate Tom Hawk Hawk, Tom
Candidate Tom Holland Holland, Tom
Candidate Tory Marie Arnberger Arnberger, Tory Marie
Candidate Total Votes Cast Total Votes Cast
Candidate Tracey Mann Mann, Tracey
Candidate Trevor Jacobs Jacobs, Trevor
Candidate Troy L. Waymaster Waymaster, Troy L.
Candidate Ty Masterson Masterson, Ty
Candidate Ty Masterson Ty Masterson
Candidate Under Votes Under Votes
Candidate Vail Fruechting Vail Fruechting
Candidate Virgil Peck Peck, Virgil
Candidate W. Michael Shimeall Shimeall, W. Michael
Expand Down Expand Up @@ -6761,3 +6758,6 @@ ReportingUnit Kansas;Wilson County Kansas;Wilson
ReportingUnit Kansas;Woodson County Kansas;Woodson
ReportingUnit Kansas;Wyandotte County Kansas;Wyandotte
CandidateContest KS Attorney General KS;Attorney General;statewide
Candidate Jordan Michael Mackey Jordan Michael Mackey
Candidate Greg Conchola Greg Conchola
Candidate Rick Parsons Rick Parsons
5 changes: 1 addition & 4 deletions src/mungers/ks_gen_johnson_count_from_B.munger
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,11 @@ CandidateContest=<count_header_0>
Party=<count_header_2>






# Values to ignore (optional) #
[ignore]
## E.g: Candidate=Total Votes Cast,Registered Voters ##
ReportingUnit=JOHNSON;COUNTY TOTALS,Johnson;COUNTY TOTALS
Candidate=Write-in,Under Votes,Over Votes

# Lookup formula sections #
## Required when foreign keys are used in munge formulas and ##
Expand Down
2 changes: 1 addition & 1 deletion src/mungers/ks_gen_sedgwick.munger
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Party={<count_header_3>,^(\w\w\w) .*$}
# Values to ignore (optional) #
[ignore]
## E.g: Candidate=Total Votes Cast,Registered Voters ##
Candidate=Write-in Totals,Totals
Candidate=Write-in Totals,Totals,Total Votes Cast
ReportingUnit=SEDGWICK;Totals,Sedgwick;Totals

# Lookup formula sections #
Expand Down
2 changes: 1 addition & 1 deletion src/mungers/ks_gen_shawnee.munger
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Party={<count_header_2>,^(\w\w\w) .*$}
# Values to ignore (optional) #
[ignore]
## E.g: Candidate=Total Votes Cast,Registered Voters ##
Candidate=Write-in Totals
Candidate=Write-in Totals,Write-in

# Lookup formula sections #
## Required when foreign keys are used in munge formulas and ##
Expand Down
Loading