ElectionDataAnalysis · sfsinger19103 · Sep 22, 2021 · Sep 16, 2021 · Sep 21, 2021 · Sep 21, 2021
diff --git a/README.md b/README.md
@@ -2,30 +2,36 @@
 
 
 # Overview
-This repository hopes to provide reliable tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves. 
+This repository provides tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves. 
  * Consolidation: take as input election results files from a wide variety of sources and load the data into a relational database
- * Export: create tab-separated flat export files of results sets rolled up to any desired intermediate geography (e.g., by county, or by congressional district)
- * Analysis: provide a variety of analysis tools
- * Visualization: provide a variety of visualization tools.
+ * Export: create consistent-format export files of results sets rolled up to any desired intermediate geography
+   * tabular (tab-separated text)
+   * xml (following NIST Election Results Reporting Common Data Format V2)
+   * json (following NIST Election Results Reporting Common Data Format V2)
+ * Analysis: 
+   * Curates one-county outliers of interest
+   * Calculates difference-in-difference for reaults available by vote type
+ * Visualization: 
+   * Scatter plots
+   * Bar charts
 
 # Target Audience
 This system is intended to be of use to candidates and campaigns, election officials, students of politics and elections, and anyone else who is interested in assembling and understanding election results.
 
 # How to Contribute Code
-Please contribute code that works in python 3.7, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format.
+Please contribute code that works in python 3.9, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format.
 
 # How to Help in Other Ways
 If you have skills to contribute to building the system, we can definitely use your help:
  * Creating visualizations
- * Importing and exporting data via xml feeds
  * Preparing for intake of specific states' results files
  * Managing collection of data files in real time
  * Writing documentation
  * Merging other data sets of interest (e.g., demographics)
  * Building our open source community
  * What else? Let us know!
 
-If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- we would love to talk with you about what you want to from this system.
+If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- let us know what you want to from this system.
 
 If you are interested in contributing, or just staying updated on the progress of this project, please [contact Stephanie Singer](http://symmetrysinger.com/index.php?id=contact). 
 
@@ -45,6 +51,7 @@ Detailed instructions can be found [here](docs/User_Guide.md).
 Funding provided October 2019 - September 2021 by the National Science Foundation
  * Award #1936809, "EAGER: Data Science for Election Verification" 
  * Award #2027089, "RAPID: Election Result Anomaly Detection for 2020"
+Data collection and consolidation for the 2020 US General Election funded in part by the Verified Voting Foundation.
 
 # License
 See [LICENSE.md](./LICENSE.md)

diff --git a/docs/User_Guide.md b/docs/User_Guide.md
@@ -34,7 +34,7 @@ If the munger for the format of your results file doesn't already exist:
 
 ### \[format\]
  There are two required format parameters: `file_type` and `count_location`. 
- The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header.
+ The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header. Acceptable values are 'flat_text', 'excel', 'xml', 'json-nested'. The `count_location` parameter indicates where the vote counts are to be found. For 'flat_text' or 'excel' file types, either `count_location=by_name:<list of names of columns containing vote counts>` or `count_location=by_number:<list of positions of columns containing vote counts`. 
   * 'flat_text': Any tab-, comma-, or other-separated table in a plain tabular text file.
     * (required) a field delimiter `flat_text_delimiter` to be specified (usually `flat_text_delimiter=,` for csv or `flat_text_delimiter=tab` for .txt)
 
@@ -47,6 +47,10 @@ If the munger for the format of your results file doesn't already exist:
     * (required if `count_location=by_name`) specify location of field names for count columns. with integer `count_field_name_row` (NB: top row not skipped is 0, next row is 1, etc.)
     * (required):
         * Either `all_rows=data` or designate row containing column names for the candidate, reporting unit, etc. with the `noncount_header_row` parameter. (NB: top row not skipped is 0, next row is 1, etc.)
+
+  *  'xml'
+
+  * 'json-nested'
 
    Available if appropriate for any file type, under the `[format]` header:
    * (required if any munging information needs to be read from the `<results>.ini` file) `constant_over_file`, a comma-separated list of elements to be read, e.g., `constant_over_file=CandidateContest,CountItemType`.
@@ -398,22 +402,24 @@ analyzer.export_election_to_tsv("tabular_results.tsv", "2020 General", "South Ca
 
 This code will produce all South Carolina data from the 2018 general election, grouped by contest, county, and vote type (total, early, absentee, etc).
 
-### NIST Common Data Format
-This package also provides functionality to export the data to xml according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd). This is as simple as identifying an election and jurisdiction of interest:
+### NIST Common Data Format Export
+This package provides functionality to export the data to xml or json according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd). 
+
+This is as simple as identifying an election and jurisdiction of interest. For xml:
 ```
 import electiondata as ea
 analyzer = ea.Analyzer()
-election_report = analyzer.export_nist_v2("2020 General", "Georgia")
+election_report = analyzer.export_nist_xml_as_string("2020 General", "Georgia")
 ```
 The output is a string, the contents of the xml file.
 
-There is also an export in the NIST V1 json format:
+And for json:
 ```
 analyzer = ea.Analyzer()
-analyzer.export_nist_v1_json("2020 General","Georgia")
+analyzer.export_nist_json_as_string("2020 General","Georgia")
 ```
 The output is a string, the contents of the json file.
-Both of these can take an optional `major_subdivision` parameter to control the level to which results are rolled up. The default is to roll up to the subdivision type indicated in the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt).
+The subdivision type for the roll-up is determined by the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt).
 
 
 ## Unload and reload data with `reload_juris_election()`
@@ -518,7 +524,9 @@ If there are hidden columns in an Excel file, you may need to omit the hidden co
 ### NIST Common Data Format imports
 To import results from a file that is valid NIST V2 xml -- that can be formally validated against the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd) -- use the file_type 'nist_v2_xml'
 
-Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration.
+Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. For these files use the
+
+Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration.
 
 ### Difference-in-Difference calculations
 The system provides a way to calculate difference-in-difference statistics. For any particular election, `Analyzer.diff_in_diff_dem_vs_rep` produces a dataframe of values for any county with results by vote type, with Democratic or Republican candidates, and any comparable pair of contests both on some ballots in the county. Contests are considered "comparable" if their districts are of the same geographical district type -- e.g., both statewide, or both state-house, etc. The method also returns a list of jurisdictions for which vote counts were zero or missing.

diff --git a/src/electiondata/__init__.py b/src/electiondata/__init__.py
@@ -3010,15 +3010,15 @@ def export_nist(
         election: str,
         jurisdiction,
     ) -> Union[str, Dict[str, Any]]:
-        """picks either version 1.0 (json) or version 2.0 (xml) based on value of constants.nist_version"""
-        if electiondata.constants.nist_version == "1.0":
-            return self.export_nist_v1_json(election, jurisdiction)
-        elif electiondata.constants.nist_version == "2.0":
-            return self.export_nist_v2(election, jurisdiction)
+        """picks either json or xml based on value of constants.nist_version"""
+        if electiondata.constants.default_nist_format == "json":
+            return self.export_nist_json(election,jurisdiction)
+        elif electiondata.constants.default_nist_format == "xml":
+            return self.export_nist_xml_as_string(election,jurisdiction)
         else:
             return ""
 
-    def export_nist_v1_json(self, election: str, jurisdiction: str) -> Dict[str, Any]:
+    def export_nist_json(self,election: str,jurisdiction: str) -> Dict[str,Any]:
         election_id = db.name_to_id(self.session, "Election", election)
         jurisdiction_id = db.name_to_id(self.session, "ReportingUnit", jurisdiction)
 
@@ -3045,16 +3045,16 @@ def export_nist_v1_json(self, election: str, jurisdiction: str) -> Dict[str, Any
 
         return election_report
 
-    def export_nist_v1(
+    def export_nist_json_as_string(
         self,
         election: str,
         jurisdiction: str,
     ) -> str:
-        """exports NIST v1 json string"""
-        json_string = json.dumps(self.export_nist_v1_json(election, jurisdiction))
+        """exports NIST v2 json string"""
+        json_string = json.dumps(self.export_nist_json(election,jurisdiction))
         return json_string
 
-    def export_nist_v2(
+    def export_nist_xml_as_string(
         self,
         election: str,
         jurisdiction: str,
@@ -3716,7 +3716,7 @@ def compare_to_results_file(
                 )
             if not not_found_in_db.empty:
                 nfid_str = (
-                    f"\nSome expected constests not found. For details, see {sub_dir}"
+                    f"\nSome expected contests not found. For details, see {sub_dir}"
                 )
                 err = ui.add_new_error(
                     err,
@@ -3925,7 +3925,7 @@ def load_results_df(
             err,
             "jurisdiction",
             juris_true_name,
-            f"No contest-selection pairs recognized via munger {munger_name}",
+            f"No contest-selection pairs recognized in file {file_name} via munger {munger_name}",
         )
         return err
 

diff --git a/src/electiondata/constants/__init__.py b/src/electiondata/constants/__init__.py
@@ -602,7 +602,7 @@ def jurisdiction_wide_contests(abbr: str) -> List[str]:
 
 # constants dictated by NIST
 if 1:
-    nist_version = "1.0"
+    default_nist_format = "json"  # other option is "xml"
     default_issuer = (
         "unspecified user of code base at github.com/ElectionDataAnalysis/electiondata"
     )

diff --git a/src/electiondata/munge/__init__.py b/src/electiondata/munge/__init__.py
@@ -624,6 +624,8 @@ def melt_to_one_count_column(
         if "in_count_headers" in p["munge_field_types"]:
             # split header_0 column into separate columns
             # # get header_rows
+            # TODO: the following throws PerformanceError for Kansas House of Representatives 2020g. Rather than
+            #  assigning values, need to use melted = pd.concat([melted, <new_columns>])
             melted[
                 [f"count_header_{idx}" for idx in p["count_header_row_numbers"]]
             ] = pd.DataFrame(melted["header_0"].str.split(";:;", expand=True).values)[
@@ -691,8 +693,8 @@ def add_contest_id(
             working, new_err = replace_raw_with_internal_ids(
                 working,
                 juris_true_name,
-                file_name,
                 munger_name,
+                file_name,
                 df_for_type[c_type],
                 f"{c_type}Contest",
                 "Name",
@@ -741,7 +743,8 @@ def add_contest_id(
     # fail if fatal errors or no contests recognized (in reverse order, just for fun
     if working_temp.empty:
         err = ui.add_new_error(
-            err, "jurisdiction", juris_true_name, f"No contests recognized."
+            err, "jurisdiction", juris_true_name,
+            f"No contests recognized from file {file_name} with munger {munger_name}."
         )
     else:
         working = working_temp
@@ -1979,7 +1982,8 @@ def to_standard_count_frame(
             )
 
         # loop through dataframes in list
-        standard[sheet] = pd.DataFrame()
+        # create list of standard-form dataframes from dataframes in list
+        standard_list = list()
         for n in range(len(df_list)):
             raw = df_list[n]
             working = raw.copy()
@@ -2050,9 +2054,12 @@ def to_standard_count_frame(
             # clean Unnamed:... out of any values
             working = blank_out(working, constants.pandas_default_pattern)
 
-            # append data from the nth dataframe to the standard-form dataframe
+            # append standard-forme data from the nth dataframe to the list
             ## NB: if df_list[n] fails it should not reach this statement
-            standard[sheet] = pd.concat([standard[sheet], working])
+            standard_list.append(working)
+
+        # put all the good standard-form dataframes together into one
+        standard[sheet] = pd.concat(standard_list)
 
         # if even one df lacks a fatal error, consider all errors non-fatal for this sheet
         non_fatal_dfs = [

diff --git a/src/ini_files_for_results/Kansas/ks_20g_ks_house_official.ini b/src/ini_files_for_results/Kansas/ks_20g_ks_house_official.ini
@@ -1,6 +1,6 @@
 [election_results]
 results_file=Kansas/2020_General_Election_Kansas_House_of_Representatives_results_by_precinct.xlsx
-munger_list=ks_gen_main,ks_gen_johnson_count_from_B,ks_gen_shawnee_count_from_B,ks_gen_sedgwick,ks_gen_wyandotte_4_line_header_first_count_col_3
+munger_list=ks_gen_main,ks_gen_johnson_count_from_B,ks_gen_shawnee_count_from_B,ks_gen_sedgwick,ks_gen_wyandotte_4_line_header_first_count_col_3,ks_gen_wyandotte_3_line_header_first_count_col_3,ks_gen_wyandotte_4_line_header_first_count_col_3_merged_rows
 jurisdiction=Kansas
 election=2020 General
 results_short_name=ks_20g_kshouse

diff --git a/src/ini_files_for_results/New-Hampshire/nh20g_CD2_official.ini b/src/ini_files_for_results/New-Hampshire/nh20g_CD2_official.ini
@@ -7,7 +7,6 @@ results_short_name=nh20g_cd2
 results_download_date=2020-12-22
 results_source=https://sos.nh.gov/elections/elections/election-results/
 results_note=revised by hand to disambiguate counties & towns with same name (Carroll, Grafton, Hillsborough, Sullivan). Also, candidate Andrew Olding. As of 8/27/2021, the electiondata code throws a (seemingly harmless) warning when processing this file ( /usr/local/lib/python3.9/site-packages/openpyxl/worksheet/header_footer.py:48: UserWarning: Cannot parse header or footer so it will be ignored
-    warn("""Cannot parse header or footer so it will be ignored"""))
-CountItemType=total
 is_preliminary=False
+CountItemType=total
 
diff --git a/src/jurisdictions/Kansas/Candidate.txt b/src/jurisdictions/Kansas/Candidate.txt
@@ -125,15 +125,12 @@ Rick Kloos
 Rachel Willis
 Brenda S. Dietrich
 Anthony Hensley
-Under Votes
-Over Votes
 Laura McConwell
 Ethan Corson
 Diana Whittington
 Cindy Holscher
 Vail Fruechting
 Ty Masterson
-Total Votes Cast
 Timothy Don Fry II
 Mary Ware
 Dan Kerschen
@@ -356,3 +353,6 @@ Vic (T-Bone) Miller
 Vicki Schmidt
 Virgil Weigel
 Wendy Bingesser
+Jordan Michael Mackey
+Greg Conchola
+Rick Parsons
diff --git a/src/jurisdictions/Kansas/dictionary.txt b/src/jurisdictions/Kansas/dictionary.txt
@@ -167,7 +167,6 @@ Candidate	Molly Baumgardner	Baumgardner, Molly
 Candidate	Monica Murnan	Murnan, Monica
 Candidate	Nancy J. Ingle	Ingle, Nancy J.
 Candidate	Other	Other
-Candidate	Over Votes	Over Votes
 Candidate	Pat Pettey	Pat Pettey
 Candidate	Pat Proctor	Proctor, Pat
 Candidate	Patrick Penn	Penn, Patrick
@@ -224,13 +223,11 @@ Candidate	Todd Maddox	Maddox, Todd
 Candidate	Tom Hawk	Hawk, Tom
 Candidate	Tom Holland	Holland, Tom
 Candidate	Tory Marie Arnberger	Arnberger, Tory Marie
-Candidate	Total Votes Cast	Total Votes Cast
 Candidate	Tracey Mann	Mann, Tracey
 Candidate	Trevor Jacobs	Jacobs, Trevor
 Candidate	Troy L. Waymaster	Waymaster, Troy L.
 Candidate	Ty Masterson	Masterson, Ty
 Candidate	Ty Masterson	Ty Masterson
-Candidate	Under Votes	Under Votes
 Candidate	Vail Fruechting	Vail Fruechting
 Candidate	Virgil Peck	Peck, Virgil
 Candidate	W. Michael Shimeall	Shimeall, W. Michael
@@ -6761,3 +6758,6 @@ ReportingUnit	Kansas;Wilson County	Kansas;Wilson
 ReportingUnit	Kansas;Woodson County	Kansas;Woodson
 ReportingUnit	Kansas;Wyandotte County	Kansas;Wyandotte
 CandidateContest	KS Attorney General	KS;Attorney General;statewide
+Candidate	Jordan Michael Mackey	Jordan Michael Mackey
+Candidate	Greg Conchola	Greg Conchola
+Candidate	Rick Parsons	Rick Parsons
diff --git a/src/mungers/ks_gen_johnson_count_from_B.munger b/src/mungers/ks_gen_johnson_count_from_B.munger
@@ -49,14 +49,11 @@ CandidateContest=<count_header_0>
 Party=<count_header_2>
 
 
-
-
-
-
 # Values to ignore (optional) #
 [ignore]
 ## E.g: Candidate=Total Votes Cast,Registered Voters ##
 ReportingUnit=JOHNSON;COUNTY TOTALS,Johnson;COUNTY TOTALS
+Candidate=Write-in,Under Votes,Over Votes
 
 # Lookup formula sections #
 ## Required when foreign keys are used in munge formulas and    ##

diff --git a/src/mungers/ks_gen_sedgwick.munger b/src/mungers/ks_gen_sedgwick.munger
@@ -62,7 +62,7 @@ Party={<count_header_3>,^(\w\w\w) .*$}
 # Values to ignore (optional) #
 [ignore]
 ## E.g: Candidate=Total Votes Cast,Registered Voters ##
-Candidate=Write-in Totals,Totals
+Candidate=Write-in Totals,Totals,Total Votes Cast
 ReportingUnit=SEDGWICK;Totals,Sedgwick;Totals
 
 # Lookup formula sections #

diff --git a/src/mungers/ks_gen_shawnee.munger b/src/mungers/ks_gen_shawnee.munger
@@ -62,7 +62,7 @@ Party={<count_header_2>,^(\w\w\w) .*$}
 # Values to ignore (optional) #
 [ignore]
 ## E.g: Candidate=Total Votes Cast,Registered Voters ##
-Candidate=Write-in Totals
+Candidate=Write-in Totals,Write-in
 
 # Lookup formula sections #
 ## Required when foreign keys are used in munge formulas and    ##