diff --git a/README.md b/README.md index 4b644915..42c0823e 100644 --- a/README.md +++ b/README.md @@ -2,22 +2,28 @@ # Overview -This repository hopes to provide reliable tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves. +This repository provides tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves. * Consolidation: take as input election results files from a wide variety of sources and load the data into a relational database - * Export: create tab-separated flat export files of results sets rolled up to any desired intermediate geography (e.g., by county, or by congressional district) - * Analysis: provide a variety of analysis tools - * Visualization: provide a variety of visualization tools. + * Export: create consistent-format export files of results sets rolled up to any desired intermediate geography + * tabular (tab-separated text) + * xml (following NIST Election Results Reporting Common Data Format V2) + * json (following NIST Election Results Reporting Common Data Format V2) + * Analysis: + * Curates one-county outliers of interest + * Calculates difference-in-difference for reaults available by vote type + * Visualization: + * Scatter plots + * Bar charts # Target Audience This system is intended to be of use to candidates and campaigns, election officials, students of politics and elections, and anyone else who is interested in assembling and understanding election results. # How to Contribute Code -Please contribute code that works in python 3.7, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format. +Please contribute code that works in python 3.9, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format. # How to Help in Other Ways If you have skills to contribute to building the system, we can definitely use your help: * Creating visualizations - * Importing and exporting data via xml feeds * Preparing for intake of specific states' results files * Managing collection of data files in real time * Writing documentation @@ -25,7 +31,7 @@ If you have skills to contribute to building the system, we can definitely use y * Building our open source community * What else? Let us know! -If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- we would love to talk with you about what you want to from this system. +If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- let us know what you want to from this system. If you are interested in contributing, or just staying updated on the progress of this project, please [contact Stephanie Singer](http://symmetrysinger.com/index.php?id=contact). @@ -45,6 +51,7 @@ Detailed instructions can be found [here](docs/User_Guide.md). Funding provided October 2019 - September 2021 by the National Science Foundation * Award #1936809, "EAGER: Data Science for Election Verification" * Award #2027089, "RAPID: Election Result Anomaly Detection for 2020" +Data collection and consolidation for the 2020 US General Election funded in part by the Verified Voting Foundation. # License See [LICENSE.md](./LICENSE.md) diff --git a/docs/User_Guide.md b/docs/User_Guide.md index 01f51743..af96a0eb 100644 --- a/docs/User_Guide.md +++ b/docs/User_Guide.md @@ -34,7 +34,7 @@ If the munger for the format of your results file doesn't already exist: ### \[format\] There are two required format parameters: `file_type` and `count_location`. - The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header. + The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header. Acceptable values are 'flat_text', 'excel', 'xml', 'json-nested'. The `count_location` parameter indicates where the vote counts are to be found. For 'flat_text' or 'excel' file types, either `count_location=by_name:` or `count_location=by_number:.ini` file) `constant_over_file`, a comma-separated list of elements to be read, e.g., `constant_over_file=CandidateContest,CountItemType`. @@ -398,22 +402,24 @@ analyzer.export_election_to_tsv("tabular_results.tsv", "2020 General", "South Ca This code will produce all South Carolina data from the 2018 general election, grouped by contest, county, and vote type (total, early, absentee, etc). -### NIST Common Data Format -This package also provides functionality to export the data to xml according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd). This is as simple as identifying an election and jurisdiction of interest: +### NIST Common Data Format Export +This package provides functionality to export the data to xml or json according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd). + +This is as simple as identifying an election and jurisdiction of interest. For xml: ``` import electiondata as ea analyzer = ea.Analyzer() -election_report = analyzer.export_nist_v2("2020 General", "Georgia") +election_report = analyzer.export_nist_xml_as_string("2020 General", "Georgia") ``` The output is a string, the contents of the xml file. -There is also an export in the NIST V1 json format: +And for json: ``` analyzer = ea.Analyzer() -analyzer.export_nist_v1_json("2020 General","Georgia") +analyzer.export_nist_json_as_string("2020 General","Georgia") ``` The output is a string, the contents of the json file. -Both of these can take an optional `major_subdivision` parameter to control the level to which results are rolled up. The default is to roll up to the subdivision type indicated in the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt). +The subdivision type for the roll-up is determined by the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt). ## Unload and reload data with `reload_juris_election()` @@ -518,7 +524,9 @@ If there are hidden columns in an Excel file, you may need to omit the hidden co ### NIST Common Data Format imports To import results from a file that is valid NIST V2 xml -- that can be formally validated against the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd) -- use the file_type 'nist_v2_xml' -Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration. +Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. For these files use the + +Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration. ### Difference-in-Difference calculations The system provides a way to calculate difference-in-difference statistics. For any particular election, `Analyzer.diff_in_diff_dem_vs_rep` produces a dataframe of values for any county with results by vote type, with Democratic or Republican candidates, and any comparable pair of contests both on some ballots in the county. Contests are considered "comparable" if their districts are of the same geographical district type -- e.g., both statewide, or both state-house, etc. The method also returns a list of jurisdictions for which vote counts were zero or missing. diff --git a/src/electiondata/__init__.py b/src/electiondata/__init__.py index 23c1eb86..da2cfad9 100644 --- a/src/electiondata/__init__.py +++ b/src/electiondata/__init__.py @@ -3010,15 +3010,15 @@ def export_nist( election: str, jurisdiction, ) -> Union[str, Dict[str, Any]]: - """picks either version 1.0 (json) or version 2.0 (xml) based on value of constants.nist_version""" - if electiondata.constants.nist_version == "1.0": - return self.export_nist_v1_json(election, jurisdiction) - elif electiondata.constants.nist_version == "2.0": - return self.export_nist_v2(election, jurisdiction) + """picks either json or xml based on value of constants.nist_version""" + if electiondata.constants.default_nist_format == "json": + return self.export_nist_json(election,jurisdiction) + elif electiondata.constants.default_nist_format == "xml": + return self.export_nist_xml_as_string(election,jurisdiction) else: return "" - def export_nist_v1_json(self, election: str, jurisdiction: str) -> Dict[str, Any]: + def export_nist_json(self,election: str,jurisdiction: str) -> Dict[str,Any]: election_id = db.name_to_id(self.session, "Election", election) jurisdiction_id = db.name_to_id(self.session, "ReportingUnit", jurisdiction) @@ -3045,16 +3045,16 @@ def export_nist_v1_json(self, election: str, jurisdiction: str) -> Dict[str, Any return election_report - def export_nist_v1( + def export_nist_json_as_string( self, election: str, jurisdiction: str, ) -> str: - """exports NIST v1 json string""" - json_string = json.dumps(self.export_nist_v1_json(election, jurisdiction)) + """exports NIST v2 json string""" + json_string = json.dumps(self.export_nist_json(election,jurisdiction)) return json_string - def export_nist_v2( + def export_nist_xml_as_string( self, election: str, jurisdiction: str, @@ -3716,7 +3716,7 @@ def compare_to_results_file( ) if not not_found_in_db.empty: nfid_str = ( - f"\nSome expected constests not found. For details, see {sub_dir}" + f"\nSome expected contests not found. For details, see {sub_dir}" ) err = ui.add_new_error( err, @@ -3925,7 +3925,7 @@ def load_results_df( err, "jurisdiction", juris_true_name, - f"No contest-selection pairs recognized via munger {munger_name}", + f"No contest-selection pairs recognized in file {file_name} via munger {munger_name}", ) return err diff --git a/src/electiondata/constants/__init__.py b/src/electiondata/constants/__init__.py index 5a3a6689..144b14a4 100644 --- a/src/electiondata/constants/__init__.py +++ b/src/electiondata/constants/__init__.py @@ -602,7 +602,7 @@ def jurisdiction_wide_contests(abbr: str) -> List[str]: # constants dictated by NIST if 1: - nist_version = "1.0" + default_nist_format = "json" # other option is "xml" default_issuer = ( "unspecified user of code base at github.com/ElectionDataAnalysis/electiondata" ) diff --git a/src/electiondata/munge/__init__.py b/src/electiondata/munge/__init__.py index 0c5aaec4..c10dcf7d 100644 --- a/src/electiondata/munge/__init__.py +++ b/src/electiondata/munge/__init__.py @@ -624,6 +624,8 @@ def melt_to_one_count_column( if "in_count_headers" in p["munge_field_types"]: # split header_0 column into separate columns # # get header_rows + # TODO: the following throws PerformanceError for Kansas House of Representatives 2020g. Rather than + # assigning values, need to use melted = pd.concat([melted, ]) melted[ [f"count_header_{idx}" for idx in p["count_header_row_numbers"]] ] = pd.DataFrame(melted["header_0"].str.split(";:;", expand=True).values)[ @@ -691,8 +693,8 @@ def add_contest_id( working, new_err = replace_raw_with_internal_ids( working, juris_true_name, - file_name, munger_name, + file_name, df_for_type[c_type], f"{c_type}Contest", "Name", @@ -741,7 +743,8 @@ def add_contest_id( # fail if fatal errors or no contests recognized (in reverse order, just for fun if working_temp.empty: err = ui.add_new_error( - err, "jurisdiction", juris_true_name, f"No contests recognized." + err, "jurisdiction", juris_true_name, + f"No contests recognized from file {file_name} with munger {munger_name}." ) else: working = working_temp @@ -1979,7 +1982,8 @@ def to_standard_count_frame( ) # loop through dataframes in list - standard[sheet] = pd.DataFrame() + # create list of standard-form dataframes from dataframes in list + standard_list = list() for n in range(len(df_list)): raw = df_list[n] working = raw.copy() @@ -2050,9 +2054,12 @@ def to_standard_count_frame( # clean Unnamed:... out of any values working = blank_out(working, constants.pandas_default_pattern) - # append data from the nth dataframe to the standard-form dataframe + # append standard-forme data from the nth dataframe to the list ## NB: if df_list[n] fails it should not reach this statement - standard[sheet] = pd.concat([standard[sheet], working]) + standard_list.append(working) + + # put all the good standard-form dataframes together into one + standard[sheet] = pd.concat(standard_list) # if even one df lacks a fatal error, consider all errors non-fatal for this sheet non_fatal_dfs = [ diff --git a/src/ini_files_for_results/Kansas/ks_20g_ks_house_official.ini b/src/ini_files_for_results/Kansas/ks_20g_ks_house_official.ini index 5e92957c..e2f48057 100755 --- a/src/ini_files_for_results/Kansas/ks_20g_ks_house_official.ini +++ b/src/ini_files_for_results/Kansas/ks_20g_ks_house_official.ini @@ -1,6 +1,6 @@ [election_results] results_file=Kansas/2020_General_Election_Kansas_House_of_Representatives_results_by_precinct.xlsx -munger_list=ks_gen_main,ks_gen_johnson_count_from_B,ks_gen_shawnee_count_from_B,ks_gen_sedgwick,ks_gen_wyandotte_4_line_header_first_count_col_3 +munger_list=ks_gen_main,ks_gen_johnson_count_from_B,ks_gen_shawnee_count_from_B,ks_gen_sedgwick,ks_gen_wyandotte_4_line_header_first_count_col_3,ks_gen_wyandotte_3_line_header_first_count_col_3,ks_gen_wyandotte_4_line_header_first_count_col_3_merged_rows jurisdiction=Kansas election=2020 General results_short_name=ks_20g_kshouse diff --git a/src/ini_files_for_results/New-Hampshire/nh20g_CD2_official.ini b/src/ini_files_for_results/New-Hampshire/nh20g_CD2_official.ini index 74845e47..1b942176 100644 --- a/src/ini_files_for_results/New-Hampshire/nh20g_CD2_official.ini +++ b/src/ini_files_for_results/New-Hampshire/nh20g_CD2_official.ini @@ -7,7 +7,6 @@ results_short_name=nh20g_cd2 results_download_date=2020-12-22 results_source=https://sos.nh.gov/elections/elections/election-results/ results_note=revised by hand to disambiguate counties & towns with same name (Carroll, Grafton, Hillsborough, Sullivan). Also, candidate Andrew Olding. As of 8/27/2021, the electiondata code throws a (seemingly harmless) warning when processing this file ( /usr/local/lib/python3.9/site-packages/openpyxl/worksheet/header_footer.py:48: UserWarning: Cannot parse header or footer so it will be ignored - warn("""Cannot parse header or footer so it will be ignored""")) -CountItemType=total is_preliminary=False +CountItemType=total diff --git a/src/jurisdictions/Kansas/Candidate.txt b/src/jurisdictions/Kansas/Candidate.txt index 720a43fe..60a9426e 100644 --- a/src/jurisdictions/Kansas/Candidate.txt +++ b/src/jurisdictions/Kansas/Candidate.txt @@ -125,15 +125,12 @@ Rick Kloos Rachel Willis Brenda S. Dietrich Anthony Hensley -Under Votes -Over Votes Laura McConwell Ethan Corson Diana Whittington Cindy Holscher Vail Fruechting Ty Masterson -Total Votes Cast Timothy Don Fry II Mary Ware Dan Kerschen @@ -356,3 +353,6 @@ Vic (T-Bone) Miller Vicki Schmidt Virgil Weigel Wendy Bingesser +Jordan Michael Mackey +Greg Conchola +Rick Parsons diff --git a/src/jurisdictions/Kansas/dictionary.txt b/src/jurisdictions/Kansas/dictionary.txt index 5d45da7b..ecf7b35d 100644 --- a/src/jurisdictions/Kansas/dictionary.txt +++ b/src/jurisdictions/Kansas/dictionary.txt @@ -167,7 +167,6 @@ Candidate Molly Baumgardner Baumgardner, Molly Candidate Monica Murnan Murnan, Monica Candidate Nancy J. Ingle Ingle, Nancy J. Candidate Other Other -Candidate Over Votes Over Votes Candidate Pat Pettey Pat Pettey Candidate Pat Proctor Proctor, Pat Candidate Patrick Penn Penn, Patrick @@ -224,13 +223,11 @@ Candidate Todd Maddox Maddox, Todd Candidate Tom Hawk Hawk, Tom Candidate Tom Holland Holland, Tom Candidate Tory Marie Arnberger Arnberger, Tory Marie -Candidate Total Votes Cast Total Votes Cast Candidate Tracey Mann Mann, Tracey Candidate Trevor Jacobs Jacobs, Trevor Candidate Troy L. Waymaster Waymaster, Troy L. Candidate Ty Masterson Masterson, Ty Candidate Ty Masterson Ty Masterson -Candidate Under Votes Under Votes Candidate Vail Fruechting Vail Fruechting Candidate Virgil Peck Peck, Virgil Candidate W. Michael Shimeall Shimeall, W. Michael @@ -6761,3 +6758,6 @@ ReportingUnit Kansas;Wilson County Kansas;Wilson ReportingUnit Kansas;Woodson County Kansas;Woodson ReportingUnit Kansas;Wyandotte County Kansas;Wyandotte CandidateContest KS Attorney General KS;Attorney General;statewide +Candidate Jordan Michael Mackey Jordan Michael Mackey +Candidate Greg Conchola Greg Conchola +Candidate Rick Parsons Rick Parsons diff --git a/src/mungers/ks_gen_johnson_count_from_B.munger b/src/mungers/ks_gen_johnson_count_from_B.munger index 83c46dd8..09e6a9ea 100644 --- a/src/mungers/ks_gen_johnson_count_from_B.munger +++ b/src/mungers/ks_gen_johnson_count_from_B.munger @@ -49,14 +49,11 @@ CandidateContest= Party= - - - - # Values to ignore (optional) # [ignore] ## E.g: Candidate=Total Votes Cast,Registered Voters ## ReportingUnit=JOHNSON;COUNTY TOTALS,Johnson;COUNTY TOTALS +Candidate=Write-in,Under Votes,Over Votes # Lookup formula sections # ## Required when foreign keys are used in munge formulas and ## diff --git a/src/mungers/ks_gen_sedgwick.munger b/src/mungers/ks_gen_sedgwick.munger index cdd04118..3f735db3 100644 --- a/src/mungers/ks_gen_sedgwick.munger +++ b/src/mungers/ks_gen_sedgwick.munger @@ -62,7 +62,7 @@ Party={,^(\w\w\w) .*$} # Values to ignore (optional) # [ignore] ## E.g: Candidate=Total Votes Cast,Registered Voters ## -Candidate=Write-in Totals,Totals +Candidate=Write-in Totals,Totals,Total Votes Cast ReportingUnit=SEDGWICK;Totals,Sedgwick;Totals # Lookup formula sections # diff --git a/src/mungers/ks_gen_shawnee.munger b/src/mungers/ks_gen_shawnee.munger index c43d8b2a..e4d4844b 100644 --- a/src/mungers/ks_gen_shawnee.munger +++ b/src/mungers/ks_gen_shawnee.munger @@ -62,7 +62,7 @@ Party={,^(\w\w\w) .*$} # Values to ignore (optional) # [ignore] ## E.g: Candidate=Total Votes Cast,Registered Voters ## -Candidate=Write-in Totals +Candidate=Write-in Totals,Write-in # Lookup formula sections # ## Required when foreign keys are used in munge formulas and ## diff --git a/src/mungers/ks_gen_wyandotte_3_line_header_first_count_col_3.munger b/src/mungers/ks_gen_wyandotte_3_line_header_first_count_col_3.munger new file mode 100644 index 00000000..53ea47cd --- /dev/null +++ b/src/mungers/ks_gen_wyandotte_3_line_header_first_count_col_3.munger @@ -0,0 +1,70 @@ +# Format parameters section (required) # +[format] +## Required format parameters: +#### File type must be one of: excel,json-nested,xml,flat_text +file_type=excel +#### Counts are found in one way of: by_name,by_number +count_location=by_number:3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 + +merged_cells=yes + +################################################ +## Sometimes required format parameters: +#### for flat_text file type: +flat_text_delimiter= +#### if count_columns_specified is 'by_name': +count_fields_by_name= +#### if count_columns_specified is 'by_number': +#### if 'in_count_headers' is in munge_strings +#### (start numbering from first unskipped row): +count_header_row_numbers=0,1,2 +#### if 'constant_over_file' is in munge_strings (NB: give value for each in .ini file): +constant_over_file=CountItemType +#### if file type is flat_text or excel and count_columns_specified is 'by_name' +#### (start numbering from first unskipped row): +count_field_name_row= +#### if file type is flat_text or excel and not all rows are data: +#### (start numbering from first unskipped row): +noncount_header_row=0 + +################################################ +## Optional format parameters: +#### for any file type: +thousands_separator=, +encoding= + +#### for a flat_text or excel file type: +###### if field names are not given in file +#all_rows=data +###### if there are multiple blocks of data per page, each with its own headers +multi_block=yes + +#### for excel file type: +sheets_to_read_names=Wyandotte,WYANDOTTE +sheets_to_read_numbers= +sheets_to_skip_names= + +#### for xml file type +nesting_tags= + +# Munge formula sections (required if in munge_strings list) # +[munge formulas] +ReportingUnit=; +Candidate={,^(?:\w\w\w |)(.*)$} +CandidateContest= +Party={,^(\w\w\w) .*$} + + + +# Values to ignore (optional) # +[ignore] +## E.g: Candidate=Total Votes Cast,Registered Voters ## +ReportingUnit=WYANDOTTE;Totals,Wyandotte;Totals,WYANDOTTE; +Candidate=Write-in Totals + +# Lookup formula sections # +## Required when foreign keys are used in munge formulas and ## +## must be looked up in another table. ## +## See mi_gen18.munger for example ## +################################################################## + diff --git a/src/mungers/ks_gen_wyandotte_4_line_header_first_count_col_3.munger b/src/mungers/ks_gen_wyandotte_4_line_header_first_count_col_3.munger index 2d8133ac..2db800e2 100644 --- a/src/mungers/ks_gen_wyandotte_4_line_header_first_count_col_3.munger +++ b/src/mungers/ks_gen_wyandotte_4_line_header_first_count_col_3.munger @@ -60,7 +60,7 @@ Party={,^(\w\w\w) .*$} [ignore] ## E.g: Candidate=Total Votes Cast,Registered Voters ## ReportingUnit=WYANDOTTE;Totals,Wyandotte;Totals,WYANDOTTE; -Candidate=Write-in Totals +Candidate=Write-in Totals,Vote For 1 # Lookup formula sections # ## Required when foreign keys are used in munge formulas and ## diff --git a/src/mungers/ks_gen_wyandotte_4_line_header_first_count_col_3_merged_rows.munger b/src/mungers/ks_gen_wyandotte_4_line_header_first_count_col_3_merged_rows.munger new file mode 100644 index 00000000..9f334c50 --- /dev/null +++ b/src/mungers/ks_gen_wyandotte_4_line_header_first_count_col_3_merged_rows.munger @@ -0,0 +1,70 @@ +# Format parameters section (required) # +[format] +## Required format parameters: +#### File type must be one of: excel,json-nested,xml,flat_text +file_type=excel +#### Counts are found in one way of: by_name,by_number +count_location=by_number:3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 + +merged_cells=yes + +################################################ +## Sometimes required format parameters: +#### for flat_text file type: +flat_text_delimiter= +#### if count_columns_specified is 'by_name': +count_fields_by_name= +#### if count_columns_specified is 'by_number': +#### if 'in_count_headers' is in munge_strings +#### (start numbering from first unskipped row): +count_header_row_numbers=0,2,3 +#### if 'constant_over_file' is in munge_strings (NB: give value for each in .ini file): +constant_over_file=CountItemType +#### if file type is flat_text or excel and count_columns_specified is 'by_name' +#### (start numbering from first unskipped row): +count_field_name_row= +#### if file type is flat_text or excel and not all rows are data: +#### (start numbering from first unskipped row): +noncount_header_row=0 + +################################################ +## Optional format parameters: +#### for any file type: +thousands_separator=, +encoding= + +#### for a flat_text or excel file type: +###### if field names are not given in file +#all_rows=data +###### if there are multiple blocks of data per page, each with its own headers +multi_block=yes + +#### for excel file type: +sheets_to_read_names=Wyandotte,WYANDOTTE +sheets_to_read_numbers= +sheets_to_skip_names= + +#### for xml file type +nesting_tags= + +# Munge formula sections (required if in munge_strings list) # +[munge formulas] +ReportingUnit=; +Candidate={,^(?:\w\w\w |)(.*)$} +CandidateContest= +Party={,^(\w\w\w) .*$} + + + +# Values to ignore (optional) # +[ignore] +## E.g: Candidate=Total Votes Cast,Registered Voters ## +ReportingUnit=WYANDOTTE;Totals,Wyandotte;Totals,WYANDOTTE; +Candidate=Write-in Totals + +# Lookup formula sections # +## Required when foreign keys are used in munge formulas and ## +## must be looked up in another table. ## +## See mi_gen18.munger for example ## +################################################################## + diff --git a/src/reference_results/Kansas.tsv b/src/reference_results/Kansas.tsv index fa9843e5..209240eb 100644 --- a/src/reference_results/Kansas.tsv +++ b/src/reference_results/Kansas.tsv @@ -2,8 +2,12 @@ Jurisdiction Election Contest ReportingUnit VoteType Count Status Source Note Kansas 2020 General US President (KS) Kansas total 1457960 official-final Kansas 2020 General US House KS District 3 Kansas total 410418 official-final Kansas 2020 General KS Senate District 15 Kansas total 23043 official-final -Kansas 2020 General KS House District 2 Kansas total 10874 official-final -Kansas 2020 General US President (KS) Kansas;Bourbon County total 6676 official-final +Kansas 2020 General KS Senate District 6 Kansas total 22583 official-final +Kansas 2020 General KS Senate District 35 Kansas total 28779 official-final +Kansas 2020 General KS House District 2 Kansas total 10874 official-final +Kansas 2020 General KS House District 32 Kansas total 3780 official-final +Kansas 2020 General KS House District 35 Kansas total 7106 official-final +Kansas 2020 General US President (KS) Kansas;Bourbon County total 6676 official-final Kansas 2000 General US President (KS) Kansas total 1072216 official-final https://doi.org/10.7910/DVN/VOQCHQ Kansas 2004 General US President (KS) Kansas total 1187756 official-final https://doi.org/10.7910/DVN/VOQCHQ Kansas 2008 General US President (KS) Kansas total 1235872 official-final https://doi.org/10.7910/DVN/VOQCHQ diff --git a/tests/analyzer_tests/test_exports.py b/tests/analyzer_tests/test_exports.py index c24de644..f5491502 100644 --- a/tests/analyzer_tests/test_exports.py +++ b/tests/analyzer_tests/test_exports.py @@ -5,7 +5,7 @@ import datetime -def test_nist_v2(analyzer, tests_path): +def test_nist_xml(analyzer, tests_path): """Tests whether length of nist v2 export string matches the standard. (Would be better to test that xml is equivalent, but that's harder.)""" # TODO restore test of nist v1 export @@ -15,11 +15,11 @@ def test_nist_v2(analyzer, tests_path): ) # test nist v2 export against sample file - new_str_v2 = analyzer.export_nist_v2("2020 General", "Wyoming") + new_str_v2 = analyzer.export_nist_xml_as_string("2020 General","Wyoming") correct_str_v2 = open(nist_v2_reference_file, "r").read() # test nist v1 export against sample file - # new_str_v1 = f"{analyzer.export_nist_v1_json('2020 General', 'Wyoming')}" + # new_str_v1 = f"{analyzer.export_nist_json('2020 General', 'Wyoming')}" # correct_str_v1 = open(nist_v1_reference_file, "r").read() assert len(correct_str_v2) == len( @@ -36,7 +36,7 @@ def test_nist_v1(analyzer, tests_path): ) # test nist v1 export against sample file - new_str_v1 = f"{analyzer.export_nist_v1_json('2020 General', 'Wyoming')}" + new_str_v1 = f"{analyzer.export_nist_json('2020 General','Wyoming')}" correct_str_v1 = open(nist_v1_reference_file, "r").read() assert len(correct_str_v1) == len(new_str_v1) diff --git a/tests/dataloading_tests/conftest.py b/tests/dataloading_tests/conftest.py index 322bbe0f..326efd98 100644 --- a/tests/dataloading_tests/conftest.py +++ b/tests/dataloading_tests/conftest.py @@ -30,7 +30,7 @@ def test_data_url(request): return request.config.getoption("--test_data_url") -@pytest.fixture(scope="session") +@pytest.fixture(scope="function") def dataloader(param_file): ts = datetime.datetime.now().strftime("%m%d_%H%M") dbname = f"test_{ts}" diff --git a/tests/dataloading_tests/test_dataloading_by_ej.py b/tests/dataloading_tests/test_dataloading_by_ej.py new file mode 100644 index 00000000..52672ba9 --- /dev/null +++ b/tests/dataloading_tests/test_dataloading_by_ej.py @@ -0,0 +1,41 @@ +import pytest +import os +from pathlib import Path + + +def test_dataloader_exists(dataloader): + assert dataloader is not None, ( + "Specify viable dataloader parameter file path with --param_file option to pytest" + "or correct default param_file (run_time.ini in same folder as test_dataloading.py)" + ) + + +def test_loading(dataloader, test_data_url, param_file): + dataloader.get_testing_data_from_git_repo(test_data_url) + successfully_loaded, failed_to_load, all_tests_passed, err = dataloader.load_all( + move_files=False, + rollup=True, + ) + print( + f"successfully loaded:\n{successfully_loaded}\n\n" + f"failed to load:\n{failed_to_load}\n\n" + f"passed all tests?:\n{all_tests_passed}" + ) + # for all election-jurisdiction pairs attempted, can't have any files failing to load, and must have + # all tests passed. + # set of ej-pairs attempted cannot be null + attempted_pairs = successfully_loaded.keys() + assert ( + attempted_pairs + ), "No loadable results files found; check run_time.ini or specified parameter file" + + # all files loaded successfully + assert all( + [v == list() for v in failed_to_load.values()] + ), f"Not all files loaded successfully." + + # all tests passed + assert all([v for v in all_tests_passed.values()]), ( + "Some tests failed. For more information " + f"see reports in {dataloader.d['reports_and_plots_dir']}" + ) diff --git a/tests/dataloading_tests/test_dataloading.py b/tests/dataloading_tests/test_dataloading_multi.py similarity index 61% rename from tests/dataloading_tests/test_dataloading.py rename to tests/dataloading_tests/test_dataloading_multi.py index bc427b3c..d9fcdb6f 100644 --- a/tests/dataloading_tests/test_dataloading.py +++ b/tests/dataloading_tests/test_dataloading_multi.py @@ -10,37 +10,6 @@ def test_dataloader_exists(dataloader): ) -def test_loading(dataloader, test_data_url, param_file): - dataloader.get_testing_data_from_git_repo(test_data_url) - successfully_loaded, failed_to_load, all_tests_passed, err = dataloader.load_all( - move_files=False, - rollup=True, - ) - print( - f"successfully loaded:\n{successfully_loaded}\n\n" - f"failed to load:\n{failed_to_load}\n\n" - f"passed all tests?:\n{all_tests_passed}" - ) - # for all election-jurisdiction pairs attempted, can't have any files failing to load, and must have - # all tests passed. - # set of ej-pairs attempted cannot be null - attempted_pairs = successfully_loaded.keys() - assert ( - attempted_pairs - ), "No loadable results files found; check run_time.ini or specified parameter file" - - # all files loaded successfully - assert all( - [v == list() for v in failed_to_load.values()] - ), f"Not all files loaded successfully." - - # all tests passed - assert all([v for v in all_tests_passed.values()]), ( - "Some tests failed. For more information " - f"see reports in {dataloader.d['reports_and_plots_dir']}" - ) - - def test_multielection_loading(dataloader): tests_dir = Path(__file__).parents[1] reference_results = os.path.join(