Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output folder DNE fix and audit log addition #847

Merged
merged 9 commits into from
Jun 23, 2024

Conversation

artoonie
Copy link
Collaborator

@artoonie artoonie commented Jun 14, 2024

closes #846

Open questions to be discussed in the Issue: should we include the audit log for the Check Ballot Counts step?

This PR does two things:

  1. A flag to opt to create the RCTab Generic CVR when reading CVRs
  2. TBD, a prefix to the audit log filename

@@ -152,8 +152,16 @@ boolean convertToCdf() {
LoadedCvrData parseAndCountCastVoteRecords(BiConsumer<Double, Double> progressUpdate)
throws CastVoteRecordGenericParseException {
ContestConfig config = ContestConfig.loadContestConfig(configPath);
boolean setUpLoggingSuccess = setUpLogging(config.getOutputDirectory(), "check-ballot-counts");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the stuff in these logs going to be duplicated in the "tabulate" log so, I'd argue that we shouldn't log in this case at all (which was an open question in #846). This would also allow us to remove the extra prefix field, which is a bonus IMO. Reasons:

  1. No tabulation actually occurs / no files are written
  2. Adds clutter (HD space and number of files)
  3. The prefixes make it so log files are no longer at the top of the directory (since "audit" comes before most other words), and that they're potentially split in the file list, so harder to find all relevant logs
  4. Simpler code

CC: @yezr to comment if necessary.

Copy link
Collaborator Author

@artoonie artoonie Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all your points, happy to defer to @yezr.

One benefit is that it may be confusing to have things outputted to the GUI log that are not in the audit log. But it's essentially duplicated / irrelevant info.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking with @yezr:

  • Don't include Chehck Ballot Counts audit log
  • Do check checksums to ensure no change between Check and Tabulate

I'll get to this!

Copy link
Collaborator Author

@artoonie artoonie Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into this and it's not as straightforward as I had hoped because some CVRs are directories, not files.

Some considerations:

  1. I could hash every file in a directory, but that several downsides, including having to recursively search a potentially very big directory with unnecessary files. We don't mandate that the CVR directory is free of clutter today, and I don't know if we should add that restriction. What if the Dominion CVR directory is just C:/ ? It will also fail if files are opened and temp hidden auto-save files are created, or .DS_STORE on mac.
  2. I could enumerate every file we read in the directory, but this feels error-prone: what if a future release reads a new file from the directory, and we forget to add it to the list of files to hash? That feels like an easy-to-miss bug.

(1) is easier, but (2) seems safer. Still, it makes me want to move away from file hashing, and towards something a little cleaner.

  1. I could hash each CVR, but I run into the same issue as (2): what if, in the future, a new field is added? This feels like a hard-to-catch bug once again, requiring manually spotting it via a PR.
  2. I could hash the entire RCTab CVR, since we have plans to test and mandate that this file contains all information needed to exactly reproduce the election results.

So I'm leaning towards the latter: a checksum validation of the RCTab CVR. The downside is: (1) that means the file can never include timestamps, and (2) we can't inform the user which of their CVR sources has changed.

To sum up the four options:

Checksum Of Downside
Every file in Directory Clutter in directory
Only relevant files in directory Easy to miss a file in the future
Every field in the CastVoteRecord data structure Easy to miss a field in the future
Entire RCTab CVR Lost mapping from changed field to CVR that changed; requires generating this file twice

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a can of worms! I think we should kick the can of worms down the road and break this out into a separate issue, just submitting this PR as-is with the counting logging removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #853 to address stricter CVR comparisons outside of this PR

Armin Samii added 2 commits June 19, 2024 18:30
…output-folder-dne

Also remove double tabulation logging
@artoonie artoonie merged commit 04a3835 into develop Jun 23, 2024
1 check passed
@artoonie artoonie deleted the feature/issue-846_output-folder-dne branch June 23, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When output folder doesn't exist, error shows in UI
3 participants