Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output folder DNE fix and audit log addition #847

Merged
merged 9 commits into from
Jun 23, 2024
43 changes: 27 additions & 16 deletions src/main/java/network/brightspots/rcv/TabulatorSession.java
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,11 @@ boolean convertToCdf(BiConsumer<Double, Double> progressUpdate) {

Progress progress = new Progress(config, 0, progressUpdate);

if (setUpLogging(config.getOutputDirectory()) && config.validate().isEmpty()) {
if (setUpLogging(config.getOutputDirectory(), "to-cdf") && config.validate().isEmpty()) {
Logger.info("Converting CVR(s) to CDF...");
try {
FileUtils.createOutputDirectory(config.getOutputDirectory());
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress);
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress, false);
if (!castVoteRecords.successfullyReadAll) {
Logger.severe("Aborting conversion due to cast vote record errors!");
} else {
Expand Down Expand Up @@ -152,8 +152,16 @@ boolean convertToCdf() {
LoadedCvrData parseAndCountCastVoteRecords(BiConsumer<Double, Double> progressUpdate)
throws CastVoteRecordGenericParseException {
ContestConfig config = ContestConfig.loadContestConfig(configPath);
boolean setUpLoggingSuccess = setUpLogging(config.getOutputDirectory(), "check-ballot-counts");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the stuff in these logs going to be duplicated in the "tabulate" log so, I'd argue that we shouldn't log in this case at all (which was an open question in #846). This would also allow us to remove the extra prefix field, which is a bonus IMO. Reasons:

  1. No tabulation actually occurs / no files are written
  2. Adds clutter (HD space and number of files)
  3. The prefixes make it so log files are no longer at the top of the directory (since "audit" comes before most other words), and that they're potentially split in the file list, so harder to find all relevant logs
  4. Simpler code

CC: @yezr to comment if necessary.

Copy link
Collaborator Author

@artoonie artoonie Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all your points, happy to defer to @yezr.

One benefit is that it may be confusing to have things outputted to the GUI log that are not in the audit log. But it's essentially duplicated / irrelevant info.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking with @yezr:

  • Don't include Chehck Ballot Counts audit log
  • Do check checksums to ensure no change between Check and Tabulate

I'll get to this!

Copy link
Collaborator Author

@artoonie artoonie Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into this and it's not as straightforward as I had hoped because some CVRs are directories, not files.

Some considerations:

  1. I could hash every file in a directory, but that several downsides, including having to recursively search a potentially very big directory with unnecessary files. We don't mandate that the CVR directory is free of clutter today, and I don't know if we should add that restriction. What if the Dominion CVR directory is just C:/ ? It will also fail if files are opened and temp hidden auto-save files are created, or .DS_STORE on mac.
  2. I could enumerate every file we read in the directory, but this feels error-prone: what if a future release reads a new file from the directory, and we forget to add it to the list of files to hash? That feels like an easy-to-miss bug.

(1) is easier, but (2) seems safer. Still, it makes me want to move away from file hashing, and towards something a little cleaner.

  1. I could hash each CVR, but I run into the same issue as (2): what if, in the future, a new field is added? This feels like a hard-to-catch bug once again, requiring manually spotting it via a PR.
  2. I could hash the entire RCTab CVR, since we have plans to test and mandate that this file contains all information needed to exactly reproduce the election results.

So I'm leaning towards the latter: a checksum validation of the RCTab CVR. The downside is: (1) that means the file can never include timestamps, and (2) we can't inform the user which of their CVR sources has changed.

To sum up the four options:

Checksum Of Downside
Every file in Directory Clutter in directory
Only relevant files in directory Easy to miss a file in the future
Every field in the CastVoteRecord data structure Easy to miss a field in the future
Entire RCTab CVR Lost mapping from changed field to CVR that changed; requires generating this file twice

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a can of worms! I think we should kick the can of worms down the road and break this out into a separate issue, just submitting this PR as-is with the counting logging removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #853 to address stricter CVR comparisons outside of this PR

if (!setUpLoggingSuccess) {
Logger.severe("Failed to configure logger!");
throw new CastVoteRecordGenericParseException();
}

Progress progress = new Progress(config, 0, progressUpdate);
return parseCastVoteRecords(config, progress);
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress, false);
Logger.removeTabulationFileLogging();
return castVoteRecords;
}

// Returns a List of exception class names that were thrown while tabulating.
Expand All @@ -168,7 +176,7 @@ List<String> tabulate(String operatorName, LoadedCvrData expectedCvrData,
ContestConfig config = ContestConfig.loadContestConfig(configPath);
checkConfigVersionMatchesApp(config);
boolean tabulationSuccess = false;
boolean setUpLoggingSuccess = setUpLogging(config.getOutputDirectory());
boolean setUpLoggingSuccess = setUpLogging(config.getOutputDirectory(), "tabulate");

if (operatorName == null || operatorName.isBlank()) {
Logger.severe("Operator name is required for the audit logs!");
Expand Down Expand Up @@ -208,7 +216,7 @@ List<String> tabulate(String operatorName, LoadedCvrData expectedCvrData,
// Read cast vote records and slice IDs from CVR files
Set<String> newWinnerSet;
try {
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress);
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress, true);
if (config.getSequentialWinners().isEmpty()
&& !castVoteRecords.metadataMatches(expectedCvrData)) {
Logger.severe("CVR data has changed between loading the CVRs and reading them!");
Expand Down Expand Up @@ -245,7 +253,7 @@ List<String> tabulate(String operatorName, LoadedCvrData expectedCvrData,
// normal operation (not multi-pass IRV, a.k.a. sequential multi-seat)
// Read cast vote records and precinct IDs from CVR files
try {
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress);
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress, true);
if (!castVoteRecords.metadataMatches(expectedCvrData)) {
Logger.severe("CVR data has changed between loading the CVRs and reading them!");
exceptionsEncountered.add(TabulationAbortedException.class.toString());
Expand Down Expand Up @@ -278,20 +286,21 @@ List<String> tabulate(String operatorName) {
Set<String> loadSliceNamesFromCvrs(ContestConfig.TabulateBySlice slice, ContestConfig config) {
Progress progress = new Progress(config, 0, null);
try {
List<CastVoteRecord> castVoteRecords = parseCastVoteRecords(config, progress).getCvrs();
List<CastVoteRecord> castVoteRecords = parseCastVoteRecords(
config, progress, false).getCvrs();
return new Tabulator(castVoteRecords, config).getEnabledSliceIds().get(slice);
} catch (TabulationAbortedException | CastVoteRecordGenericParseException e) {
throw new RuntimeException(e);
}
}

private boolean setUpLogging(String outputDirectory) {
private boolean setUpLogging(String outputDirectory, String prefix) {
boolean success = false;
// cache outputPath for testing
outputPath = outputDirectory;
try {
FileUtils.createOutputDirectory(outputDirectory);
Logger.addTabulationFileLogging(outputDirectory, timestampString);
Logger.addTabulationFileLogging(outputDirectory, timestampString + "_" + prefix);
success = true;
} catch (UnableToCreateDirectoryException | IOException exception) {
Logger.severe("Failed to configure tabulation logger!\n%s", exception);
Expand Down Expand Up @@ -321,8 +330,8 @@ private Set<String> runTabulationForConfig(
// parse CVR files referenced in the ContestConfig object into a list of CastVoteRecords
// param: config object containing CVR file paths to parse
// returns: list of parsed CVRs or null if an error was encountered
private LoadedCvrData parseCastVoteRecords(ContestConfig config, Progress progress)
throws CastVoteRecordGenericParseException {
private LoadedCvrData parseCastVoteRecords(ContestConfig config, Progress progress,
boolean outputGenericCvr) throws CastVoteRecordGenericParseException {
Logger.info("Parsing cast vote records...");
List<CastVoteRecord> castVoteRecords = new ArrayList<>();
boolean encounteredSourceProblem = false;
Expand Down Expand Up @@ -397,16 +406,18 @@ private LoadedCvrData parseCastVoteRecords(ContestConfig config, Progress progre
}

// Output the RCTab-CSV CVR
try {
ResultsWriter writer =
if (outputGenericCvr) {
try {
ResultsWriter writer =
new ResultsWriter().setContestConfig(config).setTimestampString(timestampString);
this.convertedFilePath =
this.convertedFilePath =
writer.writeRctabCvrCsv(
castVoteRecords,
cvrSourceData,
config.getOutputDirectory());
} catch (IOException exception) {
// error already logged in ResultsWriter
} catch (IOException exception) {
// error already logged in ResultsWriter
}
}

if (encounteredSourceProblem) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"firstVoteColumnIndex" : "2",
"firstVoteRowIndex" : "2",
"idColumnIndex" : "1",
"precinctColumnIndex" : "2",
"precinctColumnIndex" : "",
"overvoteDelimiter" : "",
"provider" : "ess",
"overvoteLabel" : "overvote",
Expand Down
Loading