Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output folder DNE fix and audit log addition #847

Merged
merged 9 commits into from
Jun 23, 2024
2 changes: 1 addition & 1 deletion src/main/java/network/brightspots/rcv/ResultsWriter.java
Original file line number Diff line number Diff line change
Expand Up @@ -554,7 +554,7 @@ void generateOverallSummaryFiles(
// Note that the castVoteRecords list MUST be stable, as cvrSourceData
// relies on its exact ordering to determine which source each record came from.
// Returns the filepath written
String writeRctabCvrCsv(
String writeRcTabCvrCsv(
List<CastVoteRecord> castVoteRecords,
List<CvrSourceData> cvrSourceData,
String csvOutputFolder)
Expand Down
127 changes: 69 additions & 58 deletions src/main/java/network/brightspots/rcv/TabulatorSession.java
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
import java.util.Map;
import java.util.Set;
import java.util.function.BiConsumer;
import javafx.util.Pair;
import network.brightspots.rcv.CastVoteRecord.CvrParseException;
import network.brightspots.rcv.ContestConfig.Provider;
import network.brightspots.rcv.ContestConfig.UnrecognizedProviderException;
Expand Down Expand Up @@ -105,11 +104,12 @@ boolean convertToCdf(BiConsumer<Double, Double> progressUpdate) {

Progress progress = new Progress(config, 0, progressUpdate);

if (setUpLogging(config.getOutputDirectory()) && config.validate().isEmpty()) {
if (setUpLogging(config.getOutputDirectory(), "convert-to-cdf")
&& config.validate().isEmpty()) {
Logger.info("Converting CVR(s) to CDF...");
try {
FileUtils.createOutputDirectory(config.getOutputDirectory());
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress);
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress, false);
if (!castVoteRecords.successfullyReadAll) {
Logger.severe("Aborting conversion due to cast vote record errors!");
} else {
Expand Down Expand Up @@ -145,30 +145,40 @@ boolean convertToCdf(BiConsumer<Double, Double> progressUpdate) {
return conversionSuccess;
}

boolean convertToCdf() {
return convertToCdf(null);
void convertToCdf() {
convertToCdf(null);
}

LoadedCvrData parseAndCountCastVoteRecords(BiConsumer<Double, Double> progressUpdate)
throws CastVoteRecordGenericParseException {
ContestConfig config = ContestConfig.loadContestConfig(configPath);
boolean setUpLoggingSuccess = setUpLogging(config.getOutputDirectory(), "check-ballot-counts");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the stuff in these logs going to be duplicated in the "tabulate" log so, I'd argue that we shouldn't log in this case at all (which was an open question in #846). This would also allow us to remove the extra prefix field, which is a bonus IMO. Reasons:

  1. No tabulation actually occurs / no files are written
  2. Adds clutter (HD space and number of files)
  3. The prefixes make it so log files are no longer at the top of the directory (since "audit" comes before most other words), and that they're potentially split in the file list, so harder to find all relevant logs
  4. Simpler code

CC: @yezr to comment if necessary.

Copy link
Collaborator Author

@artoonie artoonie Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all your points, happy to defer to @yezr.

One benefit is that it may be confusing to have things outputted to the GUI log that are not in the audit log. But it's essentially duplicated / irrelevant info.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking with @yezr:

  • Don't include Chehck Ballot Counts audit log
  • Do check checksums to ensure no change between Check and Tabulate

I'll get to this!

Copy link
Collaborator Author

@artoonie artoonie Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into this and it's not as straightforward as I had hoped because some CVRs are directories, not files.

Some considerations:

  1. I could hash every file in a directory, but that several downsides, including having to recursively search a potentially very big directory with unnecessary files. We don't mandate that the CVR directory is free of clutter today, and I don't know if we should add that restriction. What if the Dominion CVR directory is just C:/ ? It will also fail if files are opened and temp hidden auto-save files are created, or .DS_STORE on mac.
  2. I could enumerate every file we read in the directory, but this feels error-prone: what if a future release reads a new file from the directory, and we forget to add it to the list of files to hash? That feels like an easy-to-miss bug.

(1) is easier, but (2) seems safer. Still, it makes me want to move away from file hashing, and towards something a little cleaner.

  1. I could hash each CVR, but I run into the same issue as (2): what if, in the future, a new field is added? This feels like a hard-to-catch bug once again, requiring manually spotting it via a PR.
  2. I could hash the entire RCTab CVR, since we have plans to test and mandate that this file contains all information needed to exactly reproduce the election results.

So I'm leaning towards the latter: a checksum validation of the RCTab CVR. The downside is: (1) that means the file can never include timestamps, and (2) we can't inform the user which of their CVR sources has changed.

To sum up the four options:

Checksum Of Downside
Every file in Directory Clutter in directory
Only relevant files in directory Easy to miss a file in the future
Every field in the CastVoteRecord data structure Easy to miss a field in the future
Entire RCTab CVR Lost mapping from changed field to CVR that changed; requires generating this file twice

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a can of worms! I think we should kick the can of worms down the road and break this out into a separate issue, just submitting this PR as-is with the counting logging removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #853 to address stricter CVR comparisons outside of this PR

if (!setUpLoggingSuccess) {
Logger.severe("Failed to configure logger!");
throw new CastVoteRecordGenericParseException();
}

Progress progress = new Progress(config, 0, progressUpdate);
return parseCastVoteRecords(config, progress);
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress, false);
Logger.removeTabulationFileLogging();
return castVoteRecords;
}

// Returns a List of exception class names that were thrown while tabulating.
// Operator name is required for the audit logs.
// Note: An exception MUST be returned any time tabulation does not run.
// In general, that means any Logger.severe in this function should be accompanied
// by an exceptionsEncountered.add(...) call.
List<String> tabulate(String operatorName, LoadedCvrData expectedCvrData,
BiConsumer<Double, Double> progressUpdate) {
List<String> tabulate(
String operatorName,
LoadedCvrData expectedCvrData,
BiConsumer<Double, Double> progressUpdate) {
Logger.info("Starting tabulation session...");
List<String> exceptionsEncountered = new LinkedList<>();
ContestConfig config = ContestConfig.loadContestConfig(configPath);
checkConfigVersionMatchesApp(config);
boolean tabulationSuccess = false;
boolean setUpLoggingSuccess = setUpLogging(config.getOutputDirectory());
boolean setUpLoggingSuccess = setUpLogging(config.getOutputDirectory(), "tabulate");

if (operatorName == null || operatorName.isBlank()) {
Logger.severe("Operator name is required for the audit logs!");
Expand Down Expand Up @@ -208,9 +218,9 @@ List<String> tabulate(String operatorName, LoadedCvrData expectedCvrData,
// Read cast vote records and slice IDs from CVR files
Set<String> newWinnerSet;
try {
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress);
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress, true);
if (config.getSequentialWinners().isEmpty()
&& !castVoteRecords.metadataMatches(expectedCvrData)) {
&& !castVoteRecords.metadataMatches(expectedCvrData)) {
Logger.severe("CVR data has changed between loading the CVRs and reading them!");
exceptionsEncountered.add(TabulationAbortedException.class.toString());
break;
Expand Down Expand Up @@ -245,7 +255,7 @@ List<String> tabulate(String operatorName, LoadedCvrData expectedCvrData,
// normal operation (not multi-pass IRV, a.k.a. sequential multi-seat)
// Read cast vote records and precinct IDs from CVR files
try {
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress);
LoadedCvrData castVoteRecords = parseCastVoteRecords(config, progress, true);
if (!castVoteRecords.metadataMatches(expectedCvrData)) {
Logger.severe("CVR data has changed between loading the CVRs and reading them!");
exceptionsEncountered.add(TabulationAbortedException.class.toString());
Expand Down Expand Up @@ -278,20 +288,21 @@ List<String> tabulate(String operatorName) {
Set<String> loadSliceNamesFromCvrs(ContestConfig.TabulateBySlice slice, ContestConfig config) {
Progress progress = new Progress(config, 0, null);
try {
List<CastVoteRecord> castVoteRecords = parseCastVoteRecords(config, progress).getCvrs();
List<CastVoteRecord> castVoteRecords =
parseCastVoteRecords(config, progress, false).getCvrs();
return new Tabulator(castVoteRecords, config).getEnabledSliceIds().get(slice);
} catch (TabulationAbortedException | CastVoteRecordGenericParseException e) {
throw new RuntimeException(e);
}
}

private boolean setUpLogging(String outputDirectory) {
private boolean setUpLogging(String outputDirectory, String filenamePrefix) {
boolean success = false;
// cache outputPath for testing
outputPath = outputDirectory;
try {
FileUtils.createOutputDirectory(outputDirectory);
Logger.addTabulationFileLogging(outputDirectory, timestampString);
Logger.addTabulationFileLogging(outputDirectory, timestampString + "_" + filenamePrefix);
success = true;
} catch (UnableToCreateDirectoryException | IOException exception) {
Logger.severe("Failed to configure tabulation logger!\n%s", exception);
Expand All @@ -318,11 +329,18 @@ private Set<String> runTabulationForConfig(
return winners;
}

// parse CVR files referenced in the ContestConfig object into a list of CastVoteRecords
// param: config object containing CVR file paths to parse
// returns: list of parsed CVRs or null if an error was encountered
private LoadedCvrData parseCastVoteRecords(ContestConfig config, Progress progress)
throws CastVoteRecordGenericParseException {
/**
* Parse CVR files referenced in the ContestConfig object into a list of CastVoteRecords.
*
* @param config Object containing CVR file paths to parse.
* @param progress Object tracking progress of parsing the CVRs.
* @param shouldOutputRcTabCvr Whether to output the simplified RCTab CVR CSV file.
* @return List of parsed CVRs or null if an error was encountered.
* @throws CastVoteRecordGenericParseException If any failure occurs when parsing CVRs.
*/
private LoadedCvrData parseCastVoteRecords(
ContestConfig config, Progress progress, boolean shouldOutputRcTabCvr)
throws CastVoteRecordGenericParseException {
Logger.info("Parsing cast vote records...");
List<CastVoteRecord> castVoteRecords = new ArrayList<>();
boolean encounteredSourceProblem = false;
Expand All @@ -332,7 +350,7 @@ private LoadedCvrData parseCastVoteRecords(ContestConfig config, Progress progre

// At each iteration of the following loop, we add records from another source file.
for (int sourceIndex = 0; sourceIndex < config.rawConfig.cvrFileSources.size(); ++sourceIndex) {
RawContestConfig.CvrSource source = config.rawConfig.cvrFileSources.get(sourceIndex);
RawContestConfig.CvrSource source = config.rawConfig.cvrFileSources.get(sourceIndex);
String cvrPath = config.resolveConfigPath(source.getFilePath());
Provider provider = ContestConfig.getProvider(source);
try {
Expand All @@ -342,12 +360,9 @@ private LoadedCvrData parseCastVoteRecords(ContestConfig config, Progress progre
reader.readCastVoteRecords(castVoteRecords);

// Update the per-source data for the results writer
cvrSourceData.add(new ResultsWriter.CvrSourceData(
source,
reader,
sourceIndex,
startIndex,
castVoteRecords.size() - 1));
cvrSourceData.add(
new ResultsWriter.CvrSourceData(
source, reader, sourceIndex, startIndex, castVoteRecords.size() - 1));

// Check for unrecognized candidates
Map<String, Integer> unrecognizedCandidateCounts =
Expand Down Expand Up @@ -396,17 +411,16 @@ private LoadedCvrData parseCastVoteRecords(ContestConfig config, Progress progre
progress.markFileRead();
}

// Output the RCTab-CSV CVR
try {
ResultsWriter writer =
new ResultsWriter().setContestConfig(config).setTimestampString(timestampString);
this.convertedFilePath =
writer.writeRctabCvrCsv(
castVoteRecords,
cvrSourceData,
config.getOutputDirectory());
} catch (IOException exception) {
// error already logged in ResultsWriter
// Output the simplified RCTab CVR CSV file
if (shouldOutputRcTabCvr) {
try {
ResultsWriter writer =
new ResultsWriter().setContestConfig(config).setTimestampString(timestampString);
this.convertedFilePath =
writer.writeRcTabCvrCsv(castVoteRecords, cvrSourceData, config.getOutputDirectory());
} catch (IOException exception) {
// error already logged in ResultsWriter
}
}

if (encounteredSourceProblem) {
Expand Down Expand Up @@ -436,27 +450,25 @@ static class UnrecognizedCandidatesException extends Exception {
}
}

static class CastVoteRecordGenericParseException extends Exception {
}
static class CastVoteRecordGenericParseException extends Exception {}

/**
* A summary of the cast vote records that have been read.
* Manages CVR in memory, so you can retain metadata about the loaded CVRs without
* keeping them all in memory. Use .discard() to free up memory. After memory is freed,
* all other operations except for getCvrs() are still valid.
* A summary of the cast vote records that have been read. Manages CVR in memory, so you can
* retain metadata about the loaded CVRs without keeping them all in memory. Use .discard() to
* free up memory. After memory is freed, all other operations except for getCvrs() are still
* valid.
*/
public static class LoadedCvrData {
public static final LoadedCvrData MATCHES_ALL = new LoadedCvrData();
public final boolean successfullyReadAll;

private List<CastVoteRecord> cvrs;
private final int numCvrs;
private List<ResultsWriter.CvrSourceData> cvrSourcesData;
private final List<ResultsWriter.CvrSourceData> cvrSourcesData;
private boolean isDiscarded;
private final boolean doesMatchAllMetadata;

public LoadedCvrData(List<CastVoteRecord> cvrs,
List<ResultsWriter.CvrSourceData> cvrSourcesData) {
LoadedCvrData(List<CastVoteRecord> cvrs, List<ResultsWriter.CvrSourceData> cvrSourcesData) {
this.cvrs = cvrs;
this.successfullyReadAll = cvrs != null;
this.numCvrs = cvrs != null ? cvrs.size() : 0;
Expand All @@ -466,8 +478,8 @@ public LoadedCvrData(List<CastVoteRecord> cvrs,
}

/**
* This constructor will cause metadataMatches to always return true,
* and contains no true statistics.
* This constructor will cause metadataMatches to always return true, and contains no true
* statistics.
*/
private LoadedCvrData() {
this.cvrs = null;
Expand All @@ -479,23 +491,23 @@ private LoadedCvrData() {
}

/**
* Currently only checks if the number of CVRs matches, but can be extended to ensure
* exact matches or meet other needs.
* Currently only checks if the number of CVRs matches, but can be extended to ensure exact
* matches or meet other needs.
*
* @param other The loaded CVRs to compare against
* @return whether the metadata matches
*/
public boolean metadataMatches(LoadedCvrData other) {
return other.doesMatchAllMetadata
|| this.doesMatchAllMetadata
|| other.numCvrs() == this.numCvrs();
|| this.doesMatchAllMetadata
|| other.numCvrs() == this.numCvrs();
}

public int numCvrs() {
return numCvrs;
}

public List<ResultsWriter.CvrSourceData> getCvrSourcesData() {
List<ResultsWriter.CvrSourceData> getCvrSourcesData() {
return cvrSourcesData;
}

Expand All @@ -504,7 +516,7 @@ public void discard() {
isDiscarded = true;
}

public List<CastVoteRecord> getCvrs() {
List<CastVoteRecord> getCvrs() {
if (isDiscarded) {
throw new IllegalStateException("CVRs have been discarded from memory.");
}
Expand All @@ -514,11 +526,10 @@ public List<CastVoteRecord> getCvrs() {
public void printSummary() {
Logger.info("Cast Vote Record summary:");
for (ResultsWriter.CvrSourceData sourceData : cvrSourcesData) {
Logger.info("Source %d: %s",
sourceData.sourceIndex + 1, sourceData.source.getFilePath());
Logger.info("Source %d: %s", sourceData.sourceIndex + 1, sourceData.source.getFilePath());
Logger.info(" uses provider: %s", sourceData.source.getProvider());
Logger.info(" read %d cast vote records", sourceData.getNumCvrs());
}
}
}
}
}
Loading