Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate the concept of "codes" (almost) entirely #690

Merged
merged 6 commits into from Jun 14, 2023

Conversation

artoonie
Copy link
Collaborator

@artoonie artoonie commented Jun 9, 2023

resolves #663

This removes the concept of "codes" entirely. In doing so, all dominion tests have been updated to use candidate names instead of codes.

In order to ensure that the tests were only changed for exactly what we expect, I didn't simply replace test files with their updated version; I used the script below to update each test. It's not documented because it's a one-time throwaway, but let me know if you have questions on it:

import json
import sys

# To run: python3 test.py <test-directory-name> [--save].
base = sys.argv[1]
doSave = len(sys.argv) > 2 and sys.argv[2] == "--save"
print(f"working on {base} and doSave={doSave}")

# Map candidate codes to candidate names
codeToName = {}
with open(f"{base}/{base}_config.json", 'r') as f:
    data = json.loads(f.read())
for candidate in data['candidates']:
    codeToName[candidate['code']] = candidate['name']

# Undeclared write-in maps to itself, e.g. 12=12. Output will have the numerical code.
uwiLabel = data['cvrFileSources'][0]['undeclaredWriteInLabel']
codeToName[uwiLabel] = uwiLabel
contestId = data['cvrFileSources'][0]['contestId']

# For each line, replace candidate codes with their names instead.
lines = []
with open(f"{base}/{base}_contest_{contestId}_expected.csv", 'r') as f:
    for rawline in f.readlines():
        if rawline.startswith('Contest'):
            lines.append(rawline.strip())
        else:
            line = rawline.strip().split(',')
            for i,item in enumerate(line[6:]):
                if item.strip() in ['undervote', 'overvote']:
                    continue
                line[i+6] = codeToName[item.strip()]
            lines.append(','.join(line))

print('\n-----\n'.join(lines))

if doSave:
    with open(f"{base}/{base}_contest_{contestId}_expected.csv", 'w') as f:
        f.write('\n'.join(lines))

@artoonie artoonie force-pushed the feature/issue-663_deprecate-codes branch from e4d1ef3 to 79ca6cc Compare June 9, 2023 00:44
Copy link
Contributor

@HEdingfield HEdingfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the commit I added to clean up a bit more and make sure it looks good to you.

13,18,504,4,,PCT 7348,27,28,undervote,undervote
13,19,664,45,,PCT 7326,27,28,29,30
13,19,664,67,,PCT 7321,28,29,30,undervote
Contest Id,Tabulator Id,Batch Id,Record Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3,Rank 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the topic of tests, does testPortlandMayorCodes need to be updated somehow as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so -- it still verifies that the codes work as intended.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads-up that #642 will swap these over to aliases whether we like it or not. I can make a note in there to keep it as-is if we care about testing old file formats, but I don't really think we do.

@@ -1063,10 +1063,6 @@ String getNameForCandidate(String nameOrAlias) {
return candidateAliasesToNameMap.get(nameOrAlias);
}

String getCodeForCandidate(String nameOrAlias) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this comment, you mentioned possibly reverting / getting rid of some lines in TabulatorSession. Is that no longer feasible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit showing the effect of that. Depends on what we want -- we can get rid of getNameForCandidate in ResultsWriter, which generates results that use the candidate code instead of the candidate name. See the diff here.

I'm not sure how the _expected.csvs are used, so I don't know which format is more useful, but it seems likely that it's more useful with canonical names (i.e. reverting #3f121b7, the next commit) and keeping the code as it was.

Let me know which you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _expected.csvs are just used for our tests.

Hard for me to answer this question... might be best for @tarheel or @chughes297 to weigh in? It appears that with the above commit you linked, it leaves our tests as-is, which is probably preferred?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the outputted CSVs never used by election administrators? If so, then I suppose it doesn't matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real output CSV are definitely used by admins. @HEdingfield is just referring specifically to the _expected.csv files that are in the test directories.

Apologies if I missed a relevant part of the discussion -- I'm just looking at this comment thread -- but I think ideally the output CSV would have the canonical names in one column and then perhaps a separate column that lists any alias that are used for each candidate (separated by a delimiter).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I think that makes sense too. I reverted the last commit, and this PR should be ready to go now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm-- let's separate adding both canonical names and aliases for another task?

@chughes297 @tarheel do y'all think it's worth creating a new issue for this? If so, is it something we'd need to include in 1.4.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would be useful to list aliases in the output spreadsheet. I doubt it's a requirement for v1.4, but @chughes297 can comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's create a separate issue but not a huge priority for me to include in 1.4.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #705 for this.

Copy link
Contributor

@HEdingfield HEdingfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@HEdingfield HEdingfield merged commit 558b63d into develop Jun 14, 2023
1 check passed
@HEdingfield HEdingfield deleted the feature/issue-663_deprecate-codes branch June 14, 2023 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Remove notion of "Candidate Codes" altogether
4 participants