deprecate the concept of "codes" (almost) entirely #690

artoonie · 2023-06-09T00:42:25Z

resolves #663

This removes the concept of "codes" entirely. In doing so, all dominion tests have been updated to use candidate names instead of codes.

In order to ensure that the tests were only changed for exactly what we expect, I didn't simply replace test files with their updated version; I used the script below to update each test. It's not documented because it's a one-time throwaway, but let me know if you have questions on it:

import json
import sys

# To run: python3 test.py <test-directory-name> [--save].
base = sys.argv[1]
doSave = len(sys.argv) > 2 and sys.argv[2] == "--save"
print(f"working on {base} and doSave={doSave}")

# Map candidate codes to candidate names
codeToName = {}
with open(f"{base}/{base}_config.json", 'r') as f:
    data = json.loads(f.read())
for candidate in data['candidates']:
    codeToName[candidate['code']] = candidate['name']

# Undeclared write-in maps to itself, e.g. 12=12. Output will have the numerical code.
uwiLabel = data['cvrFileSources'][0]['undeclaredWriteInLabel']
codeToName[uwiLabel] = uwiLabel
contestId = data['cvrFileSources'][0]['contestId']

# For each line, replace candidate codes with their names instead.
lines = []
with open(f"{base}/{base}_contest_{contestId}_expected.csv", 'r') as f:
    for rawline in f.readlines():
        if rawline.startswith('Contest'):
            lines.append(rawline.strip())
        else:
            line = rawline.strip().split(',')
            for i,item in enumerate(line[6:]):
                if item.strip() in ['undervote', 'overvote']:
                    continue
                line[i+6] = codeToName[item.strip()]
            lines.append(','.join(line))

print('\n-----\n'.join(lines))

if doSave:
    with open(f"{base}/{base}_contest_{contestId}_expected.csv", 'w') as f:
        f.write('\n'.join(lines))

HEdingfield

Please review the commit I added to clean up a bit more and make sure it looks good to you.

src/main/java/network/brightspots/rcv/RawContestConfig.java

HEdingfield · 2023-06-12T22:00:43Z

...rk/brightspots/rcv/test_data/dominion_multi_file/dominion_multi_file_contest_13_expected.csv

-13,18,504,4,,PCT 7348,27,28,undervote,undervote
-13,19,664,45,,PCT 7326,27,28,29,30
-13,19,664,67,,PCT 7321,28,29,30,undervote
+Contest Id,Tabulator Id,Batch Id,Record Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3,Rank 4


On the topic of tests, does testPortlandMayorCodes need to be updated somehow as well?

I don't think so -- it still verifies that the codes work as intended.

Heads-up that #642 will swap these over to aliases whether we like it or not. I can make a note in there to keep it as-is if we care about testing old file formats, but I don't really think we do.

HEdingfield · 2023-06-12T22:26:07Z

src/main/java/network/brightspots/rcv/ContestConfig.java

@@ -1063,10 +1063,6 @@ String getNameForCandidate(String nameOrAlias) {
    return candidateAliasesToNameMap.get(nameOrAlias);
  }

-  String getCodeForCandidate(String nameOrAlias) {


In this comment, you mentioned possibly reverting / getting rid of some lines in TabulatorSession. Is that no longer feasible?

I pushed a commit showing the effect of that. Depends on what we want -- we can get rid of getNameForCandidate in ResultsWriter, which generates results that use the candidate code instead of the candidate name. See the diff here.

I'm not sure how the _expected.csvs are used, so I don't know which format is more useful, but it seems likely that it's more useful with canonical names (i.e. reverting #3f121b7, the next commit) and keeping the code as it was.

Let me know which you prefer.

The _expected.csvs are just used for our tests.

Hard for me to answer this question... might be best for @tarheel or @chughes297 to weigh in? It appears that with the above commit you linked, it leaves our tests as-is, which is probably preferred?

Are the outputted CSVs never used by election administrators? If so, then I suppose it doesn't matter.

The real output CSV are definitely used by admins. @HEdingfield is just referring specifically to the _expected.csv files that are in the test directories.

Apologies if I missed a relevant part of the discussion -- I'm just looking at this comment thread -- but I think ideally the output CSV would have the canonical names in one column and then perhaps a separate column that lists any alias that are used for each candidate (separated by a delimiter).

Great. I think that makes sense too. I reverted the last commit, and this PR should be ready to go now.

Hm-- let's separate adding both canonical names and aliases for another task?

@chughes297 @tarheel do y'all think it's worth creating a new issue for this? If so, is it something we'd need to include in 1.4.0?

Yes, I think it would be useful to list aliases in the output spreadsheet. I doubt it's a requirement for v1.4, but @chughes297 can comment.

Yes let's create a separate issue but not a huge priority for me to include in 1.4.0.

Created #705 for this.

This reverts commit 3f121b7.

HEdingfield

LGTM!

artoonie requested a review from HEdingfield June 9, 2023 00:42

artoonie added the v2.0 Nice-to-Have P1 label Jun 9, 2023

artoonie force-pushed the feature/issue-663_deprecate-codes branch from 0655785 to e4d1ef3 Compare June 9, 2023 00:44

deprecate the concept of "codes" (almost) entirely

79ca6cc

artoonie force-pushed the feature/issue-663_deprecate-codes branch from e4d1ef3 to 79ca6cc Compare June 9, 2023 00:44

HEdingfield added 2 commits June 12, 2023 14:22

Merge branch 'develop' into feature/issue-663_deprecate-codes

1a1f0a3

Cleans up additional references to candidate code.

2e74d82

HEdingfield reviewed Jun 12, 2023

View reviewed changes

artoonie and others added 3 commits June 12, 2023 18:43

don't getNameForCandidate in ResultsWriter

3f121b7

Revert "don't getNameForCandidate in ResultsWriter"

daa7859

This reverts commit 3f121b7.

Merge branch 'develop' into feature/issue-663_deprecate-codes

7a8e8aa

HEdingfield approved these changes Jun 14, 2023

View reviewed changes

HEdingfield merged commit 558b63d into develop Jun 14, 2023
1 check passed

HEdingfield deleted the feature/issue-663_deprecate-codes branch June 14, 2023 06:29

HEdingfield mentioned this pull request Jun 15, 2023

Include aliases in output CSVs #705

Open

yezr mentioned this pull request Dec 15, 2023

Support for multiple CVR formats #779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deprecate the concept of "codes" (almost) entirely #690

deprecate the concept of "codes" (almost) entirely #690

artoonie commented Jun 9, 2023

HEdingfield left a comment

HEdingfield Jun 12, 2023

artoonie Jun 12, 2023

HEdingfield Jun 13, 2023

HEdingfield Jun 12, 2023

artoonie Jun 12, 2023

HEdingfield Jun 13, 2023

artoonie Jun 13, 2023

tarheel Jun 13, 2023

artoonie Jun 13, 2023

HEdingfield Jun 14, 2023

tarheel Jun 14, 2023

chughes297 Jun 14, 2023

HEdingfield Jun 15, 2023

HEdingfield left a comment

deprecate the concept of "codes" (almost) entirely #690

deprecate the concept of "codes" (almost) entirely #690

Conversation

artoonie commented Jun 9, 2023

HEdingfield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HEdingfield left a comment

Choose a reason for hiding this comment