Ci rebase on develop #576

artoonie · 2021-08-11T14:31:58Z

Same as #573 , but rebased on develop with fixed merge conflicts

HEdingfield · 2021-08-12T07:07:52Z

Big thanks for the explanation here! Going to discuss with the others before approving.

One point of clarity: you're deleting these test files out of the repo, so it will indeed affect our local tests. But it looks like this .gitmodule file might act as a local pointer to the repo and... automatically download it and put it in the right folders we'd expect? This is the first time I've encountered something like this; how does it work in practice?

HEdingfield · 2021-08-12T07:10:35Z

Also agree with your other comment that BrightSpots should host this on our end, so I think before approving we should get a repo going similar to the one you created and update this PR to point to that one.

artoonie · 2021-08-12T13:41:45Z

Ah right, I should have explained submodules too:
If you run
git submodule init && git submodule update It downloads the test files and places them exactly as they were before.

The only change will be how you update the tests. After you change a test file, you have to push to the tests repo then commit and push to this repo. It’s just a couple more steps: two commits and two pushes instead of one of each.

tarheel

Sorry, I forgot about this! I'm open to trying it. If we end up regretting it for some reason, we can always revert it.

HEdingfield · 2021-10-12T07:02:01Z

@artoonie Thanks for the explanation above on how to run the submodules!

Two last follow-up things, and then I'll be ready to approve too:

I'm confused about what's going on in this commit. I thought you had deleted those files, but it looks like they're being modified?
Can we host the test repo on BrightSpots instead of artoonie? What's the best way to go about doing this; just fork yours, or try to recreate it manually?

HEdingfield · 2021-10-12T07:05:29Z

Re: 2 above: sorry, just saw your email. I went ahead and forked it here: https://github.com/BrightSpots/rcvtests

I think you should be good to delete https://github.com/artoonie/rcvtests now and update this CL.

Still curious about item 1 in my last comment. :)

artoonie · 2021-10-12T13:08:20Z

Thanks, I'll update this PR to point to the forked repo and delete artoonie/rcvtests.

As for Item 1, it looks like github is just trying to be friendly:

Submodules point to specific commits in a separate repo, not the repo itself
That allows you to checkout a commit in this repo, and get the exact source you expect, as opposed to the source of this repo with newer or older tests in rcvtests
When I updated rcvtests to address the merge conflicts, I had to update this repo to point to the new rcvtests commit
That update is being rendered in github not as a change in SHA, but actually showing the diff of that repo

You can see the "real" commit by following the links in the commit you linked to: https://github.com/artoonie/rcvtests/compare/aa21d8069ba0b10a36e8fd03f99c8f2734c10ec6...e500075d038813dccc00b9e32e63d326256c462f

HEdingfield

Looks good! I'll merge it in and verify that I can get it working locally.

HEdingfield · 2021-10-16T22:29:56Z

After merging in this PR, here's the path I took to get tests working again locally:

Pulled changes to my system in the develop branch
(Had to manually delete the \src\test\resources\network\brightspots\rcv\test_data folder, then update the repo, then rollback the changes for things to reflect properly)
(Project files were highlighted brown, implying they weren't part of the project anymore for some reason; restarted IntelliJ and this fixed it)
Started up the Terminal tab in IntelliJ
$ git submodule init (worked fine)
$ git submodule update (gave me sass "git@github.com: Permission denied (publickey)" "fatal: Could not read from remote repository.")
Had to follow these instructions here: https://stackoverflow.com/a/65249757/3846321
Changing the .gitmodules url line to use "https://github.com/" instead of "git@github.com:" and then run the following commands:
$ git submodule sync
$ git submodule update --init

This finally pulled the rcvtests repo locally and I was able to successfully run the tests.

I submitted PR #577 to make the changes to the rcvtests repo url. @artoonie, I couldn't add you as a reviewer, but hopefully this looks kosher to you?

I recommend everyone working on this project do the same to make sure tests are working locally for them.

We'll also probably want to update a test soon to understand and document in GitHub exactly how this new process works... as @artoonie mentioned above: "After you change a test file, you have to push to the tests repo then commit and push to this repo. It’s just a couple more steps: two commits and two pushes instead of one of each."

I also noted that you can check the Actions tab to see the CI testing status now, woohoo.

tarheel · 2022-08-08T17:16:24Z

After you change a test file, you have to push to the tests repo then commit and push to this repo. It’s just a couple more steps: two commits and two pushes instead of one of each.

@artoonie could you elaborate on how this process should work? I'm sure I can figure it out with sufficient effort, but if you already understand the details, I'd love to just be spoon-fed the explanation...

I've successfully pulled the tests locally -- but then if I run any of the tests and they generate their output files, git status tells me that I have untracked content:

And the only way I've found to fix that is to run these two commands (which is fine, but I suspect it's not the best way to deal with it):

git submodule deinit -f .
git submodule update --init --recursive

artoonie · 2022-08-08T18:29:40Z

Do you want to keep or discard the changes?

I think the easiest way to think about it will be to enter the submodule directory:
cd src/.../test_Data/

Then git status should act like "normal". From there, you can commit the changes as you normally do (git commit -a, for example), or discard them as you normally do (git reset --hard HEAD, e.g.).

moldover · 2022-08-08T20:08:50Z

@artoonie why did we end up moving test_data to a submodule? Was that to keep the repo size small, or to facilitate CI builds?

HEdingfield · 2022-08-08T20:47:05Z

I asked the same question a while back here.

artoonie · 2022-08-08T20:55:09Z

Thanks Hylton - and an update to that post, my understanding is that at least one other person is now using the separated test repo for their own tabulator tests.

tarheel · 2022-08-08T21:27:10Z

@artoonie thanks. I'm still not clear on what you mean by this: "After you change a test file, you have to push to the tests repo then commit and push to this repo. It’s just a couple more steps: two commits and two pushes instead of one of each."

After I commit and push to the rcvtests repo, what exactly do I have to commit/push in the rcv repo? It's not just a matter of resyncing my local copy?

artoonie · 2022-08-09T00:02:03Z

@tarheel a git submodule points to a specific commit, not to a branch - so when you update the main branch on rcvtests, the rcv repo will not automatically point to that new commit.

For example, if you update rcvtests:

cd /src/test/resources/network/brightspots/rcv/test_data/
git add new-file.txt
git commit -a -m "added new file"

then go back to the rcv repo, you'll notice that git status is dirty:

cd ..
git status

So you'll want to update rcv to point to the latest commit in rcvtests:

git commit test_data -m "updated test data with latest changes

(By the way: I think submodules are nice, but as you can see, they are a headache when first getting started with them. I wouldn't be offended if you prefer to revert this change and bring the test data back into this repo and remove the submodule.)

moldover · 2022-08-09T04:45:09Z

@artoonie I think at this point the submodule is doing more harm than good. I fully support the vision, but until there are more users who really benefit from submodules, I believe it should be removed. And FWIW, I am extremely familiar with submodule usage, so this is not coming from a place of learning curve - I have re-pinned more submodules than anyone should ever have to.

HEdingfield · 2022-08-09T05:02:33Z

I guess I come in on the other side... I think once we get past this initial little hurdle of getting comfortable with updating it this way, it'll be a net benefit in the long-run (especially since there's already another project using it). That said, this is my first experience using a submodule.

@tarheel did the stuff above work out for you?

@tamird any thoughts on this?

tarheel · 2022-08-09T05:09:37Z

I haven't tried out the latest guidance, but will soon!

My instinct matches @HEdingfield's: I don't want to scrap it just because it's causing me a bit of startup friction. I mostly want to base this decision on how significant of an ongoing cost it will impose. @moldover do you think it will continue to be more trouble than it's worth?

moldover · 2022-08-09T05:10:02Z

Yes. 100% not even close.

moldover · 2022-08-09T05:11:32Z

I challenge you to tell me how hypothetical benefits outweigh very real disadvantages.

tarheel · 2022-08-09T05:13:04Z

Could you elaborate a bit on what the disadvantages are? I have zero experience with submodules, so all I know so far is that 1) it makes it easy for others to reuse our tests and 2) it makes the process of updating our tests a bit circuitous.

moldover · 2022-08-09T05:25:20Z

it's not clear to me how a submodule makes it easier to incorporate test data. Show me this project, and explain why they can't incorporate our test data without a dedicated submodule.

The disadvantages are: having this conversation which is wasting all our time. Every dev will have to learn this stuff and it can take a while to explain. Will you sign up to hand-hold the next volunteer dev as you requested Armin do for you?

Every time you need to add / edit a test there will be extra steps. You want to properly isolate changes to test data during development? Better make a branch. Guess what? Github will not automate merging / deleting your test data branch afterwards so you need to do that manually. I promise you this will result in commit errors and moreover fewer tests being created. That's why. If there comes a time when the submodule is really needed it can be added. Until then we are going to pay a tax whenever we touch test data for what benefit?

tarheel · 2022-08-09T05:32:14Z

it's not clear to me how a submodule makes it easier to incorporate test data. Show me this project, and explain why they can't incorporate our test data without a dedicated submodule.

That's true. @artoonie any counterargument to that?

tamird · 2022-08-09T14:16:25Z

If the goal is for another project to reuse the same tests, is there a reason they can't consume our entire repo as a submodule?

artoonie · 2022-08-09T15:11:59Z

rcvcruncher started to use this, though the tests were all failing so we paused on merging it in (and the rcvcruncher developer is no longer with FV, so that PR is on hold). Another RCV tinkerer you are all familiar with is also using this, though I emailed him this morning and he said he's fine if you delete this repo, as long as he can pull a local copy before it's gone. That's only two users that I'm aware of.

This affects all of you more than it does me, so I will trust your judgement here and not argue one way or another. I would like to address some of the questions and concerns raised. First, let me share my vision:

Vison

A test suite usable by anybody developing a tabulator, with results that should be consistently generated by any tabulator in any language, in order to find bugs or inconsistencies. The test suite has already identified a few discrepancies in the rcvcruncher: for example, the Tabulator always eliminates Undeclared Write-Ins first, whereas rcvcruncher does not.
If multiple independently-developed tabulators generate the same result, it builds trust in both of them. Right now, most of the tests are regression tests, with no good way to ascertain correctness.

Questions/Concerns

Can they consume our entire repo? Sure, though it is messier: for example, you'd be pulling a ton of java into a python project, and you'd be updating the other repo each time RCTab updates, even if those changes don't affect the test repo. This is functionally usable, but will discourage other repos from adding their own test CVRs, and will likely discourage this usage entirely. I've never seen a repo used as a submodule just for its test suite - it seems pretty clunky. But yes, it's workable.
Will there be a ton of extra steps during development? I've worked with submodules many times, and most of the cost is upfront. During development, the additional steps become pretty minor and don't feel like a hassle.
Is there a lot of upfront training required for new developers to this repo? Yes, potentially, and I agree this is the biggest downside.

Again, I defer to your judgement, but I'd stress my experience with submodules: like anything in git, there's an initial learning curve but it can quickly become a useful tool.

tamird · 2022-08-09T15:45:27Z

Yep, I generally agree. I've used submodules a lot in the past and they work well once you understand what they are and are not meant to do. Standard tools are good.

HEdingfield · 2022-08-09T15:49:06Z

Big thanks for writing this up, @artoonie. FWIW, I support this vision, and I also like that it slims down and focuses the RCTab repo.

Re: @moldover's points (thanks for these too):

I promise you this will result in commit errors and moreover fewer tests being created.

Agree that fewer tests being created is a big potential concern, but I think we should wait to see at least after this development cycle if there's that much extra friction.

It seems like a fairly straightforward flow to first build the tests out locally, then submit a PR that adds the new test data to rcvtests, then sync up and submit a PR that adds the test in rcv. It seems like the biggest pain point might be waiting for the rcvtests PR to get approved, but we could mitigate this by not requiring a review for the core developers to add data to it.

Maybe if @artoonie can write up a little cookbook of a few common scenarios (similar to what he did here -- and particularly ones that @moldover is concerned about), that would sufficiently smooth things out, and would serve as something we can point new developers to as well.

E.g.:

I want to add a new test
I want to modify an existing test
I want to delete a test and its data (not sure we'd actually have a real use-case for this)
Anything else?

moldover · 2022-08-09T17:08:36Z

This does not cut it. As @artoonie acknowledges, external devs can consume this data either way, and the two known users don't care. On the other hand, the last 5 commits to the testdata submodule are mine; it has caused me very real friction and I do care. It's not a "ton" of extra steps, but it adds up. This is not learning curve speaking. I've been working with submodules for years.

I'm not suggesting we remove this submodule for this release or that there is any urgency here. If y'all want to learn how to use submodules, go for it. They are a standard tool and well worth learning. Add some test data, please.

If external devs want to add test data to the RCV corpus or incorporate it into their project I will gladly help them do it. But until there is a compelling use-case (real usefulness > real friction) this is early optimizing: a bad idea. And I have some CDF conversion routines I want to sell you.

andyanderson · 2022-08-09T17:14:38Z

Re “the Tabulator always eliminates Undeclared Write-Ins first,” By that you mean any candidate not printed on the ballot? I hope that’s only if they have the fewest number of votes? It is possible for an undeclared write-in to win an election. — Andy

…

On Aug 9, 2022, at 11:12 AM, Armin Samii ***@***.***> wrote: rcvcruncher <fairvotereform/rcv_cruncher#12> started to use this, though the tests were all failing so we paused on merging it in (and the rcvcruncher developer is no longer with FV, so that PR is on hold). Another RCV tinkerer you are all familiar with is also using this, though I emailed him this morning and he said he's fine if you delete this repo, as long as he can pull a local copy before it's gone. That's only two users that I'm aware of. This affects all of you more than it does me, so I will trust your judgement here and not argue one way or another. I would like to address some of the questions and concerns raised. First, let me share my vision: Vison A test suite usable by anybody developing a tabulator, with results that should be consistently generated by any tabulator in any language, in order to find bugs or inconsistencies. The test suite has already identified a few discrepancies in the rcvcruncher: the Tabulator always eliminates Undeclared Write-Ins first, whereas rcvcruncher does not. If multiple independently-developed tabulators generate the same result, it builds trust in both of them. Right now, most of the tests are regression tests, with no good way to ascertain correctness. Questions/Concerns Can they consume our entire repo? Sure, though it is messier: for example, you'd be pulling a ton of java into a python project, and you'd be updating the other repo each time RCTab updates, even if those changes don't affect the test repo. This is functionally usable, but will discourage other repos from adding their own test CVRs, and will likely discourage this usage entirely. I've never seen a repo used as a submodule just for its test suite - it seems pretty clunky. But yes, it's workable. Will there be a ton of extra steps during development? I've worked with submodules many times, and most of the cost is upfront. During development, the additional steps become pretty minor and don't feel like a hassle. Is there a lot of upfront training required for new developers to this repo? Yes, potentially, and I agree this is the biggest downside. Again, I defer to your judgement, but I'd stress my experience with submodules: like anything in git, there's an initial learning curve but it can quickly become a useful tool. — Reply to this email directly, view it on GitHub <#576 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDAO74D3S7NYPVAQDCY52LVYJYMVANCNFSM5B6VAWYA>. You are receiving this because you are subscribed to this thread.

HEdingfield · 2022-08-10T03:10:47Z

@moldover It's clear that you feel strongly about this, and also have more experience than I do with submodules, so I went ahead and filed #612.

@artoonie Would the RCVis repo work ok referencing the test data inside this repo instead of the separate rcvtests one? I think that was the original motivation for moving it out, right? I just want to make sure this wouldn't break anything for you.

tarheel · 2022-08-10T05:28:37Z

Re “the Tabulator always eliminates Undeclared Write-Ins first,”

By that you mean any candidate not printed on the ballot? I hope that’s only if they have the fewest number of votes? It is possible for an undeclared write-in to win an election.

If there's a candidate string in the CVRs that's literally "UWI" or "Undeclared" or whatever (which we've seen at least in ES&S CVRs, I believe), you can set the undeclared write-in label to that value in your config file, and the tabulation will automatically eliminate that candidate in the first round regardless of how many votes they have. But if you want that candidate string to be treated like a normal candidate, you can just enter it into the config's candidate list instead of using the undeclared write-in label.

artoonie · 2022-08-10T14:04:57Z

@HEdingfield, RCVis doesn't use the rcvtests repo, so we're all good. I can update rcvcruncher to use this separate repo, and let Shel know about the change so he can get the tests from their previous location when that is complete.

andyanderson · 2022-08-10T14:43:52Z

Re «If there's a candidate string in the CVRs that's literally "UWI" or "Undeclared" or whatever (which we've seen at least in ES&S CVRs, I believe), you can set the undeclared write-in label to that value in your config file, and the tabulation will automatically eliminate that candidate in the first round regardless of how many votes they have.»

I don’t see this documented in either config_file_documentation.txt or README.md. Is there someplace else it appears? If not I’ll create a new issues ticket, because I think this behavior should be clear to users.

Reference: https://ballotpedia.org/Jo_Comerford#Democratic_primary_election

artoonie · 2022-08-10T15:33:07Z

@andyanderson In case there's confusion: there's a difference between a write-in and an undeclared write-in. In most elections, even write-in candidates must declare their intention to be a part of the race; if they do not, they are ineligible to win. Note that this is indeed documented in config_file_documentation.txt,

andyanderson · 2022-08-10T18:54:35Z

@artoonie — Thanks, this is exactly the answer I needed to see for my original question “By [an undeclared write-in] you mean any candidate not printed on the ballot?”

Please note that your clear description is definitely not in config_file_documentation.txt. Not everyone will understand this distinction. I’ll create a separate issue and suggest updated text.

moldover · 2022-08-10T23:19:51Z

@artoonie I wanted to respond here. Also wanted to apologize for not having brought this up at the time. I'm sorry.

Can they consume our entire repo?_ Sure, though it is messier: for example, you'd be pulling a ton of java into a
python project, and you'd be updating the other repo each time RCTab updates, even if those changes don't affect the test repo. This is functionally usable,

In my experience this is pretty standard for any package. There are files on disk you don't care about and updates don't necessarily apply to you.

will discourage other repos from adding their own test CVRs, and will likely discourage this usage entirely.

Yeah I agree with you. Can give us a sense for what the community of other tabulator developers who could collaborate is like? Have you had conversations with other people?

Will there be a ton of extra steps during development?_ I've worked with submodules many times, and most of the cost is upfront. During development, the additional steps become pretty minor and don't feel like a hassle.

Is there a lot of upfront training required for new developers to this repo?_ Yes, potentially, and I agree this is the biggest downside.

Yeah I agree with you there, and that's the threshold for me.

In The Vision I'm wondering about what happens when we want to change something? Would we need to keep it backwards compatible? Does RCVRC own it? You mention a standardized test suite... who sets it?

artoonie · 2022-08-11T03:11:31Z

Re the vision:
I imagined whoever owns RCTab would also own the test suite, and nothing would change from RCTab's perspective. The most recent commit on Test Suite Branch X would always be compatible with the most recent RCTab Repo branch X. As you mentioned, this benefit is mostly hypothetical for now, and would require more than a repo split to actually have multiple tabulators relying on the test suite.

Re the community:
Perhaps we should take that bit to email, since I'm not in a position to speak for the other developer. I'll just note that he'd be okay with anything that makes it easy to grab just the test suite, and "easy" includes "clone all of RCTab and delete what you don't need".

moldover · 2022-08-11T23:37:48Z

Sounds good. Once the submodule is removed it will still be easy to grab test data:

Download and unzip RCTab: https://github.com/BrightSpots/rcv/archive/refs/heads/develop.zip
test data is in: rcv/src/test/resources/network/brightspots/rcv/test_data

The non-test files are only 24.7MB and grow slowly, so there is no need to delete them.

artoonie and others added 5 commits August 11, 2021 09:52

Add CI

5ee8517

Remove tests from this repo

44bd71e

Add test data as submodule

aac4965

Move test_data to submodule

42b6b6e

Update submodule to use latest test data

6b4c967

artoonie mentioned this pull request Aug 11, 2021

Use submodules and run CI on each push #575

Closed

HEdingfield requested review from moldover, tarheel and HEdingfield August 12, 2021 07:03

tarheel approved these changes Oct 8, 2021

View reviewed changes

Update submodule to point to brightspots fork

59ede34

HEdingfield approved these changes Oct 16, 2021

View reviewed changes

HEdingfield merged commit 391865d into BrightSpots:develop Oct 16, 2021

HEdingfield mentioned this pull request Aug 10, 2022

Bring test data back into main RCTab repo #612

Closed

Ci rebase on develop #576

Ci rebase on develop #576

Conversation

artoonie commented Aug 11, 2021

HEdingfield commented Aug 12, 2021

HEdingfield commented Aug 12, 2021

artoonie commented Aug 12, 2021

tarheel left a comment

Choose a reason for hiding this comment

HEdingfield commented Oct 12, 2021

HEdingfield commented Oct 12, 2021

artoonie commented Oct 12, 2021

HEdingfield left a comment

Choose a reason for hiding this comment

HEdingfield commented Oct 16, 2021

tarheel commented Aug 8, 2022

artoonie commented Aug 8, 2022

moldover commented Aug 8, 2022

HEdingfield commented Aug 8, 2022

artoonie commented Aug 8, 2022

tarheel commented Aug 8, 2022

artoonie commented Aug 9, 2022

moldover commented Aug 9, 2022 • edited Loading

HEdingfield commented Aug 9, 2022 • edited Loading

tarheel commented Aug 9, 2022

moldover commented Aug 9, 2022

moldover commented Aug 9, 2022

tarheel commented Aug 9, 2022

moldover commented Aug 9, 2022

tarheel commented Aug 9, 2022

tamird commented Aug 9, 2022

artoonie commented Aug 9, 2022 • edited Loading

Vison

Questions/Concerns

tamird commented Aug 9, 2022

HEdingfield commented Aug 9, 2022 • edited Loading

moldover commented Aug 9, 2022 • edited Loading

andyanderson commented Aug 9, 2022 via email

HEdingfield commented Aug 10, 2022

tarheel commented Aug 10, 2022

artoonie commented Aug 10, 2022

andyanderson commented Aug 10, 2022

artoonie commented Aug 10, 2022

andyanderson commented Aug 10, 2022

moldover commented Aug 10, 2022 • edited Loading

artoonie commented Aug 11, 2022 • edited Loading

moldover commented Aug 11, 2022

moldover commented Aug 9, 2022 •

edited

Loading

HEdingfield commented Aug 9, 2022 •

edited

Loading

artoonie commented Aug 9, 2022 •

edited

Loading

HEdingfield commented Aug 9, 2022 •

edited

Loading

moldover commented Aug 9, 2022 •

edited

Loading

moldover commented Aug 10, 2022 •

edited

Loading

artoonie commented Aug 11, 2022 •

edited

Loading