Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ci rebase on develop #576

Merged
merged 6 commits into from
Oct 16, 2021
Merged

Conversation

artoonie
Copy link
Collaborator

Same as #573 , but rebased on develop with fixed merge conflicts

@HEdingfield
Copy link
Contributor

Big thanks for the explanation here! Going to discuss with the others before approving.

One point of clarity: you're deleting these test files out of the repo, so it will indeed affect our local tests. But it looks like this .gitmodule file might act as a local pointer to the repo and... automatically download it and put it in the right folders we'd expect? This is the first time I've encountered something like this; how does it work in practice?

@HEdingfield
Copy link
Contributor

Also agree with your other comment that BrightSpots should host this on our end, so I think before approving we should get a repo going similar to the one you created and update this PR to point to that one.

@artoonie
Copy link
Collaborator Author

Ah right, I should have explained submodules too:
If you run
git submodule init && git submodule update It downloads the test files and places them exactly as they were before.

The only change will be how you update the tests. After you change a test file, you have to push to the tests repo then commit and push to this repo. It’s just a couple more steps: two commits and two pushes instead of one of each.

Copy link
Contributor

@tarheel tarheel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot about this! I'm open to trying it. If we end up regretting it for some reason, we can always revert it.

@HEdingfield
Copy link
Contributor

@artoonie Thanks for the explanation above on how to run the submodules!

Two last follow-up things, and then I'll be ready to approve too:

  1. I'm confused about what's going on in this commit. I thought you had deleted those files, but it looks like they're being modified?
  2. Can we host the test repo on BrightSpots instead of artoonie? What's the best way to go about doing this; just fork yours, or try to recreate it manually?

@HEdingfield
Copy link
Contributor

Re: 2 above: sorry, just saw your email. I went ahead and forked it here: https://github.com/BrightSpots/rcvtests

I think you should be good to delete https://github.com/artoonie/rcvtests now and update this CL.

Still curious about item 1 in my last comment. :)

@artoonie
Copy link
Collaborator Author

Thanks, I'll update this PR to point to the forked repo and delete artoonie/rcvtests.

As for Item 1, it looks like github is just trying to be friendly:

  • Submodules point to specific commits in a separate repo, not the repo itself
  • That allows you to checkout a commit in this repo, and get the exact source you expect, as opposed to the source of this repo with newer or older tests in rcvtests
  • When I updated rcvtests to address the merge conflicts, I had to update this repo to point to the new rcvtests commit
  • That update is being rendered in github not as a change in SHA, but actually showing the diff of that repo

You can see the "real" commit by following the links in the commit you linked to: https://github.com/artoonie/rcvtests/compare/aa21d8069ba0b10a36e8fd03f99c8f2734c10ec6...e500075d038813dccc00b9e32e63d326256c462f

Copy link
Contributor

@HEdingfield HEdingfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'll merge it in and verify that I can get it working locally.

@HEdingfield HEdingfield merged commit 391865d into BrightSpots:develop Oct 16, 2021
@HEdingfield
Copy link
Contributor

After merging in this PR, here's the path I took to get tests working again locally:

  1. Pulled changes to my system in the develop branch
  2. (Had to manually delete the \src\test\resources\network\brightspots\rcv\test_data folder, then update the repo, then rollback the changes for things to reflect properly)
  3. (Project files were highlighted brown, implying they weren't part of the project anymore for some reason; restarted IntelliJ and this fixed it)
  4. Started up the Terminal tab in IntelliJ
  5. $ git submodule init (worked fine)
  6. $ git submodule update (gave me sass "git@github.com: Permission denied (publickey)" "fatal: Could not read from remote repository.")
  7. Had to follow these instructions here: https://stackoverflow.com/a/65249757/3846321
    Changing the .gitmodules url line to use "https://github.com/" instead of "git@github.com:" and then run the following commands:
    $ git submodule sync
    $ git submodule update --init

This finally pulled the rcvtests repo locally and I was able to successfully run the tests.

I submitted PR #577 to make the changes to the rcvtests repo url. @artoonie, I couldn't add you as a reviewer, but hopefully this looks kosher to you?

I recommend everyone working on this project do the same to make sure tests are working locally for them.

We'll also probably want to update a test soon to understand and document in GitHub exactly how this new process works... as @artoonie mentioned above: "After you change a test file, you have to push to the tests repo then commit and push to this repo. It’s just a couple more steps: two commits and two pushes instead of one of each."

I also noted that you can check the Actions tab to see the CI testing status now, woohoo.

@tarheel
Copy link
Contributor

tarheel commented Aug 8, 2022

After you change a test file, you have to push to the tests repo then commit and push to this repo. It’s just a couple more steps: two commits and two pushes instead of one of each.

@artoonie could you elaborate on how this process should work? I'm sure I can figure it out with sufficient effort, but if you already understand the details, I'd love to just be spoon-fed the explanation...

I've successfully pulled the tests locally -- but then if I run any of the tests and they generate their output files, git status tells me that I have untracked content:

image

And the only way I've found to fix that is to run these two commands (which is fine, but I suspect it's not the best way to deal with it):

git submodule deinit -f .
git submodule update --init --recursive

@artoonie
Copy link
Collaborator Author

artoonie commented Aug 8, 2022

Do you want to keep or discard the changes?

I think the easiest way to think about it will be to enter the submodule directory:
cd src/.../test_Data/

Then git status should act like "normal". From there, you can commit the changes as you normally do (git commit -a, for example), or discard them as you normally do (git reset --hard HEAD, e.g.).

@moldover
Copy link
Contributor

moldover commented Aug 8, 2022

@artoonie why did we end up moving test_data to a submodule? Was that to keep the repo size small, or to facilitate CI builds?

@HEdingfield
Copy link
Contributor

I asked the same question a while back here.

@artoonie
Copy link
Collaborator Author

artoonie commented Aug 8, 2022

Thanks Hylton - and an update to that post, my understanding is that at least one other person is now using the separated test repo for their own tabulator tests.

@tarheel
Copy link
Contributor

tarheel commented Aug 8, 2022

@artoonie thanks. I'm still not clear on what you mean by this: "After you change a test file, you have to push to the tests repo then commit and push to this repo. It’s just a couple more steps: two commits and two pushes instead of one of each."

After I commit and push to the rcvtests repo, what exactly do I have to commit/push in the rcv repo? It's not just a matter of resyncing my local copy?

@artoonie
Copy link
Collaborator Author

artoonie commented Aug 9, 2022

@tarheel a git submodule points to a specific commit, not to a branch - so when you update the main branch on rcvtests, the rcv repo will not automatically point to that new commit.

For example, if you update rcvtests:

cd /src/test/resources/network/brightspots/rcv/test_data/
git add new-file.txt
git commit -a -m "added new file"

then go back to the rcv repo, you'll notice that git status is dirty:

cd ..
git status

So you'll want to update rcv to point to the latest commit in rcvtests:

git commit test_data -m "updated test data with latest changes

(By the way: I think submodules are nice, but as you can see, they are a headache when first getting started with them. I wouldn't be offended if you prefer to revert this change and bring the test data back into this repo and remove the submodule.)

@moldover
Copy link
Contributor

moldover commented Aug 9, 2022

@artoonie I think at this point the submodule is doing more harm than good. I fully support the vision, but until there are more users who really benefit from submodules, I believe it should be removed. And FWIW, I am extremely familiar with submodule usage, so this is not coming from a place of learning curve - I have re-pinned more submodules than anyone should ever have to.

@HEdingfield
Copy link
Contributor

HEdingfield commented Aug 9, 2022

I guess I come in on the other side... I think once we get past this initial little hurdle of getting comfortable with updating it this way, it'll be a net benefit in the long-run (especially since there's already another project using it). That said, this is my first experience using a submodule.

@tarheel did the stuff above work out for you?

@tamird any thoughts on this?

@tarheel
Copy link
Contributor

tarheel commented Aug 9, 2022

I haven't tried out the latest guidance, but will soon!

My instinct matches @HEdingfield's: I don't want to scrap it just because it's causing me a bit of startup friction. I mostly want to base this decision on how significant of an ongoing cost it will impose. @moldover do you think it will continue to be more trouble than it's worth?

@moldover
Copy link
Contributor

moldover commented Aug 9, 2022

Yes. 100% not even close.

@moldover
Copy link
Contributor

moldover commented Aug 9, 2022

I challenge you to tell me how hypothetical benefits outweigh very real disadvantages.

@tarheel
Copy link
Contributor

tarheel commented Aug 9, 2022

Could you elaborate a bit on what the disadvantages are? I have zero experience with submodules, so all I know so far is that 1) it makes it easy for others to reuse our tests and 2) it makes the process of updating our tests a bit circuitous.

@moldover
Copy link
Contributor

moldover commented Aug 9, 2022

  1. it's not clear to me how a submodule makes it easier to incorporate test data. Show me this project, and explain why they can't incorporate our test data without a dedicated submodule.

The disadvantages are: having this conversation which is wasting all our time. Every dev will have to learn this stuff and it can take a while to explain. Will you sign up to hand-hold the next volunteer dev as you requested Armin do for you?

  1. Every time you need to add / edit a test there will be extra steps. You want to properly isolate changes to test data during development? Better make a branch. Guess what? Github will not automate merging / deleting your test data branch afterwards so you need to do that manually. I promise you this will result in commit errors and moreover fewer tests being created. That's why. If there comes a time when the submodule is really needed it can be added. Until then we are going to pay a tax whenever we touch test data for what benefit?

@tarheel
Copy link
Contributor

tarheel commented Aug 9, 2022

it's not clear to me how a submodule makes it easier to incorporate test data. Show me this project, and explain why they can't incorporate our test data without a dedicated submodule.

That's true. @artoonie any counterargument to that?

@tamird
Copy link
Collaborator

tamird commented Aug 9, 2022

If the goal is for another project to reuse the same tests, is there a reason they can't consume our entire repo as a submodule?

@artoonie
Copy link
Collaborator Author

artoonie commented Aug 9, 2022

rcvcruncher started to use this, though the tests were all failing so we paused on merging it in (and the rcvcruncher developer is no longer with FV, so that PR is on hold). Another RCV tinkerer you are all familiar with is also using this, though I emailed him this morning and he said he's fine if you delete this repo, as long as he can pull a local copy before it's gone. That's only two users that I'm aware of.

This affects all of you more than it does me, so I will trust your judgement here and not argue one way or another. I would like to address some of the questions and concerns raised. First, let me share my vision:

Vison

  1. A test suite usable by anybody developing a tabulator, with results that should be consistently generated by any tabulator in any language, in order to find bugs or inconsistencies. The test suite has already identified a few discrepancies in the rcvcruncher: for example, the Tabulator always eliminates Undeclared Write-Ins first, whereas rcvcruncher does not.
  2. If multiple independently-developed tabulators generate the same result, it builds trust in both of them. Right now, most of the tests are regression tests, with no good way to ascertain correctness.

Questions/Concerns

  1. Can they consume our entire repo? Sure, though it is messier: for example, you'd be pulling a ton of java into a python project, and you'd be updating the other repo each time RCTab updates, even if those changes don't affect the test repo. This is functionally usable, but will discourage other repos from adding their own test CVRs, and will likely discourage this usage entirely. I've never seen a repo used as a submodule just for its test suite - it seems pretty clunky. But yes, it's workable.
  2. Will there be a ton of extra steps during development? I've worked with submodules many times, and most of the cost is upfront. During development, the additional steps become pretty minor and don't feel like a hassle.
  3. Is there a lot of upfront training required for new developers to this repo? Yes, potentially, and I agree this is the biggest downside.

Again, I defer to your judgement, but I'd stress my experience with submodules: like anything in git, there's an initial learning curve but it can quickly become a useful tool.

@tamird
Copy link
Collaborator

tamird commented Aug 9, 2022

Yep, I generally agree. I've used submodules a lot in the past and they work well once you understand what they are and are not meant to do. Standard tools are good.

@HEdingfield
Copy link
Contributor

HEdingfield commented Aug 9, 2022

Big thanks for writing this up, @artoonie. FWIW, I support this vision, and I also like that it slims down and focuses the RCTab repo.

Re: @moldover's points (thanks for these too):

I promise you this will result in commit errors and moreover fewer tests being created.

Agree that fewer tests being created is a big potential concern, but I think we should wait to see at least after this development cycle if there's that much extra friction.

It seems like a fairly straightforward flow to first build the tests out locally, then submit a PR that adds the new test data to rcvtests, then sync up and submit a PR that adds the test in rcv. It seems like the biggest pain point might be waiting for the rcvtests PR to get approved, but we could mitigate this by not requiring a review for the core developers to add data to it.

Maybe if @artoonie can write up a little cookbook of a few common scenarios (similar to what he did here -- and particularly ones that @moldover is concerned about), that would sufficiently smooth things out, and would serve as something we can point new developers to as well.

E.g.:

  • I want to add a new test
  • I want to modify an existing test
  • I want to delete a test and its data (not sure we'd actually have a real use-case for this)
  • Anything else?

@moldover
Copy link
Contributor

moldover commented Aug 9, 2022

This does not cut it. As @artoonie acknowledges, external devs can consume this data either way, and the two known users don't care. On the other hand, the last 5 commits to the testdata submodule are mine; it has caused me very real friction and I do care. It's not a "ton" of extra steps, but it adds up. This is not learning curve speaking. I've been working with submodules for years.

I'm not suggesting we remove this submodule for this release or that there is any urgency here. If y'all want to learn how to use submodules, go for it. They are a standard tool and well worth learning. Add some test data, please.

If external devs want to add test data to the RCV corpus or incorporate it into their project I will gladly help them do it. But until there is a compelling use-case (real usefulness > real friction) this is early optimizing: a bad idea. And I have some CDF conversion routines I want to sell you.

@andyanderson
Copy link

andyanderson commented Aug 9, 2022 via email

@HEdingfield
Copy link
Contributor

@moldover It's clear that you feel strongly about this, and also have more experience than I do with submodules, so I went ahead and filed #612.

@artoonie Would the RCVis repo work ok referencing the test data inside this repo instead of the separate rcvtests one? I think that was the original motivation for moving it out, right? I just want to make sure this wouldn't break anything for you.

@tarheel
Copy link
Contributor

tarheel commented Aug 10, 2022

Re “the Tabulator always eliminates Undeclared Write-Ins first,”

By that you mean any candidate not printed on the ballot? I hope that’s only if they have the fewest number of votes? It is possible for an undeclared write-in to win an election.

If there's a candidate string in the CVRs that's literally "UWI" or "Undeclared" or whatever (which we've seen at least in ES&S CVRs, I believe), you can set the undeclared write-in label to that value in your config file, and the tabulation will automatically eliminate that candidate in the first round regardless of how many votes they have. But if you want that candidate string to be treated like a normal candidate, you can just enter it into the config's candidate list instead of using the undeclared write-in label.

@artoonie
Copy link
Collaborator Author

@HEdingfield, RCVis doesn't use the rcvtests repo, so we're all good. I can update rcvcruncher to use this separate repo, and let Shel know about the change so he can get the tests from their previous location when that is complete.

@andyanderson
Copy link

Re «If there's a candidate string in the CVRs that's literally "UWI" or "Undeclared" or whatever (which we've seen at least in ES&S CVRs, I believe), you can set the undeclared write-in label to that value in your config file, and the tabulation will automatically eliminate that candidate in the first round regardless of how many votes they have.»

I don’t see this documented in either config_file_documentation.txt or README.md. Is there someplace else it appears? If not I’ll create a new issues ticket, because I think this behavior should be clear to users.

Reference: https://ballotpedia.org/Jo_Comerford#Democratic_primary_election

@artoonie
Copy link
Collaborator Author

@andyanderson In case there's confusion: there's a difference between a write-in and an undeclared write-in. In most elections, even write-in candidates must declare their intention to be a part of the race; if they do not, they are ineligible to win. Note that this is indeed documented in config_file_documentation.txt,

@andyanderson
Copy link

@artoonie — Thanks, this is exactly the answer I needed to see for my original question “By [an undeclared write-in] you mean any candidate not printed on the ballot?”

Please note that your clear description is definitely not in config_file_documentation.txt. Not everyone will understand this distinction. I’ll create a separate issue and suggest updated text.

@moldover
Copy link
Contributor

moldover commented Aug 10, 2022

@artoonie I wanted to respond here. Also wanted to apologize for not having brought this up at the time. I'm sorry.

Can they consume our entire repo?_ Sure, though it is messier: for example, you'd be pulling a ton of java into a
python project, and you'd be updating the other repo each time RCTab updates, even if those changes don't affect the test repo. This is functionally usable,

In my experience this is pretty standard for any package. There are files on disk you don't care about and updates don't necessarily apply to you.

will discourage other repos from adding their own test CVRs, and will likely discourage this usage entirely.

Yeah I agree with you. Can give us a sense for what the community of other tabulator developers who could collaborate is like? Have you had conversations with other people?

Will there be a ton of extra steps during development?_ I've worked with submodules many times, and most of the cost is upfront. During development, the additional steps become pretty minor and don't feel like a hassle.

Is there a lot of upfront training required for new developers to this repo?_ Yes, potentially, and I agree this is the biggest downside.

Yeah I agree with you there, and that's the threshold for me.

In The Vision I'm wondering about what happens when we want to change something? Would we need to keep it backwards compatible? Does RCVRC own it? You mention a standardized test suite... who sets it?

@artoonie
Copy link
Collaborator Author

artoonie commented Aug 11, 2022

Re the vision:
I imagined whoever owns RCTab would also own the test suite, and nothing would change from RCTab's perspective. The most recent commit on Test Suite Branch X would always be compatible with the most recent RCTab Repo branch X. As you mentioned, this benefit is mostly hypothetical for now, and would require more than a repo split to actually have multiple tabulators relying on the test suite.

Re the community:
Perhaps we should take that bit to email, since I'm not in a position to speak for the other developer. I'll just note that he'd be okay with anything that makes it easy to grab just the test suite, and "easy" includes "clone all of RCTab and delete what you don't need".

@moldover
Copy link
Contributor

Sounds good. Once the submodule is removed it will still be easy to grab test data:

Download and unzip RCTab: https://github.com/BrightSpots/rcv/archive/refs/heads/develop.zip
test data is in: rcv/src/test/resources/network/brightspots/rcv/test_data

The non-test files are only 24.7MB and grow slowly, so there is no need to delete them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants