Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0095] Enable doCheck by default #95

Conversation

gytis-ivaskevicius
Copy link

rfcs/0095-enable-docheck-by-default.md Outdated Show resolved Hide resolved
rfcs/0095-enable-docheck-by-default.md Outdated Show resolved Hide resolved
rfcs/0095-enable-docheck-by-default.md Show resolved Hide resolved
rfcs/0095-enable-docheck-by-default.md Outdated Show resolved Hide resolved
@nh2
Copy link
Contributor

nh2 commented Jun 25, 2021

Tangential, but maybe worth asking:

Would it be possible/sensible to separate build and test phases into separate derivations, or something of that kind?

Sometimes, running the tests requires some overrides (e.g. LD_LIBRARY_PATH) on the checkPhase only, requiring developer iteration on that phase only. Currently, this requires a full rebuild (e.g. 1-hour rebuild for large software like ceph).

@edolstra
Copy link
Member

I think this is high cost, low benefit. Most upstream test suites don't test packaging bugs, so as long as upstream runs their own test suite before release, enabling doCheck by default won't reveal a lot of bugs for us. But the cost is very high: many test suites are very slow (sometimes taking hours), pull in lots of additional dependencies, and are brittle (so we'll get a lot of random test failures that we need to debug and patch).

@Ericson2314
Copy link
Member

@edolstra I would say all those bad test sweets should just be marked doCheck = false. I don't see this PR as us committing to bailing out upstream packages in new significant ways, just maintaining a blacklist rather than a whitelist.

I think that is nonetheless a big improvement because it encourages maintainers to at least try the test suite, and the opt-outs will come with useful comments saying what's wrong. Blacklists are also better to indicate to upstream devs that something is wrong --- which is different and easier than actually fixing the issue ourselves!

@nh2
Copy link
Contributor

nh2 commented Jun 25, 2021

As somebody who uses NixOS as production infrastructructure with user data that must not be lost, I would very welcome a change to tests-by-default, with problematic tests turned off where needed.

@Zimmi48
Copy link
Member

Zimmi48 commented Jun 26, 2021

I agree that running test-suites by default would help detect new packaging errors, and therefore would be useful. However, a significant drawback of the proposed approach that this RFC doesn't acknowledge is the one that I reported in #55648 and https://discourse.nixos.org/t/behavior-of-undeclared-broken-packages-is-confusing/2123.

The only clear solution to the above problem is an always green hydra (though it doesn't solve the issue of derivations that are not supposed to be built in hydra).

Furthermore, when it is possible to test packages in separate derivations (cf. passthru.tests documented at https://nixos.org/manual/nixpkgs/stable/#var-meta-tests), then we should encourage this over doCheck = true (though it's harder to do in practice since test-suites rarely support this use case).

gytis-ivaskevicius and others added 5 commits June 26, 2021 17:11
Co-authored-by: Niklas Hambüchen <mail@nh2.me>
Co-authored-by: Niklas Hambüchen <mail@nh2.me>
Co-authored-by: Niklas Hambüchen <mail@nh2.me>
Co-authored-by: Niklas Hambüchen <mail@nh2.me>
Co-authored-by: Niklas Hambüchen <mail@nh2.me>
@gytis-ivaskevicius
Copy link
Author

gytis-ivaskevicius commented Jun 26, 2021

I was thinking about this - maybe one middle ground that is worth considering is running checks only during PR review stage.

@zimbatm
Copy link
Member

zimbatm commented Jun 27, 2021

This proposal appeals to my "let's make everything strict" side. At the same time, I don't remember a single time where unit tests uncovered an issue with the packaging. Usually, the issue is something related to the runtime, which isn't exercised by most tests.

Are there examples out there where doCheck = true would have saved us from runtime issues?

@nh2
Copy link
Contributor

nh2 commented Jun 27, 2021

Are there examples out there where doCheck = true would have saved us from runtime issues?

I am quite sure that I've seen such cases in Ceph and GlusterFS builds, where the upstram test suites uncovered missing runtime dependencies (e.g. Python libs) or lack of wrapping like LD_LIBRARY_PATH.

I didn't write them down at the time though, as at the time I wasn't aware that doCheck = true isn't the default.

@kevincox
Copy link
Contributor

Missing runtime deps is a common problem that could be caught. I think that default-enable is the right approach. As long as disabling the checks are simple and not frowned upon, lots of software has fast and helpful test suites that wkll catch some common issues.

@gytis-ivaskevicius
Copy link
Author

I updated the section on doCheck semantics and it seems that everyone has stated their opinions. The majority seems to like the idea but "cost vs value" is still in question.

Do you guys think that we can take this RFC to the next stages?

@Ericson2314
Copy link
Member

I nominate myself.

@gytis-ivaskevicius
Copy link
Author

Nice going @Ericson2314 🚀
Who else would like to join in?? @nh2 @oxij thughts?

@spacekookie spacekookie added the status: open for nominations Open for shepherding team nominations label Jul 8, 2021
@michaelpj
Copy link

Would it be possible/sensible to separate build and test phases into separate derivations, or something of that kind?

Somewhat tangential, but haskell.nix does this. It's designed to focus on projects under development, though, where you really do want to run your tests in your CI! Having them in separate derivations is however extremely helpful in preventing slow tests from clogging the overall build progress. So I would say that this has proven quite valuable if you do want to run tests.

@nh2
Copy link
Contributor

nh2 commented Jul 8, 2021

I'm not very experienced in the RFC discussion / calls process yet, so I don't want to be in a position of leadership for it, but I'd certainly like to participate in such things to get into it.

@piegamesde
Copy link
Member

Would it be possible/sensible to separate build and test phases into separate derivations, or something of that kind?

As far as I can tell, this would also allow us to better separate the build and the check process. For example, at the moment the only way to build packages without also running the tests is to use overrides, which gets messy quickly.

@Mic92
Copy link
Member

Mic92 commented Jul 22, 2021

We need at least one more shepherd for this RFC.

Co-authored-by: Jörg Thalheim <Mic92@users.noreply.github.com>
@spacekookie
Copy link
Member

Shepherds for this RFC are @Ericson2314, @nh2, and @edolstra

@Ericson2314 would you mind being the leader?

@Mic92
Copy link
Member

Mic92 commented Sep 9, 2021

@gytis-ivaskevicius please also update the yaml frontmatter to incorporate the shepherds.

@AndersonTorres
Copy link
Member

A thing about language normalization:

If I remember well, every phase can be turned on or off by setting attributes such as dontConfigure to true or false.

However, some of these attrs have name dontX and others have doX.

How sould we normalize them? Putting all as dontX? and using false as default?

@Ericson2314
Copy link
Member

Just so the rest of world knows, I made a matrix channel with the author and shepherds to get the ball rolling.

@Mic92
Copy link
Member

Mic92 commented Nov 3, 2021

@Ericson2314 have you tried scheduling a meeting already?

@gytis-ivaskevicius
Copy link
Author

Yeah, we tried scheduling a meeting and it was unsuccessful. Will try to align the meeting with everyone later today.

@nh2
Copy link
Contributor

nh2 commented Nov 11, 2021

Meeting happening right now

@edolstra
Copy link
Member

Meeting summary:

  • While tests are useful, enabling them has downsides:
    • Increased build times.
    • More non-deterministic build failures.
    • Extra dependencies for the test framework.
    • Upstream tests don't often reveal downstream packaging/integration issues, because most are functional tests that are unlikely to break.
    • Upstream tests typically run against the build tree rather than the installed package, so they don't actually test what we're interested in (namely whether the installed package works).
  • If enabling doCheck globally is too expensive, there are some ideas for running tests anyway:
    • Let ofborg build pkg.override { doCheck = true; }. That way our CI runs tests but users who build from source don't have to.
    • Have more .passthru.test derivations to test installed packages.
    • Split tests into separate derivations, e.g. by saving the build tree into a separate output and running the test from there. This would be quite expensive for Hydra in terms of storage space, since build trees are large.
  • Action items:
    • Add guidelines to the RFC (for inclusion in the Nixpkgs docs) on when doCheck should be enabled, e.g. 1) the tests shouldn't take "too long"; 2) the tests shouldn't require (big) additional dependencies; 3) the tests shouldn't be flaky.
    • Set up a nixpkgs branch to test the cost of enabling doCheck globally.
    • Create a sprintable project (like ZHF) for setting explicit doCheck = true|false attributes on as many packages as possible.
  • There was some parenthetical discussion on whether enableParallelBuilding should be enabled by default, but that's a topic for another RFC.

@AndersonTorres
Copy link
Member

  • Split tests into separate derivations, e.g. by saving the build tree into a separate output and running the test from there. This would be quite expensive for Hydra in terms of storage space, since build trees are large.

Isn't it possible to reuse/cache such build trees when enabling tests?

- By default `doCheck` option should be enabled as long as `stdenv.hostPlatform == stdenv.buildPlatform`.
- Non-reproducible test prevention should be implemented.
- All failing packages should be fixed or updated with `doCheck = false;`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Guidelines to refrain from enabling tests:
- If tests are taking _too_ long. (Tests aren't expected to run longer than build time. In case of quick builds - not more than 10min)
- If tests require additional large dependencies.
- If tests are flaky. (If tests randomly fail once in a while)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, sounds good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guidelines to refrain from enabling tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S.:

  • s|If|When|
  • When tests are flaky, unpredictably failing.

Comment on lines +33 to +37
**New `doCheck`/`doInstallCheck` semantics:**
In addition to booleans, `doCheck`/`doInstallCheck` should also accept strings.
- String value should be considered as `false`
- It should be used as a place for comment on why the check is disabled. For
example: "Requires X11 server" or "Requires network access".
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**New `doCheck`/`doInstallCheck` semantics:**
In addition to booleans, `doCheck`/`doInstallCheck` should also accept strings.
- String value should be considered as `false`
- It should be used as a place for comment on why the check is disabled. For
example: "Requires X11 server" or "Requires network access".
**New semantics:**
- `doCheck`/`doInstallCheck` should default to `null` and work exactly the same as `false`
- New options should be introduced: `meta.{checksFlaky,checksLargeDependencies,checksTakeTooLong,checksDisableReason}`

The first point is so we could evaluate checks that are not set. Derivation paths are not expected to change.
And the second point is so we could evaluate the reason why checks were disabled. Maybe defining additional attrset would be nice? like checks.{flaky,largeDependencies,takeTooLong,disableReason}? Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on the source code and a Boolean value shoud suffice. There is no reason to overload this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During RFC call, we thought that it would be a very simple change which would allow us to actually evaluate the reason why tests were disabled and that is definitely handy. I am still in favor of it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, it is preferrable a (short?) string, without true/false semantics attached to it.
After all, it can be useful to use the string to convey a useful information even in the case the tests are mandatory.

Something like

doCheck = false;
checkReason = "I don't wanna fetch X.Org just to verify a blinking box!";

Or even better, a new attrset:

check.enable = true;
check.reason = "This package is critical for bootstrap";

@gytis-ivaskevicius
Copy link
Author

Sorry fellow nix'ers, but these days I don't have time for pretty much anything which is why I am closing this RFC. If anyone wishes to take over this RFC - feel free to do so. Also for the record - it seems that #119 addresses most things that I am interested in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet