Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unhelpful error messages in aggregate jobs. #825

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

@samueldr
Copy link
Member

@samueldr samueldr commented Oct 25, 2020

From the commit messages:


hydra-eval-jobs: Identify unexpected errors in handling aggregate jobs

The vague "[json.exception.type_error.302] type must be string, but is null" is absolutely unhelpful in the way Hydra currently handles it on evaluation.

This is handling unexpected errors only; the following commit will handle the specific instance of the previously mentioned error.


hydra-eval-jobs: Transmit original Nix error when handling aggregate jobs

It might happen that a job from the aggregate returned an error!

This is what the vague "[json.exception.type_error.302] type must be string, but is null" was all about in this instance; there was no drvPath to stringify!

So we now actively watch for errors and copy them to the aggregate job.


Note: This wasn't used inside Hydra. The evaluator was only ran directly against the repo.

env -i HYDRA_CONFIG=/Users/samuel/tmp/nixpkgs/hydra.conf \
    /Users/samuel/Projects/nixos/hydra/./src/hydra-eval-jobs/hydra-eval-jobs \
    -I /nix/store \
    -I ./hydra-fail/ \
    ./hydra-fail/nixos/release-small.nix

This was verified on top of NixOS/nixpkgs@f262382

So I do not know if adding an error attribute on the aggregate job will work as expected.

I'm also a bit exhausted from figuring out the error. I'll let someone that knows more about Hydra guide or fix the PR so the aggregate job properly reports that there was an error. Ideally The aggregate job HAS to be failed, otherwise we would have a missing job!

With this change, the job in the JSON file gets the original errors, if present.

Additionally, new unexpected errors would show up like this:

Unexpected error in hydra-eval-jobs when handling job 'nixos.tests.boot.biosCdrom.x86_64-linux', when producing aggregate job 'tested':
error: [...]

cc @andir

The vague "[json.exception.type_error.302] type must be string, but is null"
is **absolutely** unhelpful in the way Hydra currently handles it on
evaluation.

This is handling *unexpected* errors only; the following commit will
handle the specific instance of the previously mentioned error.
@@ -442,45 +442,66 @@ int main(int argc, char * * argv)
for (auto i = state->jobs.begin(); i != state->jobs.end(); ++i) {
auto jobName = i.key();
auto & job = i.value();
// For the error message
std::string lastTriedJobName = i.key();
Copy link
Member Author

@samueldr samueldr Oct 25, 2020

I guess the initialization here is not great...

I think when I wrote that up originally I was thinking about using auto to intrinsically get the type. In that case auto lastTriedJobName = jobName is probably better.

Otherwise, initializing with "".

What would be the better option?

Copy link
Member

@andir andir Oct 26, 2020

IIRC the nlohman value key() function returns reference. auto should then deduct the type T& which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.

if (job2 == state->jobs.end())
throw Error("aggregate job '%s' references non-existent job '%s'", jobName, jobName2);

if ((*job2).find("error") != (*job2).end()) {
Copy link
Member Author

@samueldr samueldr Oct 25, 2020

Needing to dereference here tripped me up. I get that this comes from auto job2 being the result of the search. Is there a pattern we could be using so that we don't have to dereference all the time?

Copy link
Member

@andir andir Oct 26, 2020

auto& something = *job2 should do the trick here and then using something instead of *job2.

Copy link

@lukegb lukegb Aug 1, 2021

Honestly, this is probably fine? I'd probably replace the (*job2). with just job2-> but other than that this seems... fine.

if (job.find("error") == job.end()) {
job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName);
}
Copy link
Member Author

@samueldr samueldr Oct 26, 2020

Is there a better pattern to initialize the error message on the JSON object?

Copy link
Member

@andir andir left a comment

Just did a quick pass.

src/hydra-eval-jobs/hydra-eval-jobs.cc Outdated Show resolved Hide resolved
@@ -442,45 +442,66 @@ int main(int argc, char * * argv)
for (auto i = state->jobs.begin(); i != state->jobs.end(); ++i) {
auto jobName = i.key();
auto & job = i.value();
// For the error message
std::string lastTriedJobName = i.key();
Copy link
Member

@andir andir Oct 26, 2020

IIRC the nlohman value key() function returns reference. auto should then deduct the type T& which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.

…jobs

It might happen that a job from the aggregate returned an error!

This is what the vague "[json.exception.type_error.302] type must be string, but is null"
was all about in this instance; there was no `drvPath` to stringify!

So we now actively watch for errors and copy them to the aggregate job.
@ju1m
Copy link

@ju1m ju1m commented Dec 1, 2020

I don't know if this PR is ready, but just to let you know that merging it would unblock NixOS/nixpkgs#101071

@samueldr
Copy link
Member Author

@samueldr samueldr commented Dec 3, 2020

I've stated in the past on the IRC channels (a couple of times) that it needs to be reviewed by "code owners" of that section for the open questions.

Other than that, in my opinion it is ready, if not the exact code as shown, the same but with the corrections and nits fixed.

I know that, during my testing, it was working as expected.

ju1m added a commit to ju1m/nixpkgs that referenced this issue Jan 3, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Jan 5, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Jan 24, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Jan 24, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Jan 27, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Feb 18, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Feb 19, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Feb 28, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Mar 5, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
ju1m added a commit to ju1m/nixpkgs that referenced this issue Apr 23, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
7c6f434c pushed a commit to NixOS/nixpkgs that referenced this issue Apr 24, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
@lukegb
Copy link

@lukegb lukegb commented Aug 1, 2021

This would have massively helped debugging NixOS/nixpkgs#132328.

if (job.find("error") == job.end()) {
job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName);
}
job["error"] = fmt("While handling '%1%': %2%\n", jobName2, (std::string) (*job2)["error"]);
Copy link

@ncfavier ncfavier Aug 1, 2021

Doesn't this always overwrite the assignment on line 473?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants