Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notparallel variable is broken #1360

Closed
BuildStream-Migration-Bot opened this issue Feb 4, 2021 · 26 comments
Closed

notparallel variable is broken #1360

BuildStream-Migration-Bot opened this issue Feb 4, 2021 · 26 comments

Comments

@BuildStream-Migration-Bot

See original issue on GitLab
In GitLab by [Gitlab user @willsalmon] on Jul 10, 2020, 10:39

Summary

The notparallel variable dose not seem to force MAKEFLAGS: -j1 in the environment

This means that projects that have bugs associated with make -j2 can not be built, eg slang.

Please see the failure: https://gitlab.com/celduin/bsps/bst-boardsupport/-/jobs/625698066
The variable being incorrectly set on line 21 in the log: https://gitlab.com/celduin/bsps/bst-boardsupport/-/jobs/625698066/artifacts/file/artifact/freedesktop-sdk/components-slang/1ad3d1df-build.3146.log
The element https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/blob/master/elements/components/slang.bst showing the:

variables:
  builddir: ''
  notparallel: true

The problem is further compounded by the fact that bst will not let you set MAKEFLAGS in the environment section as it is protected

@BuildStream-Migration-Bot
Copy link
Author

In GitLab by [Gitlab user @willsalmon] on Jul 10, 2020, 10:39

changed the description

@BuildStream-Migration-Bot
Copy link
Author

In GitLab by [Gitlab user @willsalmon] on Jul 10, 2020, 10:40

mentioned in merge request celduin/bsps/bst-boardsupport!17

@BuildStream-Migration-Bot
Copy link
Author

In GitLab by [Gitlab user @cs-shadow] on Jul 17, 2020, 13:58

First, I am not able to reproduce this. In my experiments, BuildStream Core sets the max-jobs variable to 1 when notparallel: true is specified. And, the autotools plugins sets MAKEFLAGS: -j%{max-jobs}. If you can try again, and provide a minimal repro case, I might be able to dig into it more.

The problem is further compounded by the fact that bst will not let you set MAKEFLAGS in the environment section as it is protected

Also, I'm not sure if I follow this. You should be able to set MAKEFLAGS to whatever you want. Where did you get the idea that it was protected in some way?

@BuildStream-Migration-Bot
Copy link
Author

In GitLab by [Gitlab user @tristanvb] on Nov 6, 2020, 09:24

Closing this with needinfo, please reopen if this is still an issue.

@abderrahim
Copy link
Contributor

I can reproduce this issue, can someone reopen this or should I file a new one?

@gtristan
Copy link
Contributor

gtristan commented Dec 6, 2021

I can reproduce this issue, can someone reopen this or should I file a new one?

This feels unlikely, with which plugin are you able to reproduce? It is the plugin YAML which is responsible for setting MAKEFLAGS according to %{max-jobs} (but that is already mentioned in Chandan’s comment).

Is %{max-jobs} not being setup properly in the core ?

Or, is it possible you are confusing the role of notparallel with the role of the —builders cli option ?

Or, is it possible that -j is being specified somewhere else at a higher priority? E.g. explicit setting in the .bst file environment settings or explicitly in the build-commands ?

@abderrahim
Copy link
Contributor

I'm still not sure exactly what happens as I'm yet to reproduce it locally but here is the element: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/blob/abderrahim/bst2/elements/components/slang.bst

And here is a failing build https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/jobs/1849722318/artifacts/file/cache/buildstream/logs/freedesktop-sdk/components-slang/9cfbabfb-build.169.log (notice MAKEFLAGS is set to -j8)

@abderrahim
Copy link
Contributor

It seems to be that max-jobs is not being setup correctly in core. As I'm trying to reproduce the issue, I notice that some elements which don't have notparallel set are being built with -j1. I'm not sure what's happening exactly.

@gtristan gtristan reopened this Dec 6, 2021
@abderrahim
Copy link
Contributor

What I have noticed is that if the element I'm passing on the command line has notparallel set, other elements are built with -j1 and if it doesn't have notparallel set, elements with notparallel are built with -j8.

My current theory is that the variable in the plugin yaml are being expanded on the first use of the plugin (rather than making a copy and expanding the variables there).

@abderrahim
Copy link
Contributor

So the above theory is correct: expanding the variables in https://github.com/apache/buildstream/blob/master/src/buildstream/element.py#L2834 changes the value of the class variable __defaults.

The line responsible for this is default_env._composite(environment) as defined in Element.__extract_environment(). The following change fixes the issue, though I'm not sure if it's the right approach or if the _composite() method should be modified instead.

diff --git a/src/buildstream/element.py b/src/buildstream/element.py
index ee3b2fc75..b1c8d16a0 100644
--- a/src/buildstream/element.py
+++ b/src/buildstream/element.py
@@ -2947,7 +2947,7 @@ class Element(Plugin):
         else:
             environment = project.base_environment.clone()
 
-        default_env._composite(environment)
+        default_env.clone()._composite(environment)
         element_env._composite(environment)
         environment._assert_fully_composited()

@jjardon
Copy link
Contributor

jjardon commented Jan 17, 2022

Build in fdsdk using bst2 fail because of this atm: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/jobs/1979958539

@gtristan

@gtristan
Copy link
Contributor

For now I'm adding this basic test which shows notparallel working properly: #1570

It would be good to expand on this to reproduce this quirky behavior we are seeing in fdsdk.

@gtristan
Copy link
Contributor

gtristan commented Jan 18, 2022

Maybe the lines of code in master have shifted since this comment, my 2834 says:

    self.__environment = unexpanded_env.strip_node_info()

So the above theory is correct: expanding the variables in https://github.com/apache/buildstream/blob/master/src/buildstream/element.py#L2834 changes the value of the class variable __defaults.

I'll try to summarize what should be happening here in __initialize_from_yaml() (which is outlined here in the docs):

  • self.__init_defaults() will initialize the class variable __defaults, this is a composition of project level overrides on top of plugin provided defaults
    • Questionably this composition should also be composited on top of project.base_variables, I think the reason why we cannot immediately cache builtin -> project.conf -> plugin -> project.conf overrides is because of the dual-pass loading of project.conf where the first pass is not a complete load.
  • self.__extract_variables() will compose the variables
    • The previously cached default_vars is loaded from the class variable __defaults
    • The default_vars (which is only plugin + overrides) is composited on top of the project.conf vars
    • The element_vars is composited on the very top of this composition and has the last word
  • Now that we have a fully sane composition of the variables MappingNode for this particular element instance, the Variables() object instance is created for this element instance
    • This is where the notparallel magic happens, forcing the max-jobs variable to 1, this seems like the right time to do this to me.
  • self.__extract_environment() will now compose the environment following similar rules
    • The previously cached default_env is loaded from the class variable __defaults
    • The default_env (which is only plugin + overrides) is composited on top of the project.conf environment
    • The element_env is composited on the very top of this composition and has the last word
  • Now the previously created Variables() instance is used to expand variables in the fully composited unexpanded_env, at this point the Variables() instance should have max-jobs set to 1 already and we should have the correct MAKEFLAGS

The line responsible for this is default_env._composite(environment) as defined in Element.__extract_environment(). The following change fixes the issue, though I'm not sure if it's the right approach or if the _composite() method should be modified instead.

If your patch does indeed change something, then I think you've indeed found a bug in MappingNode.composite(), this function should modify the target node only and should not cause residual side effects in the source node.

@abderrahim
Copy link
Contributor

abderrahim commented Jan 18, 2022

Yup, lines have shifted. The line in question is

self.__variables.expand(unexpanded_env)

So what happens is that Node._composite() does a shallow copy of the nodes, then Variables.expand() modifies them in-place, resulting in a modified __defaults class variable. which is then composited into other elements with the wrong value.

Here is a minimal reproducer for the issue:

from buildstream.node import Node
from buildstream._variables import Variables

default = Node.from_dict({'MAKEFLAGS': '-j%{max-jobs}'})
element = Node.from_dict({'name': 'my-element.bst'})
v = Variables(Node.from_dict({'max-jobs': '1'}))

print(default, element, sep='\n')
# [synthetic node]: {'MAKEFLAGS': '-j%{max-jobs}'}
# [synthetic node]: {'name': 'my-element.bst'}

default._composite(element)

print(default, element, sep='\n')
# [synthetic node]: {'MAKEFLAGS': '-j%{max-jobs}'}
# [synthetic node]: {'name': 'my-element.bst', 'MAKEFLAGS': '-j%{max-jobs}'}

v.expand(element)

print(default, element, sep='\n')
# [synthetic node]: {'MAKEFLAGS': '-j1'}
# [synthetic node]: {'name': 'my-element.bst', 'MAKEFLAGS': '-j1'}

@abderrahim
Copy link
Contributor

I just stumbled upon this old MR: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1929

It seems to suggest we're indeed supposed to use Node.clone() with composition.

CC: @BenjaminSchubert

@BenjaminSchubert
Copy link
Contributor

I think that's indeed the right thing to do here as far as I can tell

@gtristan
Copy link
Contributor

gtristan commented Jan 20, 2022

I think that's indeed the right thing to do here as far as I can tell

I think this is clearly wrong.

Any code that I can think of which composites one mapping on top of another mapping, expects the source mapping to remain unmodified by the operation, only the target mapping should be affected by composition.

What is likely happening here is that MappingNode._composite() is failing to overwrite the target mapping with clones of the source node, which it should be doing for mutable values at least.

@nanonyme
Copy link
Contributor

Are we still going to merge the bug fix? This is from freedesktop-sdk point of view beta blocker.

@gtristan
Copy link
Contributor

Are we still going to merge the bug fix? This is from freedesktop-sdk point of view beta blocker.

I don’t want to merge the workaround for what I think is clear cut broken behavior of MappingNode._composite(), this is probably an indicator of further brokenness we haven’t discovered yet. Probably this went unnoticed until now because we don’t do very much compositing of nested dictionaries and lists.

I’ve asked @BenjaminSchubert to come back and comment on this.

@gtristan
Copy link
Contributor

[...]

Here is a minimal reproducer for the issue:

from buildstream.node import Node
from buildstream._variables import Variables

default = Node.from_dict({'MAKEFLAGS': '-j%{max-jobs}'})
element = Node.from_dict({'name': 'my-element.bst'})
v = Variables(Node.from_dict({'max-jobs': '1'}))

print(default, element, sep='\n')
# [synthetic node]: {'MAKEFLAGS': '-j%{max-jobs}'}
# [synthetic node]: {'name': 'my-element.bst'}

default._composite(element)

print(default, element, sep='\n')
# [synthetic node]: {'MAKEFLAGS': '-j%{max-jobs}'}
# [synthetic node]: {'name': 'my-element.bst', 'MAKEFLAGS': '-j%{max-jobs}'}

v.expand(element)

print(default, element, sep='\n')
# [synthetic node]: {'MAKEFLAGS': '-j1'}
# [synthetic node]: {'name': 'my-element.bst', 'MAKEFLAGS': '-j1'}

So I'm trying to put together a minimal reproducer with a proper end-to-end test in tests/format/variables.py but still having trouble reproducing this.

I've got 2 manual elements where one depends on the other, and I'm unable to get one of them to pollute the other, by setting notparallel in one element and observing max-jobs in bst show --format '%{vars}' output.

Setting either element to notparallel does not pollute the second element, even adding a third element where the base element and toplevel element are both notparallel but the middle element does not specify notparallel still yields the expected result.

Were you able to reproduce this in a simplified end to end test ?

@abderrahim
Copy link
Contributor

Nope, my failed attempt is here: https://github.com/abderrahim/incorrect-max-jobs

I'll try to find out what the problem is.

@nanonyme
Copy link
Contributor

Maybe composite itself could be unittested to get it to break?

@abderrahim
Copy link
Contributor

Here you go: https://github.com/abderrahim/incorrect-max-jobs

bst show --format %{env} parallel.bst and bst show --format %{env} notparallel.bst will show the correct MAXJOBS variable. but bst show --format %{env} parallel.bst notparallel.bst will show the same value for both elements.

@gtristan
Copy link
Contributor

Maybe composite itself could be unittested to get it to break?

Possible but I'm really more interested in reproducing the issue with an end-to-end test, I'd rather keep internal unit testing down to a minimum if and when it is at all possible to reproduce a bug in action.

gtristan added a commit that referenced this issue Feb 3, 2022
…el issue

This test recreates the scenario described in #1360 in an end-to-end test,
by providing a custom plugin which makes use of %{max-jobs} substitution in
the default environment variables provided in the plugin's default YAML.
@gtristan
Copy link
Contributor

gtristan commented Feb 3, 2022

Here you go: https://github.com/abderrahim/incorrect-max-jobs

bst show --format %{env} parallel.bst and bst show --format %{env} notparallel.bst will show the correct MAXJOBS variable. but bst show --format %{env} parallel.bst notparallel.bst will show the same value for both elements.

I've provided an isolated test case for this in the https://github.com/apache/buildstream/tree/tristan/test-notparallel branch now (#1570).

To observe the broken behavior, you can type tox -- tests/format/variables.py::test_notparallel_twice

gtristan added a commit that referenced this issue Feb 3, 2022
As we've observed in #1360, it occurs that in composition, instead of
overwriting the target nodes with copies of source values, we insert the
mutable source values directly onto the target, this leaves the source
nodes vulnerable to future mutations when the code goes on to mutate
the resulting composited target.

Fixes #1360.
gtristan added a commit that referenced this issue Feb 3, 2022
…el issue

This test recreates the scenario described in #1360 in an end-to-end test,
by providing a custom plugin which makes use of %{max-jobs} substitution in
the default environment variables provided in the plugin's default YAML.
@gtristan
Copy link
Contributor

gtristan commented Feb 3, 2022

Now #1570 has been updated to contain both:

  • The updated test case which triggers the issue
  • The fix for MappingNode._composite() which ensures that we overwrite target nodes with clones of source nodes, so that the composite source is not left vulnerable to mutations when code goes on to mutate the composited target node.

gtristan added a commit that referenced this issue Feb 3, 2022
…el issue

This test recreates the scenario described in #1360 in an end-to-end test,
by providing a custom plugin which makes use of %{max-jobs} substitution in
the default environment variables provided in the plugin's default YAML.
gtristan added a commit that referenced this issue Feb 3, 2022
…el issue

This test recreates the scenario described in #1360 in an end-to-end test,
by providing a custom plugin which makes use of %{max-jobs} substitution in
the default environment variables provided in the plugin's default YAML.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants