-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add full Go 1.21 support #500
Add full Go 1.21 support #500
Conversation
cachi2/core/models/output.py
Outdated
for i, m in enumerate(mappings): | ||
log.debug(f"EnvironmentVariable resolution iteration {i}.: {ret}") | ||
|
||
placeholders_seen.append(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested variable resolution is very cool 👍
But in the current state, cycle detection doesn't work correctly (depends on the order of the mapping
)
>>> e = EnvironmentVariable(name="A", value="$C")
>>> e.resolve_value({"C": "$B", "B": "1"})
'1'
>>> e.resolve_value({"B": "1", "C": "$B"})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/acmiel/RedHat/Cachito/cachi2/cachi2/core/models/output.py", line 66, in resolve_value
raise Cachi2Error(
cachi2.core.errors.Cachi2Error: Detected a cycle in environment variable expansion of 'A'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! Nasty bug, shameful...@chmeliik thanks for pointing this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you'll try to solve this meticulously first thing in the morning, but I wonder if we need nested variable resolution in the first place. I does look nice, but it adds quite some complexity, and maybe we don't need it right now.
Our current use case can be solved if we declare GOPROXY as:
file://${output_dir}/deps/gomod/pkg/mod/cache/download
... and we can hold on adding nested variable resolution for when a use case that can't be solved this way comes up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chmeliik updated with IMO a more robust solution, see if you can spot another resolution problem. I also tweaked and rearranged the test data so that your finding is covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brunoapimentel I understand the motivation and what you suggest was my initial approach, but then I figured that I hate maintaining duplicate static strings which leads to the need to fix them eventually at all places. It all comes down to immediate effect vs hardening. IMO it is the immediate effect fixes that usually lead to and add up to the technical debt that will likely never be addressed, but there surely is a line to be drawn. If you feel strongly about the added complexity (which won't even be exercised 99% of the cases) being too high, then I'll drop the change no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about it, my reasoning was mostly to avoid extra work, since we will need to reach a reasonable solution with this nested variable resolution algorithm.
98eede5
to
6b4d729
Compare
6b4d729
to
38c0dc8
Compare
since v2:
|
38c0dc8
to
0672652
Compare
0672652
to
e500a41
Compare
since v3:
|
since v4:
|
@brunoapimentel @taylormadore sorry for the delay, please have another look at the changes. I'll work on some integration tests in the meantime. |
7e48e3c
to
278c0b2
Compare
Since v5:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only major comment is about the GOTOOLCHAIN environment variable that we're setting up for .build-config.json
. Besides that, it LGTM!
mkdir /usr/local/go && \ | ||
mv "$HOME/sdk/go1.20" /usr/local/go && \ | ||
rm -rf "$HOME/go" "$HOME/.cache/go-build/" | ||
RUN for go_ver in "go1.20" "go1.21.0"; do \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem for this PR, but I wonder how we'll enable hermetic Cachi2 builds of Cachi2 since we need to pull these toolchains from the internet. My first idea was to try to create some sort of go-deps
directory, but then, how can we ask Go to fetch toolchains <= 1.21.0? For toolchains > 1.21.0, the download is easy to setup.
In any case, we don't need to solve this now, I'm just thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we ask Go to fetch toolchains <= 1.21.0? For toolchains > 1.21.0, the download is easy to setup.
Last time I looked at the cache, it seemed like we'd have to reverse engineer how Go creates the directory structure in GOMODCACHE and copy files over there by ourselves from the SDKs downloaded the same way as in this code block. I think a separate discussion is in place and things should become a bit easier if we conclude we don't want to support EOL'd versions of Go in upstream cachi2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this LGTM aside from the output GOTOOLCHAIN=local 🚀
All of the local testing I've done looks good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm going to approve it anyway given that I won't be here tomorrow and that is a quick fix
This fixture used to return a dictionary. While perfectly fine as is, we then did it differently in Yarn (commit 6e3711c) breaking the consistency by returning a list of EnvironmentVariables. Since the latter may seemingly be a little more superior, match it here for consistency reasons. Signed-off-by: Erik Skultety <eskultet@redhat.com>
This fixture generates a static list of environment variables, so while not being called from that many places, there's not reason this shouldn't be a module-scoped fixture. Signed-off-by: Erik Skultety <eskultet@redhat.com>
This was some historical remnant that used '.format()' instead of f-strings for in-place formatting (i.e. immediate replacement). Signed-off-by: Erik Skultety <eskultet@redhat.com>
When inspecting logs from a fetch one can see that we're trying to locate a toolchain in the predefined download locations, but without hinting on whether we have succeeded in doing so. We probably have, but it nevertheless complicates the process of debugging incorrect toolchain selection or issues with missing toolchain cached downloads. Fix this by appending "SUCCESS/FAIL" to the affected log string. While at it, add an extra debug message stating that the necessary Go toolchain SDK will be downloaded to process the request if we haven't found it either installed on the system or cached by us from a previous run. Signed-off-by: Erik Skultety <eskultet@redhat.com>
We report this as INFO as soon as we query the bundled version of Go, potentially misleading users into believing that this is the version of Go we used to process the project which is not necessarily true and which won't be true at all in the future where we'll base our efforts on the .0 toolchain releases of each minor version of Go along with GOTOOLCHAIN=auto setting. Signed-off-by: Erik Skultety <eskultet@redhat.com>
Our existing design represents environment variables in 2 distinct types of abstraction - paths & literals. What it means in practice is that whenever we need to generate a dump environment variables, we 1. Parse the variables from .build-config.json 2. Determine the path to the previously fetched dependencies for which we're going to be generating env variables 3. Loop over our representation of env variables and a) if it's simple string variable (aka "literal") we dump it as is OR b) if it's a path-representing variable we prepend the absolute path to the output directory containing the pre-fetched dependencies to the relative path bits we store in the env variable abstraction (i.e. EnvironmentVariable) 4. Write each "resolved" variable to the output ^This design however comes short when we need to generate an equivalent of a nested Bash variable, e.g. FOO="<foo>${BAR}<foobar>" because the logic for neither of the kinds described above can handle this scenario. It's also not as simple as forcing the variable placeholder (i.e. ${BAR}) in the output because the output is syphoned through shlex.quote for proper escaping safety resulting in e.g. FOO='foo${BAR}foobar' which naturally won't work in Bash (mind single vs double quotes!). The proposed alternative approach is to treat all environment variables represented by EnvironmentVariable as templates, i.e. ditching the concept of "kinds" as template by its nature should cover all scenarios. That on its own, however, would result in a backwards incompatible solution so this patch makes sure that whatever format of variables was generated by previous versions of cachi2 is still understood properly. What it means in practice for this particular patch is that we'll be able to add arbitrary placeholders into the env variable value string and at the same time (as a nice side effect) treat our internal structures storing these as simple key-value pair dictionaries: # in code "YARN_GLOBAL_FOLDER": "${output_dir}/deps/yarn" # in .build-config.json {"name": "YARN_GLOBAL_FOLDER", "value": "${output_dir}/deps/yarn"} # in a Bash env file export YARN_GLOBAL_FOLDER='/path/to/output_dir/deps/yarn' Note that this particular patch only adds simple templating to the EnvironmentVariable model, environment variable "nesting" will be added in a future patch. Signed-off-by: Erik Skultety <eskultet@redhat.com>
With EnvironmentVariable value templating (i.e. allowing usage of placeholders in the string value) we can take this one step further and add support for variable "nesting", e.g. "FOO='${BAR}foo', BAR='bar'". The problem at hand is detecting/avoiding cycles in placeholder substitution which is the Halting problem though we should be able to get away using a very naive and potentially light on resources solution (i.e. not recursion) - limit the substitutions to N where N is the number of environment variables evaluated. After N rounds of substitutions we'll stop the substitution, leaving the processed variable potentially unresolved (though unlikely). As for detecting cycles we'll loop over all variables we know of, remembering and comparing them in each substitution step as something we have already seen --> cycle. The naiveness of this solution stems from the fact that we're assuming that N may never be high enough to deplete the available memory which is a fair assumption with respect to individual package managers' environment variable support. What we gain with this patch is simplicity in being able to express dependency relations between individual variables. Signed-off-by: Erik Skultety <eskultet@redhat.com>
There's no need to take path to the whole source repo directory, this is tiny self-contained helper that only needs path to a given go.mod file. Signed-off-by: Erik Skultety <eskultet@redhat.com>
_resolve_gomod is a beast of a function. If we add Go toolchain selection into the mix we're making it that much harder for ourselves to test the selection functionality properly. Therefore, extract the Go toolchain selection into a smaller helper that can be unit tested separately without _resolve_gomod in the picture. Signed-off-by: Erik Skultety <eskultet@redhat.com>
Unfortunately, with Go 1.21 it's not enough to rely solely on what the mod file declares with the 'go' line, but also what it declares with the 'toolchain' line. While 'toolchain' is only suggestive, i.e. not usually enforced, the situation changes when combined with GOTOOLCHAIN=auto. Start parsing that keyword for future use. Signed-off-by: Erik Skultety <eskultet@redhat.com>
Future patches will enable automatic toolchain handling with GOTOOLCHAIN=auto meaning that Go will download a missing toolchain on its own, however that also means that it could easily download a version of Go which wasn't vetted by cachi2 with regards to feature safety to avoid arbitrary code execution, etc. Therefore, this patch sets an upper boundary on the supported Go version. In this case, anything above 1.22 will automatically fail the request. Note that because we're going to set GOTOOLCHAIN=auto for our dependency fetching in a future patch we must also check the version coming from the 'toolchain' keyword. The reason for that is that despite 'toolchain' being only suggestive, it becomes enforced with GOTOOLCHAIN=auto and if someone writes the following go.mod file we'd fetch and use a toolchain version we have not vetted for our project's use: go 1.21 toolchain 1.22.0 Signed-off-by: Erik Skultety <eskultet@redhat.com>
If the destination exists an error will be emitted and the container image build will fail. Use '-p' which besides creating parent directories makes sure that no error is emitted for existing directories in the destination path. Signed-off-by: Erik Skultety <eskultet@redhat.com>
Currently, we only pre install Go 1.20 as a fallback for projects building with older version of Go due to some specific behavioral semantic changes to 1.21 which our container image (Fedora 39) is based on. Starting Go 1.21, the go toolchain can fetch and use newer toolchains than the bundled one which is used to process a project from the command line. However, if the project's Go 1.21.X toolchain requirement is older than the bundled toolchain doing the processing, the older toolchain is never downloaded in such case. This design decision makes things for hermetic builds harder because not having any control over the container build environment it could potentially fail due a missing Go toolchain version if we had not prefetched it due to the Go's version handling. Signed-off-by: Erik Skultety <eskultet@redhat.com>
Commit 7f73464 added preliminary support for Go 1.21 only, meaning that we'd use a fixed toolchain version of 1.21.0 (or the bundled Go toolchain, whichever was bigger). Some projects do bump even the micro version of their required Go toolchain which means that we'd be forced to refresh the container image with every Go micro release to keep up with these bumps for the dependency prefetching (looking at you Openshift). Starting with Go 1.21, however, one is able to instruct Go to download a newer toolchain automatically if the installed one doesn't comply with the requirement in go.mod. That behaviour is controller by the GOTOOLCHAIN variable which will need to be set as 'GOTOOLCHAIN=auto' to override the default behaviour of Go (depending on the installation) which always prefers the installed toolchain. One thing to consider is the fact that a newer toolchain never downloads an older one, meaning that if the project's build image doesn't supply the desired version of Go (either with through the 'go' line or the 'toolchain' line) and the consumer overrides the GOTOOLCHAIN setting to 'auto', the build will fail in either case because the correct toolchain version won't be available. To mitigate this on our side we'll always base our processing on the 0th release of a minor Go version (i.e. 1.21.0, 1.22.0, etc.) which ensures that with GOTOOLCHAIN=auto Go will always download and cache any newer toolchain from any dependent module. Note that we're only going to set GOTOOLCHAIN=auto for the dependency fetching, i.e. in our own environment. Consumers are free to override the GOTOOLCHAIN at their will in their build environment. With that in mind, we're also dropping the GOTOOLCHAIN=local override we have been forcing upon the users so far for the very same reason. Additionally, in context of hermetic (offline) builds we also need to set the GOPROXY variable as 'GOPROXY=$GOMODCACHE/cache/download' in the consumer build environment to enable consumers to actually make use of any downloaded toolchains otherwise Go would try to fetch them from the internet in a network-isolated environment. Signed-off-by: Erik Skultety <eskultet@redhat.com>
278c0b2
to
4c4b313
Compare
Since v6:
|
Since there are some behavioral traits involved that are worth explaining to further detail, especially with full 1.21 support, let's move this section to the dedicated gomod docs. This feels appropriate especially since a future patch will provide even more details onto the matter. Signed-off-by: Erik Skultety <eskultet@redhat.com>
Document that future cachi2 release will process Go projects by the means of GOTOOLCHAIN=auto setting to keep up with frequent micro release version bumps by projects. Signed-off-by: Erik Skultety <eskultet@redhat.com>
4c4b313
to
750f640
Compare
Since v7:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for fighting this dragon, Erik 😄
8110f95
Maintainers will complete the following section
This PR enables full Go 1.21 support by the means of instructing Go to download the desired toolchain via
GOTOOLCHAIN=auto
. This is limited to 1.21 only, so 1.22 will be explicitly refused (1.22 will be added by a standalone PR).Notes:
first 3 patches are some generic tiny fixes
EnvironmentVariable
was adjusted to act as a more generic resolvable template (e.g. to allowGOPROXY=file://${GOMODCACHE}/foo
)Go 1.21 full support is enabled using
GOTOOLCHAIN
andGOPROXY
Commit messages are descriptive enough
Code coverage from testing does not decrease and new code is covered
Docs updated (if applicable)
Docs links in the code are still valid (if docs were updated)
Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test
(as is the standard for Pipelines as Code)