Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full Go 1.21 support #500

Merged
merged 16 commits into from
Apr 5, 2024

Conversation

eskultety
Copy link
Member

@eskultety eskultety commented Mar 18, 2024

Maintainers will complete the following section

This PR enables full Go 1.21 support by the means of instructing Go to download the desired toolchain viaGOTOOLCHAIN=auto. This is limited to 1.21 only, so 1.22 will be explicitly refused (1.22 will be added by a standalone PR).

Notes:

  • first 3 patches are some generic tiny fixes

  • EnvironmentVariable was adjusted to act as a more generic resolvable template (e.g. to allow GOPROXY=file://${GOMODCACHE}/foo)

  • Go 1.21 full support is enabled using GOTOOLCHAIN and GOPROXY

  • Commit messages are descriptive enough

  • Code coverage from testing does not decrease and new code is covered

  • Docs updated (if applicable)

  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

for i, m in enumerate(mappings):
log.debug(f"EnvironmentVariable resolution iteration {i}.: {ret}")

placeholders_seen.append(m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested variable resolution is very cool 👍

But in the current state, cycle detection doesn't work correctly (depends on the order of the mapping)

>>> e = EnvironmentVariable(name="A", value="$C")
>>> e.resolve_value({"C": "$B", "B": "1"})
'1'
>>> e.resolve_value({"B": "1", "C": "$B"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/acmiel/RedHat/Cachito/cachi2/cachi2/core/models/output.py", line 66, in resolve_value
    raise Cachi2Error(
cachi2.core.errors.Cachi2Error: Detected a cycle in environment variable expansion of 'A'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Nasty bug, shameful...@chmeliik thanks for pointing this out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you'll try to solve this meticulously first thing in the morning, but I wonder if we need nested variable resolution in the first place. I does look nice, but it adds quite some complexity, and maybe we don't need it right now.

Our current use case can be solved if we declare GOPROXY as:

file://${output_dir}/deps/gomod/pkg/mod/cache/download

... and we can hold on adding nested variable resolution for when a use case that can't be solved this way comes up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chmeliik updated with IMO a more robust solution, see if you can spot another resolution problem. I also tweaked and rearranged the test data so that your finding is covered.

Copy link
Member Author

@eskultety eskultety Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brunoapimentel I understand the motivation and what you suggest was my initial approach, but then I figured that I hate maintaining duplicate static strings which leads to the need to fix them eventually at all places. It all comes down to immediate effect vs hardening. IMO it is the immediate effect fixes that usually lead to and add up to the technical debt that will likely never be addressed, but there surely is a line to be drawn. If you feel strongly about the added complexity (which won't even be exercised 99% of the cases) being too high, then I'll drop the change no problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it, my reasoning was mostly to avoid extra work, since we will need to reach a reasonable solution with this nested variable resolution algorithm.

@eskultety eskultety force-pushed the go_toolchain_auto branch 2 times, most recently from 98eede5 to 6b4d729 Compare March 19, 2024 10:35
@eskultety
Copy link
Member Author

since v2:

@eskultety
Copy link
Member Author

since v3:

  • GOTOOLCHAIN=auto is only used for dependency fetching, but it's not enforced upon users by the means of generate-env
  • toolchain keyword in go.mod file is considered as well when checking the maximum supported version violation (due to the use of GOTOOLCHAIN=auto)
  • container image is preinstalled with a static 1.21.0 Go SDK along with 1.20
  • some minor debug log adjustments
  • rebase on main

@eskultety
Copy link
Member Author

since v4:

  • add documentation for Go 1.21 full support

@eskultety
Copy link
Member Author

@brunoapimentel @taylormadore sorry for the delay, please have another look at the changes. I'll work on some integration tests in the meantime.

@eskultety
Copy link
Member Author

Since v5:

  • fix integration tests by generating fresh mock data

Copy link
Contributor

@brunoapimentel brunoapimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only major comment is about the GOTOOLCHAIN environment variable that we're setting up for .build-config.json. Besides that, it LGTM!

README.md Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Show resolved Hide resolved
mkdir /usr/local/go && \
mv "$HOME/sdk/go1.20" /usr/local/go && \
rm -rf "$HOME/go" "$HOME/.cache/go-build/"
RUN for go_ver in "go1.20" "go1.21.0"; do \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem for this PR, but I wonder how we'll enable hermetic Cachi2 builds of Cachi2 since we need to pull these toolchains from the internet. My first idea was to try to create some sort of go-deps directory, but then, how can we ask Go to fetch toolchains <= 1.21.0? For toolchains > 1.21.0, the download is easy to setup.

In any case, we don't need to solve this now, I'm just thinking out loud.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we ask Go to fetch toolchains <= 1.21.0? For toolchains > 1.21.0, the download is easy to setup.

Last time I looked at the cache, it seemed like we'd have to reverse engineer how Go creates the directory structure in GOMODCACHE and copy files over there by ourselves from the SDKs downloaded the same way as in this code block. I think a separate discussion is in place and things should become a bit easier if we conclude we don't want to support EOL'd versions of Go in upstream cachi2.

Copy link
Contributor

@taylormadore taylormadore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this LGTM aside from the output GOTOOLCHAIN=local 🚀

All of the local testing I've done looks good 👍

cachi2/core/package_managers/gomod.py Show resolved Hide resolved
Copy link
Contributor

@taylormadore taylormadore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm going to approve it anyway given that I won't be here tomorrow and that is a quick fix

This fixture used to return a dictionary. While perfectly fine as is,
we then did it differently in Yarn (commit 6e3711c) breaking the
consistency by returning a list of EnvironmentVariables. Since the
latter may seemingly be a little more superior, match it here for
consistency reasons.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
This fixture generates a static list of environment variables, so while
not being called from that many places, there's not reason this
shouldn't be a module-scoped fixture.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
This was some historical remnant that used '.format()' instead of
f-strings for in-place formatting (i.e. immediate replacement).

Signed-off-by: Erik Skultety <eskultet@redhat.com>
When inspecting logs from a fetch one can see that we're trying to
locate a toolchain in the predefined download locations, but without
hinting on whether we have succeeded in doing so. We probably have, but
it nevertheless complicates the process of debugging incorrect
toolchain selection or issues with missing toolchain cached downloads.
Fix this by appending "SUCCESS/FAIL" to the affected log string.

While at it, add an extra debug message stating that the necessary Go
toolchain SDK will be downloaded to process the request if we haven't
found it either installed on the system or cached by us from a previous
run.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
We report this as INFO as soon as we query the bundled version of Go,
potentially misleading users into believing that this is the version of
Go we used to process the project which is not necessarily true and
which won't be true at all in the future where we'll base our efforts
on the .0 toolchain releases of each minor version of Go along with
GOTOOLCHAIN=auto setting.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Our existing design represents environment variables in 2 distinct
types of abstraction - paths & literals. What it means in practice is
that whenever we need to generate a dump environment variables, we
    1. Parse the variables from .build-config.json
    2. Determine the path to the previously fetched dependencies for
       which we're going to be generating env variables
    3. Loop over our representation of env variables and
        a) if it's simple string variable (aka "literal") we dump it as
           is OR
        b) if it's a path-representing variable we prepend the absolute
           path to the output directory containing the pre-fetched
           dependencies to the relative path bits we store in the
           env variable abstraction (i.e. EnvironmentVariable)
    4. Write each "resolved" variable to the output

^This design however comes short when we need to generate an equivalent
of a nested Bash variable, e.g. FOO="<foo>${BAR}<foobar>" because the
logic for neither of the kinds described above can handle this
scenario. It's also not as simple as forcing the variable placeholder
(i.e. ${BAR}) in the output because the output is syphoned through
shlex.quote for proper escaping safety resulting in e.g.

    FOO='foo${BAR}foobar'

which naturally won't work in Bash (mind single vs double quotes!).

The proposed alternative approach is to treat all environment variables
represented by EnvironmentVariable as templates, i.e. ditching the
concept of "kinds" as template by its nature should cover all
scenarios. That on its own, however, would result in a backwards
incompatible solution so this patch makes sure that whatever format of
variables was generated by previous versions of cachi2 is still
understood properly.

What it means in practice for this particular patch is that we'll be
able to add arbitrary placeholders into the env variable value string
and at the same time (as a nice side effect) treat our internal
structures storing these as simple key-value pair dictionaries:
    # in code
    "YARN_GLOBAL_FOLDER": "${output_dir}/deps/yarn"

    # in .build-config.json
    {"name": "YARN_GLOBAL_FOLDER", "value": "${output_dir}/deps/yarn"}

    # in a Bash env file
    export YARN_GLOBAL_FOLDER='/path/to/output_dir/deps/yarn'

Note that this particular patch only adds simple templating to the
EnvironmentVariable model, environment variable "nesting" will be added
in a future patch.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
With EnvironmentVariable value templating (i.e. allowing usage of
placeholders in the string value) we can take this one step further and
add support for variable "nesting", e.g. "FOO='${BAR}foo', BAR='bar'".

The problem at hand is detecting/avoiding cycles in placeholder
substitution which is the Halting problem though we should be able to
get away using a very naive and potentially light on resources solution
(i.e. not recursion) - limit the substitutions to N where N is the
number of environment variables evaluated. After N rounds of
substitutions we'll stop the substitution, leaving the processed
variable potentially unresolved (though unlikely). As for detecting
cycles we'll loop over all variables we know of, remembering and
comparing them in each substitution step as something we have already
seen --> cycle.

The naiveness of this solution stems from the fact that we're assuming
that N may never be high enough to deplete the available memory
which is a fair assumption with respect to individual package managers'
environment variable support.

What we gain with this patch is simplicity in being able to express
dependency relations between individual variables.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
There's no need to take path to the whole source repo directory, this
is tiny self-contained helper that only needs path to a given go.mod
file.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
_resolve_gomod is a beast of a function. If we add Go toolchain
selection into the mix we're making it that much harder for ourselves
to test the selection functionality properly. Therefore, extract the Go
toolchain selection into a smaller helper that can be unit tested
separately without _resolve_gomod in the picture.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Unfortunately, with Go 1.21 it's not enough to rely solely on what the
mod file declares with the 'go' line, but also what it declares with
the 'toolchain' line. While 'toolchain' is only suggestive, i.e. not
usually enforced, the situation changes when combined with
GOTOOLCHAIN=auto. Start parsing that keyword for future use.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Future patches will enable automatic toolchain handling with
GOTOOLCHAIN=auto meaning that Go will download a missing toolchain
on its own, however that also means that it could easily download a
version of Go which wasn't vetted by cachi2 with regards to feature
safety to avoid arbitrary code execution, etc.

Therefore, this patch sets an upper boundary on the supported Go
version. In this case, anything above 1.22 will automatically fail the
request.

Note that because we're going to set GOTOOLCHAIN=auto for our
dependency fetching in a future patch we must also check the version
coming from the 'toolchain' keyword. The reason for that is that
despite 'toolchain' being only suggestive, it becomes enforced with
GOTOOLCHAIN=auto and if someone writes the following go.mod file we'd
fetch and use a toolchain version we have not vetted for our project's
use:
    go 1.21
    toolchain 1.22.0

Signed-off-by: Erik Skultety <eskultet@redhat.com>
If the destination exists an error will be emitted and the container
image build will fail. Use '-p' which besides creating parent
directories makes sure that no error is emitted for existing
directories in the destination path.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Currently, we only pre install Go 1.20 as a fallback for projects
building with older version of Go due to some specific behavioral
semantic changes to 1.21 which our container image (Fedora 39) is based
on.
Starting Go 1.21, the go toolchain can fetch and use newer toolchains
than the bundled one which is used to process a project from the
command line. However, if the project's Go 1.21.X toolchain requirement
is older than the bundled toolchain doing the processing, the older
toolchain is never downloaded in such case. This design decision makes
things for hermetic builds harder because not having any control over
the container build environment it could potentially fail due a missing
Go toolchain version if we had not prefetched it due to the Go's
version handling.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Commit 7f73464 added preliminary support for Go 1.21 only, meaning
that we'd use a fixed toolchain version of 1.21.0 (or the bundled Go
toolchain, whichever was bigger). Some projects do bump even the micro
version of their required Go toolchain which means that we'd be
forced to refresh the container image with every Go micro release to
keep up with these bumps for the dependency prefetching (looking at you
Openshift).

Starting with Go 1.21, however, one is able to instruct Go to download
a newer toolchain automatically if the installed one doesn't comply
with the requirement in go.mod. That behaviour is controller by the
GOTOOLCHAIN variable which will need to be set as 'GOTOOLCHAIN=auto' to
override the default behaviour of Go (depending on the installation)
which always prefers the installed toolchain.

One thing to consider is the fact that a newer toolchain never
downloads an older one, meaning that if the project's build image
doesn't supply the desired version of Go (either with through the 'go'
line or the 'toolchain' line) and the consumer overrides the
GOTOOLCHAIN setting to 'auto', the build will fail in either case
because the correct toolchain version won't be available.
To mitigate this on our side we'll always base our processing on the
0th release of a minor Go version (i.e. 1.21.0, 1.22.0, etc.) which
ensures that with GOTOOLCHAIN=auto Go will always download and cache
any newer toolchain from any dependent module.

Note that we're only going to set GOTOOLCHAIN=auto for the dependency
fetching, i.e. in our own environment. Consumers are free
to override the GOTOOLCHAIN at their will in their build environment.
With that in mind, we're also dropping the GOTOOLCHAIN=local override
we have been forcing upon the users so far for the very same reason.

Additionally, in context of hermetic (offline) builds we also need to
set the GOPROXY variable as 'GOPROXY=$GOMODCACHE/cache/download' in the
consumer build environment to enable consumers to actually make use of
any downloaded toolchains otherwise Go would try to fetch them from the
internet in a network-isolated environment.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
@eskultety
Copy link
Member Author

Since v6:

  • docs changes as requested in review
  • dropped explicit GOTOOLCHAIN=local setting in the generated env file to really let users to set the variable however they like
  • moved the GOTOOLCHAIN=auto enablement to the correct commit

Since there are some behavioral traits involved that are worth
explaining to further detail, especially with full 1.21 support, let's
move this section to the dedicated gomod docs. This feels appropriate
especially since a future patch will provide even more details onto
the matter.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Document that future cachi2 release will process Go projects by the
means of GOTOOLCHAIN=auto setting to keep up with frequent micro
release version bumps by projects.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
@eskultety
Copy link
Member Author

Since v7:

  • drop the go backwards compatibility footnote as it dropping was incorrect
  • convert since cachi2-v0.6.0 from bold italic to just italic (typo)

Copy link
Contributor

@brunoapimentel brunoapimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fighting this dragon, Erik 😄

@eskultety eskultety added this pull request to the merge queue Apr 5, 2024
Merged via the queue into containerbuildsystem:main with commit 8110f95 Apr 5, 2024
15 checks passed
@eskultety eskultety deleted the go_toolchain_auto branch July 17, 2024 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants