Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.12 Backports 2023-09-26 #28295

Merged
merged 19 commits into from
Oct 2, 2023
Merged

v1.12 Backports 2023-09-26 #28295

merged 19 commits into from
Oct 2, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Sep 26, 2023

Once this PR is merged, you can update the PR labels via:

for pr in 23093 25426 27567 27996 28131 28122 28133 28161 28145 28172 28249 28225; do contrib/backporting/set-labels.py $pr done 1.12; done

or with

make add-labels BRANCH=v1.12 ISSUES=23093,25426,27567,27996,28131,28122,28133,28161,28145,28172,28249,28225

@giorio94 giorio94 added kind/backports This PR provides functionality previously merged into master. backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. labels Sep 26, 2023
@giorio94
Copy link
Member Author

/test-backport-1.12

@giorio94 giorio94 marked this pull request as ready for review September 26, 2023 16:40
@giorio94 giorio94 requested review from a team as code owners September 26, 2023 16:40
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backports for my doc PRs look good, thank you!

Copy link
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my changes.

@giorio94
Copy link
Member Author

/test-1.16-4.9

@giorio94
Copy link
Member Author

/test-1.22-4.9

@giorio94
Copy link
Member Author

/test-1.24-net-next

@giorio94
Copy link
Member Author

/test-runtime

@giorio94
Copy link
Member Author

/test-1.23-4.19

@giorio94
Copy link
Member Author

/test-upstream-k8s

@giorio94
Copy link
Member Author

All previous failures occurred because Jenkins had been restarted while the tests were in progress.

@giorio94
Copy link
Member Author

The build commits check failed due to timeout. Manually run:

  • git rebase --exec "make -C bpf build_all -j $(nproc)" 6bb3bb849ef8^
  • git rebase --exec "make -C test build -j $(nproc)" 6bb3bb849ef8^

which both completed successfully locally.

@margamanterola
Copy link
Member

For my docs commit, in the 1.13 backport, Donia had actually introduced the whole paragraph:
b98de6b

Can we do the same here? Thanks!

@lmb
Copy link
Contributor

lmb commented Sep 27, 2023

The build commits check failed due to timeout.

Do we need to bump the timeout then?

@giorio94
Copy link
Member Author

For my docs commit, in the 1.13 backport, Donia had actually introduced the whole paragraph: b98de6b

Can we do the same here? Thanks!

Sure, no problem. I've updated the commit to additionally include that paragraph.

@giorio94
Copy link
Member Author

The build commits check failed due to timeout.

Do we need to bump the timeout then?

That's what I wonder as well. The most demanding part seems to be building all the datapath configurations, which is skipped if no BPF changes are part of the PR. That's probably the reason why we don't hit this issue consistently in backport PRs (and then the number of commits also plays an important role).

@giorio94
Copy link
Member Author

giorio94 commented Sep 27, 2023

/test-backport-1.12

Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: DNS entry is not ready after timeout

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.22-kernel-4.9/140/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.9 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.17-kernel-4.9' hit: #23318 (94.58% similarity)

Copy link
Member

@margamanterola margamanterola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for my commit, thanks

jschwinger233 and others added 17 commits September 28, 2023 09:00
[ upstream commit 48023b9 ]

This commit makes conn-disrupt-test a github action, so upgrade test
and IPsec key rotation test don't have to copy and paste everywhere.

The idea is to allow caller workflow to specify the commands to execute,
then this action will follow the steps:
1. Run "cilium-cli connectivity test --conn-disrupt-test-setup";
2. Run whatever caller workflow passes: could be upgrade operation or
   IPsec key rotation;
3. Run "cilium-cli connectivity test --include-conn-disrupt-test";

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 85481c4 ]

When executing `cilium encrypt status`, cilium-agent lists xfrm states
and counts number of different AEAD keys. However, cilium-agent panics
if there is any xfrm state using non-AEAD algorithm. These unexpected
xfrm states could be installed by other applications.

To reproduce the panic, we can manually install one by running command:

```
ip x s a src 1.1.1.1 dst 1.1.1.2 proto esp spi 0x3 reqid 1 mode tunnel enc aes 0xf0e1d2c3b4a5f60708090a0b0c0d0e0f
```

Then `cilium encrypt status` crashes.

This patch fixes it.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit ce51167 ]

Previously, the CiliumExecContext() function would retry the cilium command up to 5 times (limitTimes) without verifying the success of previous executions. This pull request introduces a logic enhancement that validates the return code of each executed command and exits upon success.

Additionally, this change optimizes test execution time by reducing unnecessary retries. With a 200ms delay between retries, the overall test execution time is expected to improve by at least 200ms multiplied by the number of cilium commands and the number of cilium pods involved.

Local Ginkgo tests, specifically those focused on 'K8sDatapathBGPTests' 'K8sDatapathCustomCalls' and 'K8sDatapathLRPTests' have shown significant improvements:

Before:
`Ran 2 of 106 Specs in 233.348 seconds`

After:
`Ran 2 of 106 Specs in 168.563 seconds`

Signed-off-by: Boris Petrovic <carnerito.b@gmail.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 071c2af ]

[ backporter's notes: skipped the hunk related to bpf_tests, as not
  present in the v1.12 branch. ]

In the past, we have merged automated updates to the coccicheck image
version (for example, [0]) without realising they may break the
workflow, because the workflow wouldn't run on the PR, which contained
no BPF-related changes.

Let's make the workflow run when its definition file is updated, to make
sure we can catch similar issues in the future.

[0] #27947

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 02c17d5 ]

[ backporter's notes: adapted the hunk to the different variable name. ]

CB_SRC_LABEL is cleared just a few lines above. So to get the source's
actual security identity, we need to use our local variable.

Fixes: e2829a0 ("bpf: lxc: support Pod->Service->Pod hairpinning with endpoint routes")
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit d987c37 ]

The Makefile for the documentation makes sure that the docs-builder
image is always present and up-to-date before building the docs. To do
so, it simply rebuilds the image, every time.

This is efficient to make sure the image is up-to-date, but not that
much in terms of build duration. Let's make it possible to skip that
step if the user is confident their image is present and up-to-date.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit e58d673 ]

Allow users to set the INCREMENTAL environment variable to avoid passing
the "-E" option to Sphinx when skipping the linters. This way, we can
have Sphinx do incremental builds instead of re-reading and re-writing
all files. This can be used for quick iterations on a local setup.

There is no need to make "-E" conditional for the spell checker run,
given that the spell checker seems to always re-read all files anyway.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 38d5669 ]

The documentation build targets, including html, have a dependency on
update-helm-values. There is no clear reason for that: this seems to be
for historical reasons.

- Initially, the dependency was on $(HELM_VALUES), because the file with
  Helm values had not been committed to the Git repo at the time, so we
  needed to ensure that it was generated first.

- When reorganising checks for Helm values in 6167f8a ("docs: check
  updates for the Helm reference"), we replaced it with
  "update-helm-values" which looked cleaner; but we don't need to run
  the _check_ at this stage, only to make sure the _file_ itself is
  available.

Now that the file is part of the repository, there's no need to check
for its existence, and we can simply remove the target dependency (to
align it with other files that receive automatic updates, such as
codeowners.rst or crdlist.rst).

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit cf3dc21 ]

[ backporter's notes: skipped a few hunks modifying targets not present
  in the v1.12 branch. ]

Let's make sure all Makefile targets are properly documented, and nicely
displayed when running "make help".

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 4d699d8 ]

[ backporter's notes: skipped a few hunks modifying targets not present
  in the v1.12 branch. ]

Move around targets inside of Documentation/Makefile to have them
displayed in a more sensible order on "make help".

    $ make help

    Usage:
      make <target>

    Targets (default: "html")

    Development Images
      base-image                    Build the docs-base image for updating the requirements.txt file.
      builder-image                 Build the docs-builder image for rendering and checking the documentation.

    Auto-generated Contents Updates and Validation
      api-flaggen                   Update the table of API flags restrictions.
      update-cmdref                 Update the command reference documents (agent, bugtool, operators, etc.).
      update-codeowners             Update the description of the code owner teams.
      update-crdlist                Update the list of CRDs.
      update-helm-values            Update the Helm reference documentation.
      check                         Validate command and Helm references, policy examples, and others.

    Build
      html                          Check documentation and render it under the specified format.
      epub                          Check documentation and render it under the specified format.
      latex                         Check documentation and render it under the specified format.
      live-preview                  Build and serve the documentation locally.

    Development
      update-requirements           Regenerate the requirements.txt file from requirements-min/requirements.txt.
      clean                         Clean up all artefacts from documentation.
      help                          Display help for the Makefile.

Also add target update-cmdref to .PHONY.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 5b77a54 ]

Since commit 944dddf ("docs: Symlink (instead of copying) the API
reference"), we symlink the _api/ directory (from ../api) instead of
copying it. So there's nothing to clean on that side on "make clean",
otherwise we lose the link.

Fixes: 944dddf ("docs: Symlink (instead of copying) the API reference")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 2b4f6df ]

[ backporter's notes: reapplied the main hunk to the corresponding file,
  due to the different docs structure. Additionally backported the
  paragraph introduced in 9a01260,
  including the modification from the upstream commit. ]

Rather than having this information hidden in a parenthesis in the
troubleshooting section, let's have a full paragraph in the key rotation
section that explains what needs to be done when using Cluster Mesh.

Signed-off-by: Marga Manterola <marga@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 94da061 ]

[ backporter's notes: picked Documentation/requirements.txt verbatim
  from the main branch to keep it consistent across versions, as we
  nonetheless use the same image in CI. ]

Now that we have rebased our theme onto the upstream version, we can use
newer versions of docutils, which allows us to update Sphinx and a bunch
of dependencies. We're still limited to docutils 0.18 because of
sphinx-tabs, though.

One immediate improvement is that we get working links for the value
types in the gRPC reference.

There are newer versions of Sphinx, websupport, or some second-level
dependencies available (currently 7.2.6 and 1.2.6, respectively), but
Netlify doesn't know about them yet and fails to build the previews.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 30c4df9 ]

In the past, we disabled Sphinx's output in check-build.sh. This was
because of the way we handle warnings: in order to have them printed at
the end of the logs, we print them to a file, we filter them, and print
the outstanding ones at the end of the output. Keeping the standard
output meant that warnings or errors wouldn't appear in the middle of
this output, and warnings that we filtered out wouldn't appear at all,
which could be confusing.

With the recent Sphinx update, we are able to filter the warnings from
MyST more efficiently, directly from conf.py, and to get rid of the
filters.

Let's take advantage of this to re-enable the output from Sphinx. While
it's not necessary to build the docs, it's occasionally helpful to debug
the setup when things don't go as expected.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 2f7fbfe ]

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 4f1f08e ]

The plague of copy/pasting strikes again...

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit c6fbeb1 ]

[ backporter's notes: fixed trivial conflict accepting the combination
  of the changes. ]

This target should completely ignore the existing contents of
`cilium/values.yaml` since we always generate from the template files.

Co-authored-by: Donia Chaiehloudj <donia.cld@isovalent.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

giorio94 commented Sep 28, 2023

/test-backport-1.12

Job 'Cilium-PR-K8s-1.18-kernel-4.9' has 1 failure but they might be new flake since it also hit 1 known flake: #27118 (97.27% similarity)

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thank you!

@julianwiedmann julianwiedmann removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Oct 2, 2023
@giorio94
Copy link
Member Author

giorio94 commented Oct 2, 2023

/test-1.18-4.9

Hit #27118

@julianwiedmann julianwiedmann mentioned this pull request Oct 2, 2023
2 tasks
@giorio94
Copy link
Member Author

giorio94 commented Oct 2, 2023

Manually run again the build commits steps locally, as timed out in the CI.
All the other tests are green, marking as ready to merge.

@giorio94 giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 2, 2023
@aanm aanm merged commit 0abe91c into v1.12 Oct 2, 2023
89 of 90 checks passed
@aanm aanm deleted the pr/v1.12-backport-2023-09-26 branch October 2, 2023 13:24
@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet