Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add support for reloading mirror registries automatically #7788

Merged
merged 2 commits into from
May 3, 2024

Conversation

sohankunkerkar
Copy link
Member

@sohankunkerkar sohankunkerkar commented Feb 16, 2024

Fixes #7748

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add support for automatically reloading mirror registries when there's an update to the registries.conf.d directory 

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Feb 16, 2024
Copy link
Contributor

openshift-ci bot commented Feb 16, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the dco-signoff: no Indicates the PR's author has not DCO signed all their commits. label Feb 16, 2024
@openshift-ci openshift-ci bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Feb 16, 2024
Copy link

codecov bot commented Feb 16, 2024

Codecov Report

Attention: Patch coverage is 18.18182% with 45 lines in your changes are missing coverage. Please review.

Project coverage is 49.58%. Comparing base (8b966a8) to head (980db06).
Report is 28 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7788      +/-   ##
==========================================
- Coverage   49.67%   49.58%   -0.10%     
==========================================
  Files         153      153              
  Lines       16826    16878      +52     
==========================================
+ Hits         8359     8369      +10     
- Misses       7423     7464      +41     
- Partials     1044     1045       +1     

@sohankunkerkar sohankunkerkar force-pushed the automatic-reload branch 2 times, most recently from 365d090 to f2f56dc Compare February 17, 2024 00:07
@openshift-ci openshift-ci bot changed the title server: add support for reloading mirror registries automatically [WIP] server: add support for reloading mirror registries automatically Feb 19, 2024
@sohankunkerkar sohankunkerkar force-pushed the automatic-reload branch 4 times, most recently from 7baad57 to 0edcacb Compare February 19, 2024 18:08
@sohankunkerkar sohankunkerkar marked this pull request as ready for review February 20, 2024 14:34
@sohankunkerkar sohankunkerkar changed the title [WIP] server: add support for reloading mirror registries automatically server: add support for reloading mirror registries automatically Feb 20, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 20, 2024
@sohankunkerkar
Copy link
Member Author

/retest

1 similar comment
@sohankunkerkar
Copy link
Member Author

/retest

server/server.go Outdated Show resolved Hide resolved
server/server.go Outdated Show resolved Hide resolved
server/server.go Outdated Show resolved Hide resolved
server/server.go Outdated Show resolved Hide resolved
@kwilczynski
Copy link
Member

Generally, robust file events watching is non-trivial. Especially when you aim to watch a directory. Aside from cross-platform implementation issues (we have *BSD support in the works; albeit this won't be an issue for a while), which the Go package here aims to abstract, there are cross-platform behaviour issues. And even so, on the same platform (let it be Linux), there have been issues between different kernel versions and different sysctl settings (for fsnotify).

That said, there are also issues with the notifications themselves. File system events are not atomic and can also be sent multiple times for the same path being watched, depending on the platform and the software that interacts with the file somewhere. More often, the software behaviour is something one has to compensate for, but in some cases, the OS will also not behave as expected.

For example, consider the following examples (collected using the very same Go package as the one CRI-O uses):

# date > test

04:11:05.6593   1 CREATE        "/etc/containers/registries.conf.d/test"
04:11:05.6599   2 WRITE         "/etc/containers/registries.conf.d/test"

# mv test /tmp

04:11:22.9017   3 RENAME        "/etc/containers/registries.conf.d/test"

# mv /tmp/test .

04:11:30.2516   4 CREATE        "/etc/containers/registries.conf.d/test"

# rm test

04:11:38.1324   5 REMOVE        "/etc/containers/registries.conf.d/test"

# vim test

04:13:33.7303   6 CREATE        "/etc/containers/registries.conf.d/.test.swp"
04:13:33.7304   7 CREATE        "/etc/containers/registries.conf.d/.test.swpx"
04:13:33.7306   8 REMOVE        "/etc/containers/registries.conf.d/.test.swpx"
04:13:33.7306   9 REMOVE        "/etc/containers/registries.conf.d/.test.swp"
04:13:33.7307  10 CREATE        "/etc/containers/registries.conf.d/.test.swp"
04:13:33.7307  11 WRITE         "/etc/containers/registries.conf.d/.test.swp"
04:13:37.7906  12 WRITE         "/etc/containers/registries.conf.d/.test.swp"
04:13:46.8045  13 WRITE         "/etc/containers/registries.conf.d/.test.swp"
04:13:49.8956  14 CREATE        "/etc/containers/registries.conf.d/test"
04:13:49.8958  15 WRITE         "/etc/containers/registries.conf.d/test"
04:13:49.8959  16 WRITE         "/etc/containers/registries.conf.d/test"
04:13:49.8971  17 WRITE         "/etc/containers/registries.conf.d/.test.swp"
04:13:53.9019  18 WRITE         "/etc/containers/registries.conf.d/.test.swp"
04:13:55.0267  19 CREATE        "/etc/containers/registries.conf.d/4913"
04:13:55.0268  20 CHMOD         "/etc/containers/registries.conf.d/4913"
04:13:55.0269  21 REMOVE        "/etc/containers/registries.conf.d/4913"
04:13:55.0271  22 RENAME        "/etc/containers/registries.conf.d/test"
04:13:55.0271  23 CREATE        "/etc/containers/registries.conf.d/test~"
04:13:55.0271  24 CREATE        "/etc/containers/registries.conf.d/test"
04:13:55.0272  25 WRITE         "/etc/containers/registries.conf.d/test"
04:13:55.0273  26 WRITE         "/etc/containers/registries.conf.d/test"
04:13:55.0280  27 CHMOD         "/etc/containers/registries.conf.d/test"
04:13:55.0281  28 REMOVE        "/etc/containers/registries.conf.d/test~"
04:13:55.0287  29 REMOVE        "/etc/containers/registries.conf.d/.test.swp"

# nano test

04:15:38.8292  38 CREATE        "/etc/containers/registries.conf.d/.test.swp"
04:15:38.8292  39 WRITE         "/etc/containers/registries.conf.d/.test.swp"
04:15:40.0551  40 REMOVE        "/etc/containers/registries.conf.d/.test.swp"
04:15:40.0551  41 CREATE        "/etc/containers/registries.conf.d/.test.swp"
04:15:40.0552  42 WRITE         "/etc/containers/registries.conf.d/.test.swp"
04:15:44.8382  43 CREATE        "/etc/containers/registries.conf.d/test"
04:15:44.8383  44 WRITE         "/etc/containers/registries.conf.d/test"
04:15:44.8393  45 REMOVE        "/etc/containers/registries.conf.d/.test.swp"

# touch test

04:16:35.3824  49 CHMOD         "/etc/containers/registries.conf.d/test"

# >test
# ln -f test test2
# date > test2
# ls -l

-rw-r--r-- 2 root root 32 Feb 21 04:18 test
-rw-r--r-- 2 root root 32 Feb 21 04:18 test2

04:18:06.0837  58 CREATE        "/etc/containers/registries.conf.d/test"
04:18:10.9388  59 CREATE        "/etc/containers/registries.conf.d/test2"
04:18:13.8936  60 WRITE         "/etc/containers/registries.conf.d/test2"
04:18:13.8940  61 WRITE         "/etc/containers/registries.conf.d/test2"

# ln -sf test2 test3
# date > test3
# ls -l

-rw-r--r-- 2 root root 32 Feb 21 04:18 test
-rw-r--r-- 2 root root 32 Feb 21 04:18 test2
lrwxrwxrwx 1 root root  5 Feb 21 04:19 test3 -> test2

04:19:06.4723  62 CREATE        "/etc/containers/registries.conf.d/test3"
04:19:16.3944  63 WRITE         "/etc/containers/registries.conf.d/test2"
04:19:16.3947  64 WRITE         "/etc/containers/registries.conf.d/test2"

Some of the above can be translated into popular use cases:

  • Editing a file with a text editor
  • Creating a file via output redirection (when the file does not exist before)
  • Wanting to trigger a file/configuration reload using touch to update file metadata
  • Moving a file to a different directory (not a delete, and yet the file will be removed from the current path)

You can also see that some operations on file on the user's end generate events that are unexpected.

However, the biggest problem is almost always having to deal with multiple events arriving on the same path. Are these duplicate events? Some might be, but some might be distinct events (for example, a large write split into smaller chunks).

Why is this important? The code, as it stands now, would keep reloading the registry configuration many times over in response to operation on even a single file. This might be undesirable—additional file reads will be done, additional objects will be created that need to be garbage-collected eventually, etc. There is also no lock or delay of any sort to prevent containers from being created and started while registries are being reloaded, and there might be a race here, albeit this is potentially the worst possible outcome and highly unlikely.

To deal with this sort of problem, one would have to implement event deduplication or events aggregator. or rate limiter so that multiple events for the same path are seen once or are staggered or throttled.

@sohankunkerkar
Copy link
Member Author

Why is this important? The code, as it stands now, would keep reloading the registry configuration many times over in response to operation on even a single file. This might be undesirable—additional file reads will be done, additional objects will be created that need to be garbage-collected eventually, etc. There is also no lock or delay of any sort to prevent containers from being created and started while registries are being reloaded, and there might be a race here, albeit this is potentially the worst possible outcome and highly unlikely.

To deal with this sort of problem, one would have to implement event deduplication or events aggregator. or rate limiter so that multiple events for the same path are seen once or are staggered or throttled.

Thanks for the feedback. I completely agree with your observations. The current implementation lacks event deduplication, and the absence of locks or delays could potentially lead to undesirable outcomes, though the likelihood is minimal. While we are here, we need to identify and document all potential use cases that the current implementation should handle (apart from the ones you listed), just to be on the same page and facilitate a quicker code-compile-debug cycle.

@sohankunkerkar sohankunkerkar marked this pull request as ready for review April 30, 2024 16:38
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2024
@openshift-ci openshift-ci bot requested a review from kwilczynski April 30, 2024 16:38
@kwilczynski
Copy link
Member

/approve
/lgtm

@kwilczynski
Copy link
Member

/retest

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2024
@sohankunkerkar
Copy link
Member Author

/retest

2 similar comments
@kwilczynski
Copy link
Member

/retest

@kwilczynski
Copy link
Member

/retest

Fixes cri-o#7748

This commit introduces automatic reloading of mirror registries configuration
upon detecting changes to registries.conf.d. It utilizes the fsnotify package
to monitor file system events (create, modify, delete) in the specified directory.
After detecting a change, a brief waiting period (eventSettleDuration) ensures
stability before triggering the reload, preventing unnecessary reloads and
enhancing efficiency.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 2, 2024
@haircommander
Copy link
Member

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 2, 2024
@haircommander haircommander added this to the 1.30 milestone May 2, 2024
Copy link
Contributor

openshift-ci bot commented May 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, kwilczynski, sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2024
@sohankunkerkar
Copy link
Member Author

/test ci-cgroupv2-integration

@haircommander
Copy link
Member

/retest

2 similar comments
@haircommander
Copy link
Member

/retest

@sohankunkerkar
Copy link
Member Author

/retest

@kwilczynski
Copy link
Member

/test e2e-gcp-ovn

@haircommander
Copy link
Member

/override ci/prow/e2e-gcp-ovn

Copy link
Contributor

openshift-ci bot commented May 3, 2024

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 67b59ab into cri-o:main May 3, 2024
69 of 71 checks passed
@sohankunkerkar sohankunkerkar deleted the automatic-reload branch May 3, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mirror configuration automatic reload
6 participants