Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: add new option --allocator-list-timeout #15538

Conversation

ArthurChiao
Copy link
Contributor

This enables user to configure how long they would like to wait before
successfully listed objects from kvstore. Especially, this determines the
agent restart frequency when clustermesh is enabled and remote kvstore
has connection problems: too many restarts in really large clusters pose
significant pressures on both k8s apiserver, local and remote kvstores.

Signed-off-by: ArthurChiao arthurchiao@hotmail.com

@ArthurChiao ArthurChiao requested a review from a team April 1, 2021 13:11
@ArthurChiao ArthurChiao requested review from a team as code owners April 1, 2021 13:11
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 1, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Apr 1, 2021
@ArthurChiao ArthurChiao requested review from jibi and twpayne April 1, 2021 13:11
@ArthurChiao
Copy link
Contributor Author

ArthurChiao commented Apr 1, 2021

Hi there, please see the commit message for the purpose of this patch.

Any comments welcomed. Thanks!

Copy link
Member

@jibi jibi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Could you update the documentation with the new option please? Should be just a matter of running make -C Documentation update-cmdref check and mentioning it in the upgrade document.

@ArthurChiao ArthurChiao force-pushed the make_allocator_list_timeout_configurable_github branch from 9ccaddf to 607f788 Compare April 6, 2021 03:32
@ArthurChiao ArthurChiao requested a review from a team as a code owner April 6, 2021 03:32
@ArthurChiao ArthurChiao requested a review from qmonnet April 6, 2021 03:32
@ArthurChiao
Copy link
Contributor Author

Documentation updated, thanks! @jibi

@ArthurChiao ArthurChiao requested a review from jibi April 6, 2021 06:48
@jibi jibi added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Apr 6, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 6, 2021
@jibi jibi added the area/daemon Impacts operation of the Cilium daemon. label Apr 6, 2021
@jibi jibi closed this Apr 6, 2021
@jibi jibi reopened this Apr 6, 2021
@jibi jibi closed this Apr 6, 2021
@jibi jibi reopened this Apr 6, 2021
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@jibi
Copy link
Member

jibi commented Apr 6, 2021

@ArthurChiao there seems to be some issues with the CI, could you rebase your PR on top of master?

@qmonnet
Copy link
Member

qmonnet commented Apr 6, 2021

The Travis error seems related?

level=fatal msg="Timeout while waiting for initial allocator state" subsys=allocator

@ArthurChiao
Copy link
Contributor Author

I'll take a look, thank you!

@ArthurChiao ArthurChiao force-pushed the make_allocator_list_timeout_configurable_github branch from 607f788 to 92604aa Compare April 6, 2021 10:08
@ArthurChiao ArthurChiao force-pushed the make_allocator_list_timeout_configurable_github branch from 92604aa to d15ab81 Compare April 6, 2021 13:54
@ArthurChiao
Copy link
Contributor Author

The Travis error seems related?

level=fatal msg="Timeout while waiting for initial allocator state" subsys=allocator

Yes, unittests skipped the option.Populate() method, which leaved the option.Config.AllocatorListTimeout un-initialized. Fixed it by initializing the field when creating option.Config object.

@ArthurChiao ArthurChiao force-pushed the make_allocator_list_timeout_configurable_github branch from d15ab81 to dda95eb Compare April 6, 2021 14:21
@jibi
Copy link
Member

jibi commented Apr 6, 2021

test-me-please

@ArthurChiao ArthurChiao force-pushed the make_allocator_list_timeout_configurable_github branch from dda95eb to de93f58 Compare April 7, 2021 00:39
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had reviewed this PR but I guess not... @ArthurChiao the clustermesh-apiserver/main.go likely needs this flag as well since it also uses the pkg/allocator/allocator.go cc @jrajahalme

@ArthurChiao
Copy link
Contributor Author

I'll take a look, thanks! @aanm

This enables user to configure how long they would like to wait before
successfully listed objects from kvstore. Especially, this determines the
agent restart frequency when clustermesh is enabled and remote kvstore
has connection problems: too many restarts in really large clusters pose
significant pressures on both k8s apiserver, local and remote kvstores.

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
@ArthurChiao ArthurChiao force-pushed the make_allocator_list_timeout_configurable_github branch from de93f58 to 3a9c0af Compare April 7, 2021 11:53
@ArthurChiao
Copy link
Contributor Author

Checked that the calling stack is: NewVMManager() -> m.identityAllocator.InitIdentityAllocator() -> allocator.NewAllocator().

Code updated, please have a review, thanks! @aanm

@ArthurChiao ArthurChiao requested a review from aanm April 7, 2021 12:15
@pchaigno
Copy link
Member

pchaigno commented Apr 7, 2021

test-me-please

@aanm aanm merged commit 051032c into cilium:master Apr 7, 2021
1.10.0 automation moved this from In progress to Done Apr 7, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 8, 2021
@ArthurChiao ArthurChiao deleted the make_allocator_list_timeout_configurable_github branch April 8, 2021 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants