New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: repo-server - prevent locked state after unclean git checkout #11805
Conversation
b3bbff9
to
700ab3a
Compare
Codecov ReportBase: 47.43% // Head: 47.44% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #11805 +/- ##
=======================================
Coverage 47.43% 47.44%
=======================================
Files 246 246
Lines 41826 41833 +7
=======================================
+ Hits 19840 19847 +7
Misses 19990 19990
Partials 1996 1996
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Would it be feasible to write a test that reproduces the exact problem? i.e. getting the cached repo into a bad state |
@crenshaw-dev that is possible: 57b6c2a I have mixed feelings about the dependency of this test on the timing of the git operations, which could lead to flakiness:
Let me know what you think. I can confirm that the changes from this PR (SIGTERM and waiting for the process to finish) make that test pass. If you're happy with the trade-off, I can cherry pick that commit with the test into this PR. A potential improvement (albeit much bigger) would be to run those git operations in a container with cgroupv2 IO limits in place. |
700ab3a
to
a85afcc
Compare
Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>
Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>
Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net>
a85afcc
to
d52ab8d
Compare
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @arturhoo!
To be extra cautious, I'd like to avoid cherry-picking this back any further than necessary until it's validated in someone's prod environment. I think I'll cherry-pick to 2.5. @harikrongali has an environment where he can validate.
If all looks good, we can cherry-pick it even further back.
Thanks for the approval @crenshaw-dev! @harikrongali and I actually work together at Lacework now (small world :P ), so I'm happy to get it validated in our prod installation. We're on v2.5, so if you think the cherry pick and a new patch release under 2.5 it is reasonable, getting it rolled out is quite straightforward for us. |
Sounds good, will merge and cherry-pick after CI passes. There's one other bugfix currently on release-2.5. I'll see if I can get anything else merged today and plan to cut a release tomorrow. |
Thanks @crenshaw-dev for merging the PR. We will validate in our env once 2.5.x is available. |
Cherry-picked onto release-2.6. The 2.5 cherry-pick bumps a lot of unrelated dependency versions. The downside of abstracting things out to a package with an uncoordinated release schedule I guess. Would y'all be okay leaving this for 2.6+, considering it goes GA on Monday? |
Yes, that works too! Thanks |
…7898) (#11805) * Pull in new version of argoproj/pkg Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net> * Allow timeout behavior to be specified in util/exec/exec Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net> * Git processes receive SIGTERM when timedout Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net> * Update util/exec/exec_test.go Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…rgoproj#7898) (argoproj#11805) * Pull in new version of argoproj/pkg Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net> * Allow timeout behavior to be specified in util/exec/exec Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net> * Git processes receive SIGTERM when timedout Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net> * Update util/exec/exec_test.go Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Artur Rodrigues <artur.rodrigues@lacework.net> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: schakrad <chakradari.sindhu@gmail.com>
Fixes #7898 - in busy systems an installation with a large number of apps defined in deep respositories, checkout operations can cross the defined
ARGOCD_EXEC_TIMEOUT
(default 90s). When that occurs, thegit
process receives aSIGKILL
with no chance of leaving the local repo checkout in a clean state.As a result, further git operations fail with "fatal: Unable to create '/private..git/index.lock': File exists".
Depends on argoproj/pkg#270 - now that a custom signal can be passed into to argoproj/pkg/exec, we can send
SIGTERM
instead and wait for git to exit cleanly before continuing with other operations.Discussion on slack: https://cloud-native.slack.com/archives/C01TSERG0KZ/p1670965919247029
Testing
util/exec/exec
module.I poked around a bit but didn't see a straight forward way to create an e2e test that simulates the behaviour described in Repo server unable to checkout commit #7898 (comment) - open to feedback thoughtest implemented on 57b6c2a probably flaky though