feat: Set a max recursion depth limit #11646

Joibel · 2023-08-22T18:49:53Z

This is a resurrection of #4193 to address #11499

Motivation

Intentional or unintentionally it is possible to write templates that indefinitely recurse. This change prevents that by monitoring the stack depth of templates, and stopping when it reaches 150 which seems big enough for anyone.

It can be disabled globally with a controller environment variable in case anyone comes across problems and we can address those problems with new fixes if they are reported.

Signed-off-by: Alan Clucas <alan@clucas.org>

docs/scaling.md

Co-authored-by: Anton Gilgur <4970083+agilgur5@users.noreply.github.com> Signed-off-by: Alan Clucas <alan@clucas.org>

isubasinghe

Generally looks good, I have a nit with the os.Getenv, but this is how the rest of the codebase does it so not a big deal.

The nit: os.Getenv is a syscall, this value may differ throughout the execution of the operator function. I can see both sides to this, perhaps this is desired behaviour. The functional programmer in me does not like it one bit though.

Regardless I've approved, grepping shows that argo commonly does this, so changing it just here won't change a thing.

Joibel · 2023-08-29T12:40:52Z

@isubasinghe if these env vars were used for anything 'non experimental' I'd suggest we read them at startup and provide them via a service, but that's a bit overkill for the current intended use.

isubasinghe · 2023-08-29T12:53:28Z

@Joibel yeah that is fair enough, I think my previous comment was largely motivated by my subjective sense of "niceness" rather than practicality.
I've approved it anyway.

terrytangyuan · 2023-08-30T01:00:11Z

workflow/controller/controller.go

@@ -67,6 +67,8 @@ import (
 	plugin "github.com/argoproj/argo-workflows/v3/workflow/util/plugins"
 )

+const defaultMaxStackDepth = 150


Would this be a reasonable default?

This number comes from #4193, where it started off as 500, and was amended to 150. #10785 removed 10 as the limit, but that feels unreasonably low and likely to cause issues with existing workflows.
Note, this is not actually a default because it's uncontrollable (again #4193 started with it configurable via the environment, but switched to a disable instead). I will change it's name.
My opinion is that 150 is too big, if I was plucking a number out of the air I'd set it lower at around 50. What's your guess at a good number @terrytangyuan?

Is there a number we can use where the controller is having trouble once the limit is reached?

I'm think by the time the controller is having trouble we've gone too far.
Without status offloading you'll probably have wasted a lot of resources but hit failure to store status before you get the controller in trouble. I don't think we should just be aiming to stop the controller getting into trouble.
I think this number should aim for a value which is reasonable for workflows already in existence so as few users as possible consider this a regression, but is otherwise low enough to prevent wasting time doing work which will be wasted because we're actually in an infinite recursion loop.
I'm going to propose reducing the 150 to 100 and see what happens in the real world, what do you think?

Sure. Also we might want to change the default to disabled to avoid breaking existing workflows.

My preference would be to leave it enabled as a saner default. Switching it 'later' doesn't really help with not breaking stuff, and I'm expecting this to go in a major not minor release (hence flagged as feat).

deHaar

I really like this kind of recursion prevention, we produced infinite recursion due to equal names of a step and an entrypoint. Spent a lot of time to stop/terminate/delete the workflow that kept calling its own entrypoint before we spent even more time finding out why this occurred.
This prevention would have saved us much time, so I think it should be merged to save time for others in future.

Signed-off-by: Alan Clucas <alan@clucas.org>

docs/environment-variables.md

docs/scaling.md

Co-authored-by: Yuan (Terry) Tang <terrytangyuan@gmail.com> Signed-off-by: Alan Clucas <alan@clucas.org>

Joibel force-pushed the max-depth branch from 3eabb5d to 7a8329d Compare August 22, 2023 19:29

agilgur5 added the area/controller Controller issues, panics label Aug 23, 2023

Joibel force-pushed the max-depth branch from 7a8329d to 81deeee Compare August 23, 2023 14:21

feat: Set a configurable max recursion depth limit

cad4f54

Signed-off-by: Alan Clucas <alan@clucas.org>

Joibel force-pushed the max-depth branch from 81deeee to cad4f54 Compare August 23, 2023 14:44

Merge branch 'master' into max-depth

04c99ab

agilgur5 reviewed Aug 23, 2023

View reviewed changes

docs/scaling.md Outdated Show resolved Hide resolved

Joibel and others added 2 commits August 23, 2023 21:40

Improve docs with a link

d0b4f76

Co-authored-by: Anton Gilgur <4970083+agilgur5@users.noreply.github.com> Signed-off-by: Alan Clucas <alan@clucas.org>

Merge branch 'master' into max-depth

a90dbd7

isubasinghe approved these changes Aug 29, 2023

View reviewed changes

Joibel mentioned this pull request Aug 29, 2023

Indirect infinite recursion doesn't get prevented #11499

Closed

3 tasks

terrytangyuan reviewed Aug 30, 2023

View reviewed changes

deHaar approved these changes Sep 5, 2023

View reviewed changes

Joibel added 2 commits September 5, 2023 21:34

Merge branch 'master' into max-depth

37f1afb

Rename max depth and change to 100

2d914e6

Signed-off-by: Alan Clucas <alan@clucas.org>

Joibel force-pushed the max-depth branch from d27282c to 2d914e6 Compare September 5, 2023 20:54

terrytangyuan enabled auto-merge (squash) September 5, 2023 21:19

terrytangyuan reviewed Sep 5, 2023

View reviewed changes

docs/environment-variables.md Outdated Show resolved Hide resolved

docs/scaling.md Outdated Show resolved Hide resolved

Joibel changed the title ~~feat: Set a configurable max recursion depth limit~~ feat: Set a max recursion depth limit Sep 5, 2023

auto-merge was automatically disabled September 5, 2023 21:21
Head branch was pushed to by a user without write access

Joibel and others added 2 commits September 5, 2023 22:21

Update docs/scaling.md

c76316f

Co-authored-by: Yuan (Terry) Tang <terrytangyuan@gmail.com> Signed-off-by: Alan Clucas <alan@clucas.org>

Update docs/environment-variables.md

8606203

Co-authored-by: Yuan (Terry) Tang <terrytangyuan@gmail.com> Signed-off-by: Alan Clucas <alan@clucas.org>

terrytangyuan approved these changes Sep 5, 2023

View reviewed changes

terrytangyuan enabled auto-merge (squash) September 5, 2023 21:37

terrytangyuan merged commit 633c5e9 into argoproj:master Sep 5, 2023
22 checks passed

agilgur5 mentioned this pull request Sep 6, 2023

fix(controller): Set a configurable max recursion depth limit. Closes: #4180 #4193

Closed

Joibel deleted the max-depth branch September 6, 2023 08:26

qudtjs0753 pushed a commit to qudtjs0753/argo-workflows that referenced this pull request Sep 6, 2023

feat: Set a max recursion depth limit (argoproj#11646)

5a873a4

agilgur5 mentioned this pull request Sep 12, 2023

Controller crashes if bad workflow (with infinite recursion) is run #9224

Closed

3 tasks

agilgur5 mentioned this pull request Nov 7, 2023

During upgrade: Maximum recursion depth exceeded error when not using recursion #12162

Closed

3 tasks

isubasinghe mentioned this pull request Jan 30, 2024

REQUEST: Promotion to Approver for @isubasinghe argoproj/argoproj#232

Closed

7 tasks

agilgur5 mentioned this pull request May 22, 2024

docs: update CHANGELOG.md for v3.4.17 #13043

Merged

Joibel mentioned this pull request Aug 15, 2024

refactor(cli): dynamically get versioned links to documentation #13455

Merged

agilgur5 mentioned this pull request Aug 31, 2024

fix: provide fallback for 3.4 to 3.5 transition with absent NodeFlag. Fixes #12162 #13504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Set a max recursion depth limit #11646

feat: Set a max recursion depth limit #11646

Joibel commented Aug 22, 2023 •

edited

Loading

isubasinghe left a comment

Joibel commented Aug 29, 2023

isubasinghe commented Aug 29, 2023

terrytangyuan Aug 30, 2023

Joibel Aug 31, 2023

terrytangyuan Sep 1, 2023 •

edited

Loading

Joibel Sep 5, 2023

terrytangyuan Sep 5, 2023

Joibel Sep 5, 2023

deHaar left a comment •

edited

Loading

feat: Set a max recursion depth limit #11646

feat: Set a max recursion depth limit #11646

Conversation

Joibel commented Aug 22, 2023 • edited Loading

Motivation

isubasinghe left a comment

Choose a reason for hiding this comment

Joibel commented Aug 29, 2023

isubasinghe commented Aug 29, 2023

terrytangyuan Aug 30, 2023

Choose a reason for hiding this comment

Joibel Aug 31, 2023

Choose a reason for hiding this comment

terrytangyuan Sep 1, 2023 • edited Loading

Choose a reason for hiding this comment

Joibel Sep 5, 2023

Choose a reason for hiding this comment

terrytangyuan Sep 5, 2023

Choose a reason for hiding this comment

Joibel Sep 5, 2023

Choose a reason for hiding this comment

deHaar left a comment • edited Loading

Choose a reason for hiding this comment

Joibel commented Aug 22, 2023 •

edited

Loading

terrytangyuan Sep 1, 2023 •

edited

Loading

deHaar left a comment •

edited

Loading