Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better exposure for client-side throttling #11387

Closed
juliev0 opened this issue Jul 18, 2023 · 7 comments · Fixed by #11437
Closed

Better exposure for client-side throttling #11387

juliev0 opened this issue Jul 18, 2023 · 7 comments · Fixed by #11437
Labels
good first issue Good for newcomers type/feature Feature request

Comments

@juliev0
Copy link
Contributor

juliev0 commented Jul 18, 2023

Summary

When the --qps and --burst are set too low on the Workflow Controller, this can create client-side throttling by the underlying client-go library. (It appears that this serves to slow down the messages going out as opposed to reject requests based on my testing.)

The only way currently to know this is happening is that the library spits out messages like:

I0525 09:00:21.037886       1 request.go:601] Waited for 7.090296384s due to client-side throttling, not priority and fairness, request: GET:https://10.100.0.1:443/apis/argoproj.io/v1alpha1/namespaces/argo/workflowtemplates/s2t

Ideally, maybe we'd have a warning-level message of our own or perhaps a metric to create greater exposure for the issue and help it to be more quickly identified.

Note that there is debug-level logging right now for messages in general here: https://github.com/argoproj/argo-workflows/blob/master/util/logs/log-k8s-requests.go#L20

Use Cases

When would you use this?


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

@juliev0 juliev0 added type/feature Feature request good first issue Good for newcomers labels Jul 18, 2023
Jack-R-lantern added a commit to Jack-R-lantern/argo-workflows that referenced this issue Jul 19, 2023
Expresses when throttling occurs on the
client side by setting the log message level to Warn.

fixes: argoproj#11387

Signed-off-by: Jack-R-lantern <tjdfkr2421@gmail.com>
@juliev0
Copy link
Contributor Author

juliev0 commented Jul 21, 2023

Just FYI, here is the place in the gomodule where this error message is being printed: https://github.com/kubernetes/client-go/blob/master/rest/request.go#L625

Jack-R-lantern added a commit to Jack-R-lantern/argo-workflows that referenced this issue Jul 22, 2023
Implemented logging of throttling.
when a delay of more than 5 seconds occurs.

Fixes: argoproj#11387 argoproj#11402

Signed-off-by: Jack-R-lantern <tjdfkr2421@gmail.com>
Jack-R-lantern added a commit to Jack-R-lantern/argo-workflows that referenced this issue Jul 22, 2023
Implemented logging of throttling.
when a delay of more than 5 seconds occurs.

Fixes: argoproj#11387 argoproj#11402

Signed-off-by: Jack-R-lantern <tjdfkr2421@gmail.com>
@Jack-R-lantern
Copy link
Contributor

@juliev0
I implemented it.
I'm testing with the following settings, but I don't think the test is accurate.
How can I make it accurate?

spec:
  tasks:
    - name: controller
      command: ./dist/workflow-controller --qps=1.0 --burst=1
      dependencies: install build-controller port-forward
      env:
        - ARGO_EXECUTOR_PLUGINS=false
        - ARGO_NAMESPACE=argo
        - ARGO_NAMESPACED=true
        - ARGO_MANAGED_NAMESPACE=argo
        - ARGO_LOG_LEVEL=info
        - ARGO_REMOVE_PVC_PROTECTION_FINALIZER=true
        - ARGO_PROGRESS_PATCH_TICK_DURATION=7s
        - DEFAULT_REQUEUE_TIME=1s
        - LEADER_ELECTION_IDENTITY=local
        - ALWAYS_OFFLOAD_NODE_STATUS=false
        - OFFLOAD_NODE_STATUS_TTL=30s
        - WORKFLOW_GC_PERIOD=30s
        - UPPERIO_DB_DEBUG=1
        - ARCHIVED_WORKFLOW_GC_PERIOD=30s
      ports: "9090"

@juliev0
Copy link
Contributor Author

juliev0 commented Jul 24, 2023

The code looks good. As far as your test, is that some sort of kit spec you're showing? Have you successfully run Argo Workflows locally before? If not, maybe just follow this page? I typically just run make clean and then make start. (don't need to change any config files)

@juliev0
Copy link
Contributor Author

juliev0 commented Jul 28, 2023

@Jack-R-lantern would you by chance be interested in adding a Controller metric for this as well? I can assist with any questions if needed.

@Jack-R-lantern
Copy link
Contributor

@juliev0 sure

@juliev0
Copy link
Contributor Author

juliev0 commented Jul 29, 2023

@Jack-R-lantern have you already started on that? 😬 My colleague is saying that the controller exposes kubernetes client metrics already and this might be one. I need to see. So sorry I didn’t realize.

@Jack-R-lantern
Copy link
Contributor

@juliev0 I'm fine, don't worry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers type/feature Feature request
Projects
None yet
2 participants