daprd occupies too much memory #6581

Hsuwen · 2023-06-26T07:07:45Z

In my environment, everything runs normally with dapr, but sidecar (daprd) occupies a lot of memory, which is much more than 5 times the memory of my application.

My environment:
Server: AWS EKS version 1.27
Dapr CLI: 1.11
Dapr Runtime: 1.11
Dapr SDK: dot-net 1.10
Dapr Components: redis(statestore), rabbitmq(pubsub)
Other: zipkin, middleware.http.ratelimit, middleware.http.routeralias

  NAME                   NAMESPACE    HEALTHY  STATUS                      REPLICAS  VERSION  AGE  CREATED
  dapr-sidecar-injector  dapr-system  True     Running                     1         1.11.0   3d   2023-06-22 10:12.40
  dapr-sentry            dapr-system  True     Running                     1         1.11.0   3d   2023-06-22 10:12.40
  dapr-operator          dapr-system  True     Running                     1         1.11.0   3d   2023-06-22 10:12.40
  dapr-placement-server  dapr-system  True     Running                     1         1.11.0   3d   2023-06-22 10:12.42

For example, one of the applications, iaas config axis, occupies 661Mi of sidecar memory, while my program only has 89Mi. I think this is abnormal. Usually, whenever my application processes a request, DAPRD will also accept and process a request, and the number of DAPRD requests will not exceed five times the number of requests in my program.

POD                                             NAME                           CPU(cores)   MEMORY(bytes)
iaas-config-axe-7b7796c8-s5gn8                  daprd                          4m           661Mi
iaas-config-axe-7b7796c8-s5gn8                  iaas-config-axe                2m           89Mi

Moreover, this application does not have any service invocation, state storage, event publications/subscriptions. Even so, I have never used DaprClient in my code, just injected it into the dapr environment.

Of course, I have already specified relevant annotations for this application with reference to Production guidelines.

dapr.io/sidecar-liveness-probe-delay-seconds: "10"
dapr.io/sidecar-readiness-probe-delay-seconds: "10"
dapr.io/sidecar-cpu-limit: "300m"
dapr.io/sidecar-memory-limits: "1000M"
dapr.io/sidecar-cpu-request: "100m"
dapr.io/sidecar-memory-request: "250M"

Not only does the sidecar of this application have memory issues, but there are also others. Here, I gave one of the most representative examples.
Do you have any suggestions regarding this issue? How should I investigate and solve it?

The text was updated successfully, but these errors were encountered:

Hsuwen · 2023-06-26T07:12:51Z

It is worth adding that the request volume of this application is also very small, with an average of 5-10 requests every 10 seconds. The above memory usage will continue to increase with the running time (more like a memory leak), and will not decrease significantly.

yaron2 · 2023-06-26T22:20:28Z

Which features/APIs in Dapr are you using?

Hsuwen · 2023-06-26T23:45:04Z

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

yaron2 · 2023-06-26T23:55:10Z

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

You stated you are loading Redis state and RabbtMQ pub/sub. If you remove these component from the namespace do you observe any changes in memory consumption? Also, can you please paste the logs of the daprd container?

Hsuwen · 2023-06-27T00:26:02Z

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

You stated you are loading Redis state and RabbtMQ pub/sub. If you remove these component from the namespace do you observe any changes in memory consumption? Also, can you please paste the logs of the daprd container?

I can’t remove components because this is live environment. I checked this pod on dapr-dashboard, don’t have any log with two containers(log level is warn), and kubectl logs command too.

yaron2 · 2023-06-27T13:56:26Z

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

Is the only usage of the daprd container in this environment is to consume messages from RabbitMQ and deliver them to the app? Also, are you able to reproduce this in a non-live environment with the same load?

Hsuwen · 2023-06-27T14:59:54Z

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

Is the only usage of the daprd container in this environment is to consume messages from RabbitMQ and deliver them to the app? Also, are you able to reproduce this in a non-live environment with the same load?

In my cluster, dapr includes some components (rabbitmq, redis, etc.). The pod (iaas-config-axe) example in this case did not make any calls to the dapr component. The 'iaas config axis' only injects the dapr sidecar, in order to allow other services (pods) to make service invocation through the dapr sdk.
It's like you wrote the simplest HTTP interface and only enabled dapr in yaml's annotations.

Hsuwen · 2023-07-03T15:16:42Z

A similar problem has arisen again,

POD                                            NAME                           CPU(cores)   MEMORY(bytes)
worker-transaction-automatic-55d5869c6-wplqb   daprd                          22m          442Mi
worker-transaction-automatic-55d5869c6-wplqb   worker-transaction-automatic   5m           93Mi

This pod only runs for 23 hours. Sidecar (dapr) occupies almost five times the memory of the app.

worker-transaction-automatic-55d5869c6-wplqb   2/2     Running   0              23h

This pod uses service invocation and state storage.

Hsuwen · 2023-07-05T05:17:56Z

I tried to turn off metric, and the memory is working properly.

apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  name: common
spec:
  metric:
    enabled: false
  ...
  ...

The following pods are the occupancy status after running for 18 hours (experiencing peak traffic).

POD                                             NAME                           CPU(cores)   MEMORY(bytes)
mysql-57f897dd66-hm57v                          mysql                          2m           366Mi
zipkin-c789dd5b8-mt5wc                          zipkin                         1m           197Mi
payany-rpc-transaction-5bddc895fd-9cczz         payany-rpc-transaction         3m           133Mi
payany-rpc-transaction-5bddc895fd-9cczz         daprd                          4m           39Mi
payany-rpc-risk-7b676848db-kjwv6                payany-rpc-risk                1m           137Mi
payany-rpc-risk-7b676848db-kjwv6                daprd                          2m           27Mi
payany-rpc-transaction-5bddc895fd-n9hkp         payany-rpc-transaction         4m           124Mi
payany-rpc-transaction-5bddc895fd-n9hkp         daprd                          4m           39Mi
payany-rpc-transaction-5bddc895fd-gm4zk         payany-rpc-transaction         3m           120Mi
payany-rpc-transaction-5bddc895fd-gm4zk         daprd                          5m           39Mi
payany-rpc-merchant-587fd4964b-gfbrc            payany-rpc-merchant            3m           122Mi
payany-rpc-merchant-587fd4964b-gfbrc            daprd                          3m           36Mi
payany-rpc-transaction-5bddc895fd-25nzn         payany-rpc-transaction         5m           120Mi
payany-rpc-transaction-5bddc895fd-25nzn         daprd                          4m           38Mi
payany-rpc-risk-7b676848db-2gf69                payany-rpc-risk                4m           129Mi
payany-rpc-risk-7b676848db-2gf69                daprd                          2m           27Mi
rabbitmq-c686d6c4-f6k2v                         rabbitmq                       9m           154Mi
payany-gateway-merchant-d4bf85bd8-25ssk         payany-gateway-merchant        2m           114Mi
payany-gateway-merchant-d4bf85bd8-25ssk         daprd                          3m           36Mi
payany-gateway-cashier-b67db86d8-6w4k4          payany-gateway-cashier         2m           103Mi
payany-gateway-cashier-b67db86d8-6w4k4          daprd                          7m           47Mi
payany-gateway-cashier-b67db86d8-95fkc          payany-gateway-cashier         2m           103Mi
payany-gateway-cashier-b67db86d8-95fkc          daprd                          5m           47Mi
payany-rpc-payment-695f85d59d-tddbb             payany-rpc-payment             5m           111Mi
payany-rpc-payment-695f85d59d-tddbb             daprd                          4m           37Mi
payany-gateway-manager-86bbfc8db4-fvzjv         payany-gateway-manager         2m           108Mi
payany-gateway-manager-86bbfc8db4-fvzjv         daprd                          4m           39Mi
payany-rpc-payment-695f85d59d-9cspr             payany-rpc-payment             3m           110Mi
payany-rpc-payment-695f85d59d-9cspr             daprd                          4m           37Mi
payany-gateway-cashier-b67db86d8-89cwb          payany-gateway-cashier         2m           99Mi
payany-gateway-cashier-b67db86d8-89cwb          daprd                          5m           47Mi
payany-gateway-cashier-b67db86d8-4n6lh          payany-gateway-cashier         1m           98Mi
payany-gateway-cashier-b67db86d8-4n6lh          daprd                          7m           47Mi
iaas-l10n-brick-57f7685d66-ql5j6                iaas-l10n-brick                1m           117Mi
iaas-l10n-brick-57f7685d66-ql5j6                daprd                          2m           27Mi

So, is metric collected in the memory of the sidecar and only removed from memory when it is collected? My cluster does not use Prometheus, and metric is enabled by default.

ItalyPaleAle · 2023-07-05T06:07:09Z

@Hsuwen The metrics collector does require additional memory. It's very interesting that it's having such a large impact for you however. Probably something we should investigate

Hsuwen · 2023-07-05T06:34:00Z

@Hsuwen The metrics collector does require additional memory. It's very interesting that it's having such a large impact for you however. Probably something we should investigate

I will continue to observe the resource usage of pods.
You can share your troubleshooting ideas with me, and I can also provide necessary information for analysis as much as possible.
Thanks.

yaron2 · 2023-07-05T16:55:10Z

@Hsuwen do you have high cardinality URLs in your system? for example /users/<user-id>

Hsuwen · 2023-07-06T09:05:56Z

@Hsuwen do you have high cardinality URLs in your system? for example /users/<user-id>

@yaron2 Sry, I don't known 'high cardinality URLs'. But about /users/<user-id> , this style should exist in the vast majority of systems, similar to the restful style. Indeed, I have many urls with this style.

ItalyPaleAle · 2023-07-20T20:22:37Z

We are getting similar reports from other users. I am thinking of re-opening this as it seems to be having a broad impact.

CC: @yaron2

yaron2 · 2023-07-20T20:32:58Z

I'm getting similar reports

denniszielke · 2023-07-20T20:42:00Z

Is there something that can be done from the app side to remove the effect? Like rewriting the URLs to query params instead of path?

yaron2 · 2023-07-20T20:48:09Z

Is there something that can be done from the app side to remove the effect? Like rewriting the URLs to query params instead of path?

It would help much if we could rule out high cardinality metrics. If you can disable metrics all together and report if memory's exhibiting normal usage patterns that would be great.

Hsuwen · 2023-07-21T00:32:24Z

I can provide all necessary information without affecting the normal use of the production environment. Please tell me exactly how to do it.

Hsuwen · 2023-07-31T04:18:32Z

About this issue, I found that the docs has been updated.
https://docs.dapr.io/operations/monitoring/metrics/metrics-overview/#high-cardinality-metrics

Several details are still not very clear:

What specific metrics will be affected by this will determine which metrics I should set
How to filter by appid in the configuration of metric.rules

Regarding point 2:
If there are paths like /users/<id> in both app1 and app2, but only app1 needs to be set, how can we do this? Of course, it is possible to configure each app separately, but this will increase complexity.
What I mean is, can we do this:

  metric:
    enabled: false
    rules:
    - name: dapr_runtime_service_invocation_req_sent_total
      labels:
      - name: method
        appid: app1
        regex:
          "users/": "users/.+"

dapr-bot · 2023-09-29T04:24:29Z

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

ItalyPaleAle · 2023-09-29T04:33:34Z

Still active, should be fixed by #6723

dapr-bot · 2023-11-28T04:51:39Z

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

dapr-bot · 2023-12-05T05:02:07Z

This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

Hsuwen closed this as not planned Won't fix, can't repro, duplicate, stale Jul 17, 2023

ItalyPaleAle reopened this Jul 20, 2023

ItalyPaleAle mentioned this issue Jul 25, 2023

Reduce cardinality in Dapr metrics and add more information to API logs #6723

Closed

dapr-bot added the stale Issues and PRs without response label Sep 29, 2023

dapr-bot removed the stale Issues and PRs without response label Sep 29, 2023

dapr-bot added the stale Issues and PRs without response label Nov 28, 2023

dapr-bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2023

ItalyPaleAle mentioned this issue May 15, 2024

Low cardinality metrics issues #7719

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daprd occupies too much memory #6581

daprd occupies too much memory #6581

Hsuwen commented Jun 26, 2023

Hsuwen commented Jun 26, 2023

yaron2 commented Jun 26, 2023

Hsuwen commented Jun 26, 2023

yaron2 commented Jun 26, 2023

Hsuwen commented Jun 27, 2023

yaron2 commented Jun 27, 2023

Hsuwen commented Jun 27, 2023

Hsuwen commented Jul 3, 2023

Hsuwen commented Jul 5, 2023

ItalyPaleAle commented Jul 5, 2023

Hsuwen commented Jul 5, 2023

yaron2 commented Jul 5, 2023

Hsuwen commented Jul 6, 2023

ItalyPaleAle commented Jul 20, 2023

yaron2 commented Jul 20, 2023

denniszielke commented Jul 20, 2023

yaron2 commented Jul 20, 2023

Hsuwen commented Jul 21, 2023

Hsuwen commented Jul 31, 2023 •

edited

dapr-bot commented Sep 29, 2023

ItalyPaleAle commented Sep 29, 2023

dapr-bot commented Nov 28, 2023

dapr-bot commented Dec 5, 2023

daprd occupies too much memory #6581

daprd occupies too much memory #6581

Comments

Hsuwen commented Jun 26, 2023

Hsuwen commented Jun 26, 2023

yaron2 commented Jun 26, 2023

Hsuwen commented Jun 26, 2023

yaron2 commented Jun 26, 2023

Hsuwen commented Jun 27, 2023

yaron2 commented Jun 27, 2023

Hsuwen commented Jun 27, 2023

Hsuwen commented Jul 3, 2023

Hsuwen commented Jul 5, 2023

ItalyPaleAle commented Jul 5, 2023

Hsuwen commented Jul 5, 2023

yaron2 commented Jul 5, 2023

Hsuwen commented Jul 6, 2023

ItalyPaleAle commented Jul 20, 2023

yaron2 commented Jul 20, 2023

denniszielke commented Jul 20, 2023

yaron2 commented Jul 20, 2023

Hsuwen commented Jul 21, 2023

Hsuwen commented Jul 31, 2023 • edited

dapr-bot commented Sep 29, 2023

ItalyPaleAle commented Sep 29, 2023

dapr-bot commented Nov 28, 2023

dapr-bot commented Dec 5, 2023

Hsuwen commented Jul 31, 2023 •

edited