Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daprd occupies too much memory #6581

Closed
Hsuwen opened this issue Jun 26, 2023 · 23 comments
Closed

daprd occupies too much memory #6581

Hsuwen opened this issue Jun 26, 2023 · 23 comments
Labels
stale Issues and PRs without response

Comments

@Hsuwen
Copy link

Hsuwen commented Jun 26, 2023

In my environment, everything runs normally with dapr, but sidecar (daprd) occupies a lot of memory, which is much more than 5 times the memory of my application.

My environment:
Server: AWS EKS version 1.27
Dapr CLI: 1.11
Dapr Runtime: 1.11
Dapr SDK: dot-net 1.10
Dapr Components: redis(statestore), rabbitmq(pubsub)
Other: zipkin, middleware.http.ratelimit, middleware.http.routeralias

  NAME                   NAMESPACE    HEALTHY  STATUS                      REPLICAS  VERSION  AGE  CREATED
  dapr-sidecar-injector  dapr-system  True     Running                     1         1.11.0   3d   2023-06-22 10:12.40
  dapr-sentry            dapr-system  True     Running                     1         1.11.0   3d   2023-06-22 10:12.40
  dapr-operator          dapr-system  True     Running                     1         1.11.0   3d   2023-06-22 10:12.40
  dapr-placement-server  dapr-system  True     Running                     1         1.11.0   3d   2023-06-22 10:12.42

For example, one of the applications, iaas config axis, occupies 661Mi of sidecar memory, while my program only has 89Mi. I think this is abnormal. Usually, whenever my application processes a request, DAPRD will also accept and process a request, and the number of DAPRD requests will not exceed five times the number of requests in my program.

POD                                             NAME                           CPU(cores)   MEMORY(bytes)
iaas-config-axe-7b7796c8-s5gn8                  daprd                          4m           661Mi
iaas-config-axe-7b7796c8-s5gn8                  iaas-config-axe                2m           89Mi

Moreover, this application does not have any service invocation, state storage, event publications/subscriptions. Even so, I have never used DaprClient in my code, just injected it into the dapr environment.

Of course, I have already specified relevant annotations for this application with reference to Production guidelines.

dapr.io/sidecar-liveness-probe-delay-seconds: "10"
dapr.io/sidecar-readiness-probe-delay-seconds: "10"
dapr.io/sidecar-cpu-limit: "300m"
dapr.io/sidecar-memory-limits: "1000M"
dapr.io/sidecar-cpu-request: "100m"
dapr.io/sidecar-memory-request: "250M"

Not only does the sidecar of this application have memory issues, but there are also others. Here, I gave one of the most representative examples.
Do you have any suggestions regarding this issue? How should I investigate and solve it?

@Hsuwen
Copy link
Author

Hsuwen commented Jun 26, 2023

It is worth adding that the request volume of this application is also very small, with an average of 5-10 requests every 10 seconds. The above memory usage will continue to increase with the running time (more like a memory leak), and will not decrease significantly.

@yaron2
Copy link
Member

yaron2 commented Jun 26, 2023

Which features/APIs in Dapr are you using?

@Hsuwen
Copy link
Author

Hsuwen commented Jun 26, 2023

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

@yaron2
Copy link
Member

yaron2 commented Jun 26, 2023

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

You stated you are loading Redis state and RabbtMQ pub/sub. If you remove these component from the namespace do you observe any changes in memory consumption? Also, can you please paste the logs of the daprd container?

@Hsuwen
Copy link
Author

Hsuwen commented Jun 27, 2023

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

You stated you are loading Redis state and RabbtMQ pub/sub. If you remove these component from the namespace do you observe any changes in memory consumption? Also, can you please paste the logs of the daprd container?

I can’t remove components because this is live environment. I checked this pod on dapr-dashboard, don’t have any log with two containers(log level is warn), and kubectl logs command too.

@yaron2
Copy link
Member

yaron2 commented Jun 27, 2023

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

Is the only usage of the daprd container in this environment is to consume messages from RabbitMQ and deliver them to the app? Also, are you able to reproduce this in a non-live environment with the same load?

@Hsuwen
Copy link
Author

Hsuwen commented Jun 27, 2023

Which features/APIs in Dapr are you using?

I don’t use any api/daprclient in this app code.

Is the only usage of the daprd container in this environment is to consume messages from RabbitMQ and deliver them to the app? Also, are you able to reproduce this in a non-live environment with the same load?

In my cluster, dapr includes some components (rabbitmq, redis, etc.). The pod (iaas-config-axe) example in this case did not make any calls to the dapr component. The 'iaas config axis' only injects the dapr sidecar, in order to allow other services (pods) to make service invocation through the dapr sdk.
It's like you wrote the simplest HTTP interface and only enabled dapr in yaml's annotations.

@Hsuwen
Copy link
Author

Hsuwen commented Jul 3, 2023

A similar problem has arisen again,

POD                                            NAME                           CPU(cores)   MEMORY(bytes)
worker-transaction-automatic-55d5869c6-wplqb   daprd                          22m          442Mi
worker-transaction-automatic-55d5869c6-wplqb   worker-transaction-automatic   5m           93Mi

This pod only runs for 23 hours. Sidecar (dapr) occupies almost five times the memory of the app.

worker-transaction-automatic-55d5869c6-wplqb   2/2     Running   0              23h

This pod uses service invocation and state storage.

@Hsuwen
Copy link
Author

Hsuwen commented Jul 5, 2023

I tried to turn off metric, and the memory is working properly.

apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  name: common
spec:
  metric:
    enabled: false
  ...
  ...

The following pods are the occupancy status after running for 18 hours (experiencing peak traffic).

POD                                             NAME                           CPU(cores)   MEMORY(bytes)
mysql-57f897dd66-hm57v                          mysql                          2m           366Mi
zipkin-c789dd5b8-mt5wc                          zipkin                         1m           197Mi
payany-rpc-transaction-5bddc895fd-9cczz         payany-rpc-transaction         3m           133Mi
payany-rpc-transaction-5bddc895fd-9cczz         daprd                          4m           39Mi
payany-rpc-risk-7b676848db-kjwv6                payany-rpc-risk                1m           137Mi
payany-rpc-risk-7b676848db-kjwv6                daprd                          2m           27Mi
payany-rpc-transaction-5bddc895fd-n9hkp         payany-rpc-transaction         4m           124Mi
payany-rpc-transaction-5bddc895fd-n9hkp         daprd                          4m           39Mi
payany-rpc-transaction-5bddc895fd-gm4zk         payany-rpc-transaction         3m           120Mi
payany-rpc-transaction-5bddc895fd-gm4zk         daprd                          5m           39Mi
payany-rpc-merchant-587fd4964b-gfbrc            payany-rpc-merchant            3m           122Mi
payany-rpc-merchant-587fd4964b-gfbrc            daprd                          3m           36Mi
payany-rpc-transaction-5bddc895fd-25nzn         payany-rpc-transaction         5m           120Mi
payany-rpc-transaction-5bddc895fd-25nzn         daprd                          4m           38Mi
payany-rpc-risk-7b676848db-2gf69                payany-rpc-risk                4m           129Mi
payany-rpc-risk-7b676848db-2gf69                daprd                          2m           27Mi
rabbitmq-c686d6c4-f6k2v                         rabbitmq                       9m           154Mi
payany-gateway-merchant-d4bf85bd8-25ssk         payany-gateway-merchant        2m           114Mi
payany-gateway-merchant-d4bf85bd8-25ssk         daprd                          3m           36Mi
payany-gateway-cashier-b67db86d8-6w4k4          payany-gateway-cashier         2m           103Mi
payany-gateway-cashier-b67db86d8-6w4k4          daprd                          7m           47Mi
payany-gateway-cashier-b67db86d8-95fkc          payany-gateway-cashier         2m           103Mi
payany-gateway-cashier-b67db86d8-95fkc          daprd                          5m           47Mi
payany-rpc-payment-695f85d59d-tddbb             payany-rpc-payment             5m           111Mi
payany-rpc-payment-695f85d59d-tddbb             daprd                          4m           37Mi
payany-gateway-manager-86bbfc8db4-fvzjv         payany-gateway-manager         2m           108Mi
payany-gateway-manager-86bbfc8db4-fvzjv         daprd                          4m           39Mi
payany-rpc-payment-695f85d59d-9cspr             payany-rpc-payment             3m           110Mi
payany-rpc-payment-695f85d59d-9cspr             daprd                          4m           37Mi
payany-gateway-cashier-b67db86d8-89cwb          payany-gateway-cashier         2m           99Mi
payany-gateway-cashier-b67db86d8-89cwb          daprd                          5m           47Mi
payany-gateway-cashier-b67db86d8-4n6lh          payany-gateway-cashier         1m           98Mi
payany-gateway-cashier-b67db86d8-4n6lh          daprd                          7m           47Mi
iaas-l10n-brick-57f7685d66-ql5j6                iaas-l10n-brick                1m           117Mi
iaas-l10n-brick-57f7685d66-ql5j6                daprd                          2m           27Mi

So, is metric collected in the memory of the sidecar and only removed from memory when it is collected? My cluster does not use Prometheus, and metric is enabled by default.

@ItalyPaleAle
Copy link
Contributor

@Hsuwen The metrics collector does require additional memory. It's very interesting that it's having such a large impact for you however. Probably something we should investigate

@Hsuwen
Copy link
Author

Hsuwen commented Jul 5, 2023

@Hsuwen The metrics collector does require additional memory. It's very interesting that it's having such a large impact for you however. Probably something we should investigate

I will continue to observe the resource usage of pods.
You can share your troubleshooting ideas with me, and I can also provide necessary information for analysis as much as possible.
Thanks.

@yaron2
Copy link
Member

yaron2 commented Jul 5, 2023

@Hsuwen do you have high cardinality URLs in your system? for example /users/<user-id>

@Hsuwen
Copy link
Author

Hsuwen commented Jul 6, 2023

@Hsuwen do you have high cardinality URLs in your system? for example /users/<user-id>

@yaron2 Sry, I don't known 'high cardinality URLs'. But about /users/<user-id> , this style should exist in the vast majority of systems, similar to the restful style. Indeed, I have many urls with this style.

@Hsuwen Hsuwen closed this as not planned Won't fix, can't repro, duplicate, stale Jul 17, 2023
@ItalyPaleAle
Copy link
Contributor

We are getting similar reports from other users. I am thinking of re-opening this as it seems to be having a broad impact.

CC: @yaron2

@ItalyPaleAle ItalyPaleAle reopened this Jul 20, 2023
@yaron2
Copy link
Member

yaron2 commented Jul 20, 2023

I'm getting similar reports

@denniszielke
Copy link
Member

Is there something that can be done from the app side to remove the effect? Like rewriting the URLs to query params instead of path?

@yaron2
Copy link
Member

yaron2 commented Jul 20, 2023

Is there something that can be done from the app side to remove the effect? Like rewriting the URLs to query params instead of path?

It would help much if we could rule out high cardinality metrics. If you can disable metrics all together and report if memory's exhibiting normal usage patterns that would be great.

@Hsuwen
Copy link
Author

Hsuwen commented Jul 21, 2023

I can provide all necessary information without affecting the normal use of the production environment. Please tell me exactly how to do it.

@Hsuwen
Copy link
Author

Hsuwen commented Jul 31, 2023

About this issue, I found that the docs has been updated.
https://docs.dapr.io/operations/monitoring/metrics/metrics-overview/#high-cardinality-metrics

Several details are still not very clear:

  1. What specific metrics will be affected by this will determine which metrics I should set
  2. How to filter by appid in the configuration of metric.rules

Regarding point 2:
If there are paths like /users/<id> in both app1 and app2, but only app1 needs to be set, how can we do this? Of course, it is possible to configure each app separately, but this will increase complexity.
What I mean is, can we do this:

  metric:
    enabled: false
    rules:
    - name: dapr_runtime_service_invocation_req_sent_total
      labels:
      - name: method
        appid: app1
        regex:
          "users/": "users/.+"

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label Sep 29, 2023
@ItalyPaleAle
Copy link
Contributor

Still active, should be fixed by #6723

@dapr-bot dapr-bot removed the stale Issues and PRs without response label Sep 29, 2023
@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label Nov 28, 2023
@dapr-bot
Copy link
Collaborator

dapr-bot commented Dec 5, 2023

This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

@dapr-bot dapr-bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues and PRs without response
Projects
None yet
Development

No branches or pull requests

5 participants