Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempo cluster sizing / capacity planning #1540

Open
pavolloffay opened this issue Jul 1, 2022 · 12 comments
Open

Tempo cluster sizing / capacity planning #1540

pavolloffay opened this issue Jul 1, 2022 · 12 comments
Labels
keepalive Label to exempt Issues / PRs from stale workflow type/docs Improvements or additions to documentation

Comments

@pavolloffay
Copy link
Contributor

Is your feature request related to a problem? Please describe.

I would like to know (approximately) Tempo cluster size and how many resources it will need for a given ingestion rate and retention - number of spans/time, average byte span size, retention N days (maybe I am missing some input parameters).

Such a document is useful when evaluating tempo from the cost perspective or capacity planning.

Describe the solution you'd like

Documentation on Tempo cluster sizing.

Describe alternatives you've considered

Run tests Tempo

Additional context

@mdisibio
Copy link
Contributor

mdisibio commented Jul 7, 2022

Hi, thanks for raising this issue, it's also something we've been thinking about. There are several different forms this tool could take, and some work to identify the important variables and formulas, definitely including the ones you mentioned. A document with approximate calculations is ok, but there is also a need for a more sophisticated and accurate tool, in Tempo and the other databases. See Mimir's discussion for reference. Tempo would likely adopt the same approach.

For now I can share some metrics from our internal clusters:

  • Config:
    • Deployment mode: distributed (microservices), this is assuming a workload large enough to need the same
    • replication factor: 3
    • block format: v2, lz4-1M
  • Cpu
    • 0.7 cpu core required per 1 MB/s ingested. 215 byte avg span size.
    • The cpu split is roughly:
    • Distributor: 27%
    • Ingester: 38%
    • Compactor: 16%
    • Querier: 5%
    • Metrics-generator: 7%
    • Other (cache, proxy, etc): 7%
  • Memory
    • ~2 GB required per 1 MB/s ingested, not counting cache (memcached, redis)
    • The mem split is roughly:
      • Distributor: 12%
      • Ingester: 50%
      • Compactor: 18%
      • Querier: 16%
      • Metrics-generator: 3%
      • Other: 1%
  • Storage
    • ~14.6 GB per day retention per 1 MB/s ingested.
    • This is ~6x compression ratio (~84GB ingested per day).

I'd expect these requirements to change over the next few releases as we add support for parquet blocks, likely increasing at first, but then stabilizing as we improve things.

@pavolloffay
Copy link
Contributor Author

Could you please describe what queries the test was doing? Is the lookback or time range affecting query resources? Was query part using functions or just scaled querier?

@pavolloffay
Copy link
Contributor Author

Does retention anyhow affect resource requirements?

@mdisibio
Copy link
Contributor

mdisibio commented Jul 8, 2022

Could you please describe what queries the test was doing? Is the lookback or time range affecting query resources? Was query part using functions or just scaled querier?

This was gathered from our own clusters which run real workloads and have a mixture of trace lookups and searches, and lookback of 1 or 24H, and using both querier pods and functions. Total querier resources is a function of data volume involved in a search. All queries are sharded into fixed-size sub-jobs, so a 2x time range will scan 2x data, and likewise a cluster with 2x volume across same time range. Scaling up pods or functions can keep latency down by executing more sub-jobs in parallel.

Does retention anyhow affect resource requirements?

Retention affects how many blocks exist, which mostly impacts latency and object store requests. Tempo reads a bloom filter per block, so 2x retention will issue 2x reads to object store. Latency can be controlled by scaling up queriers to check more bloom filters in parallel (and more recently making use of #1388). Increased block list also has a small but not significant increase in memory since block metadata including name/size/location is kept in memory.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Nov 14, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2022
@pavolloffay
Copy link
Contributor Author

@mdisibio could re-open this ticket and perhaps document the resources in the docs?

We have used values in this ticker in the Tempo Kubernetes operator and we would like to keep them updated if storage or other components change.

@mdisibio
Copy link
Contributor

mdisibio commented Dec 2, 2022

Got it, reopening. Expecting the requirements to change in Tempo 2.0 with TraceQL and full parquet, will gather new numbers then.

@mdisibio mdisibio reopened this Dec 2, 2022
@mdisibio mdisibio added keepalive Label to exempt Issues / PRs from stale workflow and removed stale Used for stale issues / PRs labels Dec 2, 2022
@mdisibio mdisibio added this to the v2.0 milestone Dec 2, 2022
@joe-elliott joe-elliott added the type/docs Improvements or additions to documentation label Dec 2, 2022
@joe-elliott
Copy link
Member

This will not block Tempo 2.0 from releasing so I'm moving it out of the v2.0 milestone.

@joe-elliott joe-elliott removed this from the v2.0 milestone Jan 24, 2023
@joe-elliott
Copy link
Member

Heads up to @electron0zero and @mapno that this issue exists. After you do your research please publish some guidelines for the community and close out this issue.

@knylander-grafana
Copy link
Contributor

I'm happy to add this information to the documentation when it's ready.

@knylander-grafana
Copy link
Contributor

See also #2836

@Jaland
Copy link

Jaland commented Oct 2, 2023

I have someone installing the operator on Openshift and we kept noticing an OOM error on our tempo-tracing-stack-query-front pod, but we were getting confused cause it was only using about half the memory requestion for the pod before hitting the CrashBackLoop.

After a little investigation, we noticed that the pod consists of two containers (tempo and tempo-query). It seems like the tempo-query container is doing 90% of the work and sucking up all the memory but for some reason the memory usage is split evenly between the pods so we OOM after only using half the memory as mentioned above.

It would probably be a better use of resources if tempo was just hard coded with a relatively low amount since it does not seem to be using much and maybe given like 2% cut of the rest of the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive Label to exempt Issues / PRs from stale workflow type/docs Improvements or additions to documentation
Projects
Status: Todo
Development

No branches or pull requests

5 participants