Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.8 is ~3x slower than v1.7.x #34984

Closed
y3ti opened this issue Apr 12, 2024 · 16 comments
Closed

v1.8 is ~3x slower than v1.7.x #34984

y3ti opened this issue Apr 12, 2024 · 16 comments
Assignees
Labels
bug new new issue not yet triaged v1.8 Issues (primarily bugs) reported against v1.8 releases waiting for reproduction unable to reproduce issue without further information

Comments

@y3ti
Copy link

y3ti commented Apr 12, 2024

Terraform Version

$ terraform --version
Terraform v1.8.0
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.39.1
+ provider registry.terraform.io/hashicorp/local v2.4.1
+ provider registry.terraform.io/hashicorp/null v3.2.2
+ provider registry.terraform.io/hashicorp/random v3.6.0

Terraform Configuration Files

...terraform config...

Debug Output

n/a

Expected Behavior

$ time terraform validate
Success! The configuration is valid.


real	0m1.452s
user	0m2.379s
sys	0m0.318s

Actual Behavior

$ time terraform validate
Success! The configuration is valid.


real	0m9.720s
user	0m2.701s
sys	0m0.444s

Steps to Reproduce

  1. git clone https://github.com/philips-labs/terraform-aws-github-runner.git
  2. cd terraform-aws-github-runner/examples/multi-runner/
  3. time terraform validate

Additional Context

Terraform v1.8 is ~3x slower than v1.7 and consumes ~2,5x more memory.

References

No response

@y3ti y3ti added bug new new issue not yet triaged labels Apr 12, 2024
@jbardin
Copy link
Member

jbardin commented Apr 12, 2024

Hi @y3ti,

Thanks for filing the issue! Can you provide some more detail around how you are measuring the change?
I'm not able to replicate this from the repo you provided. I get approximately 1.4s for validate from each version, both with around 200M of RSS (on darwin/arm64)

@jbardin jbardin added the waiting for reproduction unable to reproduce issue without further information label Apr 12, 2024
@mschfh
Copy link
Contributor

mschfh commented Apr 12, 2024

I was able to reproduce it, although the timings seem ~15% worse.

The binaries were built from git using go1.22.2:

1.7.5

real    0m1.996s
user    0m3.148s
sys     0m0.267s

1.8.0

real    0m2.361s
user    0m3.625s
sys     0m0.360s

Flamegraphs:
CPU: https://flamegraph.com/share/19e220fe-f8ca-11ee-8204-e6b38c1ccd74/19efa205-f8ca-11ee-8204-e6b38c1ccd74
mem: https://flamegraph.com/share/9163f4a2-f8ca-11ee-8204-e6b38c1ccd74/916f234c-f8ca-11ee-8204-e6b38c1ccd74

@y3ti
Copy link
Author

y3ti commented Apr 12, 2024

Setup:

$ cd terraform-aws-github-runner/examples/multi-runner/
$ wget https://releases.hashicorp.com/terraform/1.8.0/terraform_1.8.0_darwin_arm64.zip
$ unzip terraform_1.8.0_darwin_arm64.zip
$ mv terraform terraform-1.8

clean up:

$ rm -rf .terraform*
$ ./terraform-1.8 init --upgrade
$ time ./terraform-1.8 validate
Success! The configuration is valid.


real	0m10.016s
user	0m2.767s
sys	0m0.478s

When I reran it again:

$ time ./terraform-1.8 validate
Success! The configuration is valid.


real	0m1.691s
user	0m2.692s
sys	0m0.331s

If I clean up my workspace:

$ rm -rf .terraform*
$ ./terraform-1.8 init --upgrade
$ time ./terraform-1.8 validate
Success! The configuration is valid.


real	0m9.935s
user	0m2.776s
sys	0m0.474s

$ time ./terraform-1.8 validate
Success! The configuration is valid.


real	0m1.699s
user	0m2.714s
sys	0m0.334s

I see in the trace log that terraform spent 8 seconds here:

2024-04-12T14:32:06.643+0200 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.45.0/darwin_arm64/terraform-provider-aws_v5.45.0_x5 args=[".terraform/providers/registry.terraform.io/hashicorp/aws/5.45.0/darwin_arm64/terraform-provider-aws_v5.45.0_x5"]
2024-04-12T14:32:06.659+0200 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.45.0/darwin_arm64/terraform-provider-aws_v5.45.0_x5 pid=69783
2024-04-12T14:32:06.659+0200 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.45.0/darwin_arm64/terraform-provider-aws_v5.45.0_x5
2024-04-12T14:32:14.047+0200 [INFO]  provider.terraform-provider-aws_v5.45.0_x5: configuring server automatic mTLS: timestamp="2024-04-12T14:32:14.047+0200"
2024-04-12T14:32:14.054+0200 [DEBUG] provider.terraform-provider-aws_v5.45.0_x5: plugin address: address=/var/folders/td/r11b7b6n1bz143g7q2lwsg7m0000gn/T/plugin3735407397 network=unix timestamp="2024-04-12T14:32:14.054+0200"
2024-04-12T14:32:14.054+0200 [DEBUG] provider: using plugin: version=5
2024-04-12T14:32:14.060+0200 [TRACE] GRPCProvider: GetProviderSchema

@jbardin
Copy link
Member

jbardin commented Apr 12, 2024

Thanks for the profile data @mschfh, we will look into that!

@y3ti, that's interesting, because that delay seems to be waiting for the provider plugin process to start, terraform isn't doing anything at that point. Can you check the output of terraform version to make sure you are using the exact same plugin versions in each case?

@apparentlymart apparentlymart added the v1.8 Issues (primarily bugs) reported against v1.8 releases label Apr 15, 2024
@joferrao
Copy link

I can confirm that our usual 20m plan/apply was hanging several times for more than 2 hours until we finally noticed it was due to the terraform version.

@y3ti
Copy link
Author

y3ti commented Apr 16, 2024

@jbardin Yes, I used the exact same versions of the plugins in each case.

I attempted to reproduce the issue on GitHub Actions, but everything functioned as expected (no issues): https://github.com/y3ti/terraform-1.8-test/actions/runs/8663897264

In my situation, I manage several large Terraform configurations (4000-5000 resources). Here are a few screenshots from the execution of terraform/terragrunt validate on versions 1.7.5 and 1.8.0:

Real Memory Usage: 6,73 GB vs 8,92 GB
CPU Time: 3:56,89 vs 11:20,95

Terraform 1.7.5

image image

Terraform 1.8.0

image image

@joferrao
Copy link

joferrao commented Apr 16, 2024

image
On our side it went from 20m, ~90% CPU and peak 8Gb to the picture attached - I gave up after 2 hours of waiting. The debug was frozen in something about "Running walker" or similar (it didn't get to the stage for making API calls to AWS to calculate diffs). Unfortunately I lost the logs because it took me a while to realize it was due to terraform version change and I was testing other options.

@y3ti
Copy link
Author

y3ti commented Apr 16, 2024

I tried to replicate the issue by using a publicly available GitHub Repository with a Terraform configuration that mimics the same test case. However, I couldn't reproduce the problem on a small-sized Terraform configuration codebase. I can reproduce the issue with larger configurations (1000+ Terraform data resources), Unfortunately, the code is not publicly available, so I can't share it. The screenshot I sent relates to this large Terraform configuration.

I enabled debug logs and noticed that Terraform is stuck at this point:

...
2024-04-16T15:50:49.657+0200 [DEBUG] Starting graph walk: walkValidate
2024-04-16T15:56:16.152+0200 [DEBUG] created provider logger: level=debug
...

@jbardin
Copy link
Member

jbardin commented Apr 16, 2024

Thanks @y3ti. The debug level isn't very useful unless you know what you are looking for. In a situation like this being able to see where the trace output pauses would probably get us closer. The only exact data we have so far is the pause after starting plugin, which unfortunately doesn't lead to anything that core is doing differently.

I've tried a handful of pathological configurations I have handy, and nothing is turning up any repeatable differences yet. Maybe we can find some common feature from the worst cases here, like all running under terragrunt (which I don't have experience with), local provider caches, test fixtures, etc.

@y3ti
Copy link
Author

y3ti commented Apr 17, 2024

@jbardin I have removed the Terragrunt dependency. I attempted to gather more data from the trace logs, but there was no difference (debug vs trace).

Terraform 1.7.5

$ terraform version
Terraform v1.7.5
on darwin_arm64
+ provider registry.terraform.io/cloudflare/cloudflare v3.35.0
+ provider registry.terraform.io/cloudposse/template v2.2.0
+ provider registry.terraform.io/cloudposse/utils v1.19.2
+ provider registry.terraform.io/hashicorp/archive v2.4.2
+ provider registry.terraform.io/hashicorp/aws v5.45.0
+ provider registry.terraform.io/hashicorp/http v3.4.2
+ provider registry.terraform.io/hashicorp/local v2.5.1
+ provider registry.terraform.io/hashicorp/null v3.2.2
+ provider registry.terraform.io/hashicorp/random v3.6.0
+ provider registry.terraform.io/hashicorp/time v0.11.1
+ provider registry.terraform.io/hashicorp/tls v4.0.5
$ time TF_LOG=trace terraform validate
2024-04-17T09:25:06.026+0200 [INFO]  Terraform version: 1.7.5
2024-04-17T09:25:06.026+0200 [DEBUG] using github.com/hashicorp/go-tfe v1.41.0
2024-04-17T09:25:06.026+0200 [DEBUG] using github.com/hashicorp/hcl/v2 v2.19.1
2024-04-17T09:25:06.026+0200 [DEBUG] using github.com/hashicorp/terraform-svchost v0.1.1
2024-04-17T09:25:06.026+0200 [DEBUG] using github.com/zclconf/go-cty v1.14.1
2024-04-17T09:25:06.026+0200 [INFO]  Go runtime version: go1.21.8
2024-04-17T09:25:06.026+0200 [INFO]  CLI args: []string{"terraform", "validate"}
2024-04-17T09:25:06.026+0200 [TRACE] Stdout is not a terminal
2024-04-17T09:25:06.026+0200 [TRACE] Stderr is not a terminal
2024-04-17T09:25:06.026+0200 [TRACE] Stdin is a terminal
2024-04-17T09:25:06.026+0200 [DEBUG] Attempting to open CLI config file: /Users/y3ti/.terraformrc
2024-04-17T09:25:06.026+0200 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2024-04-17T09:25:06.026+0200 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2024-04-17T09:25:06.026+0200 [DEBUG] ignoring non-existing provider search directory /Users/y3ti/.terraform.d/plugins
2024-04-17T09:25:06.026+0200 [DEBUG] ignoring non-existing provider search directory /Users/y3ti/Library/Application Support/io.terraform/plugins
2024-04-17T09:25:06.026+0200 [DEBUG] ignoring non-existing provider search directory /Library/Application Support/io.terraform/plugins
2024-04-17T09:25:06.026+0200 [INFO]  CLI command args: []string{"validate"}
2024-04-17T09:25:15.928+0200 [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers
...
2024-04-17T09:27:16.930+0200 [DEBUG] Starting graph walk: walkValidate
2024-04-17T09:27:18.111+0200 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/null\"]": starting visit (*terraform.NodeApplyableProvider)
...
2024-04-17T09:28:22.281+0200 [TRACE] vertex "root": starting visit (*terraform.nodeCloseModule)
2024-04-17T09:28:22.281+0200 [TRACE] vertex "root": visit complete
...
real	3m21.692s
user	4m26.538s
sys	1m35.698s

Terraform 1.8.0

$ terraform version
Terraform v1.8.0
on darwin_arm64
+ provider registry.terraform.io/cloudflare/cloudflare v3.35.0
+ provider registry.terraform.io/cloudposse/template v2.2.0
+ provider registry.terraform.io/cloudposse/utils v1.19.2
+ provider registry.terraform.io/hashicorp/archive v2.4.2
+ provider registry.terraform.io/hashicorp/aws v5.45.0
+ provider registry.terraform.io/hashicorp/http v3.4.2
+ provider registry.terraform.io/hashicorp/local v2.5.1
+ provider registry.terraform.io/hashicorp/null v3.2.2
+ provider registry.terraform.io/hashicorp/random v3.6.0
+ provider registry.terraform.io/hashicorp/time v0.11.1
+ provider registry.terraform.io/hashicorp/tls v4.0.5
$ time TF_LOG=trace terraform validate
2024-04-17T09:12:48.224+0200 [INFO]  Terraform version: 1.8.0
2024-04-17T09:12:48.224+0200 [DEBUG] using github.com/hashicorp/go-tfe v1.41.0
2024-04-17T09:12:48.224+0200 [DEBUG] using github.com/hashicorp/hcl/v2 v2.20.0
2024-04-17T09:12:48.224+0200 [DEBUG] using github.com/hashicorp/terraform-svchost v0.1.1
2024-04-17T09:12:48.224+0200 [DEBUG] using github.com/zclconf/go-cty v1.14.3
2024-04-17T09:12:48.224+0200 [INFO]  Go runtime version: go1.22.1
2024-04-17T09:12:48.224+0200 [INFO]  CLI args: []string{"terraform", "validate"}
2024-04-17T09:12:48.224+0200 [TRACE] Stdout is not a terminal
2024-04-17T09:12:48.224+0200 [TRACE] Stderr is not a terminal
2024-04-17T09:12:48.224+0200 [TRACE] Stdin is a terminal
2024-04-17T09:12:48.224+0200 [DEBUG] Attempting to open CLI config file: /Users/y3ti/.terraformrc
2024-04-17T09:12:48.224+0200 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2024-04-17T09:12:48.224+0200 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2024-04-17T09:12:48.224+0200 [DEBUG] ignoring non-existing provider search directory /Users/y3ti/.terraform.d/plugins
2024-04-17T09:12:48.224+0200 [DEBUG] ignoring non-existing provider search directory /Users/y3ti/Library/Application Support/io.terraform/plugins
2024-04-17T09:12:48.224+0200 [DEBUG] ignoring non-existing provider search directory /Library/Application Support/io.terraform/plugins
2024-04-17T09:12:48.225+0200 [INFO]  CLI command args: []string{"validate"}
2024-04-17T09:12:59.098+0200 [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers
...
2024-04-17T09:14:48.561+0200 [DEBUG] Starting graph walk: walkValidate
2024-04-17T09:19:56.946+0200 [TRACE] vertex "module.myname (expand)": starting visit (*terraform.nodeValidateModule)
...
2024-04-17T09:20:57.576+0200 [TRACE] vertex "root": visit complete
...
real	8m9.574s
user	10m58.318s
sys	1m36.452s

As observed, there was a pause of 5 minutes in logging between [DEBUG] Starting graph walk: walkValidate and [TRACE] vertex ....

PS. I tested v1.8.1 but the problem still exists.

@jbardin
Copy link
Member

jbardin commented Apr 18, 2024

Thanks @y3ti, the pause at Starting graph walk: walkValidate is definitely a sign of something, but I'm not sure what yet. There is some new code in between those points, but it's reprocessing the graph in a way which should not take any longer than the first time the graph was constructed, and will normally be much faster, so the fact that it's making up the bulk of the total time is confusing. Maybe there's an edge case I'm missing in there, but running it on my existing worst-case scenarios only causes a few seconds delay.

The first example you had paused in a different location, that however was a much smaller configuration to begin with, but given the last config is the location of the stall always the same? I'm trying to rule out some obscure runtime+OS interaction vs something that Terraform itself is doing.

Another random idea is if you add the env variable GODEBUG=gctrace=1,schedtrace=1000 is there runtime output to stderr during that pause? I don't expect to see much useful info there, but it could let us at least know if something was thrashing the garbage collector or the entire runtime was hanging for some reason.

@y3ti
Copy link
Author

y3ti commented Apr 18, 2024

Let's disregard the example from my initial comment. I attempted to use an open-source repository to provide a sample configuration that you could reproduce on your laptop. As previously mentioned, I attempted to reproduce the issue using GitHub Actions. However, the example Terraform configuration was not extensive enough, and I was unable to replicate the problem.

Let's focus on the example from my previous three comments.

As you suggested I ran terraform validate with GODEBUG=gctrace=1,schedtrace=1000

Here is the output:

$ time GODEBUG=gctrace=1,schedtrace=1000 TF_LOG=trace terraform validate
SCHED 0ms: gomaxprocs=10 idleprocs=8 threads=5 spinningthreads=1 needspinning=0 idlethreads=0 runqueue=0 [1 0 0 0 0 0 0 0 0 0]
gc 1 @0.006s 3%: 0.008+1.3+0.008 ms clock, 0.089+0.60/1.5/0+0.085 ms cpu, 3->4->2 MB, 4 MB goal, 0 MB stacks, 1 MB globals, 10 P
gc 2 @0.011s 7%: 0.047+1.1+0.47 ms clock, 0.47+0.64/2.1/0.87+4.7 ms cpu, 5->5->3 MB, 5 MB goal, 0 MB stacks, 1 MB globals, 10 P
gc 3 @0.019s 7%: 0.058+1.6+0.15 ms clock, 0.58+0.10/3.0/3.8+1.5 ms cpu, 7->7->5 MB, 8 MB goal, 0 MB stacks, 1 MB globals, 10 P
2024-04-18T17:33:32.809+0200 [INFO]  Terraform version: 1.8.0
...
2024-04-18T17:35:29.821+0200 [DEBUG] Starting graph walk: walkValidate
SCHED 117792ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 118793ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
gc 71 @118.931s 1%: 0.10+307+0.015 ms clock, 1.0+0.15/921/1806+0.15 ms cpu, 2758->2773->2283 MB, 3239 MB goal, 0 MB stacks, 1 MB globals, 10 P
SCHED 119802ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 120812ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 121823ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 122824ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 123834ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 124853ms: gomaxprocs=10 idleprocs=0 threads=18 spinningthreads=0 needspinning=1 idlethreads=2 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
gc 72 @124.419s 1%: 0.082+534+0.015 ms clock, 0.82+0.50/1602/3140+0.15 ms cpu, 4421->4537->3226 MB, 4567 MB goal, 0 MB stacks, 1 MB globals, 10 P
SCHED 125861ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 126872ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 127884ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 128897ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 129902ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 130907ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 131909ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
gc 73 @131.995s 1%: 0.067+843+0.012 ms clock, 0.67+0.049/2528/5048+0.12 ms cpu, 6065->6211->4438 MB, 6453 MB goal, 0 MB stacks, 1 MB globals, 10 P
SCHED 132915ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=0 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 133920ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 134921ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 135927ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 136927ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 137935ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 138939ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 139946ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 140957ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 141968ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 142976ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 143990ms: gomaxprocs=10 idleprocs=0 threads=18 spinningthreads=0 needspinning=1 idlethreads=2 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
gc 74 @143.150s 1%: 0.13+1206+0.029 ms clock, 1.3+0.47/3614/7224+0.29 ms cpu, 8299->8498->6040 MB, 8878 MB goal, 0 MB stacks, 1 MB globals, 10 P
SCHED 144994ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 146002ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 147004ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 148011ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 149011ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 150017ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 151028ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 152032ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 153040ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 154046ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 155051ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 156051ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 157058ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 158059ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 159070ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 160073ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 161077ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 162088ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 163100ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 164103ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 165104ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 166111ms: gomaxprocs=10 idleprocs=0 threads=18 spinningthreads=1 needspinning=0 idlethreads=2 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
...
gc 82 @416.502s 2%: 0.60+2736+0.019 ms clock, 6.0+0.89/8176/16254+0.19 ms cpu, 13244->13471->7008 MB, 14030 MB goal, 0 MB stacks, 1 MB globals, 10 P
SCHED 419704ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 420715ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 421716ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 422716ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 423725ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 424726ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 425732ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 426732ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 427734ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 428734ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 429735ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 430736ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 431747ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 432752ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 433757ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 434761ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 435774ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=1 [0 0 0 0 0 0 0 0 0 0]
SCHED 436777ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 437778ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 438785ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 439795ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 440799ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 441800ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 442806ms: gomaxprocs=10 idleprocs=8 threads=18 spinningthreads=1 needspinning=0 idlethreads=10 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
SCHED 443806ms: gomaxprocs=10 idleprocs=9 threads=18 spinningthreads=0 needspinning=0 idlethreads=11 runqueue=0 [0 0 0 0 0 0 0 0 0 0]
2024-04-18T17:40:57.478+0200 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/aws\"].eu_central_1": starting visit (*terraform.NodeApplyableProvider)
...
2024-04-18T17:41:48.423+0200 [TRACE] vertex "root": visit complete
...
Success! The configuration is valid, but there were some
validation warnings as shown above.


real	8m15.830s
user	10m55.163s
sys	1m58.489s

@jbardin
Copy link
Member

jbardin commented Apr 18, 2024

Thank you @y3ti! That at least makes it clear that something is going on in there, the heap is growing throughout that pause. If anyone has a public reproduction case it would help, but knowing that v1.8 is allocating significantly more memory around that time is very useful!

@jbardin jbardin self-assigned this Apr 22, 2024
@y3ti
Copy link
Author

y3ti commented Apr 24, 2024

@jbardin I have just tested 1.8.2. It works very well - even better than 1.7.5!

$ time terragrunt validate for my configuration:

1.7.5 - 3m21.692s
1.8.0 - 8m9.574s
1.8.2 - 1m42.493s

Memory usage is also lower:

image image

Thank you very much!

@y3ti y3ti closed this as completed Apr 24, 2024
@fpeixotoj
Copy link

@jbardin I have just tested 1.8.2. It works very well - even better than 1.7.5!

$ time terragrunt validate for my configuration:

1.7.5 - 3m21.692s
1.8.0 - 8m9.574s
1.8.2 - 1m42.493s

Memory usage is also lower:

image image
Thank you very much!

For me terraform plan still takes a very long time with 1.8.2 version.

@jbardin
Copy link
Member

jbardin commented Apr 25, 2024

@fpeixotoj, if that is the case there must be a different issue you are experiencing. If you have data showing the difference, and preferably a way to reproduce the problem, please file a new issue.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug new new issue not yet triaged v1.8 Issues (primarily bugs) reported against v1.8 releases waiting for reproduction unable to reproduce issue without further information
Projects
None yet
Development

No branches or pull requests

6 participants