feat: Add ability to use custom terraform version by brittandeyoung · Pull Request #2942 · dagger/dagger

brittandeyoung · 2022-08-15T15:40:49Z

Updated all terraform actions and base "run" action to support specifying a custom version of terraform cli.

Updated all terraform actions and base "run" action to support specifying a custom version of terraform cli. Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

We should always default to a specific version of Terraform, otherwise users of this package may be surprised with new behaviour introduced in a future latest version. Requiring everyone that uses this package to default to a version sounds like a lot of busy work. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

gerhard

This looks good to me - ready to merge. WDYT @marcosnils @jpadams?

Does it still work as expected for you @brittandeyoung after my changes?

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

brittandeyoung · 2022-08-16T12:20:09Z

@gerhard I am confused as to why there needs to be a default version specified in run when there is a default version specified in the image? I was also unaware you could pass in nested values without declaring them =) so thats cool! saves on repeated definitions. Please take a look at how in run I am conditionally passing in the version if one is set, then in the image the default value is used if a version is not provided. Doesn't this accomplish the same thing without declaring the default value twice?

@marcosnils Good catch! I have moved the input for version to the #Image definition. Please let me know if this resolves your concerns.

brittandeyoung · 2022-08-16T16:15:34Z

Additionally I would like to get thoughts on making the default version latest. I bring this up as this may be a great way to keep the Actions up to date with the latest version of the terraform cli. As long as this is documented, it is usually my experience that users will specify the version of terraform they want if they do not want to use latest. Otherwise developers will have to specify latest or a newer version of the specified default if they need functionality that is in a newer version that the default defined in the image action.

@gerhard @marcosnils

Having this as latest would prevent developers from needing to specify version if they are okay with using the latest version and okay with getting auto updates. Then hard coding to a specific version would be more of a conscience effort.

marcosnils · 2022-08-17T00:49:54Z

@marcosnils Good catch! I have moved the input for version to the #Image definition. Please let me know if this resolves your concerns.

thx for updating this 🙌. I just realized that the reason that we didn't catch this in the tests is because we're not currently testing it. WDYT if we add the following action to also test the docker.#Image flow?

				imageWorkflow: {
			// Set a terraform version for testing specific version of terraform
			_image: docker.#Pull & {
				source: "hashicorp/terraform"
			}

			init: terraform.#Init & {
				container: #input: _image.output
				source: tfSource.output
			}

			workspaceNew: terraform.#Workspace & {
				container: #input: _image.output
				source: init.output
				subCmd: "new"
				name:   "TEST_WORKSPACE"
			}

			plan: terraform.#Plan & {
				container: #input: _image.output
				source: init.output
			}

			apply: terraform.#Apply & {
				container: #input: _image.output
				always: true
				source: plan.output
			}

			verify: #AssertFileRegex & {
				input: apply.output
				path:  "terraform.tfstate"
				regex: string | *"1\\.2\\.7"
			}

			destroy: terraform.#Destroy & {
				container: #input: _image.output
				source: apply.output
				container: #input: _image.output
			}
		}

Having this as latest would prevent developers from needing to specify version if they are okay with using the latest version and okay with getting auto updates. Then hard coding to a specific version would be more of a conscience effort.

I agree that having latest is very convenient for users that prefer to have the latest features without having to hardcode a version in their plans. Having said that, I'd open separate issue to discuss about this since currently a lot of our actions follow the same pattern of having a _#DefaultVersion definition and there's a reason that it was done in the first place. If we decide to change this, we should make the effort to make it work equally for all the actions and address it on a single PR if possible.

brittandeyoung · 2022-08-17T12:24:22Z

@marcosnils New test workflow has been added as suggested. I have also opened #2951 to start the discussion of using latest for the _#DefaultVersion.

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

brittandeyoung · 2022-08-17T13:31:35Z

The Universe Tests were exceeding the git hub actions timeout value specified, So I increased it.

grouville · 2022-08-17T13:43:28Z

The Universe Tests were exceeding the git hub actions timeout value specified, So I increased it.

They shouldn't: in general, it times out when bats fails unproperly and a zombie continues to live whenever it shouldn't.
It can happen when a test is broken by the import path. Whenever you test it locally, how long does it take ?

brittandeyoung · 2022-08-17T13:45:51Z

@grouville Unfortunately I am unable to run all the tests locally as I use a M1 Mac. There are tests that rely on docker images that do not have an arm based image. I have been relying on targeting the specific tests I am modifying locally.

brittandeyoung · 2022-08-17T14:00:41Z

@grouville I have seen the Universe test run for close to 23 minutes in past PRs. With tests being added, it would not surprise me if it takes longer than 30 min to run in a PR. This is why I bumped the timeout to test on this PR.

gerhard · 2022-08-17T14:32:01Z

Thank you @brittandeyoung for sticking with this!

I have a few local changes which I am yet to push, but for now it's more important to address the existing comments in reverse order, starting with the most recent one first (bottom to top). Here comes the first one:

@brittandeyoung I have seen the Universe test run for close to 23 minutes in past PRs. With tests being added, it would not surprise me if it takes longer than 30 min to run in a PR. This is why I bumped the timeout to test on this PR.

This is a symptom of the shared networking available on the free GitHub Actions tier combined with the variable performance of the GitHub Actions Cache.

Locally, these tests consistently take less than 2 minutes to run. Here is my last run:

Instead of increasing the timeout, a better fix is to disable the GitHub Actions Caching. This will result in more consistent durations, which currently are around the 15 mins mark. This fix is already in main. This has more details: #2941 (comment). FTR:

My recommendation is to rebase this branch and drop the timeout increase commit.

Moving onto the next comment.

Updated all terraform actions and base "run" action to support specifying a custom version of terraform cli. Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

We should always default to a specific version of Terraform, otherwise users of this package may be surprised with new behaviour introduced in a future latest version. Requiring everyone that uses this package to default to a version sounds like a lot of busy work. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

…young/dagger into feat/terraform-version Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

This reverts commit a782b7d. Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

brittandeyoung · 2022-08-17T14:39:26Z

@gerhard I have pulled the latest from main and performed a rebase.

gerhard · 2022-08-17T14:42:06Z

@brittandeyoung Unfortunately I am unable to run all the tests locally as I use a M1 Mac. There are tests that rely on docker images that do not have an arm based image. I have been relying on targeting the specific tests I am modifying locally.

I do not use Docker for Mac since this is 8x slower than the Linux equivalent. FTR: https://docs.dagger.io/1200/local-dev

I personally run Docker on Linux & use DOCKER_HOST=tcp://100.81.87.121:2375 across all my Macs to connect to it via Tailscale. Technically, I could also use DOCKER_HOST=tcp://192.168.88.22:2375 since it's a physical host on the LAN, but the Tailscale IP ensures that this works consistently even when I'm remote, on the M1 Mac laptop.

Knowing the 8x slowdown, would it still be worth to you for us to invest time in getting these tests to work on Docker for Mac ARM?

gerhard · 2022-08-17T14:47:21Z

@brittandeyoung I have also opened #2951 to start the discussion of using latest for the _#DefaultVersion.

Thank you very much! That was going to be my suggestion too since this is slowly snowballing and I am keen on us getting it merged.

brittandeyoung · 2022-08-17T14:47:33Z

@gerhard Thanks for the tips on running it remotely on a linux host. I do not feel these tests need to be "fixed". The examples are the powershell action. The docker image simply does not have a arm64 option. This will likely be the case for a long time. I will work on getting a dev environment using the remote DOCKER_HOST as suggested.

gerhard · 2022-08-17T15:00:29Z

@brittandeyoung Thanks for the tips on running it remotely on a linux host. I will work on getting a dev environment using the remote DOCKER_HOST as suggested.

🙌🏻

@brittandeyoung The examples are the powershell action. The docker image simply does not have a arm64 option. This will likely be the case for a long time.

The simplest thing that I can think of is to disable those tests if the platform is not x86_64. WDYT?

brittandeyoung · 2022-08-17T15:03:21Z

The simplest thing that I can think of is to disable those tests if the platform is not x86_64. WDYT?

I think that might help for tests unrelated. But if the user is modifying this action, I think it would be better for them to get an error when running tests locally then it simply not running them.

Maybe there should be some documentation indicating potential issues running tests on ARM platforms instead?

gerhard · 2022-08-17T15:11:58Z

I think that might help for tests unrelated. But if the user is modifying this action, I think it would be better for them to get an error when running tests locally then it simply not running them.

Good point. I believe that is the case currently.

Maybe there should be some documentation indicating potential issues running tests on ARM platforms instead?

I would much rather we made these tests work for ARM. I don't know if we can add a Powershell arm64 image, but that sounds most reasonable.

gerhard · 2022-08-17T15:15:30Z

It looks like bats got "stuck": https://github.com/dagger/dagger/runs/7881156518?check_suite_focus=true#step:6:1097

Running this locally now, with an empty cache.

gerhard · 2022-08-17T15:22:32Z

Passes locally (Linux amd64) on first try:

Re-triggered the GitHub Action for now - not holding my breath.

This is my last update in this time-box. I might be able to resume later on today, otherwise tomorrow.

My takeaway from this session is that my task to provision a dedicated Docker Engine / GitHub Actions Runner just jumped to the top of my list. These tests should finish in less than 2 minutes. We are wasting a lot of time by not having this. Leaving it here for now. fyi @aluzzardi @samalba @shykes

gerhard · 2022-08-17T23:04:16Z

I continued digging into the make universe-test timeouts, and this is what I discovered on a fresh Debian 11 host:

Notice that we blow past the 4GB limit, start swapping, and then everything slows down to a crawl. While this host is limited to 4GB and the free GitHub Actions Runner has 7GB, I suspect that we are seeing the same issue in GitHub Actions.

FTR: my local Linux host where this completes in under 2 mins has 64GB of DDR4 and 8 CPUs (16 vCPUs). The fast NVMe drive (7GB/s & 500k IOPS) makes a big difference too.

Next step is to run make universe-test in a host limited to 8GB, which is closer to the GitHub Actions Runner spec and see what happens then. My assumption is that the same thing. Will continue tomorrow.

gerhard · 2022-08-18T07:41:34Z

32287s (9h) later, the integration tests are still running, and they are unlikely to finish since we are hitting some gRPC timeouts within BuildKit:

The above confirms that bumping the timeout to 60m was unlikely to work.

Before I bump the RAM to 8GB, I am curious what happens if I lower the number of parallel bats jobs.

Because we currently run 4 in parallel (meaning 4 dagger instances), I suspect that this makes this CUE issue cue-lang/cue#1795 far worse. Based on everything I know, I am fairly certain that we are hitting this specific issue. 👨🏻‍⚕️

gerhard · 2022-08-18T10:55:14Z

Even with 2 bats jobs, we end up with 3 dagger instances which do not fit in 4GB based on the work that they need to do. Notice the RSS taken by dagger processes (ignore the forks) during the test:

And after make universe-test is cancelled:

Since the GitHub Actions Runner has 7GB, I am going to bump my test host to 8GB and try again.

Instead of adopting multiple approaches, stick to a single one. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk> Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

When `dagger do test integration universe` runs, each job can use up to 1GB of RAM. There is an extra top-level job that spikes up to 3GB of RAM. Combined, they exceed the 7GB free GitHub Actions Runner limit. When this happens, the host starts swapping, everything slows down to a crawl, and eventually the 30 minutes action limit gets exceeded and the job gets cancelled. The best approach right now seems to not parallelize by default. The total time saving is insignificant - 20s for a >600s total in my test. dagger#2942 has more details (including lots of screenshots). Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

gerhard · 2022-08-18T17:57:43Z

Not running multiple universe tests in parallel should keep the memory utilisation within the free GitHub Actions runner resources.

Each job can use up to 1GB of RAM, and the top-level one that orchestrates all the jobs can use up to 3GB of RAM. This hits the 7GB limit, the runner starts swapping, everything slows down to a crawl, and eventually the 30 mins timeout gets hit. This is what that looks like in one picture:

While the last commit makes the Experimental Universe Tests slower (9 vs 7 mins), as well as the Integration Tests slower (11 vs 7 mins), the Universe ones should now be more reliable going forward. They also seem to be slightly faster (14 vs 15 mins). This speed-up might be incidental.

There is a lot more that I discovered since I started looking at this. For now, this improves things enough to get this change merged. I intend to continue digging into this test-related / GitHub Actions issue next week. cc @aluzzardi @samalba @shykes

Thank you @brittandeyoung for your contribution & sticking with us as we pushed this through. If we did it right, your next contribution should be easier 😎

Leaving another screenshot for the Universe Tests issue here:

gerhard

👍 🚀

brittandeyoung requested review from helderco and jpadams as code owners August 15, 2022 15:40

brittandeyoung force-pushed the feat/terraform-version branch 4 times, most recently from 6cd90c9 to 072ca8f Compare August 15, 2022 16:17

feat: Add ability to use custom terraform version

6d9fc57

Updated all terraform actions and base "run" action to support specifying a custom version of terraform cli. Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

brittandeyoung force-pushed the feat/terraform-version branch from 072ca8f to 6d9fc57 Compare August 15, 2022 16:45

gerhard force-pushed the feat/terraform-version branch from 58c0e14 to acddbba Compare August 15, 2022 22:49

gerhard self-requested a review August 15, 2022 23:04

gerhard approved these changes Aug 15, 2022

View reviewed changes

marcosnils reviewed Aug 16, 2022

View reviewed changes

Comment thread pkg/universe.dagger.io/alpha/terraform/run.cue Outdated

fix: move version to #Image to prevent breaking custom docker.#Image

20e8536

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

brittandeyoung force-pushed the feat/terraform-version branch from 55a36a5 to 20e8536 Compare August 16, 2022 12:20

brittandeyoung force-pushed the feat/terraform-version branch from c9cd7ef to 2e11890 Compare August 17, 2022 12:14

tests: Added test workflow for terraform custom docker image

c6e4d1f

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

brittandeyoung force-pushed the feat/terraform-version branch from 2e11890 to c6e4d1f Compare August 17, 2022 12:53

fix: Increase Universe action timeout

b7b3011

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

brittandeyoung requested a review from a team as a code owner August 17, 2022 13:31

feat: Add ability to use custom terraform version

87f9c1a

Updated all terraform actions and base "run" action to support specifying a custom version of terraform cli. Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

gerhard and others added 6 commits August 17, 2022 10:34

fix: move version to #Image to prevent breaking custom docker.#Image

af9406a

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

tests: Added test workflow for terraform custom docker image

17fe611

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

fix: Increase Universe action timeout

a782b7d

Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

Merge branch 'feat/terraform-version' of https://github.com/brittande…

14b494b

…young/dagger into feat/terraform-version Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

Revert "fix: Increase Universe action timeout"

7b4b7c0

This reverts commit a782b7d. Signed-off-by: Brittan DeYoung <32572259+brittandeyoung@users.noreply.github.com>

brittandeyoung force-pushed the feat/terraform-version branch from 428ee1e to 7b4b7c0 Compare August 17, 2022 14:38

gerhard mentioned this pull request Aug 17, 2022

Missing arm64 version for Docker image PowerShell/PowerShell#17915

Closed

gerhard added 2 commits August 18, 2022 18:36

chore: Use a consistent approach to Terraform tests

fee5c47

Instead of adopting multiple approaches, stick to a single one. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk> Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

gerhard force-pushed the feat/terraform-version branch from 9f46ab9 to 23530f3 Compare August 18, 2022 17:36

gerhard approved these changes Aug 18, 2022

View reviewed changes

gerhard merged commit ee657b7 into dagger:main Aug 18, 2022

gerhard mentioned this pull request Aug 25, 2022

Add ability to prioritize GitHub Action runners actions/runner#1665

Open

gerhard mentioned this pull request Oct 11, 2022

🐞 insane memory usage #3307

Closed

Conversation

brittandeyoung commented Aug 15, 2022

Uh oh!

gerhard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brittandeyoung commented Aug 16, 2022

Uh oh!

brittandeyoung commented Aug 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcosnils commented Aug 17, 2022

Uh oh!

brittandeyoung commented Aug 17, 2022

Uh oh!

brittandeyoung commented Aug 17, 2022

Uh oh!

grouville commented Aug 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brittandeyoung commented Aug 17, 2022

Uh oh!

brittandeyoung commented Aug 17, 2022

Uh oh!

gerhard commented Aug 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brittandeyoung commented Aug 17, 2022

Uh oh!

gerhard commented Aug 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gerhard commented Aug 17, 2022

Uh oh!

brittandeyoung commented Aug 17, 2022

Uh oh!

gerhard commented Aug 17, 2022

Uh oh!

brittandeyoung commented Aug 17, 2022

Uh oh!

gerhard commented Aug 17, 2022

Uh oh!

gerhard commented Aug 17, 2022

Uh oh!

gerhard commented Aug 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gerhard commented Aug 17, 2022

Uh oh!

gerhard commented Aug 18, 2022

Uh oh!

gerhard commented Aug 18, 2022

Uh oh!

gerhard commented Aug 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gerhard left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gerhard left a comment •

edited

Loading

brittandeyoung commented Aug 16, 2022 •

edited

Loading

grouville commented Aug 17, 2022 •

edited

Loading

gerhard commented Aug 17, 2022 •

edited

Loading

gerhard commented Aug 17, 2022 •

edited

Loading

gerhard commented Aug 17, 2022 •

edited

Loading

gerhard commented Aug 18, 2022 •

edited

Loading