Skip to content
This repository has been archived by the owner on Jul 25, 2024. It is now read-only.

fix(terraform): fix tf apply failures #305

Merged
merged 9 commits into from
Feb 3, 2022
Merged

fix(terraform): fix tf apply failures #305

merged 9 commits into from
Feb 3, 2022

Conversation

ace-n
Copy link
Contributor

@ace-n ace-n commented Jan 11, 2022

Fixes issues discovered during work on #304

# Need to enable Artifact Registry service before repository creation.
google_project_service.artifactregistry
# Need to ensure Artifact Registry API is enabled first.
time_sleep.wait_for_artifactregistry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more info here? I used to have a retry for artifact registry because the beta depends-upon wasn't working, but the last I checked, it had been fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment above (lines 61 and 62) enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking around for an explanation or at least a bug filed against Terraform. the depends_on function is supposed to cover this, so I think we need to explain why it doesn't and include some kind of note about updating it when it does work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR that Adam did before to fix this problem: #165

It's possible that the depends_on cloudbuild that you added in the other block might be enough to (weirdly) fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the cloudbuild dependency is (effectively) async, and signals completion too early (i.e. before the API enablement is actually complete).

A sufficient timeout would solve that issue, which explains why this works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Dina's comment, I don't see any dependency on Cloud Build here, seems like a fairly clean "Artifact Registry resource automatically has a dependency on the service" bug. We should have a TODO linking to an upstream issue when something that's meant to be a key value of Terraform isn't working.

@dinagraves
Copy link
Contributor

I'm able to replicate the error if I run the setup script with sh setup.sh. If I do ./setup.sh, then I don't get any errors. (I think) the file should only be executed that way because it's #!/bin/bash instead of bin/sh.

@dinagraves
Copy link
Contributor

Oh, I see, you're running the script from clean_project_setup.sh. Not sure what the issue is, but I wonder if it's because you're not clearing the application state?

You have:

# Clear Terraform state
rm ../terraform/*.tfstat*

But, I think we also need:

rm .../terraform/app/*.tfstat*

@ace-n
Copy link
Contributor Author

ace-n commented Jan 12, 2022

@dinagraves that removal is fixed in #303 - we can rebase this on top of that (once it's merged) and see if it works.

@ace-n
Copy link
Contributor Author

ace-n commented Jan 12, 2022

I tried #303 without this - no dice there.

(Perhaps this should be filed as a Terraform provider bug, too?)

@ace-n
Copy link
Contributor Author

ace-n commented Jan 12, 2022

From @dinagraves

We should compare ./setup.sh and sh setup.sh (in terraform/clean_project_setup.sh) and see if this makes any difference.

@ace-n
Copy link
Contributor Author

ace-n commented Jan 21, 2022

Dina - I tried replacing exec ./setup.sh with ./setup.sh and sh setup.sh in clean_project_setup.sh, and neither seemed to do anything. 🙁

@dinagraves
Copy link
Contributor

Hrm, are you using your Mac? Cloud Shell? This seems like it must be an environment issue.

@ace-n
Copy link
Contributor Author

ace-n commented Jan 21, 2022

I tested this on my work Mac.

@grayside
Copy link
Collaborator

If nobody else can replicate the problem and it's only encountered when using the "clean terraform runner", which as previously discussed is an edge case convenience, then I'm against an awkward workaround that adds complexity to the terraform definition. Instead, let's plan on arranging some pairs troubleshooting time to see if we can isolate the problem and get a more specific fix.

@grayside grayside closed this Jan 21, 2022
@ace-n
Copy link
Contributor Author

ace-n commented Jan 25, 2022

@grayside (in chat) suggested adding depends_ons to the Pub/Sub topics and/or IAM memberships.

I'm reopening this PR to track that work, and will update the code itself if that approach works.

@ace-n ace-n reopened this Jan 25, 2022
@dinagraves
Copy link
Contributor

@grayside I've tried this a dozen different ways with lots of explicit compound depends_ons -- the 10 second wait is the only one that worked consistently. I think maybe we file a bug in terraform and merge this in.

terraform/ops/main.tf Outdated Show resolved Hide resolved
@ace-n ace-n requested a review from grayside February 2, 2022 01:46
@ace-n
Copy link
Contributor Author

ace-n commented Feb 3, 2022

@grayside approved internally, given that b/217779689 was filed.

@ace-n ace-n merged commit d2943e0 into main Feb 3, 2022
@ace-n ace-n deleted the fix-tf branch February 3, 2022 23:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants