-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CloudSQLInstance stuck in FAILED if using private connectivity and Connection does not exist #84
Comments
Could you clarify on how the scenario described in this issue differs from the mainline scenarios that our user guides walk people through? |
As long as you allow There are two things here:
For the first problem, we probably need more investigation of CloudSQL API. It might be the case that this is not treated as error but rather a warning. Nevertheless, our CR should show it. For the second one, what we could do is to have a reference only to block the creation. For instance, CloudSQL would have a |
This was my initial thought was well. However, after talking with @negz he expressed some concerns around building a dependency graph a an eventually consistent system. We may have to get creative with how we want to solve this issue, and maybe the first pass does just look like a
@jbw976 To expound on what @muvaf said here, the creation in the Services guides is sequential so this is not an issue, and in the Stacks guide I am guessing this is not an issue because the |
@jbw976 Frankly I'm pretty skeptical that the scenario our guides walk folks through is a use case that exists in the real world. To clarify:
...but I strongly suspect folks wanting to create a VPC (typically the base network primitive that all other infrastructure lives in) in the exact same step as the infrastructure that runs in that VPC is a rare case, at best. It's certainly not a pattern I have ever encountered during my time in SRE. More commonly I would imagine creating the network infrastructure (a rare event) and creating infrastructure like databases that live in that network (a much more common event) would be distinct actions and thus either distinct stages in a GitOps pipeline, or distinct pipelines altogether. I do think it's worth looking for a fix for this issue, but I strongly suggest that we weigh the complexity and effort required to fix it against how potentially serious this shortcoming is to folks trying to use Crossplane in the real world. |
So two options come to mind here. Neither would bulletproof this experience, but both would reduce the the likelihood of a user encountering it. If a
status:
peerings:
- autoCreateRoutes: true
exchangeSubnetRoutes: true
name: cloudsql-mysql-googleapis-com
network: https://www.googleapis.com/compute/v1/projects/speckle-umbrella-30/global/networks/cloud-sql-network-283222062215-b2a39e7ea055b996
state: ACTIVE
stateDetails: '[2019-11-06T13:04:01.073-08:00]: Connected.'
- autoCreateRoutes: true
exchangeSubnetRoutes: true
name: servicenetworking-googleapis-com
network: https://www.googleapis.com/compute/v1/projects/c064626fb0df236dc-tp/global/networks/servicenetworking
state: ACTIVE
stateDetails: '[2019-11-06T13:03:51.961-08:00]: Connected.'
routingConfig:
routingMode: REGIONAL
selfLink: https://www.googleapis.com/compute/v1/projects/REDACTED/global/networks/my-cool-network My inclination is to go with option 1, because it's easier to do, and better supports the case in which you manage your |
Option 1 makes sense to me, too. If the name is constant, having it show up on the CR doesn't make a lot of sense. So, I like that option more than the one that adds a reference just to block. Handling the dependency implicitly in the Though this is still a bug on GCP side. They say try creating connection and we eventually do but CloudSQL doesn't reconcile and it doesn't return an error on creation. Hope there are not a class of errors that show up only in the console like this one. |
FWIW I think this does show up on the API too - at least the CloudSQL instance transitions to state |
What happened?
When creating a
GlobalAddress
,Connection
, andCloudSQLInstance
, I noticed theCloudSQLInstance
would enter aFAILED
state and never come out of it. The error was:Upon reaching this state the instance had to be deleted and recreated because it was observed as existing but could not be updated.
How can we reproduce it?
GlobalAddress
, aConnection
(that references it), and aCloudSQLInstance
with secure connectivity enabled all at the same time in the same network.GlobalAddress
should come available almost immediately, but theConnection
will wait a short time before creation because its first reconciliation will result in unresolved references.CloudSQLInstance
will begin creation because it does not have references to either of theGlobalAddress
orConnection
, but will fail if it gets to the subnetwork creation step before theConnection
is created because it will be unable to peer the network.Note: The need to delete and recreate the resource in this scenario is not a major problem because it is somewhat unlikely that this scenario will be exercised that frequently, and the instance will never be created so there is not risk of losing data. However, it does somewhat hamper the immediate bootstrap of a full environment that includes a database.
What environment did it happen in?
Crossplane version:
v0.4.0
stack-gcp version:
v0.2.0
The text was updated successfully, but these errors were encountered: