Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: can you use health checks / outlier detection with serverless NEGS? #1783

Closed
yardenas opened this issue Oct 20, 2023 · 8 comments
Closed

Comments

@yardenas
Copy link

yardenas commented Oct 20, 2023

Hi everyone,

Some context: I'm trying to set up a multiregion deployment for cloudrun, mainly for the purpose of automatic failover. My understanding is that whenever one of the NEGs (of a specific region) becomes unhealthy, traffic is redirected to other NEGS.

Now, to mark a NEG as unhealthy, I have two options:

  1. Use health checks
  2. Use outlier detection.

The documentation states:

Health checks are not supported for serverless backends. Therefore, backend services that contain serverless NEG backends cannot be configured with health checks. However, you can optionally enable outlier detection to identify unhealthy serverless services and route new requests to a healthy serverless service.

So it seems that I cannot use health checks but only outlier detection.

Now for the code (a modified version of the serverless cloud run blueprint, to support multiple regions):

resource "random_uuid" "cloudrun_revision_id" {
  keepers = {
    first = timestamp()
  }
}

locals {
  gclb_create = var.custom_domain == null ? false : true
}

# Cloud Run service
module "cloud_run" {
  for_each      = var.regions
  source        = "github.com/GoogleCloudPlatform/cloud-foundation-fabric.git//modules/cloud-run?ref=v25.0.0"
  project_id    = var.project_id
  name          = "${var.run_svc_name}-${each.key}"
  revision_name = "${var.run_svc_name}-${random_uuid.cloudrun_revision_id.result}"
  region        = each.value
  containers = {
    default = {
      image = var.container_image
      options = {
        command  = null
        args     = null
        env      = {}
        env_from = null
      }
      ports         = null
      resources     = null
      volume_mounts = null
    }
  }
  iam = {
    "roles/run.invoker" = var.invoker_group
  }
  revision_annotations = {
    autoscaling         = var.autoscaling
    cloudsql_instances  = var.cloudsql_instances
    vpcaccess_connector = var.vpcaccess_connectors[each.key]
    vpcaccess_egress    = "all-traffic"
  }
  ingress_settings       = var.ingress_settings
  service_account_create = true
}

# Reserved static IP for the Load Balancer
resource "google_compute_global_address" "default" {
  count   = local.gclb_create ? 1 : 0
  project = var.project_id
  name    = "glb-ip"
}

resource "google_compute_ssl_policy" "profile" {
  name            = "prod-ssl-policy"
  profile         = "MODERN"
  min_tls_version = "TLS_1_2"
}

# Global L7 HTTPS Load Balancer in front of Cloud Run
module "glb" {
  source     = "github.com/GoogleCloudPlatform/cloud-foundation-fabric.git//modules/net-lb-app-ext?ref=v25.0.0"
  count      = local.gclb_create ? 1 : 0
  project_id = var.project_id
  name       = "external-load-balancer"
  address    = google_compute_global_address.default[0].address
  backend_service_configs = {
    default = {
      backends = [
        for k, v in var.regions : {
          backend = k
        }
      ]
      health_checks = []
      outlier_detection = {
        consecutive_errors = 10
      }
      port_name     = "http"
      security_policy = try(google_compute_security_policy.policy[0].name,
      null)
      iap_config = try({
        oauth2_client_id     = google_iap_client.iap_client[0].client_id,
        oauth2_client_secret = google_iap_client.iap_client[0].secret
      }, null)
    }
  }
  health_check_configs = {}
  neg_configs = {
    for k, v in var.regions :
    k => {
      cloudrun = {
        region = v
        target_service = {
          name = module.cloud_run[k].service_name
        }
      }
    }
  }
  protocol = "HTTPS"
  https_proxy_config = {
    ssl_policy = google_compute_ssl_policy.profile.self_link
  }
  ssl_certificates = {
    managed_configs = {
      default = {
        domains = [var.custom_domain]
      }
    }
  }
}

# Cloud Armor configuration
resource "google_compute_security_policy" "policy" {
  count   = local.gclb_create && var.security_policy.enabled ? 1 : 0
  name    = "cloud-run-policy"
  project = var.project_id
  rule {
    action   = "deny(403)"
    priority = 1000
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = var.security_policy.ip_blacklist
      }
    }
    description = "Deny access to list of IPs"
  }
  rule {
    action   = "deny(403)"
    priority = 900
    match {
      expr {
        expression = "request.path.matches(\"${var.security_policy.path_blocked}\")"
      }
    }
    description = "Deny access to specific URL paths"
  }
  rule {
    action   = "allow"
    priority = "2147483647"
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    description = "Default rule"
  }
}

# Identity-Aware Proxy (IAP) or OAuth brand (see OAuth consent screen)
# Note:
# Only "Organization Internal" brands can be created programmatically
# via API. To convert it into an external brand please use the GCP
# Console.
# Brands can only be created once for a Google Cloud project and the
# underlying Google API doesn't support DELETE or PATCH methods.
# Destroying a Terraform-managed Brand will remove it from state but
# will not delete it from Google Cloud.
resource "google_iap_brand" "iap_brand" {
  count   = var.iap.enabled ? 1 : 0
  project = var.project_id
  # Support email displayed on the OAuth consent screen. The caller must be
  # the user with the associated email address, or if a group email is
  # specified, the caller can be either a user or a service account which
  # is an owner of the specified group in Cloud Identity.
  support_email     = var.iap.support_email
  application_title = var.iap.app_title
}

# IAP owned OAuth2 client
# Note:
# Only internal org clients can be created via declarative tools.
# External clients must be manually created via the GCP console.
# Warning:
# All arguments including secret will be stored in the raw state as plain-text.
resource "google_iap_client" "iap_client" {
  count        = var.iap.enabled ? 1 : 0
  display_name = var.iap.oauth2_client_name
  brand        = google_iap_brand.iap_brand[0].name
}

# IAM policy for IAP
# For simplicity we use the same email as support_email and authorized member
resource "google_iap_web_iam_member" "iap_iam" {
  count   = var.iap.enabled ? 1 : 0
  project = var.project_id
  role    = "roles/iap.httpsResourceAccessor"
  member  = var.iap.email
}

resource "google_project_service_identity" "iap_sa" {
  count    = var.iap.enabled ? 1 : 0
  provider = google-beta
  project  = var.project_id
  service  = "iap.googleapis.com"
}

Whenever I run this code I get the following error:

Invalid value for field 'resource.outlierDetection': '{  "consecutiveErrors": 10,  "maxEjectionPercent": 10,  "enforcingConsecutiveErrors": 100,  "enforci...'. Outlier detection is not supported., invalid

What am I missing here? Any help would be very much appreciated!

@ludoo
Copy link
Collaborator

ludoo commented Oct 20, 2023

@apichick is probably best place to chime in on this

@ludoo
Copy link
Collaborator

ludoo commented Oct 20, 2023

can you paste the full error message which includes the source line number and file?

@yardenas
Copy link
Author

│ Error: Error creating BackendService: googleapi: Error 400: Invalid value for field 'resource.outlierDetection': '{  "consecutiveErrors": 10,  "maxEjectionPercent": 10,  "enforcingConsecutiveErrors": 100,  "enforci...'. Outlier detection is not supported., invalid
│ 
│   with module.application.module.glb[0].google_compute_backend_service.default["default"],
│   on .terraform/modules/application.glb/modules/net-lb-app-ext/backend-service.tf line 44, in resource "google_compute_backend_service" "default":
│   44: resource "google_compute_backend_service" "default" {
│ 
╵

Hope this helps

@ludoo
Copy link
Collaborator

ludoo commented Oct 20, 2023

You're using a Global Load Balancer, this might be the reason

image

@yardenas
Copy link
Author

I see, thanks for the info!
@ludoo, are you aware of any other way to achieve automatic failover?

Thanks a lot for helping ! 💪

@ludoo
Copy link
Collaborator

ludoo commented Oct 20, 2023

@apichick was chatting with me about it working with GLB, she might have code for that. Let's wait a minute until she has time to chime in. :)

@czka
Copy link

czka commented Oct 31, 2023

FWIW, I guess that once the Outlier detection for serverless NEGs enters GA (it's pre-GA as of writing this) the google_compute_backend_service's outlier_detection should be able to support the EXTERNAL_MANAGED LB scheme as well (at least as long as IAP isn't enabled). See: hashicorp/terraform-provider-google#15210

@ludoo
Copy link
Collaborator

ludoo commented Nov 1, 2023

Closing this for now as I don't think it's a module issue from our side. Feel free to reopen if you still want to discuss, or if new evidence emerges.

@ludoo ludoo closed this as completed Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants