Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE cluster modules: add optional kube state metrics #1682

Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions blueprints/gke/autopilot/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,17 @@ module "cluster" {
# autopilot = true
# }
# monitoring_config = {
# # (Optional) control plane metrics
# enable_api_server_metrics = true
# enable_controller_manager_metrics = true
# enable_scheduler_metrics = true
# # (Optional) kube state metrics
# enable_daemonset_metrics = true
# enable_deployment_metrics = true
# enable_hpa_metrics = true
# enable_pod_metrics = true
# enable_statefulset_metrics = true
# enable_storage_metrics = true
# }
# cluster_autoscaling = {
# auto_provisioning_defaults = {
Expand Down
28 changes: 14 additions & 14 deletions blueprints/gke/multitenant-fleet/README.md

Large diffs are not rendered by default.

10 changes: 8 additions & 2 deletions blueprints/gke/multitenant-fleet/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,18 @@ variable "clusters" {
monitoring_config = optional(object({
enable_system_metrics = optional(bool, true)

# Control plane metrics
# (Optional) control plane metrics
enable_api_server_metrics = optional(bool, false)
enable_controller_manager_metrics = optional(bool, false)
enable_scheduler_metrics = optional(bool, false)

# TODO add kube state metrics
# (Optional) kube state metrics
enable_daemonset_metrics = optional(bool, false)
enable_deployment_metrics = optional(bool, false)
enable_hpa_metrics = optional(bool, false)
enable_pod_metrics = optional(bool, false)
enable_statefulset_metrics = optional(bool, false)
enable_storage_metrics = optional(bool, false)

# Google Cloud Managed Service for Prometheus
enable_managed_prometheus = optional(bool, true)
Expand Down
30 changes: 15 additions & 15 deletions fast/stages/3-gke-multitenant/dev/README.md

Large diffs are not rendered by default.

10 changes: 8 additions & 2 deletions fast/stages/3-gke-multitenant/dev/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -70,12 +70,18 @@ variable "clusters" {
monitoring_config = optional(object({
enable_system_metrics = optional(bool, true)

# Control plane metrics
# (Optional) control plane metrics
enable_api_server_metrics = optional(bool, false)
enable_controller_manager_metrics = optional(bool, false)
enable_scheduler_metrics = optional(bool, false)

# TODO add kube state metrics
# (Optional) kube state metrics
enable_daemonset_metrics = optional(bool, false)
enable_deployment_metrics = optional(bool, false)
enable_hpa_metrics = optional(bool, false)
enable_pod_metrics = optional(bool, false)
enable_statefulset_metrics = optional(bool, false)
enable_storage_metrics = optional(bool, false)

# Google Cloud Managed Service for Prometheus
enable_managed_prometheus = optional(bool, true)
Expand Down
52 changes: 41 additions & 11 deletions modules/gke-cluster-autopilot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,12 +107,12 @@ module "cluster-1" {
### Monitoring configuration

> [!NOTE]
> System metrics collection is pre-configured for Autopilot clusters and cannot be disabled.
> [System metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-system-metrics) collection is pre-configured for Autopilot clusters and cannot be disabled.

> [!WARNING]
> GKE **workload metrics** is deprecated and removed in GKE 1.24 and later. Workload metrics is replaced by [Google Cloud Managed Service for Prometheus](https://cloud.google.com/stackdriver/docs/managed-prometheus), which is Google's recommended way to monitor Kubernetes applications by using Cloud Monitoring.

This example shows how to [configure collection of Kubernetes control plane metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-control-plane-metrics). The metrics for these components are not collected by default.
This example shows how to [configure collection of Kubernetes control plane metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-control-plane-metrics). These metrics are optional and are not collected by default.

```hcl
module "cluster-1" {
Expand All @@ -134,6 +134,36 @@ module "cluster-1" {
# tftest modules=1 resources=1 inventory=monitoring-config-control-plane.yaml
```

The next example shows how to [configure collection of kube state metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-ksm). These metrics are optional and are not collected by default.

```hcl
module "cluster-1" {
source = "./fabric/modules/gke-cluster-autopilot"
project_id = var.project_id
name = "cluster-1"
location = "europe-west1"
vpc_config = {
network = var.vpc.self_link
subnetwork = var.subnet.self_link
secondary_range_names = {} # use default names "pods" and "services"
}
monitoring_config = {
enable_daemonset_metrics = true
enable_deployment_metrics = true
enable_hpa_metrics = true
enable_pod_metrics = true
enable_statefulset_metrics = true
enable_storage_metrics = true
# Kube state metrics collection requires Google Cloud Managed Service for Prometheus,
# which is enabled by default.
# enable_managed_prometheus = true
}
}
# tftest modules=1 resources=1 inventory=monitoring-config-kube-state.yaml
```

The *control plane metrics* and *kube state metrics* collection can be configured in a single `monitoring_config` block.

### Backup for GKE

> [!NOTE]
Expand Down Expand Up @@ -177,9 +207,9 @@ module "cluster-1" {
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [location](variables.tf#L110) | Autopilot clusters are always regional. | <code>string</code> | ✓ | |
| [name](variables.tf#L170) | Cluster name. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L196) | Cluster project ID. | <code>string</code> | ✓ | |
| [vpc_config](variables.tf#L225) | VPC-level configuration. | <code title="object&#40;&#123;&#10; network &#61; string&#10; subnetwork &#61; string&#10; master_ipv4_cidr_block &#61; optional&#40;string&#41;&#10; secondary_range_blocks &#61; optional&#40;object&#40;&#123;&#10; pods &#61; string&#10; services &#61; string&#10; &#125;&#41;&#41;&#10; secondary_range_names &#61; optional&#40;object&#40;&#123;&#10; pods &#61; optional&#40;string, &#34;pods&#34;&#41;&#10; services &#61; optional&#40;string, &#34;services&#34;&#41;&#10; &#125;&#41;&#41;&#10; master_authorized_ranges &#61; optional&#40;map&#40;string&#41;&#41;&#10; stack_type &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [name](variables.tf#L187) | Cluster name. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L213) | Cluster project ID. | <code>string</code> | ✓ | |
| [vpc_config](variables.tf#L242) | VPC-level configuration. | <code title="object&#40;&#123;&#10; network &#61; string&#10; subnetwork &#61; string&#10; master_ipv4_cidr_block &#61; optional&#40;string&#41;&#10; secondary_range_blocks &#61; optional&#40;object&#40;&#123;&#10; pods &#61; string&#10; services &#61; string&#10; &#125;&#41;&#41;&#10; secondary_range_names &#61; optional&#40;object&#40;&#123;&#10; pods &#61; optional&#40;string, &#34;pods&#34;&#41;&#10; services &#61; optional&#40;string, &#34;services&#34;&#41;&#10; &#125;&#41;&#41;&#10; master_authorized_ranges &#61; optional&#40;map&#40;string&#41;&#41;&#10; stack_type &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [backup_configs](variables.tf#L17) | Configuration for Backup for GKE. | <code title="object&#40;&#123;&#10; enable_backup_agent &#61; optional&#40;bool, false&#41;&#10; backup_plans &#61; optional&#40;map&#40;object&#40;&#123;&#10; encryption_key &#61; optional&#40;string&#41;&#10; include_secrets &#61; optional&#40;bool, true&#41;&#10; include_volume_data &#61; optional&#40;bool, true&#41;&#10; namespaces &#61; optional&#40;list&#40;string&#41;&#41;&#10; region &#61; string&#10; schedule &#61; string&#10; retention_policy_days &#61; optional&#40;string&#41;&#10; retention_policy_lock &#61; optional&#40;bool, false&#41;&#10; retention_policy_delete_lock_days &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;, &#123;&#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [description](variables.tf#L37) | Cluster description. | <code>string</code> | | <code>null</code> |
| [enable_addons](variables.tf#L43) | Addons enabled in the cluster (true means enabled). | <code title="object&#40;&#123;&#10; cloudrun &#61; optional&#40;bool, false&#41;&#10; config_connector &#61; optional&#40;bool, false&#41;&#10; dns_cache &#61; optional&#40;bool, false&#41;&#10; horizontal_pod_autoscaling &#61; optional&#40;bool, false&#41;&#10; http_load_balancing &#61; optional&#40;bool, false&#41;&#10; istio &#61; optional&#40;object&#40;&#123;&#10; enable_tls &#61; bool&#10; &#125;&#41;&#41;&#10; kalm &#61; optional&#40;bool, false&#41;&#10; network_policy &#61; optional&#40;bool, false&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; horizontal_pod_autoscaling &#61; true&#10; http_load_balancing &#61; true&#10;&#125;">&#123;&#8230;&#125;</code> |
Expand All @@ -189,12 +219,12 @@ module "cluster-1" {
| [logging_config](variables.tf#L115) | Logging configuration. | <code title="object&#40;&#123;&#10; enable_api_server_logs &#61; optional&#40;bool, false&#41;&#10; enable_scheduler_logs &#61; optional&#40;bool, false&#41;&#10; enable_controller_manager_logs &#61; optional&#40;bool, false&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [maintenance_config](variables.tf#L126) | Maintenance window configuration. | <code title="object&#40;&#123;&#10; daily_window_start_time &#61; optional&#40;string&#41;&#10; recurring_window &#61; optional&#40;object&#40;&#123;&#10; start_time &#61; string&#10; end_time &#61; string&#10; recurrence &#61; string&#10; &#125;&#41;&#41;&#10; maintenance_exclusions &#61; optional&#40;list&#40;object&#40;&#123;&#10; name &#61; string&#10; start_time &#61; string&#10; end_time &#61; string&#10; scope &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; daily_window_start_time &#61; &#34;03:00&#34;&#10; recurring_window &#61; null&#10; maintenance_exclusion &#61; &#91;&#93;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [min_master_version](variables.tf#L149) | Minimum version of the master, defaults to the version of the most recent official release. | <code>string</code> | | <code>null</code> |
| [monitoring_config](variables.tf#L155) | Monitoring configuration. System metrics collection cannot be disabled for Autopilot clusters. Control plane metrics are optional. Google Cloud Managed Service for Prometheus is enabled by default. | <code title="object&#40;&#123;&#10; enable_api_server_metrics &#61; optional&#40;bool, false&#41;&#10; enable_controller_manager_metrics &#61; optional&#40;bool, false&#41;&#10; enable_scheduler_metrics &#61; optional&#40;bool, false&#41;&#10; enable_managed_prometheus &#61; optional&#40;bool, true&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [node_locations](variables.tf#L175) | Zones in which the cluster's nodes are located. | <code>list&#40;string&#41;</code> | | <code>&#91;&#93;</code> |
| [private_cluster_config](variables.tf#L182) | Private cluster configuration. | <code title="object&#40;&#123;&#10; enable_private_endpoint &#61; optional&#40;bool&#41;&#10; master_global_access &#61; optional&#40;bool&#41;&#10; peering_config &#61; optional&#40;object&#40;&#123;&#10; export_routes &#61; optional&#40;bool&#41;&#10; import_routes &#61; optional&#40;bool&#41;&#10; project_id &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [release_channel](variables.tf#L201) | Release channel for GKE upgrades. Clusters created in the Autopilot mode must use a release channel. Choose between \"RAPID\", \"REGULAR\", and \"STABLE\". | <code>string</code> | | <code>&#34;REGULAR&#34;</code> |
| [service_account](variables.tf#L212) | The Google Cloud Platform Service Account to be used by the node VMs created by GKE Autopilot. | <code>string</code> | | <code>null</code> |
| [tags](variables.tf#L218) | Network tags applied to nodes. | <code>list&#40;string&#41;</code> | | <code>&#91;&#93;</code> |
| [monitoring_config](variables.tf#L155) | Monitoring configuration. System metrics collection cannot be disabled. Control plane metrics are optional. Kube state metrics are optional. Google Cloud Managed Service for Prometheus is enabled by default. | <code title="object&#40;&#123;&#10; enable_api_server_metrics &#61; optional&#40;bool, false&#41;&#10; enable_controller_manager_metrics &#61; optional&#40;bool, false&#41;&#10; enable_scheduler_metrics &#61; optional&#40;bool, false&#41;&#10; enable_daemonset_metrics &#61; optional&#40;bool, false&#41;&#10; enable_deployment_metrics &#61; optional&#40;bool, false&#41;&#10; enable_hpa_metrics &#61; optional&#40;bool, false&#41;&#10; enable_pod_metrics &#61; optional&#40;bool, false&#41;&#10; enable_statefulset_metrics &#61; optional&#40;bool, false&#41;&#10; enable_storage_metrics &#61; optional&#40;bool, false&#41;&#10; enable_managed_prometheus &#61; optional&#40;bool, true&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [node_locations](variables.tf#L192) | Zones in which the cluster's nodes are located. | <code>list&#40;string&#41;</code> | | <code>&#91;&#93;</code> |
| [private_cluster_config](variables.tf#L199) | Private cluster configuration. | <code title="object&#40;&#123;&#10; enable_private_endpoint &#61; optional&#40;bool&#41;&#10; master_global_access &#61; optional&#40;bool&#41;&#10; peering_config &#61; optional&#40;object&#40;&#123;&#10; export_routes &#61; optional&#40;bool&#41;&#10; import_routes &#61; optional&#40;bool&#41;&#10; project_id &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [release_channel](variables.tf#L218) | Release channel for GKE upgrades. Clusters created in the Autopilot mode must use a release channel. Choose between \"RAPID\", \"REGULAR\", and \"STABLE\". | <code>string</code> | | <code>&#34;REGULAR&#34;</code> |
| [service_account](variables.tf#L229) | The Google Cloud Platform Service Account to be used by the node VMs created by GKE Autopilot. | <code>string</code> | | <code>null</code> |
| [tags](variables.tf#L235) | Network tags applied to nodes. | <code>list&#40;string&#41;</code> | | <code>&#91;&#93;</code> |

## Outputs

Expand Down
7 changes: 7 additions & 0 deletions modules/gke-cluster-autopilot/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,13 @@ resource "google_container_cluster" "cluster" {
var.monitoring_config.enable_api_server_metrics ? "APISERVER" : null,
var.monitoring_config.enable_controller_manager_metrics ? "CONTROLLER_MANAGER" : null,
var.monitoring_config.enable_scheduler_metrics ? "SCHEDULER" : null,
# Kube state metrics:
var.monitoring_config.enable_daemonset_metrics ? "DAEMONSET" : null,
var.monitoring_config.enable_deployment_metrics ? "DEPLOYMENT" : null,
var.monitoring_config.enable_hpa_metrics ? "HPA" : null,
var.monitoring_config.enable_pod_metrics ? "POD" : null,
var.monitoring_config.enable_statefulset_metrics ? "STATEFULSET" : null,
var.monitoring_config.enable_storage_metrics ? "STORAGE" : null,
]))
managed_prometheus {
enabled = var.monitoring_config.enable_managed_prometheus
Expand Down
23 changes: 20 additions & 3 deletions modules/gke-cluster-autopilot/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -153,18 +153,35 @@ variable "min_master_version" {
}

variable "monitoring_config" {
description = "Monitoring configuration. System metrics collection cannot be disabled for Autopilot clusters. Control plane metrics are optional. Google Cloud Managed Service for Prometheus is enabled by default."
description = "Monitoring configuration. System metrics collection cannot be disabled. Control plane metrics are optional. Kube state metrics are optional. Google Cloud Managed Service for Prometheus is enabled by default."
type = object({
# Control plane metrics
enable_api_server_metrics = optional(bool, false)
enable_controller_manager_metrics = optional(bool, false)
enable_scheduler_metrics = optional(bool, false)
# Google Cloud Managed Service for Prometheus
# GKE Autopilot clusters running GKE version 1.25 or greater must have this on.
# Kube state metrics. Requires managed Prometheus. Requires provider version >= v4.82.0
enable_daemonset_metrics = optional(bool, false)
enable_deployment_metrics = optional(bool, false)
enable_hpa_metrics = optional(bool, false)
enable_pod_metrics = optional(bool, false)
enable_statefulset_metrics = optional(bool, false)
enable_storage_metrics = optional(bool, false)
# Google Cloud Managed Service for Prometheus. Autopilot clusters version >= 1.25 must have this on.
enable_managed_prometheus = optional(bool, true)
})
default = {}
nullable = false
validation {
condition = anytrue([
var.monitoring_config.enable_daemonset_metrics,
var.monitoring_config.enable_deployment_metrics,
var.monitoring_config.enable_hpa_metrics,
var.monitoring_config.enable_pod_metrics,
var.monitoring_config.enable_statefulset_metrics,
var.monitoring_config.enable_storage_metrics,
]) ? var.monitoring_config.enable_managed_prometheus : true
error_message = "Kube state metrics collection requires Google Cloud Managed Service for Prometheus to be enabled."
}
}

variable "name" {
Expand Down