Monitoring with Stackdriver on Kubernetes Engine
Table of Contents
- Relevant Material
Stackdriver Kubernetes Monitoring is a new Stackdriver feature that more tightly integrates with GKE to better show you key stats about your cluster and the workloads and services running in it. Included in the new feature is functionality to import, as native Stackdriver metrics, metrics from pods with Prometheus endpoints. This allows you to use Stackdriver native alerting functionality with your Prometheus metrics without any additional workload.
This tutorial will walk you through setting up Monitoring and visualizing metrics from a Kubernetes Engine cluster. It makes use of Terraform, a declarative Infrastructure as Code tool that enables configuration files to be used to automate the deployment and evolution of infrastructure in the cloud. The logs from the Kubernetes Engine cluster will be leveraged to walk through the monitoring capabilities of Stackdriver.
Note: The setup of the Stackdriver Monitoring workspace is not automated with a script because it is currently not supported through Terraform or via the gcloud command line tool.
The tutorial will create a Kubernetes Engine cluster that has a sample application deployed to it. The logging and metrics for the cluster are loaded into Stackdriver Logging by default. In the tutorial a Stackdriver Monitoring account will be setup to view the metrics captured.
Run Demo in a Google Cloud Shell
Click the button below to run the demo in a Google Cloud Shell
All the tools for the demo are installed. When using Cloud Shell execute the following command in order to setup gcloud cli. When executing this command please setup your region and zone.
You can obtain a free trial of GCP if you need one
Install Cloud SDK
The Google Cloud SDK is used to interact with your GCP resources. Installation instructions for multiple platforms are available online.
Install kubectl CLI
The kubectl CLI is used to interteract with both Kubernetes Engine and kubernetes in general. Installation instructions for multiple platforms are available online.
Terraform is used to automate the manipulation of cloud infrastructure. Its installation instructions are also available online.
The Terraform configuration will execute against your GCP environment and create a Kubernetes Engine cluster running a simple application. The configuration will use your personal account to build out these resources. To setup the default account the configuration will use, run the following command to select the appropriate account:
$ gcloud auth application-default login
In this section we will create a Stackdriver Monitoring account so that we can explore the capabilities of the Monitoring console.
Create a new Stackdriver Account
The following steps are used to setup a Stackdriver Monitoring account.
- Visit the Monitoring section of the GCP Console. This will launch the process of creating a new Monitoring console if you have not created one before.
- On the Create your free StackDriver account page select the project you created earlier. Note: You cannot change this setting once it is created.
- Click on the Create Account button.
- On the next page, Add Google Cloud Platform projects to monitor you can leave this alone since the project is already selected it isn't necessary to select any other projects. Note: You can add and remove projects at a later date if necessary.
- Click the Continue button.
- On the Monitor AWS accounts page you can choose to specify your AWS account information or skip this step.
- For this tutorials purposes you can click the Skip AWS Setup button.
- On the Install the Stackdriver Agents page you are provided with a script that can be used to add the Stackdriver Monitoring and Logging agents on each of your VM instances. Note: The tracking of VM's is not automatic like it is for Kubernetes Engine. For the purposes of this tutorial this script is not needed.
- Click the Continue button.
- On the Get Reports by Email page you can simply select any of the options depending on whether you want to receive the reports. For the purposes of this demo we will not be using the reports.
- Click the Continue button.
- The actual creation of the account and underlying resources takes a few minutes. Once completed you can press the Launch monitoring button.
Deploying the cluster
The infrastructure and Stackdriver alert policy required by this project can be deployed by executing:
- Read your project & zone configuration to generate a couple config files:
./terraform/terraform.tfvarsfor Terraform variables
./manifests/prometheus-service-sed.yamlfor the Prometeus policy to be created in Stackdriver
terraform initto prepare Terraform to create the infrastructure
terraform applyto actually create the infrastructure & Stackdriver alert policy
If you need to override any of the defaults in the Terraform variables file, simply replace the desired value(s) to the right of the equals sign(s). Be sure your replacement values are still double-quoted.
If no errors are displayed then after a few minutes you should see your Kubernetes Engine cluster in the GCP Console.
How does Terraform work?
Following the principles of Infrastructure as Code and Immutable Infrastructure, Terraform supports the writing of declarative descriptions of the desired state of infrastructure. When the descriptor is applied, Terraform uses GCP APIs to provision and update resources to match. Terraform compares the desired state with the current state so incremental changes can be made without deleting everything and starting over. For instance, Terraform can build out GCP projects and compute instances, etc., even set up a Kubernetes Engine cluster and deploy applications to it. When requirements change, the descriptor can be updated and Terraform will adjust the cloud infrastructure accordingly.
This example will start up a Kubernetes Engine cluster and deploy a simple sample application to it. By default, Kubernetes Engine clusters in GCP are provisioned with a pre-configured Fluentd-based collector that forwards logs to Stackdriver.
If no errors are displayed during deployment, after a few minutes you should see your Kubernetes Engine cluster in the GCP Console with the sample application deployed.
In order to validate that resources are installed and working correctly, run:
Using Stackdriver Kubernetes Monitoring
For a thorough guide on how to observe your cluster with the new Stackdriver Kubernetes UI, see Observing Your Kubernetes Clusters.
Native Prometheus integration
The Terraform code included a Stackdriver alerting policy that is watching a metric that was originally imported from a Prometheus endpoint.
From the Stackdriver main page, click on
Policies Overview to show all the policies, including the alerting policy called
Prometheus mem alloc. Clicking on the policy will provide much more detail.
When you are finished with this example, and you are ready to clean up the resources that were created so that you avoid accruing charges, you can run the following command to remove all resources :
$ make teardown
This command uses the
terraform destroy command to remove the infrastructure. Terraform tracks the resources it creates so it is able to tear them all back down.
** The install script fails with a
Permission denied when running Terraform.**
The credentials that Terraform is using do not provide the
necessary permissions to create resources in the selected projects. Ensure
that the account listed in
gcloud config list has necessary permissions to
create resources. If it does, regenerate the application default credentials
gcloud auth application-default login.
** Metrics Not Appearing or Uptime Checks not executing ** After the scripts execute it may take a few minutes for the Metrics or Uptime Checks to appear. Configure the items and give the system some time to generate metrics and checks as they someimes take time to complete.
This is not an officially supported Google product