Logging with Stackdriver on Kubernetes Engine
Table of Contents
- Relevant Material
Stackdriver Logging can be used aggregate logs from all GCP resources as well as any custom resources (on other platforms) to allow for one centralized store for all logs and metrics. Logs are aggregated and then viewable within the provided Stackdriver Logging UI. They can also be exported to Sinks to support more specialized of use cases. Currently, Stackdriver Logging supports exporting to the following sinks:
- Cloud Storage
This document will describe the steps required to deploy a sample application to Kubernetes Engine that forwards log events to Stackdriver Logging. It makes use of Terraform, a declarative Infrastructure as Code tool that enables configuration files to be used to automate the deployment and evolution of infrastructure in the cloud. The configuration will also create a Cloud Storage bucket and a BigQuery dataset for exporting log data to.
The Terraform configurations are going to build a Kubernetes Engine cluster that will generate logs and metrics that can be ingested by Stackdriver. The scripts will also build out Logging Export Sinks for Cloud Storage, BigQuery, and Cloud Pub/Sub. The diagram of how this will look along with the data flow can be seen in the following graphic.
The steps described in this document require the installation of several tools and the proper configuration of authentication to allow them to access your GCP resources.
You'll need access to a Google Cloud Project with billing enabled. See Creating and Managing Projects (https://cloud.google.com/resource-manager/docs/creating-managing-projects) for creating a new project. To make cleanup easier it's recommended to create a new project.
Required GCP APIs
The following APIs will be enabled in the
- Cloud Resource Manager API
- Kubernetes Engine API
- Stackdriver Logging API
- Stackdriver Monitoring API
- BigQuery API
Run Demo in a Google Cloud Shell
Click the button below to run the demo in a Google Cloud Shell.
All the tools for the demo are installed. When using Cloud Shell execute the following command in order to setup gcloud cli. When executing this command please setup your region and zone.
- Terraform >= 0.12
- Google Cloud SDK version >= 204.0.0
- kubectl matching the latest GKE version
- bash or bash compatible shell
- GNU Make 3.x or later
- A Google Cloud Platform project where you have permission to create networks
Install Cloud SDK
The Google Cloud SDK is used to interact with your GCP resources. Installation instructions for multiple platforms are available online.
Install kubectl CLI
The kubectl CLI is used to interteract with both Kubernetes Engine and kubernetes in general. Installation instructions for multiple platforms are available online.
Terraform is used to automate the manipulation of cloud infrastructure. Its installation instructions are also available online.
The Terraform configuration will execute against your GCP environment and create various resources. The script will use your personal account to build out these resources. To setup the default account the script will use, run the following command to select the appropriate account:
gcloud auth application-default login
How does it work?
Following the principles of Infrastructure as Code and Immutable Infrastructure, Terraform supports the writing of declarative descriptions of the desired state of infrastructure. When the descriptor is applied, Terraform uses GCP APIs to provision and update resources to match. Terraform compares the desired state with the current state so incremental changes can be made without deleting everything and starting over. For instance, Terraform can build out GCP projects and compute instances, etc., even set up a Kubernetes Engine cluster and deploy applications to it. When requirements change, the descriptor can be updated and Terraform will adjust the cloud infrastructure accordingly.
This example will start up a Kubernetes Engine cluster and deploy a simple sample application to it. By default, Kubernetes Engine clusters in GCP are provisioned with a pre-configured Fluentd-based collector that forwards logs to Stackdriver. Interacting with the sample app will produce logs that are visible in the Stackdriver Logging UI and other log event sinks.
The terraform configuration takes two parameters to determine where the Kubernetes Engine cluster should be created:
For simplicity, these parameters should be specified in a file named terraform.tfvars, in the terraform directory. To generate this file based on your glcoud defaults, a script will be used
./scripts/generate-tfvars.sh to produce a
terraform/terraform.tfvars file with the following keys. The values themselves will match the output of gcloud config list:
# Contents of terraform.tfvars project="YOUR_PROJECT" zone="YOUR_ZONE"
Deploying the cluster
There are three Terraform files provided with this example. The first one,
main.tf, is the starting point for Terraform. It describes the features that will be used, the resources that will be manipulated, and the outputs that will result. The second file is
provider.tf, which indicates which cloud provider and version will be the target of the Terraform commands--in this case GCP. The final file is
variables.tf, which contains a list of variables that are used as inputs into Terraform. Any variables referenced in the
main.tf that do not have defaults configured in
variables.tf will result in prompts to the user at runtime.
To build out the environment you can execute the following make command:
$ make create
If no errors are displayed during deployment, after a few minutes you should see your Kubernetes Engine cluster in the GCP Console with the sample application deployed.
Now that the application is deployed to Kubernetes Engine we can generate log data and use the Stackdriver UI and other tools to view it.
The sample application that Terraform deployed serves up a simple web page. Each time you open this application in your browser the application will publish log events to Stackdriver Logging. Refresh the page a few times to produce several log events.
To get the URL for the application page you must perform the following steps:
- In the GCP console navigate to the Networking -> Network services page.
- On the default Load balancing page that shows up, click on the TCP load balancer that was setup.
- On the Load balancer details page there is a top section labeled Frontend. Note the IP:Port value as this will be used in the upcoming steps.
Using the IP:Port value you can now access the application. Go to a browser and enter the URL. The browser should return a screen that looks similar to the following:
Logs in the Stackdriver UI
Stackdriver provides a UI for viewing log events. Basic search and filtering features are provided, which can be useful when debugging system issues. The Stackdriver Logging UI is best suited to exploring more recent log events. Users requiring longer-term storage of log events should consider some of the tools in following sections.
To access the Stackdriver Logging console perform the following steps:
- In the GCP console navigate to the Stackdriver -> Logging page.
- The default logging console will load. On this page change the resource filter to be GKE Container -> stackdriver-logging -> default (the stackdriver-logging is the cluster; and the default is the namespace). Your screen should look similar the screenshot below.
- On this screen you can expand the bulleted log items to view more complete details about the log entry.
In the logging console you can perform any type of text search, or try the various filters by log type, log level, timeframe, etc.
Viewing Log Exports
The Terraform configuration built out two Log Export Sinks. To view the sinks perform the following steps:
- In the GCP console navigate to the Stackdriver -> Logging page.
- The default logging console will load. On the left navigation click on the Exports menu option.
- This will bring you to the Exports page. You should see two Sinks in the list of log exports.
- You can edit/view these sinks by clicking on the context menu to the right and selecting the Edit sink option.
- Additionally, you could create additional custom export sinks by clicking on the Create Export option in the top of the navigation window.
Logs in Cloud Storage
Log events can be stored in Cloud Storage, an object storage system suitable for archiving data. Policies can be configured for Cloud Storage buckets that, for instance, allow aging data to expire and be deleted while more recent data can be stored with a variety of storage classes affecting price and availability.
The Terraform configuration created a Cloud Storage Bucket named stackdriver-gke-logging- to which logs will be exported for medium to long-term archival. In this example, the Storage Class for the bucket is defined as Nearline because the logs should be infrequently accessed in a normal production environment (this will help to manage the costs of medium-term storage). In a production scenario this bucket may also include a lifecycle policy that moves the content to Coldline storage for cheaper long-term storage of logs.
To access the Stackdriver logs in Cloud Storage perform the following steps:
Note: Logs from Cloud Storage Export are not populated immediately. It may take up to 2-3 hours for logs to appear.
- In the GCP console navigate to the Storage -> Storage page.
- This loads the Cloud Storage Browser page. On the page find the Bucket with the name stackdriver-gke-logging-, and click on the name (which is a hyperlink).
- This will show the details of the bucket. You should see a list of directories corresponding to pods running in the cluster (eg autoscaler, dnsmasq, etc.).
On this page you can click into any of the named folders to browse specific log details like heapster, kubedns, sidecar, etc.
Logs in BigQuery
Stackdriver log events can be configured to be published to BigQuery, a data warehouse tool that supports fast, sophisticated, querying over large data sets.
The Terraform configuration will create a BigQuery DataSet named gke_logs_dataset. This dataset will be setup to include all Kubernetes Engine related logs for the last hour (by setting a Default Table Expiration for the dataset). A Stackdriver Export will be created that pushes Kubernetes Engine container logs to the dataset.
To access the Stackdriver logs in BigQuery perform the following steps:
Note: The BigQuery Export is not populated immediately. It may take a few minutes for logs to appear.
- In the GCP console navigate to the Big Data -> BigQuery page.
- This loads a new browser tab with the BigQuery console.
- On the left hand console you will have a display of the datasets you have access to. You should see a dataset named gke_logs_dataset. Expand this dataset to view the tables that exist (Note: The dataset is created immediately, but the tables are what is generated as logs are written and new tables are needed).
- Click on one of the tables to view the table details. Your screen should look similar to the screenshot below.
- Review the schema of the table to note the column names and their data types. This information can be used in the next step when we query the table to look at the data.
- Click on the Query Table towards the top right to perform a custom query against the table.
- This opens the query window. You can simply add an asterisk (*) after the Select in the window to pull all details from the current table. Note: A 'Select *' query is generally very expensive and not advised. For this tutorial the dataset is limited to only the last hour of logs so the overall dataset is relatively small.
- Click the Run Query button to execute the query and return some results from the table.
- A popup window till ask you to confirm running the query. Click the Run Query button on this window as well.
- The results window should display some rows and columns. You can scroll through the various rows of data that are returned, or download the results to a local file.
- Execute some custom queries that filter for specific data based on the results that were shown in the original query.
When you are finished with this example you will want to clean up the resources that were created so that you avoid accruing charges:
$ make teardown
Since Terraform tracks the resources it created it is able to tear them all down.
Having used Terraform to deploy an application to Kubernetes Engine, generated logs, and viewed them in Stackdriver, you might consider exploring Stackdriver Monitoring and Stackdriver Tracing. Examples for these topics are available here and build on the work performed with this document.
** The install script fails with a
Permission denied when running Terraform. **
The credentials that Terraform is using do not provide the
necessary permissions to create resources in the selected projects. Ensure
that the account listed in
gcloud config list has necessary permissions to
create resources. If it does, regenerate the application default credentials
gcloud auth application-default login.
** Cloud Storage Bucket not populated ** Once the Terraform configuration is complete the Cloud Storage Bucket will be created but it is not always populated immediately with log data from the Kubernetes Engine cluster. The logs details rarely populate in the bucket immediately. Give the process some time because it can take up to 2 to 3 hours before the first entries start appearing (https://cloud.google.com/logging/docs/export/using_exported_logs).
** No tables created in the BigQuery dataset ** Once the Terraform configuration is complete the BigQuery Dataset will be created but it will not always have tables created in it by the time you go to review the results. The tables are rarely populated immediately. Give the process some time (minimum of 5 minutes) before determining that something is not working properly.
- Kubernetes Engine Logging
- Viewing Logs
- Advanced Logs Filters
- Overview of Logs Exports
- Procesing Logs at Scale Using Cloud Dataflow
- Terraform Google Cloud Provider
This is not an officially supported Google product