Google Cloud Platform Tools BOSH Release
Go Shell Ruby Smarty Makefile HCL
Latest commit 981e0b8 Jan 9, 2017 @johnsonj johnsonj committed on GitHub Merge pull request #69 from cloudfoundry-community/PR-apache2-license
Add Apache 2.0 License

README.md

stackdriver-tools release for BOSH

This release provides Cloud Foundry and BOSH integration with Google Cloud Platform's Stackdriver Logging and Monitoring.

Functionality is provided by 3 jobs in this release:

Project Status

This is currently a beta release. It should be used in production environments with an abundance of caution, and only after being vetted in dev environment.

The project was developed in partnership with Google and Pivotal and is actively maintained by Google.

Getting started

Enable Stackdriver APIs

Ensure the Stackdriver Logging and Stackdriver Monitoring APIs are enabled.

Quotas

Depending on the size of the cloud foundry deployment and which events the nozzle is forwarding, it can be quite easy to reach the default Stackdriver quotas:

Google quotas can be viewed and managed on the API Quotas Page. An operator can increase the default quota up to a limit; exceeding that, use the contact links to request even higher quotas.

Create and configure service accounts

All of the jobs in this release authenticate to Stackdriver Logging and Monitoring via Service Accounts. You must create a service account with the following roles:

  • roles/logging.logWriter to stream logs to Stackdriver Logging
  • roles/logging.configWriter to setup CloudFoundry specific metrics on Stackdriver Monitoring

The BOSH resource pool you deploy the job(s) to must use that service account by specifying it in cloud_properties. The BOSH Google CPI documentation describes how to set the service_account for a resource pool.

You may also read the access control documentation for more general information about how authentication and authorization work for Stackdriver.

General usage

To use any of the jobs in this BOSH release, first upload it to your BOSH director:

bosh upload release https://storage.googleapis.com/bosh-gcp/beta/stackdriver-tools/latest.tgz

The stackdriver-tools.yml sample deployment manifest illustrates how to use all 3 jobs in this release (nozzle, host logging, and host monitoring). You can deploy the sample with:

bosh deployment manifests/stackdriver-tools.yml 
bosh -n deploy

This will create a self-contained deployment that sends Cloud Foundry firehose data, host logs, and host metrics to Stackdriver.

Deploying each job individually is described in detail below.

Deploying the nozzle

Create a new deployment manifest for the nozzle. See the example manifest for a full deployment and the jobs.stackdriver-nozzle section for the nozzle.

To reduce message loss, operators should run a minimum of two instances. With two instances, updating stemcells and other destructive BOSH operations will still leave an instance draining logs.

The loggregator system will round-robin messages across multiple instances. If the nozzle can't handle the load, consider scaling to more than two nozzle instances.

The spec describes all the properties an operator should modify.

Stackdriver Error Reporting

Stackdriver can automatically detect and report errors from stack traces in logs. However, this does not automatically work with Loggregator because it sends each line from app output as a separate log message to the nozzle. To enable this feature of Stackdriver, apps will need to manually encode stacktraces on a single line so that the stackdriver-nozzle can send them as single messages to Stackdriver.

This is accomplished by replacing newlines in stacktraces with a unique character, which is set using the firehose.newline_token template variable in the nozzle so that the nozzle can reconstruct the stacktrace on multiple lines.

For example, if firehose.newline_token is set to , a Go app would need to implement something like the following:

const newlineToken = ""

func main() {
    ...
    defer handlePanic()
    ...
}

func handlePanic() {
        e := recover()
        if e == nil {
            return
        }

        stack := make([]byte, 1<<16)
        stackSize := runtime.Stack(stack, true)
        out := string(stack[:stackSize])

        fmt.Fprintf(os.Stderr, "panic: %v", e)
        fmt.Fprintf(os.Stderr, strings.Replace(out, "\n", newlineToken, -1))
        os.Exit(1)
}

This outputs the stacktrace separately from the panic so that the panic remains in the logs and the stacktrace is logged by itself. This allows Stackdriver to detect the stacktrace as an error.

For an example in Java, see this section of the Loggregator documentation.

Deploying host logging

The google-fluentd template uses Fluentd to send both syslog and template logs (assuming that template jobs are writing logs into /var/vcap/sys/log/*/*.log) to Stackdriver Logging.

To forward host logs from BOSH VMs to Stackdriver, co-locate the google-fluentd template with an existing job whose host logs should be forwarded.

Include the stackdriver-tools release in your existing deployment manifest:

releases:
  ...
  - name: stackdriver-tools
    version: latest
  ...

Add the google-fluentd template to your job:

jobs:
  ...
  - name: nats
    templates:
      - name: nats
        release: cf
      - name: metron_agent
        release: cf
      - name: google-fluentd
        release: stackdriver-tools
  ...

Deploying host monitoring

The stackdriver-agent template uses the Stackdriver Monitoring Agent to collect VM metrics to send to Stackdriver Monitoring.

To forward host metrics forwarding from BOSH VMs to Stackdriver, co-locate the stackdriver-agent template with an existing job whose host metrics should be forwarded.

Include the stackdriver-tools release in your existing deployment manifest:

releases:
  ...
  - name: stackdriver-tools
    version: latest
  ...

Add the stackdriver-agent template to your job:

jobs:
  ...
  - name: nats
    templates:
      - name: nats
        release: cf
      - name: metron_agent
        release: cf
      - name: stackdriver-agent
        release: stackdriver-tools
  ...

Deploying as a BOSH addon

Specify the jobs as addons in your runtime config to deploy Stackdriver Monitoring and Logging agents on all instances in your deployment. Do not specify the jobs as part of your deployment manifest if you are using the runtime config.

# runtime.yml
---
releases:
  - name: stackdriver-tools
    version: latest

addons:
- name: stackdriver-tools
  jobs:
  - name: google-fluentd
    release: stackdriver-tools
  - name: stackdriver-agent
    release: stackdriver-tools

To deploy the runtime config:

bosh update runtime-config runtime.yml
bosh deploy

Development

Updating google-fluentd

google-fluentd is versioned by the Gemfile in src/google-fluentd. To update fluentd:

  1. Update the version specifier in the Gemfile (if necessary)
  2. Update Gemfile.lock: bundle update
  3. Create a vendor cache from the Gemfile.lock: bundle package
  4. Tar and compress the vendor folder: tar zvc vendor > google-fluentd-vendor-VERSION-NUMBER.tgz
  5. Update the vendor version in the google-fluentd package packaging and spec
  6. Add vendored cache to the BOSH blobstore: bosh add blob google-fluentd-vendor-VERSION-NUMBER.tgz google-fluentd-vendor
  7. Create a dev release and deploy it to verify that all of the above worked
  8. Update the BOSH blobstore: bosh upload blobs
  9. Commit your changes

bosh-lite

Both the nozzle and the fluentd jobs can run on bosh-lite. To generate a working manifest, start from the bosh-lite-example-manifest. Note the application_default_credentials property, which should be filled in with the contents of a Google service account key.

Contributing

For detailes on how to contribute to this project - including filing bug reports and contributing code changes - please see CONTRIBUTING.md.

Copyright

Copyright (c) 2016 Ferran Rodenas. See LICENSE for details.