Skip to content
DC/OS Distributed Diagnostics Tool & Aggregation Service
Go Other
  1. Go 98.3%
  2. Other 1.7%
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
api
cmd
collector
config
dcos
docs
fetcher
io
mocks
scripts
units
util
vendor
.gitignore
.golangci.toml
.travis.yml
CODEOWNERS
Dockerfile
Jenkinsfile
LICENSE
Makefile
NOTICE
README.md
appveyor.yml
go.mod
go.sum
main.go
mergebot-config.json
owners.json

README.md

dcos-diagnostics License Jenkins Go Report Card

DC/OS Distributed Diagnostics Tool & Aggregation Service

dcos-diagnostics is a monitoring agent which exposes a HTTP API for querying from the /system/health/v1 DC/OS api. dcos-diagnostics puller collects the data from agents and represents individual node health for things like system resources as well as DC/OS-specific services.

API

API documentation could be find in docs directory. It's using OpenApi v3.0 You can see rendered version here.

Health Status

Enum Meaning
0 working
1 error
3 unknown

Build

go get github.com/dcos/dcos-diagnostics
cd $GOPATH/src/github.com/dcos/dcos-diagnostics
make install
./dcos-diagnostics --version

Run

Run dcos-diagnostics once, on a DC/OS host to check systemd units:

dcos-diagnostics --diag

Get verbose log output:

dcos-diagnostics --diag --verbose

Run the dcos-diagnostics aggregation service to query all cluster hosts for health state:

dcos-diagnostics daemon --pull

Start the dcos-diagnostics health API endpoint:

dcos-diagnostics daemon

dcos-diagnostics daemon options

--agent-port int
    Use TCP port to connect to agents. (default 1050)

--ca-cert string
    Use certificate authority.

--command-exec-timeout int
    Set command executing timeout (default 120)

--diag
    Get diagnostics output once on the CLI. Does not expose API.

--diagnostics-bundle-dir string
    Set a path to store diagnostic bundles (default "/var/run/dcos/dcos-diagnostics/diagnostic_bundles")

--diagnostics-job-timeout int
    Set a global diagnostics job timeout (default 720)

--diagnostics-units-since string
    Collect systemd units logs since (default "24 hours ago")

--diagnostics-url-timeout int
    Set a local timeout for every single GET request to a log endpoint (default 2)

--endpoint-config string
    Use endpoints_config.json (default "/opt/mesosphere/endpoints_config.json")

--exhibitor-ip string
    Use Exhibitor IP address to discover master nodes. (default "http://127.0.0.1:8181/exhibitor/v1/cluster/status")

--force-tls
    Use HTTPS to do all requests.

--health-update-interval int
    Set update health interval in seconds. (default 60)

--master-port int
    Use TCP port to connect to masters. (default 1050)

--port int
    Web server TCP port. (default 1050)

--pull
    Try to pull checks from DC/OS hosts.

--pull-interval int
    Set pull interval in seconds. (default 60)

--pull-timeout int
    Set pull timeout. (default 3)

--verbose
    Use verbose debug output.

--version
    Print version.

Test

make test

Or from any submodule:

go test
You can’t perform that action at this time.