Marathon applications law enforcement
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config
debian
marathon
metrics
mgc
score
web
.gitignore
.travis.yml
Dockerfile
LICENSE
Makefile
README.md
VERSION
glide.lock
glide.yaml
main.go

README.md

AppCop Build Coverage Status

Marathon AppCop - Marathon applications law enforcement.

In large Mesos deployments there could be thousands of applications running and deploying every day. Sometimes they happen to be broken, forgotten and unmaintained which could exert pressure on cluster in numerous ways.

To address that AppCop clears Marathon from broken application deployments.

How it works

AppCop takes information provided by the Marathon event-stream related to applications failures and scales them down.

Scoring Mechanism

Based on Marathon events (TASK_KILL, TASK_FAIL, TASK_FINISHED), AppCop is building score registry for each application event emited. Each score is incremented by each app event, so if events related to failures are comming it is constantly raising. When application passes treshold, then AppCop scales application one instance down forcefully and put appcop label in app definition. After that, score for this application is reset. When there is only one instance, then and score is pass theshold then application is suspended. Scores are periodically reset.

GarbageCollection

AppCop is periodically fetching applications and groups from Marathon. When application is suspended or group is empty for long (configurable) time then it is deleted.

Metrics

AppCop provides set of standard system metrics as well as application based metrics.

Metric Types

System Metrics - AppCop specific telemetry (e.g - queue Size, Event delays etc). Location equals, metrics-prefix append metrics-system-sub-prefix.

Applications Metrics - Applications telemetry calculated based on events provided by marathon (like: task_killed, task_finished counters). Location equals, metrics-prefix (append) metrics-app-sub-prefix.

Please note the existance of appid-prefix config option, if set, removes matching string from application id when it comes to metric publication. For example, assumming

appid-prefix = com.example.
appID = com.example.exampleapp

your applications metric will be placed under:

{prefix}.{metrics-app-sub-prefix}.exampleapp

Installation

Installing from source code

To simply compile and run the source code:

go run main.go [options]

To run the tests:

make test

To build the binary:

make build

To build deb package:

make pack

Check dist/ dir.

Setting up AppCop

AppCcop should be installed on all Marathon masters. The event subscription should be set to localhost to reduce network traffic. Please refer to options section for more.

Marathon Labels

AppCop is using Marathon labels to communicate actions or to tune execution logic.

Used labels:

Name Possible values r/w Description
appcop suspend, scaleDown w Every time AppCop scales or suspend application, put appropriate label in app definition
APP_IMMUNITY false, true r When AppCop encounters this label in app definition, treats it as immune to all penalties (excused from all criminal acts on cluster). Use this feature wisely, because if applied to often it could defeat whole purpose for using AppCop

r - label is taken from app definition, not altered, w - label is manipulated by AppCop.

Options

Argument Default Description
config-file Path to a JSON file to read configuration from. Note: Will override options set earlier on the command line
event-stream-location /v2/events Get events from this stream
my-leader marathon-dev My leader, when Marathon /v2/leader endpoint return the same string as this one, make subscription to event stream and launch jobs.
events-queue-size 1000 Size of events queue
listen :4444 Accept connections at this address
log-file Save logs to file (e.g.: /var/log/appcop.log). If empty logs are published to STDERR
log-format text Log format: JSON, text
log-level info Log level: panic, fatal, error, warn, info or debug
marathon-location example.com:8080 Marathon URL
marathon-password Marathon password for basic auth
marathon-protocol http Marathon protocol (http or https)
marathon-ssl-verify true Verify certificates when connecting via SSL
marathon-timeout 30s Time limit for requests made by the Marathon HTTP client. A timeout of zero means no timeout
appid-prefix Prefix common to all fully qualified application ID's. Remove this preffix from applications id's ([Metric Types](#metric types))
marathon-username Marathon username for basic auth
scale-down-score 30 Score for application to scale it one instance down
scale-limit 2 How many scale down actions to commit in one scaling down iteration
update-interval 2s Interval for updating app scores
reset-interval 1d How often collected scores are reset
evaluate-interval 30s How often collected scores are compared against scale-down-score
metrics-interval 30s Metrics reporting interval
metrics-location Graphite URL (used when metrics-target is set to graphite)
metrics-prefix default Metrics prefix (default is resolved to .<app_name>
metrics-system-sub-prefix appcop-internal System specific metrics. Append to metric-prefix
metrics-app-sub-prefix applications Applications specific metrics. Appended to metric-prefix
metrics-target stdout Metrics destination stdout or graphite (empty string disables metrics)
workers-pool-size 10 Number of concurrent workers processing events
mgc-enabled true Enable garbage collecting of Marathon, old suspended applications will be deleted
mgc-max-suspend-time 7 days How long application should be suspended before deleting it
mgc-interval 8 hours Marathon GC interval
mgc-appcop-only true Delete only applications suspended by AppCop
dry-run false Perform a trial run with no changes made to marathon

Endpoints

Endpoint Description
/health healthcheck - returns OK