Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some docs #29

Merged
merged 2 commits into from Sep 30, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
88 changes: 88 additions & 0 deletions doc/configuration.md
@@ -0,0 +1,88 @@
# Configuration

## What does terraboot do

It generates [terraform]()-readable json. Make sure you have latest terraform installed, they do tend to fix bugs from one version to the other
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wants a link :)


## Modules

terraboot is divided in 4 modules at the moment:

* vpc (generated by the _vpc-vpn-infra_ function): for vpc, subnets, NAT, vpn, ELK, monitoring and alerting boxes (the setup of the boxes is not automated at this point, see install-dns, install-icinga, install-influx, install-logstash)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use backtick to isolate fn names.

* vpc dns (generated by the _vpc-public-dns_ function): for a number of public DNS names on the VPC (requires having a public domain on Route53).
* cluster (geenrated by _cluster-infra_ function): for individual clusters - the idea being that you can have several clusters per vpn.
* cluster dns (generated by _cluster-publlic-dns_ function): for cluster-specific DNS (also requires having a public domain on Route53)

There are two main configuration files: a edn files per module for fixed details, and more variable parameters (like instance types or open ports) in the calls to terraboot.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find this sentence a little confusing.


## edn files

This should contain details which are mostly fixed over the whole infrastructure. One of these file is passed in as an argument to `lein run`, for instance `lein run resources/terraboot-staging.edn`.

{:region "your-aws-region"
:bucket-name "your-s3-bucket"
:aws-profile "your-aws-profile"
:account-number "your-aws-account-number" ;; to generate ARN
:azs [:a :b :c]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add examples for :azs and :target

:target "your-target"}



## Variable configuration

The more variable configuration goes in the clojure code. By more variable I mean IP address ranges, names, open ports, instance types, disk sizes.
As this is likely to be a moving target, it's best to check the signature of the various functions to work out which parameters should be used. The scenario we're going for now is a 'main' function calling the module functions with both a parameter EDN file and parameters

```
(ns my-setup.infra
(:require [terraboot.core :refer :all]
[terraboot.vpc :as vpc]
[terraboot.public-dns :as dns]
[terraboot.cluster :as cluster]
[clojure.edn :as edn]
[clojure.java.io :as io]))

(defn get-config
"Gets info from a config file."
[url]
(edn/read-string (slurp url)))

(defn generate-json [edn-path]
(let [{:keys [account-number
region
azs
bucket-name
aws-profile
target]}) (get-config (edn-path)
mesos-ami "" ;; desired CoreOS AMI
default-ami "" ;; desired Ubuntu AMI
vpc-cidr-block "" ;; VPC address range
dns-zone "mastodonc.net" ;; if public dns is to be used
dns-zone-id ""] ;; AWS zone id
(condp = target
"vpc" (do (to-file (vpc/vpc-vpn-infra {... parameters}) "vpc/vpc.tf")
(to-file (dns/vpc-public-dns {... parameters}) "vpc/vpc-dns.tf"))
"dataplatform" (do (to-file (cluster/cluster-info {... parameters}) "dataplatform/dataplatform.tf")
(to-file (dns/cluster-public-dns {... parameters}) "dataplatform/dataplatform-dns.tf")))))

(defn -main [edn-path]
(generate-json edn-path))
```

The *.tf files referred to in the code are the terraform json file terraform will consume. It's recommended to put them in their own directory, since terraform reads all tf file in a directory.

## Running it all

How to run a component: generate relevant configuration file

```
lein run resources/terraboot-vpc.edn # takes edn path
```
In relevant directory (where tf files live)
```
terraform plan .
```
If planning result looks like what you'd expect (green, number of resources to plan)
```
terraform apply .
```
26 changes: 26 additions & 0 deletions doc/index.md
@@ -0,0 +1,26 @@
# Index

This is a list of the documentation in this repository.

How to use terraboot

* [Configuration](cluster-configuration.md)

For day-to-day use of a terraboot-generated DC/OS cluster

* [Useful links](useful-links.md)
* [Troubleshooting](troubleshooting.md)

Some extra information of things that haven't quite gotten automated yet (so TODO).

* [To install logstash](install-logstash.md)
* [To install influxdb and grafana](install-influx.md)
* [To install the Kibana proxy](install-kibana.md)
* [To install Icinga2 for alerting](install-icinga.md)

## External documentation

* [DC/OS documentation](https://dcos.io/docs/1.8/)
* [Terraform for AWS](https://www.terraform.io/docs/providers/aws/index.html)
* [mesos](https://mesos.apache.org/)
* [CoreOS](https://coreos.com/)
64 changes: 64 additions & 0 deletions doc/troubleshooting.md
@@ -0,0 +1,64 @@
# Troubleshooting

## Starting the cluster

### DC/OS console is not starting

ssh into a master box and run

journalctl

To see if there are any errors in the DC/OS setup


## deployment

### Is the slave disk full (without mesos knowning) because of docker?

Log on into the slave (ssh core@<ip> when on the VPN)
Check for disk space `df -h` - if the main disk is at 100%, this may be your problem.
Solution

docker rmi $(docker images -a -q)
docker rm $(docker ps -a -q)`

If it doesn't release space properly, stop docker, rm -rf /var/lib/docker and restart (but this means pulling all images again).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use backticks around bash command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indented by 4 spaces has same effect for markdown (https://github.com/MastodonC/terraboot/blob/master/doc/troubleshooting.md)



### Is it not even starting to deploy (staying in staging mode, 0 of 1)

There may be a resource contention: whether memory, CPU or ports. Check whether the ports your application requires are free, and whether the resources are present on one of the slaves. This can be done by checking the mesos state-summary.

curl http://staging-masters.sandpit-vpc.kixi/mesos/state-summary

(View the output with formatted json)

### It's deploying and starting but doesn't turn green

Does the health check work? The health check should work from the masters. It can be TCP, UDP or HTTP, documentation here <https://mesosphere.github.io/marathon/docs/health-checks.html>.

ssh into either the slave or the master box, and attempt to check manually.

HTTP: with curl
TCP, UDP:

netstat -a | grep LISTEN

to see all the listening ports.

## Marathon

### After starting a marathon framework and stopping it, it sometimes keeps a new one from starting (C*, kafka)

Sometimes just removing a process from Marathon doesn't completely remove all the traces of a process. Sometimes the framework needs torn down.

curl -d@delete.txt -X POST http://staging-masters.sandpit-vpc.kixi/mesos/master/teardown

with delete.txt containing a string which is frameworkId=xyz

Then all traces must be removed in Zookeeper and similar (described [here](https://docs.mesosphere.com/1.7/usage/managing-services/uninstall/)).
For isntance for cassandra:

```
docker run mesosphere/janitor /janitor.py -r cassandra-role -p cassandra-principal -z dcos-service-cassandra
```
66 changes: 66 additions & 0 deletions doc/useful-links.md
@@ -0,0 +1,66 @@
# Useful links

## prerequisites

Having DNS setup to use the amazon DNS.
The openvpn starting script will do this if you're on linux, on mac you might need a manual intervention to add the second address of the VPC address range.
Say if your VPC has CIDR 172.20.0.0/20
Then your nameserver should be 172.20.0.2

## Getting on the VPN

the VPN on the kixi cluster is

vpn.mastodonc.net

(but the public IP address for the <vpc>-vpn box would also work)

## DC/OS Links

These are all set up by the DC/OS installation process, and live behind an nginx proxy.

DC/OS console

http://<cluster-name>-masters.<vpc>-vpc.kixi

Mesos console

http://<cluster-name>-masters.<vpc>-vpc.kixi/mesos

Marathon console

http://<cluster-name>-masters.<vpc>-vpc.kixi/marathon

Exhibitor

http://<cluster-name>-masters.<vpc>-vpc.kixi

For Cassandra API calls (and other services when set up vi dcos cli possibly)

http://<cluster-name>-masters.<vpc>-vpc.kixi/service/cassandra/


## Monitoring, alerting

Logstash DNS: for posting to logstash, internal DNS

logstash.<vpc>-vpc.kixi

Kibana

ELB link, need to create DNS


Influxdb: for posting to influx, internal DNS


influxdb.<vpc>-vpc.kixi

Icinga2

https://alerts.mastodonc.net/icingaweb2/dashboard


Grafana

https://grafana.mastodonc.net/