Consul tutorial

##Setting up a local consul cluster for testing.

You can test consul with one node. To test out the consul RAFT algorithm, we will set up four agents. Three will be servers and one will be a client.

A consul agent runs in server mode or client mode. Server agents maintain the state of the cluster. You want to have three to 5 server agents per datacenter. It should be an odd number to facilitate leader selection. Client agents are used on the servers whose services you want to monitor and report back to the consul servers.

The goal is to make sure you have a complete handle on how to recover and the difference between ctrl-c (graceful shutdown) and kill -9 (I have fallen and can't get up), and when you need to bootstrap and when you do not.

To reduce the amount of command line arguments, I will use a config file to start up the servers.

Here is the server1 config.

####Server 1 configuration server1.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul1",
  "log_level": "INFO",
  "node_name": "server1",
  "server": true,
  "bootstrap" : true,
  "ports" : {

    "dns" : -1,
    "http" : 9500,
    "rpc" : 9400,
    "serf_lan" : 9301,
    "serf_wan" : 9302,
    "server" : 9300
  }
}

When you are starting up a new server cluster you typically put one of the servers in bootstrap mode. This tells consul that this server is allowed to elect itself as leader. It is like when you ask Dick Chenney to be on the VP selection commitee and he nominates himself. He was in bootstrap mode.

Consul servers are boring if they have no one to talk to so let's add two more server config files to the mix.

Server 2 config server2.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul2",
  "log_level": "INFO",
  "node_name": "server2",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 10500,
    "rpc" : 10400,
    "serf_lan" : 10301,
    "serf_wan" : 10302,
    "server" : 10300
  },
  "start_join" : ["127.0.0.1:9301", "127.0.0.1:11301"]
}

Notice that server 2 points to server 1 and three in the start_join config option.

Server 3 config server3.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul3",
  "log_level": "INFO",
  "node_name": "server3",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 11500,
    "rpc" : 11400,
    "serf_lan" : 11301,
    "serf_wan" : 11302,
    "server" : 11300
  },
  "start_join" : ["127.0.0.1:9301", "127.0.0.1:10301"]
}

And server 3 points to server 1 and 2. We are assigning different ports so we can run them all on one box. For obvious reasons, you would not need to assign ports if you were not running them all on the same box. You would never run servers on the same box. It would defeat the purpose of the replication for reliability.

To startup the three servers use the following command lines.

Server 1 start script server1.sh

consul agent -config-file=server1.json  -ui-dir=/opt/consul/web

We installed the web consul files for the UI in /opt/consul/web. You can download the UI from Consul UI.

Server 2 start script server2.sh

consul agent -config-file=server2.json

Server 3 start script server3.sh

consul agent -config-file=server3.json

Go ahead and start up the servers.

You should have the following files.

$ tree
.
├── server1.json
├── server1.sh
├── server2.json
├── server2.sh
├── server3.json
└── server3.sh

Run chmod +x on the .sh files so you can run them. Then run them.

The log for server 1 should look like this:

Server 1 startup

$ ./server1.sh 
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'server1'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 127.0.0.1 (HTTP: 9500, HTTPS: -1, DNS: -1, RPC: 9400)
      Cluster Addr: 10.0.0.162 (LAN: 9301, WAN: 9302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/04/14 10:43:04 [INFO] raft: Node at 10.0.0.162:9300 [Follower] entering Follower state
    2015/04/14 10:43:04 [INFO] serf: EventMemberJoin: server1 10.0.0.162
    2015/04/14 10:43:04 [INFO] serf: EventMemberJoin: server1.dc1 10.0.0.162
    2015/04/14 10:43:04 [ERR] agent: failed to sync remote state: No cluster leader
    2015/04/14 10:43:04 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
    2015/04/14 10:43:04 [INFO] consul: adding server server1.dc1 (Addr: 10.0.0.162:9300) (DC: dc1)
    2015/04/14 10:43:05 [WARN] raft: Heartbeat timeout reached, starting election
    2015/04/14 10:43:05 [INFO] raft: Node at 10.0.0.162:9300 [Candidate] entering Candidate state
    2015/04/14 10:43:05 [INFO] raft: Election won. Tally: 1
    2015/04/14 10:43:05 [INFO] raft: Node at 10.0.0.162:9300 [Leader] entering Leader state
    2015/04/14 10:43:05 [INFO] consul: cluster leadership acquired
    2015/04/14 10:43:05 [INFO] consul: New leader elected: server1
    2015/04/14 10:43:05 [INFO] raft: Disabling EnableSingleNode (bootstrap)
    2015/04/14 10:43:05 [INFO] consul: member 'server1' joined, marking health alive
    2015/04/14 10:43:07 [INFO] agent: Synced service 'consul'

It is warning us that we should not start server 1 in bootstrap mode unless we know what we are doing. Since we are just learning consul, let's leave it as such.

Then it does a Dick Chenney, it says it could not find a leader so it makes itself leader.

Now we start up server 2 in another terminal window.

Server 1 terminal output after starting up server 2

 2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
    2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] raft: Added peer 10.0.0.162:10300, starting replication
    2015/04/14 10:45:56 [WARN] raft: AppendEntries to 10.0.0.162:10300 rejected, sending older logs (next: 1)
    2015/04/14 10:45:56 [INFO] raft: pipelining replication to peer 10.0.0.162:10300
    2015/04/14 10:45:56 [INFO] consul: member 'server2' joined, marking health alive

It sees server 2 and marks it as alive. Whoot!

Going back to server 2's output we get

Server 2 output on startup

$ ./server2.sh 
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
    Join completed. Synced with 1 initial agents
==> Consul agent running!
         Node name: 'server2'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 10500, HTTPS: -1, DNS: -1, RPC: 10400)
      Cluster Addr: 10.0.0.162 (LAN: 10301, WAN: 10302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/04/14 10:45:56 [INFO] raft: Node at 10.0.0.162:10300 [Follower] entering Follower state
    2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
    2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2.dc1 10.0.0.162
    2015/04/14 10:45:56 [INFO] agent: (LAN) joining: [127.0.0.1:9301 127.0.0.1:11301]
    2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] consul: adding server server2.dc1 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server1 10.0.0.162
    2015/04/14 10:45:56 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2015/04/14 10:45:56 [ERR] agent: failed to sync remote state: No cluster leader
    2015/04/14 10:45:56 [WARN] raft: Failed to get previous log: 6 log not found (last: 0)
    2015/04/14 10:46:20 [INFO] agent: Synced service 'consul'

It is a bit cryptic. We get some error messages about there being no leader. Then it says it synced. Whoot!

Server 3 statup is a little more smooth because the other two servers are alive!

$ ./server3.sh 
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
    Join completed. Synced with 2 initial agents
==> Consul agent running!
         Node name: 'server3'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 11500, HTTPS: -1, DNS: -1, RPC: 11400)
      Cluster Addr: 10.0.0.162 (LAN: 11301, WAN: 11302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3 10.0.0.162
    2015/04/14 10:48:45 [INFO] raft: Node at 10.0.0.162:11300 [Follower] entering Follower state
    2015/04/14 10:48:45 [INFO] consul: adding server server3 (Addr: 10.0.0.162:11300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3.dc1 10.0.0.162
    2015/04/14 10:48:45 [INFO] agent: (LAN) joining: [127.0.0.1:9301 127.0.0.1:10301]
    2015/04/14 10:48:45 [INFO] consul: adding server server3.dc1 (Addr: 10.0.0.162:11300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server1 10.0.0.162
    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server2 10.0.0.162
    2015/04/14 10:48:45 [INFO] consul: adding server server1 (Addr: 10.0.0.162:9300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] agent: (LAN) joined: 2 Err: <nil>
    2015/04/14 10:48:45 [ERR] agent: failed to sync remote state: No cluster leader
    2015/04/14 10:48:45 [WARN] raft: Failed to get previous log: 12 log not found (last: 0)
     2015/04/14 10:49:07 [INFO] agent: Synced service 'consul'
    ```
    Now are three servers are up and their state is synced.
    Now in the server 1 log we have two messages about being in-sync.
    
    ```bash
     2015/04/14 10:45:56 [INFO] serf: EventMemberJoin: server2 10.0.0.162
    2015/04/14 10:45:56 [INFO] consul: adding server server2 (Addr: 10.0.0.162:10300) (DC: dc1)
    2015/04/14 10:45:56 [INFO] raft: Added peer 10.0.0.162:10300, starting replication
    2015/04/14 10:45:56 [WARN] raft: AppendEntries to 10.0.0.162:10300 rejected, sending older logs (next: 1)
    2015/04/14 10:45:56 [INFO] raft: pipelining replication to peer 10.0.0.162:10300
    2015/04/14 10:45:56 [INFO] consul: member 'server2' joined, marking health alive
    2015/04/14 10:48:45 [INFO] serf: EventMemberJoin: server3 10.0.0.162
    2015/04/14 10:48:45 [INFO] consul: adding server server3 (Addr: 10.0.0.162:11300) (DC: dc1)
    2015/04/14 10:48:45 [INFO] raft: Added peer 10.0.0.162:11300, starting replication
    2015/04/14 10:48:45 [INFO] consul: member 'server3' joined, marking health alive
    2015/04/14 10:48:45 [WARN] raft: AppendEntries to 10.0.0.162:11300 rejected, sending older logs (next: 1)
    2015/04/14 10:48:45 [INFO] raft: pipelining replication to peer 10.0.0.162:11300
    ```
    All is well. We have three healthy nodes.

If we go to server2 terminal and shut it down with control-C, it shuts down gracefully.

Now go to server 1 output and watch. It keeps trying to recconnect to server 2.

Now start up server 2 again. Notice how it reconnects and the look at the logs for server1 and server3.

Now do the same with server3. Shut it down. Watch the logs of server 1 and server 2. 

When you see this log:

#### Server 3 log on reconnect
```bash

2015/04/14 11:01:53 [WARN] raft: Failed to get previous log: 23 log not found (last: 21)

It means it could not find a log so it has to advance and replicate data after index 21. Consul keeps a version number of the data called an index and as servers come online, they look at their index and then ask for stuff that happened after that index so they can sync changes.

    2015/04/14 11:01:53 [INFO] raft: Removed ourself, transitioning to follower
    2015/04/14 11:02:09 [INFO] agent: Synced service 'consul'

You can shutdown any two servers, and then bring them back up. The leadership will change to the server that stayed up. Do not shut down all three. Then it gets a bit trickier to get them bootstrapped again.

Each server maintains its own state. Look at the files in each servers data folder.

$ pwd
/opt/consul1

$ tree
.
├── checkpoint-signature
├── raft
│   ├── mdb
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── peers.json
│   └── snapshots
├── serf
│   ├── local.snapshot
│   └── remote.snapshot
└── tmp
    └── state506459124
        ├── data.mdb
        └── lock.mdb

6 directories, 8 files

Look at the snapshot files and the contents of the json peers.json.

Now shut down all three with ctrl-C. Now start them up again. They will not be able to select a leader because ctrl-C allows them to leave gracefully. Do I understand this? No. But I know the solution. Because we left with ctrl-c it was a graceful leave and now have to delete the peers so it could reconnect which almost makes sense but not quite but I don't care because it works. We just delete the peers.json file.

I went ahead and changed the configuration file for them to be able to be restarted.

server1.json no bootstrap retry sever list

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul1",
  "log_level": "INFO",
  "node_name": "server1",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 9500,
    "rpc" : 9400,
    "serf_lan" : 9301,
    "serf_wan" : 9302,
    "server" : 9300
  },

  "retry_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"]

}

server2.json retry sever list

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul2",
  "log_level": "INFO",
  "node_name": "server2",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 10500,
    "rpc" : 10400,
    "serf_lan" : 10301,
    "serf_wan" : 10302,
    "server" : 10300
  },


  "retry_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"
  ]
}

server3.json retry sever list

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul3",
  "log_level": "INFO",
  "node_name": "server3",
  "server": true,
  "ports" : {

    "dns" : -1,
    "http" : 11500,
    "rpc" : 11400,
    "serf_lan" : 11301,
    "serf_wan" : 11302,
    "server" : 11300
  },
  "retry_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"
  ]

}

In order to startup clean, you have to start one of the servers in bootstrap mode like we had server1.json setup before and then delete all of the peers.json files.

Starting up all three servers after a graceful shutdown by deleting peer files

$ pwd
/opt

$ find . -name "peers.json" 
./consul1/raft/peers.json
./consul2/raft/peers.json
./consul3/raft/peers.json

$ find . -name "peers.json" | xargs rm

Now restart them.

As long as one of the servers does not startup in bootstrap mode, you will get this all day long.

Unable to elect a leader

    2015/04/14 13:02:31 [ERR] agent: failed to sync remote state: rpc error: No cluster leader
    2015/04/14 13:02:32 [ERR] agent: failed to sync remote state: rpc error: No cluster leader

I have another set of bootstrap json files for each server with a startup script. You need to delete the peer file and then start up one of the servers in bootstrap mode using these scripts. See the discussion at issue 526, and then re-read outage recovery.

bootstrap json file server1.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul1",
  "log_level": "INFO",
  "node_name": "server1",
  "server": true,
  "bootstrap": true,
  "ports" : {

    "dns" : -1,
    "http" : 9500,
    "rpc" : 9400,
    "serf_lan" : 9301,
    "serf_wan" : 9302,
    "server" : 9300
  }
}

bootstrap json file server2.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consul2",
  "log_level": "INFO",
  "node_name": "server2",
  "server": true,
  "bootstrap": true,
  "ports" : {

    "dns" : -1,
    "http" : 10500,
    "rpc" : 10400,
    "serf_lan" : 10301,
    "serf_wan" : 10302,
    "server" : 10300
  }
}

bootstrap json file server3.json

{
 
{
  "datacenter": "dc1",
  "data_dir": "/opt/consul3",
  "log_level": "INFO",
  "node_name": "server3",
  "server": true,
  "bootstrap": true,
  "ports" : {

    "dns" : -1,
    "http" : 11500,
    "rpc" : 11400,
    "serf_lan" : 11301,
    "serf_wan" : 11302,
    "server" : 11300
  }
}

Server 1 bootstrap starter

$ cat server1boot.sh
export GOMAXPROCS=10
consul agent -config-file=server1boot.json  \ 
-retry-interval=3s  \
-ui-dir=/opt/consul/web

There is a startup script per server.

Remember to delete the peers.json file.

Deleting peers

$ pwd
/opt

$ find . -name "peers.json" | xargs rm

Now you are set. Pick any server, run it in bootstrap mode.

Running server 3 in bootstrap mode

$ ./server3boot.sh 
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'server3'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 127.0.0.1 (HTTP: 11500, HTTPS: -1, DNS: -1, RPC: 11400)
      Cluster Addr: 10.0.0.162 (LAN: 11301, WAN: 11302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

....
$ ./server2.sh 
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'server2'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 10500, HTTPS: -1, DNS: -1, RPC: 10400)
      Cluster Addr: 10.0.0.162 (LAN: 10301, WAN: 10302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

...
$ ./server1.sh 
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'server1'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 9500, HTTPS: -1, DNS: -1, RPC: 9400)
      Cluster Addr: 10.0.0.162 (LAN: 9301, WAN: 9302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

To force a failure, pick a server process and kill -9 it.

Also if you kill all of the servers at once with kill -9 as in:

Killing all of the servers

$ pkill -9 consul

You do not have to boostrap any of them.

Thus you can run:

Starting up servers that died versus were shut down gracefully

./server1.sh 
./server2.sh 
./server3.sh

##Setting up a client to test our local cluster

The Agent is the center of the Consul world. The agent must run on every node that is part of a Consul cluster. You have two types of agents: clients and servers. The agent servers are the information hub and they store data for the cluster. They also replicate the date to the other server nodes and to the client nodes. The client nodes are what sit on every service box. Agent client nodes are lightweight instances that sit on every server in the cluster and rely on the agent servers for most of their state.

To start up consul in client mode we will use the following config file.

Client consul mode config file client1.json

$ cat client1.json 

{
  "datacenter": "dc1",
  "data_dir": "/opt/consulclient",
  "log_level": "INFO",
  "node_name": "client1",
  "server": false,
  "ports" : {

    "dns" : -1,
    "http" : 8500,
    "rpc" : 8400,
    "serf_lan" : 8301,
    "serf_wan" : 8302,
    "server" : 8300
  },

  "start_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"
    ]
}

You will notice that we are not in server mode which puts us in client mode. The client is addressable on port 8XXX. We specified where to see the servers using the start_join key.

Client consul mode startup script client1.sh

consul agent -config-file=client1.json

Now when we start up consul, we just specify the client1.json file.

Getting information about our node and cluster

Now we can get info about our cluster.

$ consul info

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = 
	revision = 0c7ca91c
	version = 0.5.0
consul:
	known_servers = 3
	server = false
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 34
	max_procs = 1
	os = darwin
	version = go1.4.2
serf_lan:
	encrypted = false
	event_queue = 0
	event_time = 6
	failed = 0
	intent_queue = 0
	left = 0
	member_time = 42
	members = 4
	query_queue = 0
	query_time = 1

We can list the members in this cluster:

Listing the members in our cluster

$ consul members
Node     Address           Status  Type    Build  Protocol
client1  10.0.0.162:8301   alive   client  0.5.0  2
server2  10.0.0.162:10301  alive   server  0.5.0  2
server1  10.0.0.162:9301   alive   server  0.5.0  2
server3  10.0.0.162:11301  alive   server  0.5.0  2

You can also use the HTTP interface to see what members are in the cluster.

Using curl to get list of members

$ curl http://localhost:8500/v1/agent/members

Output using /v1/agent/members http call

[
    {
        "Name": "server2",
        "Addr": "10.0.0.162",
        "Port": 10301,
        "Tags": {
            "build": "0.5.0:0c7ca91c",
            "dc": "dc1",
            "port": "10300",
            "role": "consul",
            "vsn": "2",
            "vsn_max": "2",
            "vsn_min": "1"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 2,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 4,
        "DelegateCur": 4
    },
    {
        "Name": "server1",
        "Addr": "10.0.0.162",
        "Port": 9301,
        "Tags": {
            "build": "0.5.0:0c7ca91c",
            "dc": "dc1",
            "port": "9300",
            "role": "consul",
            "vsn": "2",
            "vsn_max": "2",
            "vsn_min": "1"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 2,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 4,
        "DelegateCur": 4
    },
    {
        "Name": "server3",
        "Addr": "10.0.0.162",
        "Port": 11301,
        "Tags": {
            "build": "0.5.0:0c7ca91c",
            "dc": "dc1",
            "port": "11300",
            "role": "consul",
            "vsn": "2",
            "vsn_max": "2",
            "vsn_min": "1"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 2,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 4,
        "DelegateCur": 4
    },
    {
        "Name": "client1",
        "Addr": "10.0.0.162",
        "Port": 8301,
        "Tags": {
            "build": "0.5.0:0c7ca91c",
            "dc": "dc1",
            "role": "node",
            "vsn": "2",
            "vsn_max": "2",
            "vsn_min": "1"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 2,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 4,
        "DelegateCur": 4
    }
]

You can try out other agent HTTP calls by looking at the Agent HTTP API.

You can use the HTTP API from any client or server:

Using HTTP API

$ curl http://localhost:8500/v1/catalog/datacenters
["dc1"]

$ curl http://localhost:9500/v1/catalog/datacenters
["dc1"]

$ curl http://localhost:10500/v1/catalog/datacenters
["dc1"]

$ curl http://localhost:10500/v1/catalog/nodes
[{"Node":"client1","Address":"10.0.0.162"},
{"Node":"server1","Address":"10.0.0.162"},
{"Node":"server2","Address":"10.0.0.162"},
{"Node":"server3","Address":"10.0.0.162"}]

$ curl http://localhost:8500/v1/catalog/services
{"consul":[]}

Register a new service

Register a new service with bash

$ curl --upload-file register_service.json \ 
http://localhost:8500/v1/agent/service/register

register_service.json

{
  "ID": "myservice1",
  "Name": "myservice",
  "Address": "127.0.0.1",
  "Port": 8080,
  "Check": {
    "Interval": "10s",
    "TTL": "15s"
  }
}

The above registers a new service called myservice. Name is the name of the service while ID is a specific instance of that service. The Check we installed expects the service to check in with the servers every 15s.

Once you register the service, then you can see it from the agent as follows:

Seeing the service we just registered

$ curl http://localhost:8500/v1/agent/services

Seeing the service we just registered

{
    "myservice1": {
        "ID": "myservice1",
        "Service": "myservice",
        "Tags": null,
        "Address": "127.0.0.1",
        "Port": 8080
    }
}

To check this services health, we can use this endpoint.

$ curl http://localhost:8500/v1/health/service/myservice

[
    {
        "Node": {
            "Node": "client1",
            "Address": "10.0.0.162"
        },
        "Service": {
            "ID": "myservice1",
            "Service": "myservice",
            "Tags": null,
            "Address": "127.0.0.1",
            "Port": 8080
        },
        "Checks": [
            {
                "Node": "client1",
                "CheckID": "service:myservice1",
                "Name": "Service 'myservice' check",
                "Status": "critical",
                "Notes": "",
                "Output": "TTL expired",
                "ServiceID": "myservice1",
                "ServiceName": "myservice"
            },
            {
                "Node": "client1",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    }
]

Here we can see that the health status is critical because the TTL failed.

To tell consul that our fictional service is passing, every 15 seconds we need to send it this:

Sending a TTL check

$ curl http://localhost:8500/v1/agent/check/pass/service:myservice1

Checking to see if our service is healthy

$ curl http://localhost:8500/v1/health/service/myservice

Checking to see if our service is healthy output

[
    {
        "Node": {
            "Node": "client1",
            "Address": "10.0.0.162"
        },
        "Service": {
            "ID": "myservice1",
            "Service": "myservice",
            "Tags": null,
            "Address": "127.0.0.1",
            "Port": 8080
        },
        "Checks": [
            {
                "Node": "client1",
                "CheckID": "service:myservice1",
                "Name": "Service 'myservice' check",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "myservice1",
                "ServiceName": "myservice"
            },
            {
                "Node": "client1",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    }
]

Notice that our status went form "Status": "critical" to "Status": "passing".

You can also mark a service as warn or critical using an HTTP call.

Marking status as warn or fail (critical)

$ curl http://localhost:8500/v1/agent/check/warn/service:myservice1

$ curl http://localhost:8500/v1/agent/check/fail/service:myservice1

Note that you can also query the health status from any node. All nodes in the cluster know where myservice services are running. All server nodes and all client nodes.

Query for health works from any agent node

$ curl http://localhost:9500/v1/health/service/myservice

Let's start up another client and install a service in it. We created another client startup script.

client2 startup

$ cat client2.json

{
  "datacenter": "dc1",
  "data_dir": "/opt/consulclient2",
  "log_level": "INFO",
  "node_name": "client2",
  "server": false,
  "ports" : {

    "dns" : -1,
    "http" : 7500,
    "rpc" : 7400,
    "serf_lan" : 7301,
    "serf_wan" : 7302,
    "server" : 7300
  },

  "start_join" : [
    "127.0.0.1:9301",
    "127.0.0.1:10301",
    "127.0.0.1:11301"
    ]
}

$ cat client2.sh
consul agent -config-file=client2.json

Then we will create another register service json file.

register json file for myservice2

$ cat register_service2.json 
{
  "ID": "myservice2",
  "Name": "myservice",
  "Address": "127.0.0.1",
  "Port": 9090,
  "Check": {
    "Interval": "10s",
    "TTL": "15s"
  }
}

#### Running service registry against client2 agent
```bash
curl --upload-file register_service2.json \ 
http://localhost:7500/v1/agent/service/register

####Make both services healthy

curl http://localhost:8500/v1/agent/check/pass/service:myservice1
curl http://localhost:7500/v1/agent/check/pass/service:myservice2

####Query them

curl http://localhost:9500/v1/health/service/myservice

Output of querying both services

[
    {
        "Node": {
            "Node": "client2",
            "Address": "10.0.0.162"
        },
        "Service": {
            "ID": "myservice2",
            "Service": "myservice",
            "Tags": null,
            "Address": "127.0.0.1",
            "Port": 9090
        },
        "Checks": [
            {
                "Node": "client2",
                "CheckID": "service:myservice2",
                "Name": "Service 'myservice' check",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "myservice2",
                "ServiceName": "myservice"
            },
            {
                "Node": "client2",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    },
    {
        "Node": {
            "Node": "client1",
            "Address": "10.0.0.162"
        },
        "Service": {
            "ID": "myservice1",
            "Service": "myservice",
            "Tags": null,
            "Address": "127.0.0.1",
            "Port": 8080
        },
        "Checks": [
            {
                "Node": "client1",
                "CheckID": "service:myservice1",
                "Name": "Service 'myservice' check",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "myservice1",
                "ServiceName": "myservice"
            },
            {
                "Node": "client1",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    }
]

Consul tutorial

Server 2 config server2.json

Server 3 config server3.json

Server 1 start script server1.sh

Server 2 start script server2.sh

Server 3 start script server3.sh

Server 1 startup

Server 1 terminal output after starting up server 2

Server 2 output on startup

server1.json no bootstrap retry sever list

server2.json retry sever list

server3.json retry sever list

Starting up all three servers after a graceful shutdown by deleting peer files

Unable to elect a leader

bootstrap json file server1.json

bootstrap json file server2.json

bootstrap json file server3.json

Server 1 bootstrap starter

Deleting peers

Running server 3 in bootstrap mode

Killing all of the servers

Starting up servers that died versus were shut down gracefully

Client consul mode config file client1.json

Client consul mode startup script client1.sh

Getting information about our node and cluster

Listing the members in our cluster

Using curl to get list of members

Output using /v1/agent/members http call

Using HTTP API

Register a new service

Register a new service with bash

register_service.json

Seeing the service we just registered

Seeing the service we just registered

Sending a TTL check

Checking to see if our service is healthy

Checking to see if our service is healthy output

Marking status as warn or fail (critical)

Query for health works from any agent node

client2 startup

register json file for myservice2

Output of querying both services

Clone this wiki locally