Create and execute an installation plan

This stage will probe your servers to discover their drives, and propose an installation plan that you can edit and execute

Preconditions

This assumes you have the correct environment variables and fleet is set up. You can double check this by running

$ bin/qs-fleetctl.sh list-machines

You should see all of your machines listed.

Download the Docker images

Run

$ bin/qs-update-images.sh

This downloads the docker images on the remote machines, and may take a while to complete.

Generating a new plan

To generate a new plan, run

$ bin/qs-generate-config.sh

For our setup, this prints:

$ bin/qs-generate-config.sh
checking server liveness
- node0 is alive
- node1 is alive
- node2 is alive
scanning for disks
sudo: unable to resolve host ip-172-30-2-116
 - found node0::xvda
 - found node0::xvdb
 - found node0::xvdc
 - found node0::xvdd
sudo: unable to resolve host ip-172-30-2-117
 - found node1::xvda
 - found node1::xvdb
 - found node1::xvdc
 - found node1::xvdd
sudo: unable to resolve host ip-172-30-2-118
 - found node2::xvda
 - found node2::xvdb
 - found node2::xvdc
 - found node2::xvdd

You may have seen multiple of the "sudo: unable..." warnings printed during this and previous steps. This is a harmless bug in how Amazon configures EC2 hostnames. It will not cause any problems. Warnings that say "find: ‘/dev/disk/by-id’: No such file or directory" are also harmless.

You should now find a new file in the tools' root directory named "qs-config.sh". This is your installation plan. You should edit this file, taking care to note a few things:

At present, all hard drives are suggested for OSD formatting. Your OS / partition is going to be one of those. You will want to comment out or delete those lines. If you set up the servers using EC2, these drives will be called /dev/xvda.
The drive model numbers, serial numbers and size are noted, if possible. On EC2 there are no model and serial numbers. This can help you distinguish between drives.
If you are installing on custom servers instead of EC2, and you are setting up a production cluster, you may want to replace the /dev/ path with a udev persistent block device name such as /dev/disk/by-path. This will prevent loss of service in the event that your HBAs enumerate nondeterministically or drive failure leads to disk renaming on reboot.
If you do not require parts of the smartgridstore synchrophasor stack, simply comment out the GEN_ lines for each component you do not need.
If you have a nonstandard Ceph configuration (less than three machines), comment out the CREATE_CEPH_POOL and FORMAT_BTRDB lines, as you will have to do that manually later.

Once you have verified that the installation plan is appropriate, remove the indicated line at the top of the file.

Executing the installation plan

Now, you can run

$ bin/qs-execute-config.sh

If you have GEN_SSL_CERT in your installation plan, you will need to accept the SSH key and enter your email address for the generated SSL certificate. After that step, the installation process is automated and you can leave it and go have a cup of coffee. If you have many drives configured as Ceph OSDs, you may want to go for a lunch as formatting the drives can take some time.

Once it is complete, you can verify that everything is working with

$ bin/qs-fleetctl.sh list-units

For our configuration this lists

UNIT				MACHINE				ACTIVE	SUB
btrdb-node0.service		7acf664b.../172.30.2.116	active	running
ceph-mon-node0.service		7acf664b.../172.30.2.116	active	running
ceph-mon-node1.service		6123c2e2.../172.30.2.117	active	running
ceph-mon-node2.service		2cd92697.../172.30.2.118	active	running
ceph-osd-node0-00.service	7acf664b.../172.30.2.116	active	running
ceph-osd-node0-01.service	7acf664b.../172.30.2.116	active	running
ceph-osd-node0-02.service	7acf664b.../172.30.2.116	active	running
ceph-osd-node1-03.service	6123c2e2.../172.30.2.117	active	running
ceph-osd-node1-04.service	6123c2e2.../172.30.2.117	active	running
ceph-osd-node1-05.service	6123c2e2.../172.30.2.117	active	running
ceph-osd-node2-06.service	2cd92697.../172.30.2.118	active	running
ceph-osd-node2-07.service	2cd92697.../172.30.2.118	active	running
ceph-osd-node2-08.service	2cd92697.../172.30.2.118	active	running
mongo-node0.service		7acf664b.../172.30.2.116	active	running
plotter-metadata-node0.service	7acf664b.../172.30.2.116	active	running
plotter-node0.service		7acf664b.../172.30.2.116	active	running
receiver-node0.service		7acf664b.../172.30.2.116	active	running

You can also see that your plotter is up and running, with a valid SSL certificate:

At this time, there are no accounts, and all data is public. This will be fixed in the next step.

What's next

In the next step you will see how to manage this newly setup cluster, adding uPMUs, creating users and verifying cluster health.

Provide feedback

Saved searches