Skip to content

Multi Host Orchestration

Nikyle Nguyen edited this page Jan 9, 2017 · 3 revisions

Multi-Host Orchestration with Docker Swarm Mode

Sample screencasts:

Provision Docker hosts in the cloud & create Swarm mode

Note: this is just Docker Swarm mode cluster setup and is independent of MPI cluster deployment. For testing purpose, you can simply $ docker swarm init at your local machine to make it a Docker Swarm manager node and follow the steps in the next screencast.

asciicast

MPI cluster deployment in Docker Swarm mode asciicast

Documentation is not ready yet, but the commands are pretty much equivalent to single-host orchestration but use ./swarm.sh instead of ./cluster.sh

QuickStart

  1. If Docker Swarm Mode is not initialized on your Docker host yet (local machine or remote server -- doesn't matter)

    $ docker swarm init
    

    If you have more than one machine, normally use the output join token to connect other Docker hosts to the Swarm.

  2. Obtain the project source code and go into cluster directory which is a setup for MPI cluster deployment, includes sample MPI program in project directory.

    $ git clone https://github.com/NLKNguyen/alpine-mpich
    $ cd cluster
    
  3. Config the parameters for MPI cluster in the Swarm.

    IMAGE_TAG (*): how you would like to tag the Docker image that will be built and push. If you have multiple Docker hosts, you need to tag the image to some image registry where all hosts in the Swarm can access, such as Docker Hub like the example below which $ docker login is required to authenticate.

    PROJECT_NAME: this will be the prefix for the service names that will be created.

    NETWORK_NAME: unique network name in the Swarm

    NETWORK_SUBNET: self explanatory

    SSH_ADDR (*): should be the output advertised IP address of the Swarm (from step 1). However, if you are testing on your local machine, you probably need to set this to localhost instead.

    SSH_PORT: which port you want to expose to the outside of the Swarm, where you will SSH login to once the MPI cluster is set up.

    Example:

    $ ./swarm.sh config set \
        IMAGE_TAG=nlknguyen/mpi      \
        PROJECT_NAME=my-mpi-project  \
        NETWORK_NAME=mpi-network     \
        NETWORK_SUBNET=10.0.9.0/24   \
        SSH_ADDR=138.197.8.191       \
        SSH_PORT=2222
    

    This will create a file called swarm.conf that stores these parameters and will be used for future, so you don't need to rerun this config command unless you wish to change.

  4. Spin up the MPI cluster

    $ ./swarm.sh up size=5
    

    You can add --watch flag to the command above to see the progress of services spinning up.

  5. Login to the MPI master node

    $ ./swarm.sh login
    
  6. Once inside the SSH session, try running the sample compiled MPI program.

    $ mpirun ./mpi_hello_world
    

    You should see the output from multiple containers in the cluster.

  7. Scale cluster in real-time

    As the cluster running, without having to close the SSH session, open a different terminal and go back to the cluster directory as in step 2-5.

    $ ./swarm.sh scale size=10
    

    In the opened SSH session, rerun the sample program to see output from more containers.

    $ mpirun ./mpi_hello_world
    
  8. Reload the cluster with changes in the program source code

    Since the new Docker image will be deployed, all MPI nodes that are running will be shut down, so the SSH session to the current master node won't be available.

    In the cluster project directory, make some changes in the program in project directory (could be as simple as changing the string to print so that you can see the different), then run the following command:

    $ ./swarm.sh reload
    

    The image will be built again and pushed to the registry, and you will see a much faster progress because only a portion of the image is changed.

    Login and rerun the program

    $ ./swarm.sh login
    

    Inside the MPI master node SSH session:

    $ mpirun ./mpi_hello_world
    

// TODO: further information