$ sudo apt-get update
$ apt-get upgrade -y
Setting up the ClusterMaster:
$ hostnamectl set-hostname new-hostname
or
$ sudo nano /etc/hostname and reboot
Additionally, I had to change the file /etc/cloud/cloud.cfg in the Ubuntu OS. There is a setting preserve_hostname: false and I had to change that from false to true. Next step is to add a new user -in this example mpiuserer - we want to use to run the Stockfish Cluster:
$ adduser mpiuser
and add to sudoer group
$ usermod -aG sudo mpiuser
It's easier to run sudo command without password. Therfore, login as newuser and get rid of password when entering sudo command with
$ sudo bash -c 'echo "$(logname) ALL=(ALL:ALL) NOPASSWD: ALL" | (EDITOR="tee -a" visudo)'
Repeat this for all your nodes, name them such as CluserNode1-3, having this picture in mind.
a) The ClusterMaster has now the basic setup. Repeat for CluserNode1-3. Now all nodes are ready and we need to ensure ClusterMaster can talk to each node.
- Create ssh key on ClusterMaster:
$ ssh-keygen -t rsa
- copy the key to each node From ClusterMaster to ClusterNode1-3
$ ssh-copy-id remote-user@server-ip
b) ClusterNode1-3 need to talk to ClusterMaster
- Create ssh key on ClusterNode1-3
$ ssh-keygen -t rsa
- copy the key to each node From ClusterNode1-3 to ClusterMaster
$ ssh-copy-id remote-user@server-ip
c) In the third step you want to enable the login from your windows machine via SSH and without password too. From Windows Powershell to Linux Machine
- create ssh key
$ cat ~/.ssh/id_rsa.pub
- to copy the key, login to ClusterMaster, ClusterNode1-3
$ cd .ssh
$ nano authorized_keys
and paste the key from Windows in a new line Done, the basic cluster is setup, the master can talk to the worker nodes and vice versa. to exececute tasks
There are multiple options available for managing a compute cluster, such as Ansible or Fabric. However, a quick and easy-to-use option is parallel-ssh, an asynchronous parallel SSH library designed for large-scale automation. It differs from alternatives and higher-level frameworks such as Ansible in the following ways:- Scalability: It can scale to hundreds, thousands, tens of thousands of hosts or more.
- Ease of use: Running commands over any number of hosts can be achieved with just a few lines of code.
To install parallel-ssh, run the following command in your terminal:
$ sudo apt install parallel-ssh
Next, create a file named .pssh_hosts using the nano text editor:
$ nano .pssh_hosts
In the .pssh_hosts file, add the host names for your cluster nodes, such as:
ClusterNode1
ClusterNode2
ClusterNode3
With the .pssh_hosts file in place, you can now execute commands on multiple hosts at once using parallel-ssh. For example:
$ parallel-ssh -i -h .pssh_hosts sudo apt-get update
$ parallel-ssh -i -h .pssh_hosts sudo apt-get upgrade -y
You can also run scripts using parallel-ssh.
Another option which you can also use to deploy on Azure compute cluster is clush
, a very easy to handle managing tool.
$ sudo apt install clustershell
Example usage:
$ clush -w node[1-3] -b
Enter 'quit' to leave this interactive mode
Working with nodes: node[1-3]
clush> uname
clusternode1: Linux
clusternode3: Linux
clusternode2: Linux
$ clush -v -w clusternode[1-3] --copy /home/mpi/helloworld.py
`/home/mpi/helloworld.py' -> clusternode[1-3]:`/home/mpi/'
clush> quit
$ sudo apt install python3-pip
And then using pip to install mpi4py
$ sudo apt install python3-mpi4py
We might want to use some python scripts to test things out such as MPI configuration.