Skip to content

catid/ansible

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

catid's ansible scripts

These are the scripts that I use to manage the worker nodes on my GPU clsuter.

Setup

  1. Install Ansible:
sudo apt update
sudo apt install -y ansible
  1. Clone the repo:
git clone git@github.com:catid/ansible
cd ansible
  1. Create key files:

Store your servers' root password here:

echo "ansible_become_password: myrootpassword" > playbooks/sudo.yml

Store your HuggingFace auth token here:

echo "hftoken: hf_blah" > playbooks/hftoken.yml
  1. Choose where dataset is stored

Edit the update_dataset.sh file to choose where the dataset lives. By default it is under ~/dataset/ and lives on the gpu4.lan host.

The computer that stores the master copy of the dataset should clone this repo and run:

./install_ssh_keys.sh

This will install its SSH key on all the other machines so that it can copy files to them.

  1. Automatically set up all servers

Before running these scripts make sure that the firewall has a reserved IP address for the server, and that the NAS has provided permission for the new server to connect.

Create SSH keys:

./install_ssh_keys.sh
./create_ssh_key_pair.sh

This will request the server login password at the start.

Watch the logs for the server's SSH public key and allow it in Github.

./full_setup.sh

At a certain point the computers will reboot and prompt for enrolling a MOK key for the Nvidia drivers if they are not set up yet. After that point it should run unattended.

Regular maintenance

./update_apt.sh
./update_conda.sh

./check_nvidia_driver.sh

# Optionally
./reboot.sh

About

Ansible scripts to set up my GPU cluster at home

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages