-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This should work if/when the base or underlying terraform modules are updated! Signed-off-by: vsoch <vsoch@users.noreply.github.com>
- Loading branch information
Showing
10 changed files
with
720 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 6 additions & 0 deletions
6
google/bare-metal-comparison/compute-engine/experiments/lammps/.gitignore
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
terraform.tfstate | ||
terraform.tfstate.backup | ||
fuse-mounts.sh | ||
basic.tfvars | ||
.terraform | ||
.terraform.lock.hcl |
153 changes: 153 additions & 0 deletions
153
google/bare-metal-comparison/compute-engine/experiments/lammps/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# Flux Framework LAMMPS Cluster Deployment | ||
|
||
This deployment illustrates deploying a flux-framework cluster on Google Cloud | ||
to run LAMMPS. All components are included here. | ||
|
||
# Usage | ||
|
||
Copy the variables to make your own variant: | ||
|
||
```bash | ||
$ cp lammps.tfvars.example lammps.tfvars | ||
``` | ||
|
||
Make note that the machine types should match those you prepared in [build-images](../../build-images) | ||
Initialize the deployment with the command: | ||
|
||
```bash | ||
$ terraform init | ||
``` | ||
|
||
## Deploy | ||
|
||
Then, deploy the cluster with the command: | ||
|
||
```bash | ||
terraform apply -var-file lammps.tfvars \ | ||
-var region=us-central1 \ | ||
-var project_id=$(gcloud config get-value core/project) \ | ||
-var network_name=foundation-net \ | ||
-var zone=us-central1-a | ||
``` | ||
|
||
This will setup networking and all the instances! Note that | ||
you can change any of the `-var` values to be appropriate for your environment. | ||
Verify that the cluster is up: | ||
|
||
```bash | ||
gcloud compute ssh gffw-login-001 --zone us-central1-a | ||
``` | ||
|
||
## Run Experiments | ||
|
||
The easiest thing to do is to copy the file to run experiments to your home directory! | ||
|
||
```bash | ||
$ gcloud compute scp --zone us-central1-a ./run-experiments.py gffw-login-001:/home/sochat1_llnl_gov/run-experiments.py | ||
``` | ||
|
||
And then shell in (as we did above) | ||
|
||
|
||
```bash | ||
gcloud compute ssh gffw-login-001 --zone us-central1-a | ||
``` | ||
|
||
Go to the experiment directory with our files of interest | ||
|
||
```bash | ||
cd /opt/lammps/examples/reaxff/HNS | ||
``` | ||
|
||
Try running the lammps experiment, given that lammps is installed on the nodes, and (for this example) we have two nodes only. | ||
Note that by default, output data will be written to the present working directory in a "data" subfolder. Since | ||
we don't have write in the experiment files folder, we direct to our home directory (it will be created): | ||
|
||
```bash | ||
$ python3 $HOME/run-experiments.py --outdir /home/sochat1_llnl_gov/data \ | ||
--workdir /opt/lammps/examples/reaxff/HNS \ | ||
--times 10 -N 2 --tasks 2 lmp -v x 1 -v y 1 -v z 1 -in in.reaxc.hns -nocite | ||
``` | ||
|
||
<details> | ||
|
||
<summary>Example Output</summary> | ||
|
||
```console | ||
N: 2 | ||
times: 10 | ||
sleep: 10 | ||
outdir: /home/sochat1_llnl_gov/data | ||
tasks: 2 | ||
command: lmp -v x 1 -v y 1 -v z 1 -in in.reaxc.hns -nocite | ||
workdir: /opt/lammps/examples/reaxff/HNS | ||
dry-run: False | ||
identifier: lammps | ||
Submit ƒ31XLJ9fgb: 1 of 10 | ||
Submit ƒ31XQvVRh1: 2 of 10 | ||
Submit ƒ31XVVsD8j: 3 of 10 | ||
Submit ƒ31Xa6iyro: 4 of 10 | ||
Submit ƒ31Xehakas: 5 of 10 | ||
Submit ƒ31XjKvWbH: 6 of 10 | ||
Submit ƒ31XovnHKM: 7 of 10 | ||
Submit ƒ31XtXe43R: 8 of 10 | ||
Submit ƒ31XyCwncX: 9 of 10 | ||
Submit ƒ31Y439Ssh: 10 of 10 | ||
|
||
⭐️ Waiting for jobs to finish... | ||
Still waiting on job ƒ31XLJ9fgb, has state RUN | ||
No longer waiting on job ƒ31XLJ9fgb, FINISHED 0! | ||
Still waiting on job ƒ31XQvVRh1, has state RUN | ||
No longer waiting on job ƒ31XQvVRh1, FINISHED 0! | ||
Still waiting on job ƒ31XVVsD8j, has state RUN | ||
No longer waiting on job ƒ31XVVsD8j, FINISHED 0! | ||
Still waiting on job ƒ31Xa6iyro, has state RUN | ||
No longer waiting on job ƒ31Xa6iyro, FINISHED 0! | ||
Still waiting on job ƒ31Xehakas, has state RUN | ||
No longer waiting on job ƒ31Xehakas, FINISHED 0! | ||
Still waiting on job ƒ31XjKvWbH, has state RUN | ||
No longer waiting on job ƒ31XjKvWbH, FINISHED 0! | ||
Still waiting on job ƒ31XovnHKM, has state RUN | ||
No longer waiting on job ƒ31XovnHKM, FINISHED 0! | ||
Still waiting on job ƒ31XtXe43R, has state RUN | ||
No longer waiting on job ƒ31XtXe43R, FINISHED 0! | ||
Still waiting on job ƒ31XyCwncX, has state RUN | ||
No longer waiting on job ƒ31XyCwncX, FINISHED 0! | ||
Still waiting on job ƒ31Y439Ssh, has state RUN | ||
No longer waiting on job ƒ31Y439Ssh, FINISHED 0! | ||
Jobs are complete, goodbye! 👋️ | ||
``` | ||
|
||
</details> | ||
|
||
The script will hang after the last run waiting for the jobs to finish. | ||
And that's it! The output directory in your home will have both log files (from the job output and error) | ||
and the job info (json) from Flux: | ||
|
||
```bash | ||
$ ls /home/sochat1_llnl_gov/data/ | ||
``` | ||
```console | ||
lammps-0-info.json lammps-2-info.json lammps-4-info.json lammps-6-info.json lammps-8-info.json | ||
lammps-0.log lammps-2.log lammps-4.log lammps-6.log lammps-8.log | ||
lammps-1-info.json lammps-3-info.json lammps-5-info.json lammps-7-info.json lammps-9-info.json | ||
lammps-1.log lammps-3.log lammps-5.log lammps-7.log lammps-9.log | ||
``` | ||
|
||
When you exit from the node, you can copy this to your computer to save. | ||
|
||
```bash | ||
$ mkdir -p ./data | ||
$ gcloud compute scp --zone us-central1-a gffw-login-001:/home/sochat1_llnl_gov/data/* ./data | ||
``` | ||
|
||
And that's really it :) When you are finished destroy the cluster: | ||
|
||
|
||
```bash | ||
terraform destroy -var-file lammps.tfvars \ | ||
-var region=us-central1 \ | ||
-var project_id=$(gcloud config get-value core/project) \ | ||
-var network_name=foundation-net \ | ||
-var zone=us-central1-a | ||
``` |
135 changes: 135 additions & 0 deletions
135
google/bare-metal-comparison/compute-engine/experiments/lammps/foundation.tf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
# Copyright 2023 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
locals { | ||
subnet = "${var.region}/${var.network_name}-subnet-01" | ||
} | ||
|
||
data "google_compute_default_service_account" "default" { | ||
project = var.project_id | ||
} | ||
|
||
data "google_compute_image" "rocky8" { | ||
project = "rocky-linux-cloud" | ||
family = "rocky-linux-8-optimized-gcp" | ||
} | ||
|
||
module "network" { | ||
source = "github.com/terraform-google-modules/terraform-google-network" | ||
project_id = var.project_id | ||
network_name = var.network_name | ||
subnets = [ | ||
{ | ||
subnet_name = "${var.network_name}-subnet-01" | ||
subnet_ip = var.subnet_ip | ||
subnet_region = var.region | ||
} | ||
] | ||
} | ||
|
||
module "nat" { | ||
source = "github.com/terraform-google-modules/terraform-google-cloud-nat" | ||
project_id = var.project_id | ||
region = var.region | ||
network = module.network.network_name | ||
create_router = true | ||
router = "${module.network.network_name}-router" | ||
} | ||
|
||
module "firewall" { | ||
source = "github.com/terraform-google-modules/terraform-google-network/modules/firewall-rules" | ||
project_id = var.project_id | ||
network_name = module.network.network_name | ||
rules = [ | ||
{ | ||
name = "${var.network_name}-allow-ssh" | ||
direction = "INGRESS" | ||
priority = null | ||
description = null | ||
ranges = ["0.0.0.0/0"] | ||
source_tags = null | ||
source_service_accounts = null | ||
target_tags = ["flux"] | ||
target_service_accounts = null | ||
allow = [ | ||
{ | ||
protocol = "tcp" | ||
ports = ["22"] | ||
} | ||
], | ||
deny = [] | ||
log_config = { | ||
metadata = "INCLUDE_ALL_METADATA" | ||
} | ||
}, | ||
{ | ||
name = "${var.network_name}-allow-interal-traffic" | ||
direction = "INGRESS" | ||
priority = null | ||
description = null | ||
ranges = ["0.0.0.0/0"] | ||
source_tags = null | ||
source_service_accounts = null | ||
target_tags = ["ssh", "flux"] | ||
target_service_accounts = null | ||
allow = [ | ||
{ | ||
protocol = "icmp" | ||
ports = [] | ||
}, | ||
{ | ||
protocol = "udp" | ||
ports = ["0-65535"] | ||
}, | ||
{ | ||
protocol = "tcp" | ||
ports = ["0-65535"] | ||
} | ||
] | ||
deny = [] | ||
log_config = { | ||
metadata = "INCLUDE_ALL_METADATA" | ||
} | ||
} | ||
] | ||
} | ||
|
||
module "nfs_server_instance_template" { | ||
source = "github.com/terraform-google-modules/terraform-google-vm/modules/instance_template" | ||
region = var.region | ||
project_id = var.project_id | ||
name_prefix = var.nfs_prefix | ||
subnetwork = module.network.subnets["${var.region}/${var.network_name}-subnet-01"].self_link | ||
tags = ["ssh", "flux", "nfs"] | ||
machine_type = "e2-standard-4" | ||
disk_size_gb = var.nfs_size | ||
source_image = data.google_compute_image.rocky8.self_link | ||
source_image_project = data.google_compute_image.rocky8.project | ||
service_account = { | ||
email = data.google_compute_default_service_account.default.email | ||
scopes = ["cloud-platform"] | ||
} | ||
startup_script = file("${path.module}/install_nfs.sh") | ||
} | ||
|
||
module "nfs_server_instance" { | ||
source = "github.com/terraform-google-modules/terraform-google-vm/modules/compute_instance" | ||
region = var.region | ||
zone = var.zone | ||
hostname = var.nfs_prefix | ||
add_hostname_suffix = true | ||
num_instances = 1 | ||
instance_template = module.nfs_server_instance_template.self_link | ||
subnetwork = module.network.subnets["${var.region}/${var.network_name}-subnet-01"].self_link | ||
} |
27 changes: 27 additions & 0 deletions
27
google/bare-metal-comparison/compute-engine/experiments/lammps/install_lammps.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
#!/bin/bash | ||
|
||
# This boot script will install lammps on all nodes | ||
|
||
# Install time for timed commands | ||
sudo dnf update -y && sudo dnf install -y time cmake openmpi clang git-clang-format | ||
sudo ldconfig | ||
|
||
# Needed for ffmpeg | ||
sudo dnf install -y https://download1.rpmfusion.org/free/el/rpmfusion-free-release-8.noarch.rpm | ||
sudo dnf install -y https://download1.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-8.noarch.rpm | ||
sudo dnf install -y ffmpeg | ||
|
||
# install laamps | ||
sudo git clone --depth 1 --branch stable_29Sep2021_update2 https://github.com/lammps/lammps.git /opt/lammps | ||
cd /opt/lammps | ||
sudo mkdir build | ||
cd build | ||
|
||
# The cmake prefix path is needed otherwise openmpi is not found | ||
sudo cmake ../cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr -D PKG_REAXFF=yes -D BUILD_MPI=yes -D PKG_OPT=yes -D FFT=FFTW3 -DCMAKE_PREFIX_PATH=/usr/lib64/openmpi | ||
sudo make | ||
sudo make install | ||
|
||
# Run from a node: | ||
# cd /opt/lammps/examples/reaxff/HNS | ||
# flux run -n 1 lmp -v x 1 -v y 1 -v z 1 -in in.reaxc.hns -nocite |
15 changes: 15 additions & 0 deletions
15
google/bare-metal-comparison/compute-engine/experiments/lammps/install_nfs.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#!/bin/bash | ||
|
||
dnf install nfs-utils -y | ||
|
||
mkdir -p /var/nfs/home | ||
chown nobody:nobody /var/nfs/home | ||
|
||
ip_addr=$(hostname -I) | ||
|
||
echo "/var/nfs/home *(rw,no_subtree_check,no_root_squash)" >> /etc/exports | ||
|
||
firewall-cmd --add-service={nfs,nfs3,mountd,rpc-bind} --permanent | ||
firewall-cmd --reload | ||
|
||
systemctl enable --now nfs-server rpcbind |
44 changes: 44 additions & 0 deletions
44
google/bare-metal-comparison/compute-engine/experiments/lammps/lammps.tfvars.example
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Copyright 2022 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
manager_machine_type = "e2-standard-2" | ||
manager_name_prefix = "gffw" | ||
manager_scopes = [ "cloud-platform" ] | ||
|
||
login_node_specs = [ | ||
{ | ||
name_prefix = "gffw-login" | ||
machine_arch = "x86-64" | ||
machine_type = "n2-standard-2" | ||
instances = 1 | ||
properties = [] | ||
boot_script = "install_lammps.sh" | ||
}, | ||
] | ||
login_scopes = [ "cloud-platform" ] | ||
|
||
compute_node_specs = [ | ||
{ | ||
name_prefix = "gffw-compute-a" | ||
machine_arch = "x86-64" | ||
machine_type = "n2-standard-2" | ||
gpu_type = null | ||
gpu_count = 0 | ||
compact = false | ||
instances = 2 | ||
properties = [] | ||
boot_script = "install_lammps.sh" | ||
}, | ||
] | ||
compute_scopes = [ "cloud-platform" ] |
Oops, something went wrong.