Skip to content

Commit

Permalink
holy hell it works
Browse files Browse the repository at this point in the history
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch committed Feb 8, 2023
1 parent 1acff65 commit dbb4337
Show file tree
Hide file tree
Showing 8 changed files with 235 additions and 532 deletions.
29 changes: 25 additions & 4 deletions nsdf-materialscience/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,18 +66,35 @@ $ docker tag test ghcr.io/converged-computing/nsdf-materialscience:ubuntu-20.04
$ minikube image load ghcr.io/converged-computing/nsdf-materialscience:ubuntu-20.04
```

I found this really useful for development.

#### Mount Data

This isn't handled by flux-cloud yet, so for now we will do:
The Flux Operator is going to expect to find volumes on the host of a particular storage type.
Since we are early in development, we currently (as the default) define a "hostpath" storage type,
meaning the operator will expect the path to be present on the node where you are running the job.
This means that we need to mount the data on our host into MiniKube (where the cluster is running)
with `minikube mount`.

Note that in our [minicluster-template.yaml](minicluster-template.yaml) we are defining the volume on the host to
be at `/tmp/data` so let's tell MiniKube to mount our local path there:

```
echo "Copying local volume to /tmp/data-volumes in minikube"
echo "Copying local volume to /tmp/data in minikube"
# We don't care if this works or not - mkdir -p seems to bork
minikube ssh -- mkdir -p /tmp/data-volumes
minikube ssh -- mkdir -p /tmp/data
minikube mount /tmp/data-volumes:/tmp/data-volumes
minikube mount /tmp/data-volumes:/tmp/data
```
Leave that process running in a window and then open another terminal to interact with the cluster.
If you want to double check the data is in the MiniKube vm:

```bash
$ minikube ssh -- ls /tmp/data
```
```console
averaged original preprocessed
```

#### Jobs

Expand Down Expand Up @@ -109,6 +126,10 @@ press control+C to end it, and then see your results remain:
```bash
$ tree /tmp/data-volumes
```
```bash
$ tree /tmp/data-volumes/averaged/ | wc -l
64
```

## Development Notes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,26 @@ metadata:
name: materials-science
namespace: flux-operator
spec:
# localDeploy needs to be true for volumes on the host
localDeploy: true

# Number of pods to create for MiniCluster
size: 4

# Disable verbose output and run timing
logging:
quiet: false

# Named volumes bound to containers, we assume they are all host volumes
# This is where I downloaded and extracted my sample datasets
# Named volumes expected to be in MiniKube VM, OR on kubernetes node
# E.g., for a minikube bind you might do:
# minikube bind <host>:<minikube-vm>.
# minikube bind /tmp/data-volumes:/tmp/data.
volumes:
data:
path: /tmp/data-volumes

data:
path: /tmp/data
containers:
- image: ghcr.io/converged-computing/nsdf-materialscience:ubuntu-20.04
workingDir: /data
command: python3 /code/preprocess_radiographs.py preprocess /data radiographic_scan_id_112536

# This says to mount the volume called "data" to "/data" in the container
# This says to mount the volume called "data" to "/data" in the container
volumes:
data:
path: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ As Flux prefix for flux commands: sudo -u flux -E PYTHONPATH= -E PATH=/usr/local

👋 Hello, I'm materials-science-0
The main host is materials-science-0
The working directory is /data
The working directory is /code
preprocess_radiographs.py
flux R encode --hosts=materials-science-[0-3]

📦 Resources
Expand Down Expand Up @@ -37,70 +38,73 @@ default_connect = "tcp://%h.flux-service.flux-operator.svc.cluster.local:%p"
hosts = [
{ host="materials-science-[0-3]"},
]
# **** Generated on 2023-02-07 06:07:59 by CZMQ ****
# **** Generated on 2023-02-08 00:09:15 by CZMQ ****
# ZeroMQ CURVE **Secret** Certificate
# DO NOT PROVIDE THIS FILE TO OTHER USERS nor change its permissions.

metadata
name = "materials-science-cert-generator"
keygen.czmq-version = "4.2.0"
keygen.sodium-version = "1.0.18"
keygen.flux-core-version = "0.46.1-215-gdd6d77b04"
keygen.flux-core-version = "0.46.1-227-gc41b5569e"
keygen.hostname = "materials-science-cert-generator"
keygen.time = "2023-02-07T06:07:59"
keygen.time = "2023-02-08T00:09:15"
keygen.userid = "0"
keygen.zmq-version = "4.3.2"
curve
public-key = "c2IC85P%)zRQ[yLC2Jx9Rtw}MI[LTG.qMvz@ihU+"
secret-key = "+ghyS>(aTZd&5n.dQ@&A^?WAE-ur+cn83-n^m]n?"
chown: changing ownership of '/data': Read-only file system
public-key = "^0)cJk(rUOr$gGJ5h0vE2**6$?8Ylap8CBMqRg:["
secret-key = "MDfCEWJwz1=itJGCdDBkP{X1xHAzms6@)b0{%{%V"

✨ Curve certificate generated by helper pod
# **** Generated on 2023-02-07 06:07:59 by CZMQ ****
# **** Generated on 2023-02-08 00:09:15 by CZMQ ****
# ZeroMQ CURVE **Secret** Certificate
# DO NOT PROVIDE THIS FILE TO OTHER USERS nor change its permissions.

metadata
name = "materials-science-cert-generator"
keygen.czmq-version = "4.2.0"
keygen.sodium-version = "1.0.18"
keygen.flux-core-version = "0.46.1-215-gdd6d77b04"
keygen.flux-core-version = "0.46.1-227-gc41b5569e"
keygen.hostname = "materials-science-cert-generator"
keygen.time = "2023-02-07T06:07:59"
keygen.time = "2023-02-08T00:09:15"
keygen.userid = "0"
keygen.zmq-version = "4.3.2"
curve
public-key = "c2IC85P%)zRQ[yLC2Jx9Rtw}MI[LTG.qMvz@ihU+"
secret-key = "+ghyS>(aTZd&5n.dQ@&A^?WAE-ur+cn83-n^m]n?"
Extra arguments are: ls /data
public-key = "^0)cJk(rUOr$gGJ5h0vE2**6$?8Ylap8CBMqRg:["
secret-key = "MDfCEWJwz1=itJGCdDBkP{X1xHAzms6@)b0{%{%V"
Extra arguments are: ls -l /data

🌀 flux start -o --config /etc/flux/config -Scron.directory=/etc/flux/system/cron.d -Stbon.fanout=256 -Srundir=/run/flux -Sstatedir=/var/lib/flux -Slocal-uri=local:///run/flux/local -Slog-stderr-level=6 -Slog-stderr-mode=local flux mini submit -n 1 --quiet --watch ls /data
broker.info[0]: start: none->join 5.26718ms
broker.info[0]: parent-none: join->init 0.029464ms
🌀 flux start -o --config /etc/flux/config -Scron.directory=/etc/flux/system/cron.d -Stbon.fanout=256 -Srundir=/run/flux -Sstatedir=/var/lib/flux -Slocal-uri=local:///run/flux/local -Slog-stderr-level=6 -Slog-stderr-mode=local flux mini submit -n 1 --quiet --watch ls -l /data
broker.info[0]: start: none->join 6.40464ms
broker.info[0]: parent-none: join->init 0.031093ms
resource.err[0]: verify: rank 0 (materials-science-0) has extra resources: core[1-3]
cron.info[0]: synchronizing cron tasks to event heartbeat.pulse
job-manager.info[0]: restart: 0 jobs
job-manager.info[0]: restart: 0 running jobs
job-manager.info[0]: restart: checkpoint.job-manager not found
broker.info[0]: rc1.0: running /etc/flux/rc1.d/01-sched-fluxion
sched-fluxion-resource.info[0]: version 0.25.0-34-g3b33a364
sched-fluxion-resource.info[0]: version 0.25.0-38-gb92989ae
sched-fluxion-resource.warning[0]: create_reader: allowlist unsupported
sched-fluxion-resource.info[0]: populate_resource_db: loaded resources from core's resource.acquire
sched-fluxion-qmanager.info[0]: version 0.25.0-34-g3b33a364
sched-fluxion-qmanager.info[0]: version 0.25.0-38-gb92989ae
broker.info[0]: rc1.0: running /etc/flux/rc1.d/02-cron
broker.info[0]: rc1.0: /etc/flux/rc1 Exited (rc=0) 3.5s
broker.info[0]: rc1-success: init->quorum 3.50218s
broker.info[0]: rc1.0: /etc/flux/rc1 Exited (rc=0) 3.8s
broker.info[0]: rc1-success: init->quorum 3.81011s
broker.info[0]: online: materials-science-0 (ranks 0)
broker.info[0]: online: materials-science-[0-3] (ranks 0-3)
broker.info[0]: quorum-full: quorum->run 29.7528s
broker.info[0]: rc2.0: flux mini submit -n 1 --quiet --watch ls /data Exited (rc=0) 0.3s
broker.info[0]: rc2-success: run->cleanup 0.340521s
broker.info[0]: quorum-full: quorum->run 27.1426s
total 12
drwxrwxr-x 1 flux 999 4096 Feb 8 00:05 averaged
drwxrwxr-x 1 flux 999 4096 Feb 8 00:06 original
drwxrwxr-x 1 flux 999 4096 Feb 8 00:05 preprocessed
broker.info[0]: rc2.0: flux mini submit -n 1 --quiet --watch ls -l /data Exited (rc=0) 0.4s
broker.info[0]: rc2-success: run->cleanup 0.421225s
broker.info[0]: cleanup.0: flux queue stop --quiet --all --nocheckpoint Exited (rc=0) 0.1s
broker.info[0]: cleanup.1: flux job cancelall --user=all --quiet -f --states RUN Exited (rc=0) 0.0s
broker.info[0]: cleanup.2: flux queue idle --quiet Exited (rc=0) 0.1s
broker.info[0]: cleanup-success: cleanup->shutdown 0.169097s
broker.info[0]: children-complete: shutdown->finalize 42.01ms
broker.info[0]: cleanup-success: cleanup->shutdown 0.21403s
broker.info[0]: children-complete: shutdown->finalize 80.5028ms
broker.info[0]: rc3.0: running /etc/flux/rc3.d/01-sched-fluxion
broker.info[0]: rc3.0: /etc/flux/rc3 Exited (rc=0) 0.1s
broker.info[0]: rc3-success: finalize->goodbye 0.110118s
broker.info[0]: goodbye: goodbye->exit 0.027561ms
broker.info[0]: rc3.0: /etc/flux/rc3 Exited (rc=0) 0.2s
broker.info[0]: rc3-success: finalize->goodbye 0.173834s
broker.info[0]: goodbye: goodbye->exit 0.032602ms
Expand Down
9 changes: 2 additions & 7 deletions nsdf-materialscience/data/minikube/k8s-size-4-local/meta.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"times": {
"create-cluster": 2.203,
"minicluster-run-scanid-112536-minicluster-size-4": 137.294,
"minicluster-run-ls-data-minicluster-size-4": 42.248
"minicluster-run-scanid-112536-minicluster-size-4": 215.377,
"minicluster-run-ls-data-minicluster-size-4": 40.839
},
"size": 4,
"minicluster": {
Expand All @@ -13,11 +13,6 @@
]
},
"jobs": {
"ls-data": {
"image": "ghcr.io/converged-computing/nsdf-materialscience:ubuntu-20.04",
"command": "ls /data",
"size": 4
},
"scanid-112536": {
"image": "ghcr.io/converged-computing/nsdf-materialscience:ubuntu-20.04",
"command": "python3 /code/preprocess_radiographs.py preprocess /data radiographic_scan_id_112536",
Expand Down

0 comments on commit dbb4337

Please sign in to comment.