Skip to content

Commit

Permalink
update with more tweaks
Browse files Browse the repository at this point in the history
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch committed Jun 23, 2023
1 parent 52c6d0f commit bd49ec6
Show file tree
Hide file tree
Showing 3 changed files with 91 additions and 28 deletions.
110 changes: 85 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,67 +146,126 @@ gcloud container clusters create $CLUSTER_NAME \
Next (when your cluster is ready and healthhy!) [deploy pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/installation/standalone-deployment/#deploying-kubeflow-pipelines)

```bash
export PIPELINE_VERSION=1.8.5
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"
git clone --depth 1 https://github.com/kubeflow/pipelines /tmp/kubeflow
cd /tmp/kubeflow/manifests/kustomize
KFP_ENV=platform-agnostic
kubectl apply -k cluster-scoped-resources/
kubectl wait crd/applications.app.k8s.io --for condition=established --timeout=60s
kubectl apply -k "env/${KFP_ENV}/"
kubectl wait pods -l application-crd-id=kubeflow-pipelines -n kubeflow --for condition=Ready --timeout=1800s
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
```

Here is how to get the public URL for pipelines (it will show when it's ready):
Then you should be able to open up to [http://localhost:8080/#/pipelines](http://localhost:8080/#/pipelines).
We've already installed kfp in the first step, so we don't need to do that again.
Expose the RESTful API (in one terminal)

```bash
kubectl describe configmap inverse-proxy-config -n kubeflow | grep googleusercontent.com
```
```console
4ad924b8db1028a2-dot-us-central1.pipelines.googleusercontent.com
# Get the port (8888)
kubectl -n kubeflow get svc/ml-pipeline -o json | jq ".spec.ports[0].port
kubectl port-forward -n kubeflow svc/ml-pipeline 9999:8888
```
We've already installed kfp in the first step, so we don't need to do that again.


### Pipeline
We have a [sample.py](sample.py) pipeline provided. We can "compile" it (a YAML in a .tar.gz)
We have a [sample.py](sample.py) pipeline provided. We can "compile" it to YAML
as follows:
```bash
kfp dsl compile --py sample.py --output hello-world.yaml
```
Expose the RESTful API (in one terminal)
Try uploading the YAML file:
```bash
SVC_PORT=$(kubectl -n kubeflow get svc/ml-pipeline -o json | jq ".spec.ports[0].port")
kubectl port-forward -n kubeflow svc/ml-pipeline ${SVC_PORT}:8888
SVC=localhost:9999
PIPELINE_ID=$(curl -F "uploadfile=@hello-world.yaml" ${SVC}/apis/v1beta1/pipelines/upload | jq -r .id)
```
Try uploading the YAML file:
We can then get a status!
```bash
SVC=localhost:8888
PIPELINE_ID=$(curl -F "uploadfile=@hello-world.yaml" ${SVC}/apis/v1beta1/pipelines/upload | jq -r .id)

# TODO get status
curl ${SVC}/apis/v1beta1/pipelines/${PIPELINE_ID} | jq
$ curl ${SVC}/apis/v1beta1/pipelines/${PIPELINE_ID} | jq
```
```console
{
"id": "6c0350c3-68eb-4359-8ae9-2c3d46765280",
"created_at": "2023-06-22T23:42:42Z",
"name": "hello-world.yaml",
"default_version": {
"id": "27b06982-6cf4-4e72-a6d4-ee4b28eb670b",
"name": "hello-world.yaml",
"created_at": "2023-06-22T23:42:42Z",
"resource_references": [
{
"key": {
"type": "PIPELINE",
"id": "6c0350c3-68eb-4359-8ae9-2c3d46765280"
},
"relationship": "OWNER"
}
]
}
}
```
And then trigger a run!
```bash
# Trigger Run
RUN_ID=$((
curl -H "Content-Type: application/json" -X POST ${SVC}/apis/v1beta1/runs \
-d @- << EOF
{
"name":"${PIPELINE_NAME}_run",
"name":"flux_component_run_1",
"pipeline_spec":{
"pipeline_id":"${PIPELINE_ID}"
"pipeline_id":"${PIPELINE_ID}",
"args": {"project": "llnl-flux"}
}
}
EOF
) | jq -r .run.id)
```
At this point, we have some kind of bug reported in the log of the pod (which doesn't make it to the UI because there is an error message about a pod name):
```bash
$ kubectl logs -n kubeflow hello-world-flux-operator-bf4dq-691794632
```
```console
I0622 23:49:18.534358 17 main.go:224] output ExecutorInput:{
"inputs": {
"parameterValues": {
"command": "echo hello world",
"image": "ghcr.io/flux-framework/flux-restful-api:latest",
"local": true,
"name": "hello-world-run-123",
"namespace": "kubeflow",
"nnodes": 2,
"project": "llnl-flux"
}
},
"outputs": {
"outputFile": "/tmp/kfp_outputs/output_metadata.json"
}
}
time="2023-06-22T23:49:19.209Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-06-22T23:49:19.209Z" level=info msg="/tmp/outputs/pod-spec-patch -> /var/run/argo/outputs/parameters//tmp/outputs/pod-spec-patch" argo=true
time="2023-06-22T23:49:19.210Z" level=info msg="/tmp/outputs/cached-decision -> /var/run/argo/outputs/parameters//tmp/outputs/cached-decision" argo=true
time="2023-06-22T23:49:19.210Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"
```
It's weird that it's added an outputFile that I didn't define - this must be some kind of standard output. It's also weird that it's trying to open a `/tmp/outputs/condition` that
isn't there! I'm not sure any of my script is running (I don't see any indication that it is).
# Get results
If we had an actual successful run:
```bash
curl ${SVC}/apis/v1beta1/runs/${RUN_ID} | jq
```
**under development** I got too sleepy.
Although that looks like a mess.
#### Clean Up
Expand All @@ -216,6 +275,7 @@ When you are done:
$ gcloud container clusters delete $CLUSTER_NAME
```
### Building the Container
If you need to build the container locally:
Expand Down
4 changes: 3 additions & 1 deletion component.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
name: Flux Framework MiniCluster
description: Launch a task to run on a Flux Framework MiniCluster
description: |
Launch a task to run on a Flux Framework MiniCluster
inputs:
- {name: project, type: String, "description": "Google Cloud project (required)"}
- {name: image, type: String, "description": "Container image for MiniCluster"}
Expand Down Expand Up @@ -27,6 +28,7 @@ inputs:
# - {name: output_model_uri, type: {GcsPath: {data_type: Keras model}}, description: 'GCS path where the trained model has been saved. The model (topology + weights + optimizer state) is saved in HDF5 format and can be loaded back by calling keras.models.load_model'} #Remove GcsUri and make it a proper output once artifact passing support is checked in.

metadata:
description: Launch a task to run on a Flux Framework MiniCluster
annotations:
author: Vanessasaurus <sochat1@llnl.gov>
canonical_location: 'https://raw.githubusercontent.com/converged-computing/flux-operator-component/main/component.yaml'
Expand Down
5 changes: 3 additions & 2 deletions sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

# An example pipeline (that gets compiled) using the Flux Operator component!
import os
import kfp
import kfp.dsl as dsl
from kfp import components

Expand All @@ -20,8 +19,10 @@ def get_current_namespace():
current_namespace = "kubeflow"
return current_namespace


def get_project():
return os.environ.get('GOOGLE_PROJECT')
return os.environ.get("GOOGLE_PROJECT") or "llnl-flux"


@dsl.pipeline(
name="hello-world-flux-operator",
Expand Down

0 comments on commit bd49ec6

Please sign in to comment.