Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelines / DAG #10

Open
1 of 3 tasks
arthur-flam opened this issue Jun 9, 2020 · 1 comment
Open
1 of 3 tasks

Pipelines / DAG #10

arthur-flam opened this issue Jun 9, 2020 · 1 comment

Comments

@arthur-flam
Copy link
Member

arthur-flam commented Jun 9, 2020

Currently QA-Board lacks expressiveness for our common use-case of:

  1. Run on some images
  2. Calibration
  3. Validation
    Likewise, we can't express easily pipelines like training-evaluation.

We need to express running series of steps / pipelines / tasks organized as directed-acyclic-graph.

We're looking for feedback or alternative ideas. Especially if you have experience with various flow engines, e.g. DVC. Thanks!

Workarounds

User have done this:

  • wrapped qa batch with a scripted pipeline
  • wrote complicated run() function with lots of logic

Status

  • Implement user-side support for sequential pipelines
  • Support pipelines officially in QA-Board
  • Support DAGs

Possible API

batch1:
  inputs:
  - A.jpg
  - B.jpg
  configurations:
  - base

batch2:
  needs: batch1
  type: script
  configurations:
  - python my_script.py {o.output_dir for o in needs["batch1"]}

More complex:

my-calibration-images:
    configurations:
    - base
    inputs:
    - DL50.raw
    - DL55.raw
    - DL65.raw
    - DL75.raw

my-calibration:
    needs:
      calibration_images: my-calibration-images
    type: script
    configurations:
    - python calibration.py ${o.output_directory for o in depends[calibration_images]}

my-evaluation-batch:
    needs:
      calibration: my-calibration
    inputs:
    - test_image_1.raw
    - test_image_2.raw
    - test_image_3.raw
    configurations:
    - base
    - ${depends[calibration].output_directory}/calibration.cde
$ qa batch my-evaluation-batch
#=> qa batch my-calibration-images
#=> qa batch my-calibration
#=> qa batch my-evaluation-batch

Thoughts

  • We should add built-in support for script input types, than just executes their config as commands. It goes well with DAGs.
my-script:
  needs: batch1
  type: script
  configurations:
  - echo OK

Expected

  • Easy API
  • Cache friendly
  • Can be used in a non-blocking way
@arthur-flam
Copy link
Member Author

arthur-flam commented Jul 24, 2020

Update: thanks to Itamar Persi and Ela Shahar, there is a pipeline implementation in "user-land":

my-pipeline:
  configs:
  - run: echo "Step 1"
  - batch: first-batch
  - batch:
    - second-batch
    - third-batch
    - label: batches running in parallel
  - run: some-postprocessing-script.py

Features include

  • using PIPELINE_OUTPUT_DIR to save data across the batch
  • providing to run steps info on the previous batch (what qa batch --list returns)

It's much simpler than a full DAG, and good enough in most cases.

Next steps

  • We'll contribute it to the project as a default run() if the input type is pipeline
  • Until then, we'll provide the code here on request as sample code (just comment here)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant