Skip to content
erikjohnson24 edited this page Apr 18, 2019 · 1 revision

Frequently asked questions

  • How do parallel jobs work?

    There are actually two ways that parallel jobs can function in SABER.

    1. Parameter sweeps: When a job parameter is swept using a sweep file, essentially it creates multiple workflows that all run in parallel with different combinations of the parameters. This means that their processes are completely independent, and steps only depend on previous steps in the sub-workflow.
    2. Parallel steps in a workflow: If two or more steps depend on the same step, but do not depend on each other, they will be run in parallel.

    These two methods of parallelization can both be combined. For example, if you have a workflow that is defined as

           / --> s2 -- \ 
    s1(t) --              --> s4
           \ --> s3 -- /
    

    Where s1(t) depends on parameter t. If you sweep over t such that it produces two iterations (say t=True, t=False), then it will produce a DAG similar to

               / --> s2 -- \ 
    s1(True) --              --> s4
               \ --> s3 -- /
               / --> s2' -- \ 
    s1'(False) --             --> s4'
               \ --> s3' -- /
    

    (s1, s2) and (s2, s3) will run in parallel. (s2', s3') will only run when s1' completes and so on.

    Parallel jobs in AWS are managed by the compute environment choice for AWS batch. In SABER's default configuration, where the compute environemnt is managed, as many jobs as possible will be run in parallel that fit within the limits that were placed on the compute environment. Each job spawns a seperate EC2 instance running the job, where the instance type (and cost) are based on the ResourceRequirement specified in the job cwl.

    If you wish to put limits on the instances launched, you can do so when creating your compute environment. See here.

  • Can I launch a job from the AWS console?

    Yes, as long as the job definition has already been generated and the containers pushed (i.e you ran the cwl-to-dag script on the job and workflow cwl's). If you navigate to the AWS Batch console and go to the job definitions page, you can select a job definition and launch the job from there. You will have to specify the individual parameters.

    However, if you have already launched a similar job and you can find it in any of the status tabs under the Jobs tab on the left of the console, you can click on the job and click Clone job, which will give you the option to change individual parameters.

    Note: If you do not change any of the _saber_ parameters, the files in S3 will be overwritten. Specifically, you should change the _saber_home or _saber_stepname, which specify where in the S3 the files are stored. See the S3 page for details.