Skip to content

Feature/psij scripts#849

Merged
jlnav merged 39 commits intodevelopfrom
feature/psij_scripts
Jul 20, 2022
Merged

Feature/psij scripts#849
jlnav merged 39 commits intodevelopfrom
feature/psij_scripts

Conversation

@jlnav
Copy link
Copy Markdown
Member

@jlnav jlnav commented May 31, 2022

Addresses #807

Introduces a new pair of utilities for preparing and launching libEnsemble workflows onto almost any machine and almost any scheduler, using a PSI/J backend. Should help alleviate some responsibility of maintaining batch submission scripts and determining how to submit jobs on each system.

  1. liberegister

Creates an initial, platform-independent PSI/J representation of a libEnsemble run. Run this utility on a calling script in a familiar manner:

liberegister run_libe_forces.py --comms local --nworkers 4

This will produce a run_libe_forces.json serialization from PSI/J, conforming to their specification:

{
    "version": 0.1,
    "type": "JobSpec",
    "data": {
        "name": "libe-job",
        "executable": "python",
        "arguments": [
            "run_libe_forces.py",
            "--comms",
            "local",
            "--nworkers",
            "4"
        ],
        "directory": null,
        "inherit_environment": true,
        "environment": {
            "PYTHONNOUSERSITE": "1"
        },
        "stdin_path": null,
        "stdout_path": null,
        "stderr_path": null,
        "resources": {
            "node_count": 1,
            "process_count": null,
            "process_per_node": null,
            "cpu_cores_per_process": null,
            "gpu_cores_per_process": null,
            "exclusive_node_use": true
        },
        "attributes": {
            "duration": "30",
            "queue_name": null,
            "project_name": null,
            "reservation_id": null,
            "custom_attributes": {}
        },
        "launcher": null
    }
}

This json file is easy to make but also easy to share for use by anyone else using libEnsemble.

  1. libesubmit

Further parameterizes a PSI/J serialization then submits the resulting Job to the specified scheduler. (for example):

libesubmit run_libe_forces.json -q debug -A project -s slurm

This will also produce a Job-specific serialization, like 8ba9de56.run_libe_forces.json:

{
    "version": 0.1,
    "type": "JobSpec",
    "data": {
        "name": "libe-job",
        "executable": "/Users/jnavarro/miniconda3/envs/libe/bin/python3.8",
        "arguments": [
            "run_libe_forces.py",
            "--comms",
            "local",
            "--nworkers",
            "4"
        ],
        "directory": "/Users/jnavarro/Desktop/libensemble/libensemble/libensemble/tests/scaling_tests/forces/forces_simple",
        "inherit_environment": true,
        "environment": {
            "PYTHONNOUSERSITE": "1"
        },
        "stdin_path": null,
        "stdout_path": "8ba9de56.run_libe_forces.out",
        "stderr_path": "8ba9de56.run_libe_forces.err",
        "resources": {
            "node_count": 4,
            "process_count": null,
            "process_per_node": null,
            "cpu_cores_per_process": null,
            "gpu_cores_per_process": null,
            "exclusive_node_use": true
        },
        "attributes": {
            "duration": "30",
            "queue_name": "debug",
            "project_name": "project",
            "reservation_id": null,
            "custom_attributes": {}
        },
        "launcher": null
    }
}

If libesubmit is run on a .json serialization from liberegister and can't find the specified calling script, it'll help find matching candidate scripts:

*** libEnsemble 0.9.1+dev ***
Imported PSI/J serialization: run_libe_forces.json. Preparing submission...
Calling script: run_libe_forces.py
... not found in Job working directory!
Check somewhere else? (Y/N): y
/Users/jnavarro:
0. Music
1. Pictures
2. Desktop
3. Library
4. Public
5. Movies
6. Documents
7. Downloads
8. miniconda3
Specify a starting directory: 2
preparing... ctrl+c to abort.
132168it [00:02, 58050.10it/s]
detecting... ctrl+c to abort.
/Users/jnavarro:
1. .../libensemble/gitlab/libensemble/libensemble/tests/scaling_tests/forces/run_libe_forces.py                                                                                                                  
2. .../libensemble/gitlab/libensemble/examples/calling_scripts/run_libe_forces.py                                                                                                                                
3. .../libensemble/libensemble/libensemble/tests/scaling_tests/forces/forces_gpu/run_libe_forces.py                                                                                                              
4. .../libensemble/libensemble/libensemble/tests/scaling_tests/forces/forces_adv/run_libe_forces.py                                                                                                              
5. .../libensemble/libensemble/libensemble/tests/scaling_tests/forces/forces_simple/run_libe_forces.py                                                                                                           
6. .../libensemble/libensemble/examples/calling_scripts/run_libe_forces.py                                                                                                                                       
7. .../libensemble/libensemble/examples/tutorials/forces_with_executor/run_libe_forces.py                                                                                                                        
8. .../libensemble/software/libensemble/libensemble/tests/scaling_tests/forces/run_libe_forces.py                                                                                                                
9. .../libensemble/software/libensemble/examples/calling_scripts/run_libe_forces.py                                                                                                                              
10. .../libensemble/old_hpc/calling/templates/run_libe_forces.py                                                                                                                                                 
11. .../libensemble/old_hpc/hpc_tests/summit/templates/run_libe_forces.py                                                                                                                                        
12. .../libensemble/old_hpc/hpc_tests/old/calling/templates/run_libe_forces.py                                                                                                                                   
13. .../libensemble/old_hpc/hpc_tests/bebop/templates/run_libe_forces.py                                                                                                                                         
14. .../libensemble/old_hpc/hpc_tests/cori/templates/run_libe_forces.py                                                                                                                                          
15. .../libensemble/old_hpc/hpc_tests/theta/templates/run_libe_forces.py                                                                                                                                         
16. .../libensemble/newgitlab/libensemble/libensemble/tests/scaling_tests/forces/run_libe_forces.py                                                                                                              
17. .../libensemble/newgitlab/libensemble/examples/calling_scripts/run_libe_forces.py                                                                                                                            
18. .../libensemble/libE-templater/platforms/all/run_libe_forces.py                                                                                                                                              
19. .../libensemble/src/libensemble/libensemble/tests/scaling_tests/forces/run_libe_forces.py                                                                                                                    
20. .../libensemble/src/libensemble/examples/calling_scripts/run_libe_forces.py                                                                                                                                  
21. .../sc21/libensemble/libensemble/tests/scaling_tests/forces/run_libe_forces.py                                                                                                                               
22. .../sc21/libensemble/examples/calling_scripts/run_libe_forces.py                                                                                                                                             
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 132168/132168 [00:01<00:00, 95904.14it/s]
Specify a detected script: 5
  • Should these two scripts really be combined into one script?
  • Create custom launcher to flexibly launch libEnsemble w/ MPI onto allocated resources

@jlnav
Copy link
Copy Markdown
Member Author

jlnav commented Jun 14, 2022

Getting started:

  1. git clone https://github.com/ExaWorks/psi-j-python.git, then pip install
  2. pip install tqdm
  3. Check out this branch, then pip install the libEnsemble repository again, so scripts are copied into the environment's bin

@jlnav jlnav requested a review from shuds13 June 14, 2022 16:46
@jlnav
Copy link
Copy Markdown
Member Author

jlnav commented Jun 14, 2022

Probably ready for the first round of feedback. Still WIP of course

Copy link
Copy Markdown
Member

@shuds13 shuds13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the discussion/demo, I am happy with how this is going so far. It currently is working for
local comms on the tested platforms. Looks like we need a psi/j update for mpi4py comms.

We can decide best location for utility scripts.

@jlnav jlnav requested a review from shuds13 July 11, 2022 21:17
@jlnav
Copy link
Copy Markdown
Member Author

jlnav commented Jul 11, 2022

This iteration may be sufficient for a portable multiprocessing solution. As has been stated, extensions to PSI/J (which I'm looking into) will be necessary to support MPI comms.

Thoughts?

@jlnav jlnav marked this pull request as ready for review July 11, 2022 21:19
@jlnav jlnav marked this pull request as draft July 13, 2022 19:05
@jlnav jlnav marked this pull request as ready for review July 14, 2022 16:17
@jlnav
Copy link
Copy Markdown
Member Author

jlnav commented Jul 14, 2022

  • Document

@jlnav jlnav changed the title [WIP] Feature/psij scripts Feature/psij scripts Jul 19, 2022
@jlnav jlnav merged commit 0075ea3 into develop Jul 20, 2022
@jlnav jlnav deleted the feature/psij_scripts branch July 20, 2022 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants