Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PARSL_MPI_PREFIX as prefix #2197

Merged

Conversation

tomdemeyere
Copy link
Contributor

Summary of Changes

Attempt to solve #2076 If the user provides parsl_resource_specification then it takes precedence in the command prefix. This should also be done for ONETEP, VASP, ... but I don't know how happy you are with this whole concept.

Small reminder about why we care about that: This allow Parsl user to use any kind of launcher, mainly mpirun

Requirements

Notes

@buildbot-princeton
Copy link
Collaborator

Can one of the admins verify this patch?

Copy link

codecov bot commented May 31, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 99.01%. Comparing base (568ddce) to head (cb5f8a8).
Report is 1 commits behind head on main.

Files Patch % Lines
src/quacc/settings.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2197      +/-   ##
==========================================
- Coverage   99.04%   99.01%   -0.03%     
==========================================
  Files          81       81              
  Lines        3364     3367       +3     
==========================================
+ Hits         3332     3334       +2     
- Misses         32       33       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Andrew-S-Rosen
Copy link
Member

Oh, I see. This might be why my half-hearted attempt to use the new MPIExecutor did not work as planned (#2167).

It would probably make sense to have this taken care of early on, like when the settings are initialized. For instance, here one could swap out the 0th index of the tuple with the environment variable if supplied. This would make it easier to prevent bugs since we won't have to make changes in multiple places. It is a bit annoying to have to do this for every single MPI executable, but I don't really see a way around that given the current Parsl setup, and if it's all confined within a single settings.py file, at least it's not too cumbersome to deal with.

@tomdemeyere
Copy link
Contributor Author

tomdemeyere commented Jun 3, 2024

Debugging Parsl scripts is a nightmare, I still didn't find a way to cleanly show stdout somewhere...

This should work, I just wanted to make sure of what happens if you send jobs with different specification inside a subflow, but suddenly I am getting hit with a:

File "/Users/tomdm/repos/quacc/src/quacc/wflow_tools/decorators.py", line 617, in wrapper
return func(*f_args, **f_kwargs)
^^^^^^^^^^^^^^^^^
File "/Users/tomdm/repos/quacc/src/quacc/recipes/espresso/core.py", line 87, in static_job
is_metal = check_is_metal(atoms)
^^^^^^^^^^^^^^^^^
NameError: name 'check_is_metal' is not defined

When trying to run a static_job with Espresso on my local machine

from quacc import SETTINGS

SETTINGS.WORKFLOW_ENGINE = "parsl"

import os

import parsl
from parsl import bash_app, python_app
from parsl.channels import LocalChannel
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.providers import LocalProvider
from quacc.recipes.espresso.core import static_job
from ase.build import molecule
from quacc import job, redecorate, subflow

config = Config(
    executors=[
        HighThroughputExecutor(
            label="htex_local",
            cores_per_worker=1,
            max_workers=1,
            provider=LocalProvider(
                channel=LocalChannel(),
                init_blocks=1,
                max_blocks=1,
            ),
        )
    ],
)

resource_spec_1 = {
    "num_nodes": 2,
    "ranks_per_node": 2,
    "num_ranks": 4,
}

resource_spec_2 = {
    "num_nodes": 1,
    "ranks_per_node": 1,
    "num_ranks": 1,
}

parsl.load(config)


atoms = molecule("H2O")

"""
@job
def test_mpi_apps():
    from os import environ

    return [environ.get("PARSL_RANKS_PER_NODE"), environ.get("PARSL_NUM_NODES"), environ.get("PARSL_NUM_RANKS"), environ.get("PARSL_MPI_PREFIX"), environ.get("PARSL_MPIEXEC_PREFIX"), environ.get("PARSL_MPI_NODELIST")]
"""
    

test_mpi_apps_1 = redecorate(static_job, job(parsl_resource_specification=resource_spec_1))
#test_mpi_apps_2 = redecorate(static_job, job(parsl_resource_specification=resource_spec_2))

#@subflow
#def subflow_mpi_apps():
#    return [test_mpi_apps_1(atoms), test_mpi_apps_2(atoms)]

future = test_mpi_apps_1(atoms)

print(future.result())

If you have time can you tell me what you get from running this script? (nothing to change normally)

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jun 3, 2024

Thanks for the changes, which look good to me.

The error you receive is not very promising! At a quick glance, it looks to me like Parsl is having trouble finding the function definition from the global space. This, in principle, should not be an issue but perhaps something weird is going on here...

I tried running the code but I'm just getting a bunch of Parsl debug chatter with no obvious runtime output or error... :( I agree debugging Parsl behavior is a real pain.

@tomdemeyere
Copy link
Contributor Author

Just tried the same script on master and get the same thing, I guess there are still things I don't understand about Parsl!

In any case, it seems to not be related to this MR.

To be completely honest this would require much more testing (with proper HPC etc...) which practically cannot be done here? I would suggest that for now Quacc goes with that and wait for other users feedback (or me but after the PhD, when I have time 😅)

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jun 3, 2024

Thanks! I'm going to go ahead and merge this as-is. When I go ahead and try to solve #2116, I'll be testing things out over HPC and with different codes, which will likely spot any problems for me to address. But this PR will be a good point for me to build from.

@Andrew-S-Rosen Andrew-S-Rosen merged commit 3b96371 into Quantum-Accelerators:main Jun 3, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants