Skip to content
Jeff Squyres edited this page Dec 16, 2015 · 121 revisions

Feb 2016 OMPI Developer's Meeting

Dates:

  • Start: 9am Tuesday, February 23, 2015
  • Finish: 1pm Thursday, February 25, 2015

Location:

We've been told there are a number of hotels that will shuttle both to/from DFW and the IBM Innovation Center. Geoff Paulsen (@gpaulsen) is working on a list now, and will update once he has a list of hotels.

Attendees

Local attendees:

  1. Jeff Squyres - Cisco
  2. Ralph Castain - Intel
  3. Edgar Gabriel - UHouston

Suggested topics:

  1. Now that high-speed networks can be accessed via multiple network stacks (and multiple Open MPI components), users are getting confused about how to (un)select specific networks. We need to figure out a better / easier way for users to convey what they want. See http://www.open-mpi.org/community/lists/devel/2015/10/18154.php for more detail.
    1. One (very loose and probably not yet well thought-out) idea is to have some kind of higher-level abstraction: --[enable|disable] NETWORK_TYPE:QUALIFIER
  2. Mellanox/LANL have raised a good point that customers do not want to have multiple different Open MPI installations in their environments (e.g., Vendor A OMPI and Vendor B OMPI and community OMPI).
    1. How can a customer have a single OMPI installation, but still have vendor/distribution-specific enhancements?
    2. Or is that the wrong question -- should we really be working to enable individual component distribution? This would disallow vendors from distributing patches to core.
  3. Can we double check that all vendor Open MPI distributions are appropriately marked? E.g., via --with-ident-string? Not sure how to do this other than to ask each vendor/distribution -- perhaps we should default to "Unknown distribution" and have the nightly/release scripts set the official strings...?
  4. Re-discuss the separation of our libraries into libmpi, libopen-rte, and libopen-pal.
    • Beginning of the project: there was just libmpi. Later, it was split into projects, and then the project libraries. Later, the build was unified back into libmpi again.
    • In Dec 2012 (here's the commit), we split the build back into 3 libraries. The commit message cites discussion at the Dec 2012 Open MPI dev meeting -- but there's unfortunately no clues as to the rationale why this was done in the wiki notes. Was it just because we developers like having 3 smaller libraries? Or is there some deeper technical issue? Neither Ralph nor Jeff remembers. 😦
    • Rationale for bringing this up again: when upstream projects are trying to link against portions of our project, and then also support apps that link against all of it, we run into conflicts (e.g., the ORTE being used by the upstream project may be different than the one being used by the OMPI installation). Slurping it all up into one library would resolve the problem - but we cannot recall if there are undesirable side-effects.
  5. Nathan: Do we really need --enable-mpi-thread-multiple any more?
  6. Geoff Paulsen: fast path for send when only 1 outstanding request (a la Platform MPI)
  7. Ralph/George/Jeff: establishing dependencies in frameworks. The is a somewhat large topic...
    • In December, Ralph/George/Jeff talked about Intel's desire to use some of OPAL/ORTE/OMPI's frameworks in its own projects. Ralph had previously been copying source code around between repositories, and it was pretty much a mess.
    • After much discussion, we realized that Ralph can just link projects like SCON against libopen-rte.so and just access the frameworks that he wants. I.e., do a full install of ORTE (or probably OMPI?) using --with-devel-headers, and then literally treat OMPI/ORTE/OPAL frameworks just like any other shared library. Yay!
    • Sidenote: just for cleanliness, Ralph will "tighten up" each ORTE framework to require as few dependencies as possible
    • This is a one caveat, however: the application linking against libopen-rte likely doesn't want to call orte_init() to initialize all of ORTE; it just wants to use a few frameworks. So the application will need to know the "magic order" in which frameworks must be initialized (i.e., the contents and ordering of orte_init() and the ESS framework init [which is where most of the heavy lifting for ORTE initialization actually occurs these days]).
    • What would be better is if the frameworks themselves could declare what their dependencies are. E.g., if the application initializes the ABC framework, the ABS framework should be able to realize that it requires the DEF framework to be initialized first. This could possibly be done with a registration-based system. E.g., in the first few lines of framework open and/or init, it can call something like opal_register_framework_dependency("ABC", "DEF"). Components could even do the same thing; for example, the ob1 PML needs the BTL framework, so it could call opal_register_component_dependency("pml", "ob1", "btl")... or something like this).
    • At this meeting, we'd like to discuss the possibilities for such a system, and sketch out what the API should be.
    • Such a system could actually be used throughout all of OPAL, ORTE, OMPI, OSHMEM. It would eliminate the "magic ordering" that we have in opal_init(), orte_init(), ompi_runtime_init(), and oshmem_runtime_init().

Clone this wiki locally