Skip to content
Howard Pritchard edited this page Jan 11, 2016 · 121 revisions

Feb 2016 OMPI Developer's Meeting

Dates:

  • Start: 9am Tuesday, February 23, 2016
  • Finish: 1pm Thursday, February 25, 2016

Location:

Map of Hotels:

  • These 3 Blue hotels offer shuttles both to/from IBM site AND the DFW International Airport:
    • Sheraton Grand Hotel (972-929-8400) - 4440 West John W Carpenter Freeway, Irving, TX 75063, USA
    • Holiday Inn Express (972-929-4499) - 4550 West John W Carpenter Freeway, Irving, TX 75063, USA
    • Hampton Inn (972-471-5000) - 1750 TX-121, Grapevine, TX 76051, USA
  • https://www.mapcustomizer.com/map/IBM%20IIC%20and%20Hotels%20-%20Map2?utm_source=share&utm_medium=email
  • NOTE: We stopped calling after we found 3 hotels that offered shuttles BOTH to the IBM site, and to the airport.

Attendees

Attendees:

  1. Jeff Squyres - Cisco
  2. Ralph Castain - Intel
  3. Edgar Gabriel - UHouston
  4. Stan Graves - IBM
  5. Geoff Paulsen - IBM
  6. Perry Schmidt (in and out) - IBM
  7. Dave Solt - IBM
  8. Brice Goglin - Inria
  9. Howard Pritchard - LANL

Suggested topics:

  1. Now that high-speed networks can be accessed via multiple network stacks (and multiple Open MPI components), users are getting confused about how to (un)select specific networks. We need to figure out a better / easier way for users to convey what they want. See http://www.open-mpi.org/community/lists/devel/2015/10/18154.php for more detail.
    1. One (very loose and probably not yet well thought-out) idea is to have some kind of higher-level abstraction: --[enable|disable] NETWORK_TYPE:QUALIFIER
  2. Mellanox/LANL have raised a good point that customers do not want to have multiple different Open MPI installations in their environments (e.g., Vendor A OMPI and Vendor B OMPI and community OMPI).
    1. How can a customer have a single OMPI installation, but still have vendor/distribution-specific enhancements?
    2. Or is that the wrong question -- should we really be working to enable individual component distribution? This would disallow vendors from distributing patches to core.
  3. Can we double check that all vendor Open MPI distributions are appropriately marked? E.g., via --with-ident-string? Not sure how to do this other than to ask each vendor/distribution -- perhaps we should default to "Unknown distribution" and have the nightly/release scripts set the official strings...?
  4. Re-discuss the separation of our libraries into libmpi, libopen-rte, and libopen-pal.
    • Beginning of the project: there was just libmpi. Later, it was split into projects, and then the project libraries. Later, the build was unified back into libmpi again.
    • In Dec 2012 (here's the commit), we split the build back into 3 libraries. The commit message cites discussion at the Dec 2012 Open MPI dev meeting -- but there's unfortunately no clues as to the rationale why this was done in the wiki notes. Was it just because we developers like having 3 smaller libraries? Or is there some deeper technical issue? Neither Ralph nor Jeff remembers. 😦
    • Rationale for bringing this up again: when upstream projects are trying to link against portions of our project, and then also support apps that link against all of it, we run into conflicts (e.g., the ORTE being used by the upstream project may be different than the one being used by the OMPI installation). Slurping it all up into one library would resolve the problem - but we cannot recall if there are undesirable side-effects.
  5. Nathan: Do we really need --enable-mpi-thread-multiple any more?
  6. Geoff Paulsen: fast path for send when only 1 outstanding request (a la Platform MPI)
  7. Ralph/George/Jeff: establishing dependencies in frameworks. This is a somewhat larger topic, but for the time being we will focus on a simple usage scenario...
    • In December, Ralph/George/Jeff talked about Intel's desire to use some of OPAL/ORTE/OMPI's frameworks in its own projects. Ralph had previously been copying source code around between repositories, and it was pretty much a mess.
    • After much discussion, we realized that Ralph can just link projects like SCON against libopen-rte.so and just access the frameworks that he wants. I.e., do a full install of ORTE (or probably OMPI?) using --with-devel-headers, and then literally treat OMPI/ORTE/OPAL frameworks just like any other shared library. Yay!
    • Sidenote: just for cleanliness, Ralph will "tighten up" each ORTE framework to require as few dependencies as possible
    • This is a one caveat, however: the application linking against libopen-rte likely doesn't want to call orte_init() to initialize all of ORTE; it just wants to use a few frameworks. So the application will need to know the "magic order" in which frameworks must be initialized (i.e., the contents and ordering of orte_init() and the ESS framework init [which is where most of the heavy lifting for ORTE initialization actually occurs these days]).
    • What would be better is if the frameworks themselves could declare what their dependencies are. E.g., if the application initializes the ABC framework, the ABC framework should be able to realize that it requires the DEF framework to be initialized first. This could possibly be done with a registration-based system. E.g., in the first few lines of framework open and/or init, it can call something like opal_register_framework_dependency("ABC", "DEF"). Components could even do the same thing; for example, the ob1 PML needs the BTL framework, so it could call opal_register_component_dependency("pml", "ob1", "btl")... or something like this).
    • At this meeting, we'd like to discuss the possibilities for such a system, and sketch out what the API should be.
    • Such a system could actually be used throughout all of OPAL, ORTE, OMPI, OSHMEM. It would not only eliminate the "magic ordering" that we have in opal_init(), orte_init(), ompi_runtime_init(), and oshmem_runtime_init(), but also allow for minimal initialization in cases where not all components are necessary for a particular run.
  8. Ralph: Further cleanup of the code base for project separation
    • Split autogen.pl and configure.ac by project?
    • Cleanup naming conventions - still have "ompi" in the "opal" layer, "ompi" named configure variables in the opal layer, etc.
  9. Howard: features for 2.1. Timeframe for forking 3.0?

Clone this wiki locally