forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 4
Meeting 2016 02
Howard Pritchard edited this page Jan 11, 2016
·
121 revisions
Dates:
- Start: 9am Tuesday, February 23, 2016
- Finish: 1pm Thursday, February 25, 2016
Location:
- IBM Dallas Innovation Center web site
- Google Maps
- Street address: 1177 South Beltline Road, Coppell, Texas 75019 USA
Map of Hotels:
- These 3 Blue hotels offer shuttles both to/from IBM site AND the DFW International Airport:
- Sheraton Grand Hotel (972-929-8400) - 4440 West John W Carpenter Freeway, Irving, TX 75063, USA
- Holiday Inn Express (972-929-4499) - 4550 West John W Carpenter Freeway, Irving, TX 75063, USA
- Hampton Inn (972-471-5000) - 1750 TX-121, Grapevine, TX 76051, USA
- https://www.mapcustomizer.com/map/IBM%20IIC%20and%20Hotels%20-%20Map2?utm_source=share&utm_medium=email
- NOTE: We stopped calling after we found 3 hotels that offered shuttles BOTH to the IBM site, and to the airport.
Attendees:
- Jeff Squyres - Cisco
- Ralph Castain - Intel
- Edgar Gabriel - UHouston
- Stan Graves - IBM
- Geoff Paulsen - IBM
- Perry Schmidt (in and out) - IBM
- Dave Solt - IBM
- Brice Goglin - Inria
- Howard Pritchard - LANL
- Now that high-speed networks can be accessed via multiple network stacks (and multiple Open MPI components), users are getting confused about how to (un)select specific networks. We need to figure out a better / easier way for users to convey what they want. See http://www.open-mpi.org/community/lists/devel/2015/10/18154.php for more detail.
- One (very loose and probably not yet well thought-out) idea is to have some kind of higher-level abstraction:
--[enable|disable] NETWORK_TYPE:QUALIFIER
- One (very loose and probably not yet well thought-out) idea is to have some kind of higher-level abstraction:
- Mellanox/LANL have raised a good point that customers do not want to have multiple different Open MPI installations in their environments (e.g., Vendor A OMPI and Vendor B OMPI and community OMPI).
- How can a customer have a single OMPI installation, but still have vendor/distribution-specific enhancements?
- Or is that the wrong question -- should we really be working to enable individual component distribution? This would disallow vendors from distributing patches to core.
- Can we double check that all vendor Open MPI distributions are appropriately marked? E.g., via
--with-ident-string? Not sure how to do this other than to ask each vendor/distribution -- perhaps we should default to "Unknown distribution" and have the nightly/release scripts set the official strings...? - Re-discuss the separation of our libraries into libmpi, libopen-rte, and libopen-pal.
- Beginning of the project: there was just libmpi. Later, it was split into projects, and then the project libraries. Later, the build was unified back into libmpi again.
- In Dec 2012 (here's the commit), we split the build back into 3 libraries. The commit message cites discussion at the Dec 2012 Open MPI dev meeting -- but there's unfortunately no clues as to the rationale why this was done in the wiki notes. Was it just because we developers like having 3 smaller libraries? Or is there some deeper technical issue? Neither Ralph nor Jeff remembers. 😦
- Rationale for bringing this up again: when upstream projects are trying to link against portions of our project, and then also support apps that link against all of it, we run into conflicts (e.g., the ORTE being used by the upstream project may be different than the one being used by the OMPI installation). Slurping it all up into one library would resolve the problem - but we cannot recall if there are undesirable side-effects.
- Nathan: Do we really need
--enable-mpi-thread-multipleany more? - Geoff Paulsen: fast path for send when only 1 outstanding request (a la Platform MPI)
- Ralph/George/Jeff: establishing dependencies in frameworks. This is a somewhat larger topic, but for the time being we will focus on a simple usage scenario...
- In December, Ralph/George/Jeff talked about Intel's desire to use some of OPAL/ORTE/OMPI's frameworks in its own projects. Ralph had previously been copying source code around between repositories, and it was pretty much a mess.
- After much discussion, we realized that Ralph can just link projects like SCON against
libopen-rte.soand just access the frameworks that he wants. I.e., do a full install of ORTE (or probably OMPI?) using--with-devel-headers, and then literally treat OMPI/ORTE/OPAL frameworks just like any other shared library. Yay! - Sidenote: just for cleanliness, Ralph will "tighten up" each ORTE framework to require as few dependencies as possible
- This is a one caveat, however: the application linking against
libopen-rtelikely doesn't want to callorte_init()to initialize all of ORTE; it just wants to use a few frameworks. So the application will need to know the "magic order" in which frameworks must be initialized (i.e., the contents and ordering oforte_init()and the ESS framework init [which is where most of the heavy lifting for ORTE initialization actually occurs these days]). - What would be better is if the frameworks themselves could declare what their dependencies are. E.g., if the application initializes the ABC framework, the ABC framework should be able to realize that it requires the DEF framework to be initialized first. This could possibly be done with a registration-based system. E.g., in the first few lines of framework open and/or init, it can call something like
opal_register_framework_dependency("ABC", "DEF"). Components could even do the same thing; for example, the ob1 PML needs the BTL framework, so it could callopal_register_component_dependency("pml", "ob1", "btl")... or something like this). - At this meeting, we'd like to discuss the possibilities for such a system, and sketch out what the API should be.
- Such a system could actually be used throughout all of OPAL, ORTE, OMPI, OSHMEM. It would not only eliminate the "magic ordering" that we have in
opal_init(),orte_init(),ompi_runtime_init(), andoshmem_runtime_init(), but also allow for minimal initialization in cases where not all components are necessary for a particular run.
- Ralph: Further cleanup of the code base for project separation
- Split autogen.pl and configure.ac by project?
- Cleanup naming conventions - still have "ompi" in the "opal" layer, "ompi" named configure variables in the opal layer, etc.
- Howard: features for 2.1. Timeframe for forking 3.0?