Skip to content
rhc54 edited this page Feb 19, 2016 · 121 revisions

Feb 2016 OMPI Developer's Meeting

Dates:

  • Start: 9am Tuesday, February 23, 2016
  • Finish: 1pm Thursday, February 25, 2016

Location:

IBM Dallas Innovation Center web site

  • Google Maps
  • Street address: 1177 South Beltline Road, Coppell, Texas 75019 USA
  • Enter on the East entrance (closest to Beltline Road)
    • Hollerith Room - On left after you walk in. (Now All 3 days, in the same room!)
    • Receptionist should have nametags for everyone.
    • Foreign Nationals welcome.
    • No need to escort visitors in this area.

Map of Hotels:

  • These 3 Blue hotels offer shuttles both to/from IBM site AND the DFW International Airport:
    • Sheraton Grand Hotel (972-929-8400) - 4440 West John W Carpenter Freeway, Irving, TX 75063, USA
    • Holiday Inn Express (972-929-4499) - 4550 West John W Carpenter Freeway, Irving, TX 75063, USA
    • Hampton Inn (972-471-5000) - 1750 TX-121, Grapevine, TX 76051, USA
  • https://www.mapcustomizer.com/map/IBM%20IIC%20and%20Hotels%20-%20Map2?utm_source=share&utm_medium=email
  • NOTE: We stopped calling after we found 3 hotels that offered shuttles BOTH to the IBM site, and to the airport.

Attendees

Attendees:

  1. Jeff Squyres - Cisco
  2. Ralph Castain - Intel
  3. Edgar Gabriel - UHouston
  4. Stan Graves - IBM
  5. Geoff Paulsen - IBM
  6. Perry Schmidt (in and out) - IBM
  7. Dave Solt - IBM
  8. Brice Goglin - Inria
  9. Howard Pritchard - LANL
  10. George Bosilca - UTK
  11. Takahiro Kawashima - Fujitsu
  12. Shinji Sumimoto - Fujitsu
  13. Annu Dasari - Intel
  14. Nysal Jan K A - IBM
  15. Sylvain Jeaugey - NVIDIA
  16. Artem Polyakov - Mellanox

Suggested topics:

  1. Now that high-speed networks can be accessed via multiple network stacks (and multiple Open MPI components), users are getting confused about how to (un)select specific networks. We need to figure out a better / easier way for users to convey what they want. See http://www.open-mpi.org/community/lists/devel/2015/10/18154.php for more detail.
    1. One (very loose and probably not yet well thought-out) idea is to have some kind of higher-level abstraction: --[enable|disable] NETWORK_TYPE:QUALIFIER
  2. Mellanox/LANL have raised a good point that customers do not want to have multiple different Open MPI installations in their environments (e.g., Vendor A OMPI and Vendor B OMPI and community OMPI).
    1. How can a customer have a single OMPI installation, but still have vendor/distribution-specific enhancements?
    2. Or is that the wrong question -- should we really be working to enable individual component distribution? This would disallow vendors from distributing patches to core.
  3. Can we double check that all vendor Open MPI distributions are appropriately marked? E.g., via --with-ident-string? Not sure how to do this other than to ask each vendor/distribution -- perhaps we should default to "Unknown distribution" and have the nightly/release scripts set the official strings...?
  4. Re-discuss the separation of our libraries into libmpi, libopen-rte, and libopen-pal.
    • Beginning of the project: there was just libmpi. Later, it was split into projects, and then the project libraries. Later, the build was unified back into libmpi again.
    • In Dec 2012 (here's the commit), we split the build back into 3 libraries. The commit message cites discussion at the Dec 2012 Open MPI dev meeting -- but there's unfortunately no clues as to the rationale why this was done in the wiki notes. Was it just because we developers like having 3 smaller libraries? Or is there some deeper technical issue? Neither Ralph nor Jeff remembers. 😦
    • Rationale for bringing this up again: when upstream projects are trying to link against portions of our project, and then also support apps that link against all of it, we run into conflicts (e.g., the ORTE being used by the upstream project may be different than the one being used by the OMPI installation). Slurping it all up into one library would resolve the problem - but we cannot recall if there are undesirable side-effects.
  5. Nathan: Do we really need --enable-mpi-thread-multiple any more?
  6. Geoff Paulsen: fast path for send when only 1 outstanding request (a la Platform MPI)
  7. Ralph/George/Jeff: establishing dependencies in frameworks. This is a somewhat larger topic, but for the time being we will focus on a simple usage scenario...
    • In December, Ralph/George/Jeff talked about Intel's desire to use some of OPAL/ORTE/OMPI's frameworks in its own projects. Ralph had previously been copying source code around between repositories, and it was pretty much a mess.
    • After much discussion, we realized that Ralph can just link projects like SCON against libopen-rte.so and just access the frameworks that he wants. I.e., do a full install of ORTE (or probably OMPI?) using --with-devel-headers, and then literally treat OMPI/ORTE/OPAL frameworks just like any other shared library. Yay!
    • Sidenote: just for cleanliness, Ralph will "tighten up" each ORTE framework to require as few dependencies as possible
    • This is a one caveat, however: the application linking against libopen-rte likely doesn't want to call orte_init() to initialize all of ORTE; it just wants to use a few frameworks. So the application will need to know the "magic order" in which frameworks must be initialized (i.e., the contents and ordering of orte_init() and the ESS framework init [which is where most of the heavy lifting for ORTE initialization actually occurs these days]).
    • What would be better is if the frameworks themselves could declare what their dependencies are. E.g., if the application initializes the ABC framework, the ABC framework should be able to realize that it requires the DEF framework to be initialized first. This could possibly be done with a registration-based system. E.g., in the first few lines of framework open and/or init, it can call something like opal_register_framework_dependency("ABC", "DEF"). Components could even do the same thing; for example, the ob1 PML needs the BTL framework, so it could call opal_register_component_dependency("pml", "ob1", "btl")... or something like this).
    • At this meeting, we'd like to discuss the possibilities for such a system, and sketch out what the API should be.
    • Such a system could actually be used throughout all of OPAL, ORTE, OMPI, OSHMEM. It would not only eliminate the "magic ordering" that we have in opal_init(), orte_init(), ompi_runtime_init(), and oshmem_runtime_init(), but also allow for minimal initialization in cases where not all components are necessary for a particular run.
  8. Ralph: Further cleanup of the code base for project separation
    • Split autogen.pl and configure.ac by project?
    • Cleanup naming conventions - still have "ompi" in the "opal" layer, "ompi" named configure variables in the opal layer, etc.
  9. Howard: let's start discussing the features we want for v2.1.0.
    • OpenMP/Open MPI interop
    • ...add your own favorite features here...
  10. What should be our timeframe for forking for v3.0.0?
  11. Git etiquette: should we start doing "git standard" first-lines in commit messages?
  12. PMIx
    • Fault response APIs - definition, requests
    • Information request APIs - definition, requests
    • If/how we integrate the above into OMPI
  13. Routing framework
    • What does this morph into as we move to OFI/BTLs under RML?
    • What happens to RML resilience, which currently flows through the routed framework?
  14. TCP OOB component currently takes all IP addresses from all peers and tries to connect to them in order. If it fails to connect, it will timeout and move on and try the next IP address to the peer -- but it can stall the job for 30-60 seconds (during MPI_INIT) with no real output/feedback to the user, unless oob_base_verbose is high.
    • Can we make this better?
    • E.g., use usnic-btl-like network graph solving to figure out which local interface should be used to communicate with which remote interface?
    • Or simultaneously open multiple non-blocking connect(2)ions and see which/if any succeed?
    • Or ...?
  15. Discussion that came up on the user list about how to help users ensure that they build Open MPI the way that they think they have built it (e.g., did they really build TM support?): http://www.open-mpi.org/community/lists/devel/2016/01/18497.php
    • How to have a solution that is "easy enough" for average users, but powerful enough to catch common cases (e.g., where feature X's headers/libs are in a non-default location, and user assumes that OMPI found them anyway/included support).
  16. Renaming the component DLL's using the project-level name - i.e., change mca_ess_tm.la to orte_ess_tm.la. This would remove the current restriction against having the same framework name in two different projects, which is becoming more of an issue as ORTE and OPAL are reused.
  17. Remove the barrier at the end of MPI_Init?
  18. Alternative mechanisms for tagging sentinel ompi_proc_t locations that preserve 32-bit support and do not depend on the size of the opal_process_name_t. See https://github.com/open-mpi/ompi/pull/1345
    • This may have been solved already...?
  19. Per #1308, there's an ambiguity between the info key value that a user/application sets with MPI_COMM_INFO_SET and the value that is actually propagated to a child communicator via MPI_COMM_DUP_WITH_INFO: does it use the value that the user set, or the value that OMPI decided to use?
  20. Jeff+Ralph: What to do about conflicting OPAL version numbers?
    • Jeff has an old note about this -- something about conflicting version numbers between OPAL and ORCM...? I don't remember the exact context.
    • The usNIC BTL currently uses the OPAL version number to determine what to do w.r.t. compatibility between the v1.10, master, and v2.x trees.
  21. --host and --hostfile behavior (Ralph's favorite topic!).
  22. Fujitsu: Collaboration of Fujitsu and Open MPI Community
    • Source Code Contribution
    • Collaborative Development
    • Contribution for Quality
  23. Discuss some additional tests
    • Can we have some tests to check to see if there are components that are dependent upon other components (and should not be)?
    • @edgargabriel has some issues in OMPIO that he'd like to do better: many of the OMPIO sub-frameworks (e.g., fcoll, fbtl, ...etc.) require the functionality of the ompio component in the io framework. How should he do this?
  24. Do we want to create some templates for Github issues / pull requests?
  25. Per https://github.com/open-mpi/ompi/issues/1379, it looks like MPI processes are "becoming invisible" to the resource manager (SLURM, in Cisco's case). Ralph and Jeff are pretty sure that are some point in the past, we set the orted to create its own process group and launch all MPI processes in that. This means that if/when the RM tries to kill the process group that it launches, it won't kill any of the MPI processes (because they opted out / created their own process group).
    • Ralph thinks that this may have been done so that we could deliver a signal to the entire MPI process group and not signal the orted. He remembers that this was a Sun-asked-for feature.
    • We might not want that behavior by default any more -- for the reasons cited on https://github.com/open-mpi/ompi/issues/1379 (i.e., that old stale processes can get left around and not killed by the resource manager).
    • Let's check the code and decide if we want to revisit this decision of creating a process group by default.
  26. Review MTT database schema
    • There are evolving requirements on this schema - e.g., correlation to external data, addition of inventory, referencing the actual .ini file. Let's see if a more flexible schema can be devised that can accommodate the broader set of requirements
  27. Thread multiple support - where are we on this?
  28. Async progress - status? What still needs to be done?

Clone this wiki locally