Skip to content

WeeklyTelcon_20160809

Geoff Paulsen edited this page Aug 9, 2016 · 11 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Ralph
  • Arm Patinyasakdikul
  • Brian
  • Edgar Gabriel
  • Nathan Hjelm
  • Todd Kordenbrock
  • Josh Hursey
  • Artem Polyakov

Agenda

Review 1.10

  • Milestones
  • 1.10.4
    • A few PRs to pull in. want folks to focus on 2.0. Once 2.0.1 is out, might begin work on 1.10.4.

Review 2.0.x

  • Wiki
  • 2.0.1 PRs that are reviewed and approved
  • Blocker Issues
  • Milestones
  • A bunch of 2.0.1 PRs, that are outstanding, but not reviewed yet!
    • Need Reviews. "Jeff and Howard aren't going to wait".
    • With exception of the performance issue, Freezing TODAY! and will release 2.0.1 the Next Tuesday.
  • 2.0.1 issues:
    • Performance issues on devel list (Issue 1943???)
      • Howard said Nathan said he had a fix, but didn't see it yesterday.
        • Issue is adding all RDMA end points, for use by one-sided, but OB1 will try to stripe across ALL RDMA endpoints (slowing things down).
        • Nathan working on patch, but it's crashing for Fujitsu. On some systems Open IB is terrible for on-node. CMA or XPM you never want to use openib BTL for on-node.
      • Nathan should have something for this today. Working on Fujitsu system.
      • disqualifying OpenIB component, but not falling back to Send/eciv.
      • PML_OB1_ALL_RDMA - if this is true, will use all RDMA, but defaults to false (same as 1.10, ignores BTLs in the RDMA list, but not on EAGER list) for send/recv.
    • Two blocker issues:
      • MPI_Req_free - listed as a blocker.
        • Looks like we're stuck.
        • Nathan is hitting this in OSC pt2pt.
        • Design Flaw: Use the request callback to call start, which calls request callback which calls start.
          • instead put it on a list, put it in progress loop, process list.
      • Disable Atomics during thread single path (20-30% message rate cost)
        • just needs to be closed. If datatype or communicator are intrinsic, don't use atomics.
          • IBM was going to do this, but probably a 2.0.2.
    • Would like PMIx 1.1.5 in OMPI 2.0.1 -
      • Created issue 1930 on Master
      • Howard would like to see the impact of NOT merging in PMIX 1.1.5.
      • Q. Why do we want to upgrade?
      • Artem would need to review.
    • OpenSHMEM in 2.0 is DOA??? From last week.
    • Howard mentioned, Would like to add a performance regression test, especially for OB1, perf.
  • 2.1.0 : No date at the moment.
    • Mellanox needs PMIx 2.0 in 2.1.0
    • Put items requested on the wiki (e.g., PMIx direct modex, OpenSHMEM, stability improvements)
    • What do people want to see for 2.1.0?
    • Finalize the list in Dallas meeting
    • Hopefully target Sept./Oct. release, not Super Computing Goal.

Review Master MTT testing (https://mtt.open-mpi.org/)

MTT Dev status:

Website migration

  • Update on regenerated Nightly Snapshots?
    • when moved website over, the 2.x directory disappeared (Thanks Jeff).
    • The branches tried to generate as 2.x, not 2.0 (based on branch name). When the scripts tried to
    • Builds Master, 2.x, and 1.10. 2.x failed, so whole script failed. Quickly fixed, and running correctly.
    • We could use http for MTT and Jenkins if we don't want to spend the $ on certificates for those sites.
      • For now we have 3 month certificates so we will continue discussing at Face2face.
  • sending email on commits will be solved soon.
  • Missed window for email migration this coming weekend.
  • New Mexico - put SSL on open-mpi.org

Open MPI Developer's Meeting

  • August 2016
  • If you are coming then make sure to register for the event, and put name on wiki
  • Keep accumulating items on the wiki
  • Facilities are the same as last year, (but possible different room the first day) Geoff will update wiki today.

New Items:

  • Open MPI Non-Profit (Ralph)

    • Right now in Jeff and Ralph's names.
    • Would be easier for Brian and others to give money.
    • Need to discuss joining other Non-Profit org.
  • Updates to the ibm-mpi test suite

    • v.10 has a bunch of new errors/hangs
  • Assessment of the refactoring request handling

    • Artem provided summary of results
    • 1.10 vs. master
    • We need to run the benchmark more broadly, and some deeper analysis on the results.
    • Action items
      1. Arm to provide a 2.0 version of the benchmark for the community
      2. Artem to setup a wiki page with details on how to run, as a place to coordinate results
      3. Folks please run the benchmarks
      4. Make sure this item stays on the agenda until resolved
      5. Wiki is here: https://github.com/open-mpi/ompi/wiki/Request-refactoring-test

Status Updates: (skipped this week)

  1. LANL
  2. Houston
  3. IBM

Status Update Rotation

  1. LANL, Houston, IBM
  2. Cisco, ORNL, UTK, NVIDIA
  3. Mellanox, Sandia, Intel

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally