Skip to content

WeeklyTelcon_20160426

Geoff Paulsen edited this page Apr 26, 2016 · 8 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Jeff Squyres
  • Todd Kordenbrock
  • Sylvain Jeaugey
  • Ralph
  • Nysal
  • Nathan Hjelm
  • Joshua Ladd
  • Howard
  • Geoff Paulsen

Agenda

Review 1.10

  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
    • PR 1097 - for 1.10 may be mute.
    • PSM2 issue short version. PSM2 API - uses a fixed UUID - so all jobs across cluster use same UUID (bad)
    • Jeff will check 1.10.3 lib versions. Ralph already updated for 1.10.3, but jeff will check

Review 2.0.x

Review Master MTT testing (https://mtt.open-mpi.org/)

  • Widespread failure of mpool / rcache failure on usNIC last night.
  • Ralph is seeing a bunch of attribute failures on 1.10.
    • Jeff is passing in BTL parameters that limits him to a shared memory component, but it's going across nodes. So the attribute thinks it's failing, because some of them can't communicate.
    • 1.10 is hanging if it doesn't get enough slots.

MTT Dev status:

Status Updates:


Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally