forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 4
WeeklyTelcon_20160809
Geoff Paulsen edited this page Aug 9, 2016
·
11 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Ralph
- Arm Patinyasakdikul
- Brian
- Edgar Gabriel
- Nathan Hjelm
- Todd Kordenbrock
- Josh Hursey
- Artem Polyakov
- Milestones
- 1.10.4
- A few PRs to pull in. want folks to focus on 2.0. Once 2.0.1 is out, might begin work on 1.10.4.
- Wiki
- 2.0.1 PRs that are reviewed and approved
- Blocker Issues
- Milestones
- A bunch of 2.0.1 PRs, that are outstanding, but not reviewed yet!
- Need Reviews. "Jeff and Howard aren't going to wait".
- With exception of the performance issue, Freezing TODAY! and will release 2.0.1 the Next Tuesday.
- 2.0.1 issues:
- Performance issues on devel list (Issue 1943???)
- Howard said Nathan said he had a fix, but didn't see it yesterday.
- Issue is adding all RDMA end points, for use by one-sided, but OB1 will try to stripe across ALL RDMA endpoints (slowing things down).
- Nathan working on patch, but it's crashing for Fujitsu. On some systems Open IB is terrible for on-node. CMA or XPM you never want to use openib BTL for on-node.
- Nathan should have something for this today. Working on Fujitsu system.
- disqualifying OpenIB component, but not falling back to Send/eciv.
- PML_OB1_ALL_RDMA - if this is true, will use all RDMA, but defaults to false (same as 1.10, ignores BTLs in the RDMA list, but not on EAGER list) for send/recv.
- Howard said Nathan said he had a fix, but didn't see it yesterday.
- Two blocker issues:
- MPI_Req_free - listed as a blocker.
- Looks like we're stuck.
- Nathan is hitting this in OSC pt2pt.
- Design Flaw: Use the request callback to call start, which calls request callback which calls start.
- instead put it on a list, put it in progress loop, process list.
- Disable Atomics during thread single path (20-30% message rate cost)
- just needs to be closed. If datatype or communicator are intrinsic, don't use atomics.
- IBM was going to do this, but probably a 2.0.2.
- just needs to be closed. If datatype or communicator are intrinsic, don't use atomics.
- MPI_Req_free - listed as a blocker.
- Would like PMIx 1.1.5 in OMPI 2.0.1 -
- Created issue 1930 on Master
- Howard would like to see the impact of NOT merging in PMIX 1.1.5.
- Q. Why do we want to upgrade?
- Artem would need to review.
- OpenSHMEM in 2.0 is DOA??? From last week.
- is this still an issue? https://www.mail-archive.com/users@lists.open-mpi.org/msg00052.html
- Howard mentioned, Would like to add a performance regression test, especially for OB1, perf.
- Performance issues on devel list (Issue 1943???)
- 2.1.0 : No date at the moment.
- Mellanox needs PMIx 2.0 in 2.1.0
- Put items requested on the wiki (e.g., PMIx direct modex, OpenSHMEM, stability improvements)
- What do people want to see for 2.1.0?
- Finalize the list in Dallas meeting
- Hopefully target Sept./Oct. release, not Super Computing Goal.
Review Master MTT testing (https://mtt.open-mpi.org/)
- Update on regenerated Nightly Snapshots?
- when moved website over, the 2.x directory disappeared (Thanks Jeff).
- The branches tried to generate as 2.x, not 2.0 (based on branch name). When the scripts tried to
- Builds Master, 2.x, and 1.10. 2.x failed, so whole script failed. Quickly fixed, and running correctly.
- We could use http for MTT and Jenkins if we don't want to spend the $ on certificates for those sites.
- For now we have 3 month certificates so we will continue discussing at Face2face.
- sending email on commits will be solved soon.
- Missed window for email migration this coming weekend.
- New Mexico - put SSL on open-mpi.org
- August 2016
- If you are coming then make sure to register for the event, and put name on wiki
- Keep accumulating items on the wiki
- Facilities are the same as last year, (but possible different room the first day) Geoff will update wiki today.
-
Open MPI Non-Profit (Ralph)
- Right now in Jeff and Ralph's names.
- Would be easier for Brian and others to give money.
- Need to discuss joining other Non-Profit org.
-
Updates to the
ibm-mpitest suite- v.10 has a bunch of new errors/hangs
- See PR 1262
- v.10 has a bunch of new errors/hangs
-
Assessment of the refactoring request handling
- Artem provided summary of results
- 1.10 vs. master
- We need to run the benchmark more broadly, and some deeper analysis on the results.
- Action items
- Arm to provide a 2.0 version of the benchmark for the community
- Artem to setup a wiki page with details on how to run, as a place to coordinate results
- Folks please run the benchmarks
- Make sure this item stays on the agenda until resolved
- Wiki is here: https://github.com/open-mpi/ompi/wiki/Request-refactoring-test
- LANL
- Houston
- IBM
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel