Project Meeting 2024.06.20

Agenda

Admin
- Contracting
- Phase 9b Updates
Phase 9a Updates
- Latest Run Results

Action Items

CS to test SANDAG model, full sample, sharrow on, single process in AWS cloud using Intel vs AMD hardware, keeping the image same (holding all other factors common)
Jeff to test with shared memory for skims completely disabled (single process only)
RSG to test running MTC model keeping all things constant except the sharrow fix all branch code.
WSP to test SFCTA run that is crashing due to insufficient resources with a small sample run, to test the hypothesis that it has something to do with disk space.

Meeting Notes

Project Admin

AMPO Contracting
- Agencies should have received Agreement MOUs from AMPO
- Typically give agencies 3 months to get everything executed and transmitted
Drafting of TOs for Phase 9b
- Joe to follow up with Jeff, Sijia, and David to discuss details

Phase 9a Updates

Latest Run Results
Compared to the start of Phase 9, many changes were made to resolve egregious run time and memory usage performance. There have been a lot of successes, but still a few outstanding things that question the stability of the ActivitySim code.
One outstanding thing not resolved: while there have been successfully runs of the SANDAG model with sharrow on/off, single process, full sample – in one of those tests, it ran very well (on WSP’s machine) and other attempts to do the example same thing but have very different (negative) performance results. Hypotheses include:
- Could be hardware
  - Success on a machine with AMD hardware
  - No success on machines with Intel hardware
  - CS to test this hypothesis on aws – using different instance types, varying AMD and Intel hardware
- Could be the version of numba
  - RSG did a test with a numba version change and it still performed poorly, so that’s not it
- Could be a different hardware-related thing - not the CPU but the bandwidth between the CPU and RAM, but this is harder to test
- Could be related to a shared memory process in sharrow. Sharrow utilizes in multiprocess shared memory for xarray, even when running in single process.
  - Jeff is creating code to test running without any shared memory. Jeff doesn’t know why this would be a problem but is trying anything.
Other outstanding thing – when we’ve attempted to run multiprocess on SFCTA’s server, it is crashing due to a cryptic insufficient resource report. We can’t figure out what resource is insufficient. There’s 1 TB of RAM and presumably plenty of disk space.
- WSP to test SFCTA run that is crashing due to insufficient resources with a small sample run, to test the hypothesis that it has something to do with disk space.
- Longer-term consideration: We may want to find a way to track disk usage/requirements if we get into very large multiprocess runs.
RSG ran the SEMCOG model with and without sharrow. SEMCOG model taking longer to run with 1.3 beta
- With sharrow there’s a reduction of run time from 6.1 hours to 4.2 hours. However, the workplace location choice model takes longer (this was seen with the MWCOG model as well, before Phase 9 work).
- We did see the same pattern in the SANDAG model (see Issue #6)
- Rerunning with updated code, there’s an increase in run time with sharrow. Maybe there’s something in the sharrow fix all branch that’s causing this. RSG to re-run MTC model with the sharrow fix all branch to see if it’s showing worse times; if so, then there’s something in that PR that’s slowing things down. We need to do a new baseline for the MTC model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Meeting 2024.06.20

Agenda

Action Items

Meeting Notes

Project Admin

Phase 9a Updates

ActivitySim

Clone this wiki locally