Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharrow mode not accessing xarray skim object as expected #662

Closed
i-am-sijia opened this issue Mar 30, 2023 · 4 comments
Closed

Sharrow mode not accessing xarray skim object as expected #662

i-am-sijia opened this issue Mar 30, 2023 · 4 comments
Assignees
Labels
Bug Something isn't working/bug f

Comments

@i-am-sijia
Copy link
Contributor

Describe the bug

The 2-zone profiling exercise profiled full size placeholder_sandag and placeholder_psrc models, and found out that Accessibility Calculation is the memory peak for the two example 2-zone models. After tracing line by line, it turns out the following line gets triggered to access the xarray skim object, causing the memory spike. See more details in the presentation at Meeting 2023.03.28, the slides include a chart of memory comparison between sharrow turned on vs off.

result = self.dataset.iat(**x, _name=key) # iat strips data encoding
# if 'digital_encoding' in self.dataset[key].attrs:
# result = array_decode(result, self.dataset[key].attrs['digital_encoding'])

image

Jeff pointed out that this line should not be triggered when it's running sharrow mode, something must be failing in the sharrow implementation. If implemented correctly, there should not be a memory spike when accessing skim object in sharrow mode.

@i-am-sijia i-am-sijia changed the title 🐛 Bug: Sharrow mode not accessing xarray skim object as expected Sharrow mode not accessing xarray skim object as expected Mar 30, 2023
@i-am-sijia i-am-sijia added the Bug Something isn't working/bug f label Mar 30, 2023
@wusun2
Copy link

wusun2 commented Mar 30, 2023

Thanks @i-am-sijia . Now that Phase 8 2-zone memory profiling revealed a Sharrow bug. Should we re-open #611? @jpn--

@jpn--
Copy link
Member

jpn-- commented Mar 30, 2023

My initial take on this from our call was "bug", but on further review I've determined it's not actually a bug per se, but rather the outcome of intentionally not using sharrow within the accessibility module. The accessibilities component is different enough from the other components in a few key ways that it was left in "fallback" mode. This was completely consistent with the Phase 7 scope, which specifically called for skipping sharrow on limited numbers of components when absolute backwards compatibility was not readily available. The runtime is roughly equivalent with sharrow on and off, my failure to notice that the memory usage from using the sharrow skim dataset inside the legacy code was spiking so high is an oversight. I am looking into solution options.

@stefancoe
Copy link
Contributor

stefancoe commented Apr 3, 2023

I hope we can discuss options to reduce the memory requirements of accessibility (as well as mandatory tour scheduling) at tomorrow's consortium meeting. Sijia's memory profiling work indicates that PSRC's memory peak would be 160 GB leaving out these two models. While still high, this is doable for us as our modeling machines have additional capacity for about 260 GB total. One potential short term fix would be to support a simpler form of chunking for these two models. Accessibility runs in just a couple of minutes on our 128 GB machines with 12 processes and 3 chunks. So perhaps we could add settings that specify the exact number of chunks for these memory intensive models. I am not sure about chunking compatibility with sharrow and 0-based recoding of zones, but given the existing chunking infrastructure in the code, perhaps not a huge lift? I recognize that we are trying to move away from chunking, but I feel like the problems started when we went from the initial chunking implementation to dynamic chunking, which added some complexities to the code and implementation. So perhaps we could go back to a very simple implementation as a short term solution? Anyhow, just a thought.

**Edit- I just want to clarify our goal is to run ActivitySim with multiprocessing & sharrow on a machine with a reasonable amount of RAM. Currently, the only way for PSRC to run accessibility and mandatory tour scheduling on such a machine is to use chunking, which, if I understand correctly, is not compatible with sharrow.

@jpn--
Copy link
Member

jpn-- commented May 24, 2023

Closed by ActivitySim/sharrow#35

@jpn-- jpn-- closed this as completed May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working/bug f
Projects
None yet
Development

No branches or pull requests

4 participants