-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent memory allocation is slowed by memTrack
#929
Comments
FYI @reuster986 @mhmerrill @pierce314159. This is large part of the reason operations that were doing lots of short-lived concurrent allocations were so slow (initial SipHash, current string regex, cast-to-str.) I'm not really sure how this didn't occur to me before. In hindsight it's pretty obvious and I kept being surprised how much overhead allocation added, but I guess I kept chalking it up to the short-lived nature of the operations There is still a non-trivial overhead to these allocations so optimizing them out is beneficial, but using |
@ronawho Good find! I am fine with a 1MB threshold, since the intent is really to track allocations on the scale of GB or larger anyway. |
Using a threshold has significantly sped up operations with a lot of concurrent allocations. Casting to str for example is 10x faster with chapel 1.25.0 and 100x faster with the main chapel development branch. This has also helped a bunch of other operations: https://chapel-lang.org/perf/arkouda/16-node-cs-hdr/?startdate=2021/08/28&enddate=2021/10/04&configs=release,nightly&graphs=coargsortperformance,groupbyperformance,stringgroupbyperformance (jump on the 10/01) I'm wondering if this performance benefit on main is something we should be considering a 1.25.1 release for. Point releases can be a bit of a pain, but the performance benefits here can be quite significant (and we could roll in other changes like chapel regex optimizations mentioned in #917 (comment)) |
I think these performance gains are pretty significant, so I would be in favor of a 1.25.1, if it's not too much trouble. Are there any other changes coming up that it would make sense to wait for? |
Yeah, reducing allocations in chapel regex code, and possibly some scan scalability improvements are coming up. I also think it would be worthwhile to test out 1.25.0 to make sure there's no correctness bugs first and if there are we could address them then too. I'll get the conversation started about a 1.25.1 internally. |
Since #1027, arkouda recommends Chapel 1.25.1, which includes the memory tracking optimization and there's minimal overhead from |
Discussed a little in #154, but
--memTrack
can seriously hurt the performance of concurrent allocations. Historically, most of Arkouda's operations allocate a few large arrays and relatively little beyond that. However, there were a few times recently where allocations were really hurting performance. The allocations were short lived so a performance hit isn't surprising, but I was surprised by the magnitude of the performance hit and I'm only now putting together that a lot of the overhead is from--memTrack
serializing concurrent allocations.Here's a concurrent allocation micro-benchmark that demonstrates the overhead. Results are on 128-core Rome CPU:
In most cases we'd like to optimize out as much allocation as possible (#266, #917), but in other cases getting rid of the concurrent allocation is hard (#675) and in general I think we want to avoid this large performance pitfall.
My suggestion is to run with
--memThreshold
to limit what allocations are tracked with the thought that highly concurrent allocations will tend to be small. In trying this I found the--memThreshold
did not get rid of as much overhead as I was expecting, but I believe I have addressed that with chapel-lang/chapel#18465. Here are updated performance results with--memThreshold
before and after that PR:The text was updated successfully, but these errors were encountered: