-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assigning an array element to another is sometimes abysmally slow #67655
Comments
When you stop the debugger and it shows on this line, what is the full backtrace? (If you're using Xcode, you may need to go into the LLDB window and type P.S. Pausing the debugger at random is one of my favorite performance-testing tricks. ;-) |
Good call on the stack trace, I noticed that the top was busy doing a memmove but didn't spot the
|
Ah. That was @jckarter's guess, and yes it would certainly explain the slowdown. Somehow, the assignment
is holding an access to the array (for the read) while performing the write, thus forcing a copy. Maybe @atrick, @nate-chandler, or @meg-gupta have ideas here? |
Yes, the access on the RHS needs to complete before the access on the LHS starts. If that's not the case, then we have a serious problem. |
Looks like this is a struct vs class issue:
Bench:
|
After some R&R with the profiler, my key takeaway from all of this is that once you're certain that your accesses are synchronized and within bounds, swap out |
@za-creature Sorry for losing the thread here.
|
Happy to share my code with Apple, but that's a "for-future-reference" sort of thing: I've re-ran the code from my prevous comment on my phone (13 mini) and got the same result with
I'll send you the full project if you need it, but iirc, I just created a new swiftui app, deleted all files but the one that contained |
Let me try that... |
Tried that and found something interesting: If I set Xcode to Debug build, I do see the discrepancy you describe. But in a Release build, I see very similar performance for these two cases. Do you also see that? |
@meg-gupta @eeckstein: Are we missing some borrow scope fixup pass in -Onone builds? |
Can somewhat confirm (haven't figured out how to do a release build but that's probably cause I deleted too much stuff from my project): The previous timings (12s vs 30ms) hold for For the other options, I had to bump For
With '-O', they're
I consider the It might be that the root cause is still there, but the optimizer is smart enough to at least partially fix it in this simple case. It's good that this isn't an issue for prod builds, but the fact that this appears to be resolved by the optimizer and not the code generator is still making me feel uneasy somehow. It's been about a decade since I last wrote low level code so I'm not up to date with compiler etiquette but by my (possibly dated) knowledge, I'm willing to pay a 50% performance penalty for range checks and the ability to attach a debugger, but not an order of magnitude more. |
Thank you. I agree that Debug builds should not show a discrepancy this big; I just wanted to make sure I was actually seeing the same issue. It looks like there might be some optimization passes that should be enabled in -Onone and -Osize but are not currently. (A number of our "optimization passes" handle what you might consider "code generation." It's just a peculiarity of how we've organized the compiler internals.) |
Description
In certain, tricky to reproduce, circumstances,
is (at least) an order of magnitude slower than
Steps to reproduce
Apologies for the long report, I did my best to replicate this using a simpler example but was unable to.
I've implemented a bounded priority queue using an interval heap (tldr: an inline min-max heap a.k.a. the unholy mixture of a min heap with a max heap) and this code is derived from the fuzzer I used to test it (please note that this is work in progress and not available for redistribution)
The actual issue is around line 77 in the following code:
Expected behavior
These are the timings I measured on my setup (macx86 compiler, iphone 13 mini target):
For samples = 100k, capacity = 50_002 (default included in report, sorry for asking you to (un)comment some code to compare)
For samples = 1M, capacity = 500_002 (heads up, this takes about 7 minutes)
Heaps are nominally n log n (assuming my implementation is at least somewhat correct), so the
with local
timings check out.Not sure where the slowdown comes from, but it's a pretty big one:
Environment
The text was updated successfully, but these errors were encountered: