-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[concurrency] SILOptimizer: optimize hop_to_executor instructions. #34593
Conversation
* Redundant hop_to_executor elimination: if a hop_to_executor is dominated by another hop_to_executor with the same operand, it is eliminated: hop_to_executor %a ... // no suspension points hop_to_executor %a // can be eliminated * Dead hop_to_executor elimination: if a hop_to_executor is not followed by any code which requires to run on its actor's executor, it is eliminated: hop_to_executor %a ... // no instruction which require to run on %a return rdar://problem/70304809
@swift-ci smoke test |
@eeckstein post-commit review! The choice of algorithm seems good, but the only explanation is
That's true but misleading for someone trying to understand the actual algorithm, which nothing to do with dominance. There should be a statement about the optimistic data-flow approach.
This would be much more approachable if the two independent optimizations were separately defined and documented. The only data structure that is worth reusing across them is a block-to-index DenseMap. The implementation is concise and nicely structured, but the algorithm is not evident from the code, so I can't reason about correctness in general. I expected a perfectly straightforward dataflow, and instead I'm baffled after spending hours trying to reverse engineer it from the implementation. It should be possible to understand the dataflow algorithm by reading the definition of
But from that, I can't tell what the data flow lattice means and I can't make sense of the merge operation from the definition of BlockState. That's because it is serving all of these purposes without any description or self-explanatory interface:
All of this should be clear in the definition of BlockState itself. The first step is to use a separate class for the two unrelated optimizations. There's no need to reuse the BlockState type across optimizations because 'blockStates' can be a single allocation based on the computed number of blocks. If you free one after the first optimization, and allocate another before the next optimization, the system will reuse the memory for you. (Re)initializing that memory is far more expensive than the allocation. While enumerating blocks you can populate a block-to-index map that can be reused across optimizations. In fact, I've often wanted this to be an analysis for use in many passes. Then we can deal with each dataflow on its own. Backward dataflow, for example only needs two lattice states
While the transfer function needs three states:
You can combine the lattice and local info into a single enum if you want, but then you should explain what you're doing. What does the
There must be a reason the code chooses specific values for the states, but that isn't explained. Note that it might be more efficient to initialize object bits to zero. Later, burried within the code, we find out that these states are actually storing actor indices for one of the optimizations!
It isn't clear what It also isn't clear why you want to store
I don't understand why there would be a firstRound check. The dataflow initialization handles the first round of dataflow and happens earlier. If global data flow propagation needs to do something different across iterations, then something is broken. I would expect to see the body of the data flow propagation loop to be:
I've been seeing a lot of these 'while(changed)' style data flow loops in the code.
Then, for straight-line code processed postorder, you never enter the global data flow loop at all and do half as many CFG traversals and merges in the common case. Of course, the worklist itself is more expensive and complicates the code.
I don't understand why state.exit is needed, or why it would ever be set to NotSet. NotSet doesn't mean anything for global data flow, it only indicates that there is no transfer function for some piece of code. I would expect initial local propagation to process the blocks like this, assuming they are laid out in something close to reverse-postorder:
You should not need to iterate over all the instructions again, only the blocks that have hop instruction, which is very uncommon. You can simply add the blocks that contain a hop instruction to a vector and that can drive the optimization. Then there is no reason to store state.exit. It's trivial to call mergeSuccesors() for the few blocks that needs to be optimized before running the optimization.
With OSSA we have LoadBorrows and StoreBorrows. For newly written optimizations, we may want to take this into consideration by using some kind of load/store abstraction. @gottesman @meghana does one exist? |
@atrick thanks for the review! I agree, the comments could be better.
This is a good idea.
Without this, I would need to replicate the transfer function in the dataflow initialization, which is surprisingly easy to get wrong. After hitting several bugs with this (also in other optimizations), I decided to do it with a firstFound check. It's an easy and safe way to avoid such bugs.
Yeah, it's probably not needed. I'll make some improvements the next time I touch this optimization. |
You can have a single transfer function
Then you can compute all effects during one initialization pass, the global dataflow propagation only iterates once and makes no changes for straight-line code, and there are no special cases. Otherwise it's hard for me to understand what |
rdar://problem/70304809