Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of type-analysis #1743

Closed
vchuravy opened this issue Feb 19, 2024 · 6 comments · Fixed by #1744
Closed

Performance of type-analysis #1743

vchuravy opened this issue Feb 19, 2024 · 6 comments · Fixed by #1744

Comments

@vchuravy
Copy link
Member

@swilliamson7 code from EnzymeAD/Enzyme.jl#1284 (comment) highlights very long compilation time in type-analysis. Happy to share the vtunes file if needed.

The hottest function are all related to type-tree:

image

in particular VisitGEPOperator seems to be dominating

image

@wsmoses
Copy link
Member

wsmoses commented Feb 19, 2024

So at least some of the shiftindices calls within GEP should be able to be made extremely quick, which should cut down on this quite a bit.

I see almost all of the visitCallBase is hitting the julia rule, which means this isn't running interprocedurally (which is good) so this is possible in theory to extract to a single llvm function which has the same behavior

@vchuravy
Copy link
Member Author

The version in michel2323/ShallowWaters.jl@6341944 where @michel2323 fixed Checkpointing.jl takes ~4m and shows a second bottleneck in the call to julia_type_rule. I wonder if we could cache those calls?

image

@wsmoses
Copy link
Member

wsmoses commented Feb 19, 2024

Even better than caching the call we could also just use type metadata and never call back into julia.

Enzyme now supports the ability to parse type data on args/insts/etc from metadata/attributes, that we could do a pass first within julia to add to all julia args that are known, then just still tell TA in Julia to be disabled [and it should still use the llvm native metadata].

@wsmoses
Copy link
Member

wsmoses commented Feb 19, 2024

here's an example where we do so currently for an alloca: https://github.com/EnzymeAD/Enzyme.jl/blob/1829706d90c4fa2110edb8ac341a8aed68ed3f6f/src/compiler.jl#L4130

we already push the julia type as attributes here: https://github.com/EnzymeAD/Enzyme.jl/blob/1829706d90c4fa2110edb8ac341a8aed68ed3f6f/src/compiler.jl#L4497
might as well also push the typetrees

@wsmoses
Copy link
Member

wsmoses commented Feb 19, 2024

I've found (and am completing a fix for) some accidentally quadratic behavior in ShiftIndices. Hopefully this should substantially cut down on that overhead, or at least reduce it.

@wsmoses
Copy link
Member

wsmoses commented Feb 20, 2024

@vchuravy let's split this the ShiftIndicies and julia typetree into separate issues. Can you open a new issue for the latter on Enzyme.jl?

And can you check to see if the linked PR resolved the ShiftIndices part of the flamegraph?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants