Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for deferred codegen in !toplevel #556

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vchuravy
Copy link
Member

Fixes EnzymeAD/Enzyme.jl#1173

Need to come up with a standalone test-case.

@jgreener64
Copy link

Could this get merged?

@wsmoses
Copy link
Contributor

wsmoses commented May 13, 2024

@vchuravy gentle bump

Copy link

codecov bot commented May 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.45%. Comparing base (288b613) to head (0e00885).

❗ Current head 0e00885 differs from pull request most recent head da0e767. Consider uploading reports for the commit da0e767 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #556      +/-   ##
==========================================
- Coverage   82.86%   73.45%   -9.41%     
==========================================
  Files          24       24              
  Lines        3361     3330      -31     
==========================================
- Hits         2785     2446     -339     
- Misses        576      884     +308     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vchuravy
Copy link
Member Author

I tried reproducing the Enzyme MWE with examples/jit.jl but the stars didn't align.

@maleadt do you recall why you limited this to toplevel originally? I assume because it wouldn't make sense for device side launched kernels.

@vchuravy vchuravy marked this pull request as ready for review May 16, 2024 01:30
Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning is that deferred functions called from the entrypoint (which is what the compiler driver is architected around) can be discovered recursively without having to expand them during deferred compilation. I guess that this doesn't hold with Enzyme because the deferred compilation jobs aren't called by the main entrypoint? To support this, the compiler driver will need to be changed. For example, if you simply let them expand from within a deferred context as proposed here, the jobs variable at top level won't list all compilation jobs, and the post-processing that happens at top level won't be executing on all functions.

@vchuravy
Copy link
Member Author

Hm... Doesn't the inner call resolve its own deferred codegen first and then both of them get linked into the outer one?

@maleadt
Copy link
Member

maleadt commented May 16, 2024

Yes, but that doesn't correctly populate the jobs variable that's used at top level:

# finish the module
#
# we want to finish the module after optimization, so we cannot do so
# during deferred code generation. instead, process the deferred jobs
# here.
if toplevel
entry = finish_ir!(job, ir, entry)
for (job′, fn′) in jobs
job′ == job && continue
finish_ir!(job′, ir, functions(ir)[fn′])
end
end

We could work around this by passing jobs to the inner deferred codegen generators, but that's a hack. Essentially the current system relies on a compilation job having a single point of entry, even into deferred jobs.

@vchuravy
Copy link
Member Author

Hm I am a bit confused how

https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002

works. I just pushed a commit that returns the job variable from the !toplevel compilation.

@maleadt
Copy link
Member

maleadt commented May 16, 2024

Hm I am a bit confused how

https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002

We probably don't really rely on the finish_ir phase right now.

I'm not particularly happy with threading yet more state through, but if you need this... The compiler driver should be reworked to support compiling multiple translation units without relying on a single entry-point.

@vchuravy
Copy link
Member Author

Yeah I can try to improve this while working on #582

@maleadt
Copy link
Member

maleadt commented May 16, 2024

CUDA.jl CI failure looks related.

@vchuravy
Copy link
Member Author

Ah I see for https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L975

We have an infinite loop since we don't pass through the list of already codegen'd functions. So we recurse into the original top-level and carry on from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

autodiff_deferred with derivatives of order greater than 2
4 participants