Allow for deferred codegen in !toplevel #556

vchuravy · 2024-03-26T22:30:35Z

Fixes EnzymeAD/Enzyme.jl#1173

Need to come up with a standalone test-case.

jgreener64 · 2024-04-18T09:40:58Z

Could this get merged?

wsmoses · 2024-05-13T20:59:47Z

@vchuravy gentle bump

codecov · 2024-05-16T01:27:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.45%. Comparing base (288b613) to head (0e00885).

❗ Current head 0e00885 differs from pull request most recent head da0e767. Consider uploading reports for the commit da0e767 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #556      +/-   ##
==========================================
- Coverage   82.86%   73.45%   -9.41%     
==========================================
  Files          24       24              
  Lines        3361     3330      -31     
==========================================
- Hits         2785     2446     -339     
- Misses        576      884     +308

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vchuravy · 2024-05-16T01:30:08Z

I tried reproducing the Enzyme MWE with examples/jit.jl but the stars didn't align.

@maleadt do you recall why you limited this to toplevel originally? I assume because it wouldn't make sense for device side launched kernels.

maleadt

The reasoning is that deferred functions called from the entrypoint (which is what the compiler driver is architected around) can be discovered recursively without having to expand them during deferred compilation. I guess that this doesn't hold with Enzyme because the deferred compilation jobs aren't called by the main entrypoint? To support this, the compiler driver will need to be changed. For example, if you simply let them expand from within a deferred context as proposed here, the jobs variable at top level won't list all compilation jobs, and the post-processing that happens at top level won't be executing on all functions.

vchuravy · 2024-05-16T12:38:39Z

Hm... Doesn't the inner call resolve its own deferred codegen first and then both of them get linked into the outer one?

maleadt · 2024-05-16T12:49:41Z

Yes, but that doesn't correctly populate the jobs variable that's used at top level:

GPUCompiler.jl/src/driver.jl

Lines 423 to 435 in 288b613

    
           # finish the module 
        
           # 
        
           # we want to finish the module after optimization, so we cannot do so 
        
           # during deferred code generation. instead, process the deferred jobs 
        
           # here. 
        
           if toplevel 
        
               entry = finish_ir!(job, ir, entry) 
        
               for (job′, fn′) in jobs 
        
                   job′ == job && continue 
        
                   finish_ir!(job′, ir, functions(ir)[fn′]) 
        
               end 
        
           end

We could work around this by passing jobs to the inner deferred codegen generators, but that's a hack. Essentially the current system relies on a compilation job having a single point of entry, even into deferred jobs.

vchuravy · 2024-05-16T14:51:43Z

Hm I am a bit confused how

https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002

works. I just pushed a commit that returns the job variable from the !toplevel compilation.

maleadt · 2024-05-16T18:31:29Z

Hm I am a bit confused how

https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002

We probably don't really rely on the finish_ir phase right now.

I'm not particularly happy with threading yet more state through, but if you need this... The compiler driver should be reworked to support compiling multiple translation units without relying on a single entry-point.

vchuravy · 2024-05-16T19:02:56Z

Yeah I can try to improve this while working on #582

maleadt · 2024-05-16T20:39:57Z

CUDA.jl CI failure looks related.

vchuravy · 2024-05-20T20:01:39Z

Ah I see for https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L975

We have an infinite loop since we don't pass through the list of already codegen'd functions. So we recurse into the original top-level and carry on from there.

Allow for deferred codegen in !toplevel

0e00885

vchuravy force-pushed the vc/fix_nested_codegen branch from ce06fc4 to 0e00885 Compare May 16, 2024 01:24

vchuravy marked this pull request as ready for review May 16, 2024 01:30

maleadt requested changes May 16, 2024

View reviewed changes

thread jobs variable through

da0e767

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for deferred codegen in !toplevel #556

Allow for deferred codegen in !toplevel #556

vchuravy commented Mar 26, 2024

jgreener64 commented Apr 18, 2024

wsmoses commented May 13, 2024

codecov bot commented May 16, 2024 •

edited

Loading

vchuravy commented May 16, 2024

maleadt left a comment

vchuravy commented May 16, 2024

maleadt commented May 16, 2024

vchuravy commented May 16, 2024

maleadt commented May 16, 2024

vchuravy commented May 16, 2024

maleadt commented May 16, 2024

vchuravy commented May 20, 2024

Allow for deferred codegen in !toplevel #556

Are you sure you want to change the base?

Allow for deferred codegen in !toplevel #556

Conversation

vchuravy commented Mar 26, 2024

jgreener64 commented Apr 18, 2024

wsmoses commented May 13, 2024

codecov bot commented May 16, 2024 • edited Loading

Codecov Report

vchuravy commented May 16, 2024

maleadt left a comment

Choose a reason for hiding this comment

vchuravy commented May 16, 2024

maleadt commented May 16, 2024

vchuravy commented May 16, 2024

maleadt commented May 16, 2024

vchuravy commented May 16, 2024

maleadt commented May 16, 2024

vchuravy commented May 20, 2024

codecov bot commented May 16, 2024 •

edited

Loading