Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MPI test #518

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Adding MPI test #518

wants to merge 3 commits into from

Conversation

michel2323
Copy link
Collaborator

Let me know if you want to keep the trailing whitespaces.

@michel2323
Copy link
Collaborator Author

#585 @vchuravy This should be merged I guess. Just checking again.

test/runtests.jl Outdated Show resolved Hide resolved
@vchuravy
Copy link
Member

Can you rebase once more?

@michel2323
Copy link
Collaborator Author

@vchuravy @wsmoses The MPI test crashes now. With 1.9 we have the finalizer issue, and with 1.8 the logs and Manifest.toml are attached.
log.tar.gz

@wsmoses
Copy link
Member

wsmoses commented Nov 8, 2023

Can you open an issue with the error message?

@michel2323 michel2323 mentioned this pull request Nov 8, 2023
@codecov-commenter
Copy link

codecov-commenter commented Nov 8, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.43%. Comparing base (90e07c1) to head (f16f50b).
Report is 6 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #518       +/-   ##
===========================================
+ Coverage   75.39%   93.43%   +18.03%     
===========================================
  Files          35        7       -28     
  Lines       10653      259    -10394     
===========================================
- Hits         8032      242     -7790     
+ Misses       2621       17     -2604     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@michel2323
Copy link
Collaborator Author

michel2323 commented Nov 8, 2023

@wsmoses @vchuravy I marked the test here as broken so we can merge the MPI tests and mark it unbroken with the fix. Issue opened #1138.

@wsmoses wsmoses force-pushed the ms/mpi branch 2 times, most recently from 0e829b9 to a1745a5 Compare November 20, 2023 02:37
@wsmoses
Copy link
Member

wsmoses commented Nov 20, 2023

@michel2323 i rebased this PR with the jll with your fix, if it passes let's merge this!

test/runtests.jl Outdated Show resolved Hide resolved
@wsmoses
Copy link
Member

wsmoses commented Nov 20, 2023

@vchuravy this still fails presumably for the need for: #669

from

2023-11-20T03:26:35.5348040Z error: /home/runner/.julia/packages/MPI/hhI6i/src/api/generated_api.jl:2151:0: in function preprocess_julia_Isend_50847 {} addrspace(10)* ({} addrspace(10)*, i64, {} addrspace(10)*): Enzyme: failed to deduce type of copy   %65 = call i32 @MPI_Isend(i64 %58, i32 %value_phi4, i32 %61, i32 %52, i32 0, i32 %62, i64 %64) #34 [ "jl_roots"({} addrspace(10)* %2, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139848936934304 to {}*) to {} addrspace(10)*), {} addrspace(10)* %value_phi5, {} addrspace(10)* %value_phi3) ], !dbg !107

2023-11-20T03:26:35.5120729Z   %value_phi5 = phi {} addrspace(10)* [ null, %pass ], [ addrspacecast ({}* inttoptr (i64 139848936932704 to {}*) to {} addrspace(10)*), %L54 ]: {[-1]:Pointer}, intvals: {0,}
2023-11-20T03:26:35.4979683Z   %59 = bitcast {} addrspace(10)* %value_phi5 to i32 addrspace(10)*, !dbg !123: {[-1]:Pointer}, intvals: {0,}
2023-11-20T03:26:35.4980592Z   %60 = addrspacecast i32 addrspace(10)* %59 to i32 addrspace(11)*, !dbg !123: {[-1]:Pointer}, intvals: {0,}
2023-11-20T03:26:35.4981343Z   %61 = load i32, i32 addrspace(11)* %60, align 4, !dbg !123, !tbaa !45: {}, intvals: {}

While I do agree custom rules are good (and we can redo that API to be a custom global invariant rule, the issue of that MPI.Double being hidden behind a separate julia-specific global int is hindering the analysis here (and also likely would for other libXYZ calls potentially, for example CUDA).

@vchuravy vchuravy added this to the release-0.12 milestone Apr 1, 2024
Comment on lines +15 to +17
buf = @view x[1:1]
push!(requests, MPI.Isend(x[2:2], MPI.COMM_WORLD; dest=rank-1, tag=0))
push!(requests, MPI.Irecv!(buf, MPI.COMM_WORLD; source=rank-1, tag=0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
buf = @view x[1:1]
push!(requests, MPI.Isend(x[2:2], MPI.COMM_WORLD; dest=rank-1, tag=0))
push!(requests, MPI.Irecv!(buf, MPI.COMM_WORLD; source=rank-1, tag=0))
buf = @view x[1:1]
push!(requests, MPI.Isend(x[2:2], MPI.COMM_WORLD; dest=rank - 1, tag=0))
push!(requests, MPI.Irecv!(buf, MPI.COMM_WORLD; source=rank - 1, tag=0))

Comment on lines +19 to +22
if rank != np-1
buf = @view x[end:end]
push!(requests, MPI.Isend(x[end-1:end-1], MPI.COMM_WORLD; dest=rank+1, tag=0))
push!(requests, MPI.Irecv!(buf, MPI.COMM_WORLD; source=rank+1, tag=0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
if rank != np-1
buf = @view x[end:end]
push!(requests, MPI.Isend(x[end-1:end-1], MPI.COMM_WORLD; dest=rank+1, tag=0))
push!(requests, MPI.Irecv!(buf, MPI.COMM_WORLD; source=rank+1, tag=0))
if rank != np - 1
buf = @view x[end:end]
push!(requests,
MPI.Isend(x[(end - 1):(end - 1)], MPI.COMM_WORLD; dest=rank + 1, tag=0))
push!(requests, MPI.Irecv!(buf, MPI.COMM_WORLD; source=rank + 1, tag=0))

Comment on lines +33 to +38
n = np*10
n1 = Int(round(rank / np * (n+np))) - rank
n2 = Int(round((rank + 1) / np * (n+np))) - rank
nl = rank == 0 ? n1+1 : n1
nr = rank == np-1 ? n2-1 : n2
nlocal = nr-nl+1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
n = np*10
n1 = Int(round(rank / np * (n+np))) - rank
n2 = Int(round((rank + 1) / np * (n+np))) - rank
nl = rank == 0 ? n1+1 : n1
nr = rank == np-1 ? n2-1 : n2
nlocal = nr-nl+1
n = np * 10
n1 = Int(round(rank / np * (n + np))) - rank
n2 = Int(round((rank + 1) / np * (n + np))) - rank
nl = rank == 0 ? n1 + 1 : n1
nr = rank == np - 1 ? n2 - 1 : n2
nlocal = nr - nl + 1

fill!(context.x, Float64(rank))
halo(context)
if rank != 0
@test context.x[1] == Float64(rank-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
@test context.x[1] == Float64(rank-1)
@test context.x[1] == Float64(rank - 1)

Comment on lines +45 to +46
if rank != np-1
@test context.x[end] == Float64(rank+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
if rank != np-1
@test context.x[end] == Float64(rank+1)
if rank != np - 1
@test context.x[end] == Float64(rank + 1)

try
include("mpi.jl")
mpiexec() do cmd
run(`$cmd -n 2 $(Base.julia_cmd()) --project=$testdir $testdir/mpi.jl`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
run(`$cmd -n 2 $(Base.julia_cmd()) --project=$testdir $testdir/mpi.jl`)
return run(`$cmd -n 2 $(Base.julia_cmd()) --project=$testdir $testdir/mpi.jl`)

end
@test mpi_test
end


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

test/runtests.jl Outdated
b = Float64[11, 13]
db = zero(b)

forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)}, Duplicated, Duplicated{typeof(A)}, Duplicated{typeof(b)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)}, Duplicated, Duplicated{typeof(A)}, Duplicated{typeof(b)})
forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)},
Duplicated, Duplicated{typeof(A)},
Duplicated{typeof(b)})

test/runtests.jl Outdated

db = zero(b)

forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)}, Duplicated, Const{typeof(A)}, Duplicated{typeof(b)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)}, Duplicated, Const{typeof(A)}, Duplicated{typeof(b)})
forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)},
Duplicated, Const{typeof(A)},
Duplicated{typeof(b)})

test/runtests.jl Outdated

dA = zero(A)

forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)}, Duplicated, Duplicated{typeof(A)}, Const{typeof(b)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)}, Duplicated, Duplicated{typeof(A)}, Const{typeof(b)})
forward, pullback = Enzyme.autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(\)},
Duplicated, Duplicated{typeof(A)},
Const{typeof(b)})

@michel2323
Copy link
Collaborator Author

michel2323 commented Apr 1, 2024

I get this now on the most recent Enzyme build:

ERROR: LoadError: ERROR: LoadError: Enzyme execution failed.
Enzyme: jl_call calling convention not implemented in aug_forward for   %33 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* noundef nonnull @jl_f_finalizer, {} addrspace(10)* noundef null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140298333135136 to {}*) to {} addrspace(10)*), {} addrspace(10)* nofree nonnull align 8 dereferenceable(16) %newstruct15.i) #29, !dbg !149
Stacktrace:
 [1] finalizer
   @ ./gcutils.jl:87
 [2] Request
   @ /disk/mschanen/julia_depot/packages/MPI/z2owj/src/nonblocking.jl:183

This is the request issue @vchuravy mentioned. How do we proceed with MPI in Julia? Should we start an Enzyme extension for MPI/MPI extension for Enzyme?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants