Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always inspect the task-local context when verifying before freeing. #1462

Merged
merged 1 commit into from
May 9, 2022

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Apr 5, 2022

For some reason, the thread-bound context can become desynchronized
from the task-local one. Generally we don't notice this, because we
prefix every API call with a call that synchronizes both. Here, however,
we explicitly didn't to avoid initializing the state as that was thought
to cause the kind of initialization that may have to yield (which is
unsupported when done so from a finalizer).

However, just creating the task local state shouldn't result in yield,
only creating a stream does, like querying active_state as was done
before #1383.

Fixes #1454.
@DhairyaLGandhi Could you verify this doesn't re-introduce the errors when finalizing arrays?
Ultimately though this needs JuliaLang/julia#35689.

@maleadt maleadt added the bugfix This gets something working again. label Apr 5, 2022
@DhairyaLGandhi
Copy link
Member

I assume the same test of distributed training would be sufficient to test this?

@maleadt
Copy link
Member Author

maleadt commented Apr 5, 2022

Yes, that would suffice.

For some reason, the thread-bound context can become desynchronized
from the task-local one. Generally we don't notice this, because we
prefix every API call with a call that synchronizes both. Here, however,
we explicitly didn't to avoid initializing the state as that was thought
to cause the kind of initialization that may have to yield (which is
unsupported when done so from a finalizer).

However, just creating the task local state shouldn't result in yield,
only creating a stream does, like querying `active_state` as was done
before #1383.
@codecov
Copy link

codecov bot commented May 9, 2022

Codecov Report

Merging #1462 (9928855) into master (554dcc4) will decrease coverage by 0.01%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##           master    #1462      +/-   ##
==========================================
- Coverage   72.64%   72.63%   -0.02%     
==========================================
  Files         131      131              
  Lines        9805     9805              
==========================================
- Hits         7123     7122       -1     
- Misses       2682     2683       +1     
Impacted Files Coverage Δ
lib/cufft/fft.jl 87.59% <50.00%> (-0.29%) ⬇️
src/pool.jl 76.92% <50.00%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 554dcc4...9928855. Read the comment docs.

@maleadt
Copy link
Member Author

maleadt commented May 9, 2022

Awaiting a response, let's go ahead and merge this since it does fix a known issue.

@maleadt maleadt merged commit 056a526 into master May 9, 2022
@maleadt maleadt deleted the tb/free_context branch May 9, 2022 13:32
simonbyrne pushed a commit to simonbyrne/CUDA.jl that referenced this pull request Nov 13, 2023
…uliaGPU#1462)

For some reason, the thread-bound context can become desynchronized
from the task-local one. Generally we don't notice this, because we
prefix every API call with a call that synchronizes both. Here, however,
we explicitly didn't to avoid initializing the state as that was thought
to cause the kind of initialization that may have to yield (which is
unsupported when done so from a finalizer).

However, just creating the task local state shouldn't result in yield,
only creating a stream does, like querying `active_state` as was done
before JuliaGPU#1383.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This gets something working again.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error while freeing DeviceBuffer-warning when using multiple GPUs
2 participants