Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear thread state to ensure threads local state don't keep references #1046

Merged
merged 3 commits into from Oct 25, 2022

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Oct 25, 2022

We're surely running into JuliaLang/julia#40626.

This is my attempt to clear thread's local storage after multithreaded parsing.

We're surely running into JuliaLang/julia#40626.

This is my attempt to clear thread's local storage after multithreaded
parsing.
@codecov
Copy link

codecov bot commented Oct 25, 2022

Codecov Report

Base: 89.75% // Head: 89.74% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (eb7d005) compared to base (2691812).
Patch coverage: 85.71% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1046      +/-   ##
==========================================
- Coverage   89.75%   89.74%   -0.02%     
==========================================
  Files           9        9              
  Lines        2246     2253       +7     
==========================================
+ Hits         2016     2022       +6     
- Misses        230      231       +1     
Impacted Files Coverage Δ
src/utils.jl 85.44% <83.33%> (-0.04%) ⬇️
src/file.jl 94.62% <100.00%> (+<0.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

(obj::_Returns)(@nospecialize(args...); @nospecialize(kw...)) = obj.value

function clear_thread_states()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quinnj - can you please explain why this is enough to clear thread states?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's detailed in the JuliaLang/julia#40626 issue. Each thread basically has a thread local storage where they will store the most recent Task that it ran, which includes a reference to the result of the task that ran. So to ensure each thread's local state is cleared, we run a dummy no-op task on each thread (via :static) to ensure no references remain to items that should be candidates for GC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

# end up getting stuck in thread local storage
# running clear_thread_states clears out any thread local storage tasks
struct _Returns{V} <: Function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you define _Returns instead of using Base.Returns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into compat issues with 1.6; maybe I just needed to do Base.Returns instead and it was exported starting in 1.7?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, docs states:

Returns requires at least Julia 1.7.

@quinnj quinnj merged commit 050789b into main Oct 25, 2022
@quinnj quinnj deleted the jq/cleanup_spawn_garbage branch October 25, 2022 20:54
@iamed2
Copy link
Contributor

iamed2 commented Oct 28, 2022

I believe this broke multithreaded parsing within asyncmap on Julia 1.6 (might affect other versions).

You can replicate by doing any asyncmap that calls a CSV.read that uses the multithreaded parsing, e.g.:

asyncmap(x -> CSV.read("hw_25000.csv", DataFrame; ntasks=3), 1:3; ntasks=3)

@quinnj
Copy link
Member Author

quinnj commented Nov 2, 2022

@iamed2, would you mind trying out this PR; hopefully that is a little more conservative in clearing thread states.

@iamed2
Copy link
Contributor

iamed2 commented Nov 4, 2022

Looks like that fixed it! Thanks!

Btw I just pulled the CSV file from https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html; my example above should be fully replicable. I originally demonstrated the issue using a bare julia:1.6 docker image with just CSV+DataFrames installed.

Might be worth adding a test. Although I don't think it's likely this exact problem with reoccur, I don't see any tests that involve @async, asyncmap, or nested tasks, and that's both a very common use case for us across several private packages.

@quinnj
Copy link
Member Author

quinnj commented Nov 4, 2022

Ah that makes sense. Thanks for confirming. Yeah, I'll look into adding a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants