Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threadlocal result is not inferrable #112

Closed
Kolaru opened this issue Jun 13, 2023 · 3 comments · Fixed by #134
Closed

threadlocal result is not inferrable #112

Kolaru opened this issue Jun 13, 2023 · 3 comments · Fixed by #134
Labels
good first issue Good for newcomers

Comments

@Kolaru
Copy link
Contributor

Kolaru commented Jun 13, 2023

The threadlocal which appears outside the @batch is not inferrable, even if the type is given in the macro.

The following code (inspired from the README)

function f()
    @batch threadlocal=rand(10:99)::Int for i in 0:9
        threadlocal += 1
    end
    sum(threadlocal)
end

@code_warntype f()

tells me that the content of threadlocal and the return type of the function have both type Any.

@Krastanov
Copy link
Contributor

Yup, that seems like a performance issue that should be fixed. I am one of the few users of this feature (I contributed it a year ago), but I can not commit to try to fix it in the near term. Help would certainly be appreciated if you have the bandwidth to look into it.

@chriselrod
Copy link
Member

This can be done in an inferrable and, if the type isbits (such as Int), also a 0-allocation manner.

If anyone feels like doing this, I'd be happy to answer questions/help out.
Basic idea:

using StrideArraysCore, CPUSummary, Static
T = # type annotation
if isbitstype(T)
  CLS = CPUSummary.cache_linesize()
  ts = static(sizeof(T))
  ncols = CPUSummary.sys_threads()
  nrows = cld(CLS, ts)
  threadlocal_matrix = StrideArray{T](undef, (nrows, ncols))
  threadlocal = threadlocal_matrix[1, :] # slice is a view

threadlocal_matrix should be allocated on the stack.
Some care will have to be taken that it doesn't escape, but Polyester itself will GC.@preserve arrays and use PtrArray views of them.

Of course, the sum from that example will not do this, forcing an allocation.

@chriselrod
Copy link
Member

Of course, the sum from that example will not do this, forcing an allocation.

So my suggestion there is to extract it to a tuple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants