You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yup, that seems like a performance issue that should be fixed. I am one of the few users of this feature (I contributed it a year ago), but I can not commit to try to fix it in the near term. Help would certainly be appreciated if you have the bandwidth to look into it.
This can be done in an inferrable and, if the type isbits (such as Int), also a 0-allocation manner.
If anyone feels like doing this, I'd be happy to answer questions/help out.
Basic idea:
using StrideArraysCore, CPUSummary, Static
T =# type annotationifisbitstype(T)
CLS = CPUSummary.cache_linesize()
ts =static(sizeof(T))
ncols = CPUSummary.sys_threads()
nrows =cld(CLS, ts)
threadlocal_matrix = StrideArray{T](undef, (nrows, ncols))
threadlocal = threadlocal_matrix[1, :] # slice is a view
threadlocal_matrix should be allocated on the stack.
Some care will have to be taken that it doesn't escape, but Polyester itself will GC.@preserve arrays and use PtrArray views of them.
Of course, the sum from that example will not do this, forcing an allocation.
The
threadlocal
which appears outside the@batch
is not inferrable, even if the type is given in the macro.The following code (inspired from the README)
tells me that the content of
threadlocal
and the return type of the function have both typeAny
.The text was updated successfully, but these errors were encountered: