Skip to content

More general options like basesize #201

@oxinabox

Description

@oxinabox

Several functions take a basesize option.

One main use of this is when threading to avoid the cost of @spawn dominating over the cost of the actual work.
Set it too low and @spawn cost dominates.
Set it too high, and if the work is uneven then some threads will be sitting around with nothing to do.

basesize makes it easy to specify if you know roughly how long each item should take to process.
Rule of thumb is something like set basesize such that processing that many takes about 1ms.

If one the other hand you don't really have good idea how long something is, but know how even it is. then something else is desired.
If it is expected to be exactly even then optimal is basesize = div(length(work), nthreads()).
If one wants to soften that because less confidant how ev en then perhaps:
basesize = div(length(work), 10nthreads()).

I am not sure the best way to expost this.
One option might be to have say basesize=0 or basesize=:even
todo the even splits.
I suspect even splits isn't a great option for default in the equal case anyway though,
since one might have a thread get taken by another process running (outside of dedicated machines).
Another might be basesize=:auto to do say basesize = div(length(work), 10nthreads()), which is probably a better bet than even.

Perhaps a fuller API would be useful.
say sizing taking a number of possible options like sizing = Basesize(1), or sizing = Even() or sizing = TimeEstimate(mean=0.1, std=0.5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions