You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FastAI.jl currently has some data container functionality that I've found very useful. On the last ML ecosystem call, I and @darsnack + @ToucheSir discussed that it makes sense to have some of that in MLUtils.jl. The relevant FastAI.jl code can be found here: transformations.jl
Specifically, there are some data container transformations that I believe should be ported:
mapobs(f, data), a lazy map over any data container. Generally useful
groupobs(f, data), returns a Dict with keys return values of f(obs) and values a datasubset of obss that returned the same f(obs). Not sure if Dict is the right type here, but NamedTuple is too restrictive. Useful for example to create train/test splits based on some value in each observation.
filterobs(f, data), does what you'd expect, returning a datasubset
joinobs(datas...) treats multiple data containers as a single one. Open to a better name for this.
There are also some data container primitives for working with tables and files, but let's put that into another issue.
The text was updated successfully, but these errors were encountered:
FastAI.jl currently has some data container functionality that I've found very useful. On the last ML ecosystem call, I and @darsnack + @ToucheSir discussed that it makes sense to have some of that in MLUtils.jl. The relevant FastAI.jl code can be found here:
transformations.jl
Specifically, there are some data container transformations that I believe should be ported:
mapobs(f, data)
, a lazy map over any data container. Generally usefulgroupobs(f, data)
, returns aDict
with keys return values off(obs)
and values adatasubset
of obss that returned the samef(obs)
. Not sure ifDict
is the right type here, butNamedTuple
is too restrictive. Useful for example to create train/test splits based on some value in each observation.filterobs(f, data)
, does what you'd expect, returning adatasubset
joinobs(datas...)
treats multiple data containers as a single one. Open to a better name for this.There are also some data container primitives for working with tables and files, but let's put that into another issue.
The text was updated successfully, but these errors were encountered: