integration with Flux #54

CarloLucibello · 2020-02-27T22:09:36Z

Hi @Evizero @johnnychen94 ,
I was wondering how to integrate Augmentor in a Flux pipeline. I can think of two options:

once the DataLoader in add DataLoader FluxML/Flux.jl#1051 is merged, we can extend it with a transforms option taking an Augmentor's pipeline, and have Augmentor as a Flux dependency.
keep Augmentor as a separate entity and just showcase its use in model-zoo examples, e.g. Cifar10 standard data augmentation.

I think the first option would make for a more simple and streamlined user experience. What do you think?

Is the example here
https://github.com/Evizero/Augmentor.jl/blob/master/examples/mnist_knet.jl
still a valid template?

The text was updated successfully, but these errors were encountered:

johnnychen94 · 2020-02-27T23:27:31Z

Am ~~updating~~ rewriting the docs... subscribe #52 if you're interested 😄

Briefly speaking, transforms (and potentially workers) option isn't really necessary in Julia because we can always lazily augments the data with https://github.com/JuliaArrays/MappedArrays.jl in parallel with a simple @thread-for loop. I used that in my still-private project and it works well.

Speaking of the Dataloader, although it can provide a similar behavior of that in Pytorch and can be easy to use, I think MLDataUtils is more flexible and sophisticated for this purpose (cc @oxinabox)

CarloLucibello · 2020-02-28T00:06:17Z

Speaking of the Dataloader, although it can provide a similar behavior of that in Pytorch and can be easy to use, I think MLDataUtils is more flexible and sophisticated for this purpose (cc @oxinabox)

Yep, it would be nice to use MLDataUtils. Last time I checked though packages there were not actively maintained and there were a lot of interdependencies difficult to disentangle. Also, I was not sure it was tailored for deep learning needs, maybe more geared towards dataframes. How do I achieve the same functionality as DataLoader with MLDataUtils?

johnnychen94 · 2020-02-28T00:15:18Z

Here's a sample from my project code that uses MLDataUtils, it's not ready/well-organized for review...

...
train_X = BatchView(train_X, batch_size, ObsDim.Last()) # WHCN
train_Y = BatchView(train_Y, batch_size, ObsDim.Last()) # WHCN
...

if use_gpu
    model = gpu(model)
    train_X = mappedarray(CuArray, train_X)
    train_Y = mappedarray(CuArray, train_Y)
end
...

Flux.train!(loss, params(model), zip(train_X, train_Y), opt; cb = evalcb)

oxinabox · 2020-02-28T10:56:22Z

I have commented over at
FluxML/Flux.jl#1051

Which is the more appropriate place to have this conversation

barucden · 2021-08-11T17:42:47Z

I am trying to produce an example of Augmentor and Flux cooperation so that we can put it in our docs. This is what I got:

using Augmentor, Flux, MappedArrays, MLDatasets, MLDataUtils, Statistics

n_instances = 50
n_epochs = 32
batch_size = 32

X = MNIST.traintensor(Float32, 1:n_instances)
y = Flux.onehotbatch(MNIST.trainlabels(1:n_instances), 0:9)

pl = ElasticDistortion(6, 6,
                       sigma=4,
                       scale=0.3,
                       iter=3,
                       border=true) |>
     ConvertEltype(Float32) |>
     Reshape(28, 28, 1)

outbatch(X) = Array{Float32}(undef, (28, 28, 1, nobs(X)))
augmentbatch((X, y)) = augmentbatch!(outbatch(X), X, pl), y

batches = mappedarray(augmentbatch,
                      batchview((X, y), maxsize=batch_size))

predict = Chain(Conv((3, 3), 1=>16, pad=(1,1), relu),
                MaxPool((2,2)),
                Conv((3, 3), 16=>32, pad=(1,1), relu),
                MaxPool((2,2)),
                Conv((3, 3), 32=>32, pad=(1,1), relu),
                MaxPool((2,2)),
                flatten,
                Dense(288,10))

loss(X, y) = Flux.Losses.logitcrossentropy(predict(X), y)
loss(batches) = mean(b -> loss(b...), batches)

opt = Flux.Optimise.ADAM(0.001)

@info loss(batches)
@Flux.epochs 32 Flux.train!(loss, params(predict), batches, opt)
@info loss(batches)

Do you see any flaws? Should I change something?

johnnychen94 · 2021-08-11T18:05:04Z

LGTM, but I don't use Flux for months so there might be some updates that @CarloLucibello knows.

For MNIST, perhaps you might want to use indices argument to avoid reading the entire dataset. (Of course even for the entire dataset it's pretty small)

MNIST.traintensor(Float32, 1:10) # (28, 28, 10) Float32 Array
MNIST.trainlabels(1:10) # length 10 vector

barucden · 2021-08-11T18:13:57Z

Thanks! I updated the example.

barucden · 2021-08-29T19:16:39Z

Can we close this, considering this example has been added to the docs?

johnnychen94 · 2021-08-30T04:04:23Z

close in favor of #102

johnnychen94 added the docs label Aug 8, 2021

johnnychen94 mentioned this issue Aug 8, 2021

Clean up gh-pages branch #91

Closed

johnnychen94 mentioned this issue Aug 12, 2021

Unify the augment interface #97

Open

johnnychen94 closed this as completed Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration with Flux #54

integration with Flux #54

CarloLucibello commented Feb 27, 2020

johnnychen94 commented Feb 27, 2020

CarloLucibello commented Feb 28, 2020

johnnychen94 commented Feb 28, 2020 •

edited

Loading

oxinabox commented Feb 28, 2020 •

edited

Loading

barucden commented Aug 11, 2021 •

edited

Loading

johnnychen94 commented Aug 11, 2021

barucden commented Aug 11, 2021

barucden commented Aug 29, 2021

johnnychen94 commented Aug 30, 2021

integration with Flux #54

integration with Flux #54

Comments

CarloLucibello commented Feb 27, 2020

johnnychen94 commented Feb 27, 2020

CarloLucibello commented Feb 28, 2020

johnnychen94 commented Feb 28, 2020 • edited Loading

oxinabox commented Feb 28, 2020 • edited Loading

barucden commented Aug 11, 2021 • edited Loading

johnnychen94 commented Aug 11, 2021

barucden commented Aug 11, 2021

barucden commented Aug 29, 2021

johnnychen94 commented Aug 30, 2021

johnnychen94 commented Feb 28, 2020 •

edited

Loading

oxinabox commented Feb 28, 2020 •

edited

Loading

barucden commented Aug 11, 2021 •

edited

Loading