Refresh Model Zoo #9

ToucheSir · 2020-09-20T18:57:23Z

This issue will track long term progress for the model-zoo. Here are the steps to get there (in order):

Original Text

Collecting these issues and PRs here for now. Eventually we may want to split some out for triage.

[WIP] updates for Flux 0.11 model-zoo#241 update to Flux 0.11 (note: this is already done for the tutorial)
Many examples use HWCN order instead of WHCN recommended by document model-zoo#235 incorrect dimension order for some CNNs
Install TagBot as a GitHub Action model-zoo#199 use the non-deprecated TagBot Action
sde and dde need names swapped model-zoo#154 rm bitrotted diffeq examples since they appear to be covered under SciML
Boston example isn't very good (diverging loss) model-zoo#244 fix model non-convergence
How about we don't use actively terrible julia code as a common demo? model-zoo#203 clean up unnecessary use of global scope
Many models are missing README files model-zoo#218, Added Readme.md files for Char-RNN and MNIST model-zoo#192 many models are missing READMEs
[TODO issue] MNIST batch size is too aggressive and causes frequent OOMs
[TODO issue] add CI (and possibly convergence testing?)

The text was updated successfully, but these errors were encountered:

adinhobl · 2020-10-28T03:51:55Z

What is the overall vision for the model zoo? Like what attributes would make a good zoo model? I'm going to throw out some things and let me know if they are what you have in mind:

single .jl file + Project.toml: self-contained, no need to have a bunch of files, easier setup (just ]instantiate, run Julia file). Not easy to document what different pieces are doing without long comments
jupyter notebook: example like this which gives a full tutorial to the ecosystem (I'm working on this right now to learn myself, but it could be useful to know the whole pipeline and options for other newcomers). But this is harder to get up and running and hard to build a benchmark pipeline around
optimized to be extremely fast: CuIterators and carefully thought out optimizations, no global scope. Harder for a Flux/Julia newbie to do
Everything up to date: keep the examples relevant, identify bugs and holes before other people try them, actively prevent bitrot
What to do about the data? Does every model need to implement its own method for downloading data? is there a best practice?

I read someone somewhere else said that they thought getting a working model zoo up to best practices would set the stage for growing the ML/DL/Flux ecosystem because then it's easier to benchmark and progressively improve.

adinhobl · 2020-10-28T03:58:07Z

As someone newer to the ecosystem and Julia, it would probably be good to have a list of best practices when implementing a model - a checklist with explanations. It would promote consistency for the model zoo models and be a good jumping point for new people in the community. There's a PR for something about putting things in the global scope, but someone new from python won't necessarily know not to do that for their model.

I think there is also likely an opportunity to improve these pages as well, specifically the performance one:
https://fluxml.ai/Flux.jl/stable/ecosystem/
https://fluxml.ai/Flux.jl/stable/performance/

adinhobl · 2020-10-28T15:05:53Z

As for my first post above, I just found Literate.jl, which I think could be a solution to having to maintain 1 .jl file but still being able to take advantage of the richness of notebooks. If they were all done in that style, there could be unit tests and CI but it could be easily transferable to notebooks. Just a thought.

darsnack · 2020-10-28T20:34:15Z

These are great comments. I'll try and give some insight on some of them.

Julia files vs. Jupyter notebooks: I believe Chris has given us some references on how SciML uses Literate.jl (or maybe it was Weave.jl) to generate Julia + notebooks as well as benchmark with CI. (link)
Staying up to date: This is absolutely crucial, and I think that using the zoo for benchmarking will help ensure this.
Data management: All zoo models should use MLDatasets.jl, or implement the custom dataset interface.

ChrisRackauckas · 2020-10-28T20:36:03Z

It's all done by https://github.com/SciML/RebuildAction

darsnack · 2020-10-28T20:36:58Z

As someone newer to the ecosystem and Julia, it would probably be good to have a list of best practices when implementing a model - a checklist with explanations.

Totally agree with this. Once the Metalhead.jl PR is resolved, I can write up a section on this in the docs for Flux.jl.

ToucheSir · 2020-10-28T20:49:36Z

RE documenting best practices, there's an old list here that might be a good jumping-off point. I think it's also worth a look at the GSOD PRs for overlap and/or opportunities for collaboration.

adinhobl · 2020-10-29T04:11:47Z

Thanks for all the responses. It sounds like there are a handful of lists that could be put together generally on the topics of performance and best practices, and I think putting them in the Flux Docs probably makes sense.

@darsnack , when you said "the custom dataset interface" it sounds like you were referring to something in particular?

It also sounds like I need to familiarize myself with RebuildAction, CI, and the SciML Benchmarks, likely after this semester ends.

ChrisRackauckas · 2020-10-29T04:57:45Z

Feel free to ask for help, and feel free to ping me to join the next ML Fast AI coordination call. I am getting Dhariya involved as well.

darsnack · 2020-10-29T12:32:26Z

@darsnack , when you said "the custom dataset interface" it sounds like you were referring to something in particular?

Yes, since we've adopted MLDataPattern.jl for iterating datasets, any custom dataset needs to implement the getobs interface. Here is a good overview of what that entails.

johnnychen94 · 2020-12-13T23:43:27Z

I wrote a wonderful Documenter plugin DemoCards.jl for JuliaImages's demo page. It uses Literate.jl so you can write in plain julia files. It maps the folder structure into page structure so making changes are very flexible.

ghost · 2021-01-17T14:13:20Z

I'm willing to put some effort in updating the model zoo. Already started testing the models, I'm keeping a list of issues here.

DhairyaLGandhi · 2021-01-20T06:39:11Z

So we already have the scripts folder which does the conversion, which is for this exact use case, also FluxBot.jl

We need only go through the models to add literature. This is literate based as well, so it should be easy to test things out.

cc @SophB

DhairyaLGandhi · 2021-01-20T06:42:25Z

Another thing that makes the model zoo less useful is that recently many models have this argument handling as part of every model which distracts from what the scripts are meant to be there for. Removing that and using simpler constructs that point to exactly what aspect is being talked about would make a whole lot of difference

ghost · 2021-01-20T13:49:26Z

Yes, I noticed this while testing the models, some of them have taken a bit too much of a kitchen sink approach.

IMHO the model zoo should find the right balance between three things:

Impress with what can be done with Flux
Guide people trying to learn Flux
Benchmark the Flux package

My guess is that highlighting just one or maybe two features or ecosystem packages per model and using each of these features or ecosystem packages in only one or two models will result in a good balance. Overdoing it will only result in distraction from the great things that can be done with Flux, harder to grasp tutorials and longer running times for the benchmarks.

Some examples:

Use the dcgan model to highlight a more complex custom training loop and try to use a vanilla training loop in all the other models.
The language detection model can be used to highlight a custom dataset and let all the others use MLDatasets
The VGG CIFAR10 model could be the only one using custom logging with TensorBoardLogger.jl

darsnack · 2021-01-20T14:08:29Z

IMHO the model zoo should find the right balance between three things:

Impress with what can be done with Flux
Guide people trying to learn Flux
Benchmark the Flux package

I think if we use @ChrisRackauckas approach with SciML, we should be able to write Literate.jl scripts for all our model zoo examples. And I agree that some examples should throw out some of the kitchen sink and try to focus on the essence of what that example is trying to teach.

Right now, the problem is that all the examples exist to be consumed in script form. Instead, if they were written in Literate.jl, we could have a Publish.jl website for the entire zoo. The pages of that site being the tutorials. I think just that change of writing for a different audience makes a difference in the produced result.

darsnack · 2021-01-20T14:09:46Z

We are working on establishing a "Flux" Publish theme for all the ecosystem packages. I can put together a sample PR where I translate a couple of the zoo examples and showcase the website.

DhairyaLGandhi · 2021-01-20T14:23:20Z

I have been saying for a while that if we can get the literature part in the model scripts, the conversion to be pushed to the site is trivial.

We are working on establishing a "Flux" Publish theme for all the ecosystem packages. I can put together a sample PR where I translate a couple of the zoo examples and showcase the website.

A PR would be welcome, have you seen the tutorial pages on the site already?

DhairyaLGandhi · 2021-01-20T14:25:30Z

we should be able to write Literate.jl scripts for all our model zoo examples.

They already are, check the scripts directory on the model zoo.

Also check the dg/zygote branch for how the zoo used to look like, without the kitchen sink approach.

darsnack · 2021-01-20T15:12:42Z

~~Ah I didn't realize that the model-zoo was already capable of feeding the tutorials on the website. Does this happening automatically on release?~~

I'm thinking of something like the tutorials page on the website, but it is automatically updated by CI on every "release" of the model zoo. That way every thing in the zoo is consumed as either a tutorial on the website, a runnable script you can download, or a script that the benchmarking CI can run.

In the set up I was describing, the model zoo would have its own website that hosted all these tutorials. Is there a way to tie that to the Flux website? I don't think GH actions can trigger events on other repos?

DhairyaLGandhi · 2021-01-20T16:26:37Z

I'm thinking of something like the tutorials page on the website, but it is automatically updated by CI on every "release" of the model zoo

So the idea is exactly that, and we have a working example of that tied with FluxBot.jl, plus RebuildAction would allow for benchmarking. We are already setting up a benchmarking suite for GPU performance.

the model zoo would have its own website that hosted all these tutorials. Is there a way to tie that to the Flux website

We should have that happen as part of the flux website, for sure.

DhairyaLGandhi · 2021-01-20T16:27:39Z

Could we add a bit in the tracker to move the site off Jekyll? Maybe publish.jl or pkgpage.jl or franklin.jl?

darsnack · 2021-01-20T16:30:34Z

Yeah moving off Jekyll to a Julia-based static website generator would make this all easier for sure.

CarloLucibello · 2021-01-23T11:10:42Z

Can we replace any occurrence of train! in the model-zoo with a custom loop? Discussion in FluxML/Flux.jl#1461 is not converging, and maybe this is something we could all agree on

DhairyaLGandhi · 2021-01-23T11:18:37Z

replace any occurrence of train! in the model-zoo with a custom loop?

I don't think that's a great idea, maybe for a model which specifically intends to show the loop or something that benefits from it.

Discussion in FluxML/Flux.jl#1461 is not converging, and maybe this is something we could all agree on

I'm not sure what you mean here? There is ongoing and active discussion and best to have it properly

CarloLucibello · 2021-01-23T11:25:31Z

I don't think that's a great idea, maybe for a model which specifically intends to show the loop or something that benefits from it.

I suggest the other way around, we give a single example of train!

I'm not sure what you mean here? There is ongoing and active discussion and best to have it properly

It surely good to have the discussion, I'm just saying it is not converging and it could go on for months. My suggestion is to reverse things as currently stand, and primarily point users to the pattern that is more informative and more flexible, and only in the second instance to train!

DhairyaLGandhi · 2021-01-23T11:38:40Z

Let me rephrase this. The pattern that we have established with Flux.jl is what is represented in the examples, and we shouldn't change things only to change them back if we don't have a clear answer yet which will come with the convergence.

CarloLucibello · 2021-01-23T13:25:29Z

there is no need to change anything back once you have custom loops

darsnack · 2021-01-23T14:43:12Z

If most of the examples use train!, and we converge on removing it or downplaying it, then we taught a bunch of users to use a non-preferred API.

The for-loop will always be a first class API, so you can't go wrong by using custom loops everywhere.

DhairyaLGandhi · 2021-01-23T15:01:27Z

Not really, the for loop is under the same considerations and open to the same changes that any other api flux has had or will have. I prefer the for loop to train personally, but it isn't cogent to push it when it's not

If the examples are better served to not need a for loop, then we did the right thing by teaching users to look for similar constructs in other packages that provide loops for more complex cases, and also educate them about how the loops work by having examples and docs that are meant to teach the for loop api.

API design shouldn't prefer one set of assumptions for every case, but offer options.

I don't want everyone copying the same for loop everywhere, the average case does not need any more complication. If there are average cases not well served, improve the api to catch that case.

CarloLucibello · 2021-01-23T15:20:22Z

I don't want everyone copying the same for loop everywhere, the average case does not need any more complication. If there are average cases not well served, improve the api to catch that case.

what I exactly want is for people to copy everywhere the for loop, they don't need the complication of train!. We have a very good api, it's called for loop, everyone is already familiar with it, it serves the average case very well, we don't need to improve anything.

darsnack · 2021-01-23T18:06:58Z

I updated the main post to reflect @joostveenema's progress (correct if I got wrong).

I don't think we need to halt on the JuliaML PRs, since the current available solutions w.r.t. data loading work well. There is no reason to halt meaningful progress on an important model-zoo refresh for a future data loading update (I don't think any of the API changes to JuliaML will surface in the model-zoo in a meaningful way anyways).

I kept the Metalhead.jl PR in there, since I do think any pre-trained models in the zoo should use the future-facing version of Metalhead.jl. I plan on effectively automating the process of training and committing the pre-trained models today. I will update the Metalhead.jl PR when I am ready.

ChrisRackauckas · 2021-01-23T18:18:08Z

what I exactly want is for people to copy everywhere the for loop, they don't need the complication of train!.

If everyone has to copy the same piece of code around then the abstraction is wrong and we should change it. Maybe callbacks need to support more things, or there can be a few more switches. But telling everyone to roll it out by hand is only useful to exactly the same devs building the library.

The for-loop will always be a first class API, so you can't go wrong by using custom loops everywhere.

Indeed, but flexibility will always be limiting. There will always be some optimizers that require less flexibility (BFGS, KrylovTrustRegion), and so fully promoting the most open choice in a function sense is also limiting in another sense. The path forward is to try and tame what can be done and capture what users do into a simplified API to then specialize and help the code perform better, not let it run too loose. Isn't that the point of Flux in the first place since you can just define the layers by hand?

ghost · 2021-01-25T14:05:10Z

@darsnack I did start marking issues as stale but wasn't done yet. I'll spend some more time on it this week.

aditkumar72 · 2021-06-16T03:25:08Z

I have added and updated README of all the vision model in FluxML/model-zoo#305 which fixes #218

aditkumar72 · 2021-06-26T06:55:46Z

I have made all the changes in FluxML/model-zoo#305 can anyone please review.

ToucheSir added help wanted Looking for contributors triage-request Intended for triage labels Sep 20, 2020

darsnack added this to Needs triage in Triage via automation Nov 24, 2020

darsnack removed the triage-request Intended for triage label Nov 24, 2020

darsnack moved this from Needs triage to High priority in Triage Nov 24, 2020

darsnack changed the title ~~Clean up Model Zoo~~ Refresh Model Zoo Nov 24, 2020

darsnack added the documentation Improvements or additions to documentation label Nov 24, 2020

ToucheSir pinned this issue Nov 24, 2020

darsnack added the good first issue Good for newcomers label Dec 18, 2020

ghost mentioned this issue Jan 17, 2021

Fix all the models FluxML/model-zoo#266

Open

59 tasks

adinhobl mentioned this issue Jan 19, 2021

add time_series tutorial FluxML/model-zoo#272

Open

ghost mentioned this issue Jan 23, 2021

Tutorial matrix FluxML/model-zoo#280

Open

Refresh Model Zoo #9

Refresh Model Zoo #9

Comments

ToucheSir commented Sep 20, 2020 • edited by logankilpatrick Loading

adinhobl commented Oct 28, 2020

adinhobl commented Oct 28, 2020

adinhobl commented Oct 28, 2020 • edited Loading

darsnack commented Oct 28, 2020

ChrisRackauckas commented Oct 28, 2020

darsnack commented Oct 28, 2020

ToucheSir commented Oct 28, 2020

adinhobl commented Oct 29, 2020

ChrisRackauckas commented Oct 29, 2020

darsnack commented Oct 29, 2020

johnnychen94 commented Dec 13, 2020

ghost commented Jan 17, 2021

DhairyaLGandhi commented Jan 20, 2021 • edited Loading

DhairyaLGandhi commented Jan 20, 2021

ghost commented Jan 20, 2021

darsnack commented Jan 20, 2021

darsnack commented Jan 20, 2021

DhairyaLGandhi commented Jan 20, 2021

DhairyaLGandhi commented Jan 20, 2021

darsnack commented Jan 20, 2021 • edited Loading

DhairyaLGandhi commented Jan 20, 2021 • edited Loading

DhairyaLGandhi commented Jan 20, 2021

darsnack commented Jan 20, 2021

CarloLucibello commented Jan 23, 2021

DhairyaLGandhi commented Jan 23, 2021

CarloLucibello commented Jan 23, 2021

DhairyaLGandhi commented Jan 23, 2021

CarloLucibello commented Jan 23, 2021 • edited Loading

darsnack commented Jan 23, 2021

DhairyaLGandhi commented Jan 23, 2021

CarloLucibello commented Jan 23, 2021

darsnack commented Jan 23, 2021

ChrisRackauckas commented Jan 23, 2021

ghost commented Jan 25, 2021 • edited by ghost Loading

aditkumar72 commented Jun 16, 2021 • edited Loading

aditkumar72 commented Jun 26, 2021 • edited Loading

ToucheSir commented Sep 20, 2020 •

edited by logankilpatrick

Loading

adinhobl commented Oct 28, 2020 •

edited

Loading

DhairyaLGandhi commented Jan 20, 2021 •

edited

Loading

darsnack commented Jan 20, 2021 •

edited

Loading

DhairyaLGandhi commented Jan 20, 2021 •

edited

Loading

CarloLucibello commented Jan 23, 2021 •

edited

Loading

ghost commented Jan 25, 2021 •

edited by ghost

Loading

aditkumar72 commented Jun 16, 2021 •

edited

Loading

aditkumar72 commented Jun 26, 2021 •

edited

Loading