Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh Model Zoo #9

Open
5 of 19 tasks
ToucheSir opened this issue Sep 20, 2020 · 36 comments
Open
5 of 19 tasks

Refresh Model Zoo #9

ToucheSir opened this issue Sep 20, 2020 · 36 comments
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Looking for contributors
Projects

Comments

@ToucheSir
Copy link
Member

ToucheSir commented Sep 20, 2020

This issue will track long term progress for the model-zoo. Here are the steps to get there (in order):

  • Test all models: run through all the tutorials to check whether they work on the latest release and document problems
  • Address outstanding external PRs: take care of PRs such as
  • Issue/PR clean-up: close stale issues and PRs
  • Update models: get models working with latest release (no new tutorials!)
  • Add new tutorials: add new tutorials (especially ones that make use of the new ecosystem packages)
  • Trim old tutorials: remove tutorials that are superseded by new tutorials
  • Set up benchmarking CI: convert tutorials to Weave.jl like SciML
  • Translate to benchmarks: convert as many tutorials as possible to use the benchmark model

Original Text

Collecting these issues and PRs here for now. Eventually we may want to split some out for triage.

@ToucheSir ToucheSir added help wanted Looking for contributors triage-request Intended for triage labels Sep 20, 2020
@adinhobl
Copy link

What is the overall vision for the model zoo? Like what attributes would make a good zoo model? I'm going to throw out some things and let me know if they are what you have in mind:

  • single .jl file + Project.toml: self-contained, no need to have a bunch of files, easier setup (just ]instantiate, run Julia file). Not easy to document what different pieces are doing without long comments
  • jupyter notebook: example like this which gives a full tutorial to the ecosystem (I'm working on this right now to learn myself, but it could be useful to know the whole pipeline and options for other newcomers). But this is harder to get up and running and hard to build a benchmark pipeline around
  • optimized to be extremely fast: CuIterators and carefully thought out optimizations, no global scope. Harder for a Flux/Julia newbie to do
  • Everything up to date: keep the examples relevant, identify bugs and holes before other people try them, actively prevent bitrot
  • What to do about the data? Does every model need to implement its own method for downloading data? is there a best practice?

I read someone somewhere else said that they thought getting a working model zoo up to best practices would set the stage for growing the ML/DL/Flux ecosystem because then it's easier to benchmark and progressively improve.

@adinhobl
Copy link

As someone newer to the ecosystem and Julia, it would probably be good to have a list of best practices when implementing a model - a checklist with explanations. It would promote consistency for the model zoo models and be a good jumping point for new people in the community. There's a PR for something about putting things in the global scope, but someone new from python won't necessarily know not to do that for their model.

I think there is also likely an opportunity to improve these pages as well, specifically the performance one:
https://fluxml.ai/Flux.jl/stable/ecosystem/
https://fluxml.ai/Flux.jl/stable/performance/

@adinhobl
Copy link

adinhobl commented Oct 28, 2020

As for my first post above, I just found Literate.jl, which I think could be a solution to having to maintain 1 .jl file but still being able to take advantage of the richness of notebooks. If they were all done in that style, there could be unit tests and CI but it could be easily transferable to notebooks. Just a thought.

@darsnack
Copy link
Member

These are great comments. I'll try and give some insight on some of them.

  • Julia files vs. Jupyter notebooks: I believe Chris has given us some references on how SciML uses Literate.jl (or maybe it was Weave.jl) to generate Julia + notebooks as well as benchmark with CI. (link)
  • Staying up to date: This is absolutely crucial, and I think that using the zoo for benchmarking will help ensure this.
  • Data management: All zoo models should use MLDatasets.jl, or implement the custom dataset interface.

@ChrisRackauckas
Copy link
Member

It's all done by https://github.com/SciML/RebuildAction

@darsnack
Copy link
Member

As someone newer to the ecosystem and Julia, it would probably be good to have a list of best practices when implementing a model - a checklist with explanations.

Totally agree with this. Once the Metalhead.jl PR is resolved, I can write up a section on this in the docs for Flux.jl.

@ToucheSir
Copy link
Member Author

RE documenting best practices, there's an old list here that might be a good jumping-off point. I think it's also worth a look at the GSOD PRs for overlap and/or opportunities for collaboration.

@adinhobl
Copy link

Thanks for all the responses. It sounds like there are a handful of lists that could be put together generally on the topics of performance and best practices, and I think putting them in the Flux Docs probably makes sense.

@darsnack , when you said "the custom dataset interface" it sounds like you were referring to something in particular?

It also sounds like I need to familiarize myself with RebuildAction, CI, and the SciML Benchmarks, likely after this semester ends.

@ChrisRackauckas
Copy link
Member

Feel free to ask for help, and feel free to ping me to join the next ML Fast AI coordination call. I am getting Dhariya involved as well.

@darsnack
Copy link
Member

@darsnack , when you said "the custom dataset interface" it sounds like you were referring to something in particular?

Yes, since we've adopted MLDataPattern.jl for iterating datasets, any custom dataset needs to implement the getobs interface. Here is a good overview of what that entails.

@darsnack darsnack added this to Needs triage in Triage via automation Nov 24, 2020
@darsnack darsnack removed the triage-request Intended for triage label Nov 24, 2020
@darsnack darsnack moved this from Needs triage to High priority in Triage Nov 24, 2020
@darsnack darsnack changed the title Clean up Model Zoo Refresh Model Zoo Nov 24, 2020
@darsnack darsnack added the documentation Improvements or additions to documentation label Nov 24, 2020
@ToucheSir ToucheSir pinned this issue Nov 24, 2020
@johnnychen94
Copy link

I wrote a wonderful Documenter plugin DemoCards.jl for JuliaImages's demo page. It uses Literate.jl so you can write in plain julia files. It maps the folder structure into page structure so making changes are very flexible.

@darsnack darsnack added the good first issue Good for newcomers label Dec 18, 2020
@ghost ghost mentioned this issue Jan 17, 2021
59 tasks
@ghost
Copy link

ghost commented Jan 17, 2021

I'm willing to put some effort in updating the model zoo. Already started testing the models, I'm keeping a list of issues here.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Jan 20, 2021

So we already have the scripts folder which does the conversion, which is for this exact use case, also FluxBot.jl

We need only go through the models to add literature. This is literate based as well, so it should be easy to test things out.

cc @SophB

@DhairyaLGandhi
Copy link
Member

Another thing that makes the model zoo less useful is that recently many models have this argument handling as part of every model which distracts from what the scripts are meant to be there for. Removing that and using simpler constructs that point to exactly what aspect is being talked about would make a whole lot of difference

@ghost
Copy link

ghost commented Jan 20, 2021

Yes, I noticed this while testing the models, some of them have taken a bit too much of a kitchen sink approach.

IMHO the model zoo should find the right balance between three things:

  1. Impress with what can be done with Flux
  2. Guide people trying to learn Flux
  3. Benchmark the Flux package

My guess is that highlighting just one or maybe two features or ecosystem packages per model and using each of these features or ecosystem packages in only one or two models will result in a good balance. Overdoing it will only result in distraction from the great things that can be done with Flux, harder to grasp tutorials and longer running times for the benchmarks.

Some examples:

  • Use the dcgan model to highlight a more complex custom training loop and try to use a vanilla training loop in all the other models.
  • The language detection model can be used to highlight a custom dataset and let all the others use MLDatasets
  • The VGG CIFAR10 model could be the only one using custom logging with TensorBoardLogger.jl

@darsnack
Copy link
Member

IMHO the model zoo should find the right balance between three things:

Impress with what can be done with Flux
Guide people trying to learn Flux
Benchmark the Flux package

I think if we use @ChrisRackauckas approach with SciML, we should be able to write Literate.jl scripts for all our model zoo examples. And I agree that some examples should throw out some of the kitchen sink and try to focus on the essence of what that example is trying to teach.

Right now, the problem is that all the examples exist to be consumed in script form. Instead, if they were written in Literate.jl, we could have a Publish.jl website for the entire zoo. The pages of that site being the tutorials. I think just that change of writing for a different audience makes a difference in the produced result.

@darsnack
Copy link
Member

We are working on establishing a "Flux" Publish theme for all the ecosystem packages. I can put together a sample PR where I translate a couple of the zoo examples and showcase the website.

@DhairyaLGandhi
Copy link
Member

I have been saying for a while that if we can get the literature part in the model scripts, the conversion to be pushed to the site is trivial.

We are working on establishing a "Flux" Publish theme for all the ecosystem packages. I can put together a sample PR where I translate a couple of the zoo examples and showcase the website.

A PR would be welcome, have you seen the tutorial pages on the site already?

@DhairyaLGandhi
Copy link
Member

we should be able to write Literate.jl scripts for all our model zoo examples.

They already are, check the scripts directory on the model zoo.

Also check the dg/zygote branch for how the zoo used to look like, without the kitchen sink approach.

@darsnack
Copy link
Member

darsnack commented Jan 20, 2021

Ah I didn't realize that the model-zoo was already capable of feeding the tutorials on the website. Does this happening automatically on release?

I'm thinking of something like the tutorials page on the website, but it is automatically updated by CI on every "release" of the model zoo. That way every thing in the zoo is consumed as either a tutorial on the website, a runnable script you can download, or a script that the benchmarking CI can run.

In the set up I was describing, the model zoo would have its own website that hosted all these tutorials. Is there a way to tie that to the Flux website? I don't think GH actions can trigger events on other repos?

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Jan 20, 2021

I'm thinking of something like the tutorials page on the website, but it is automatically updated by CI on every "release" of the model zoo

So the idea is exactly that, and we have a working example of that tied with FluxBot.jl, plus RebuildAction would allow for benchmarking. We are already setting up a benchmarking suite for GPU performance.

the model zoo would have its own website that hosted all these tutorials. Is there a way to tie that to the Flux website

We should have that happen as part of the flux website, for sure.

@DhairyaLGandhi
Copy link
Member

Could we add a bit in the tracker to move the site off Jekyll? Maybe publish.jl or pkgpage.jl or franklin.jl?

@darsnack
Copy link
Member

Yeah moving off Jekyll to a Julia-based static website generator would make this all easier for sure.

@CarloLucibello
Copy link
Member

Can we replace any occurrence of train! in the model-zoo with a custom loop? Discussion in FluxML/Flux.jl#1461 is not converging, and maybe this is something we could all agree on

@DhairyaLGandhi
Copy link
Member

replace any occurrence of train! in the model-zoo with a custom loop?

I don't think that's a great idea, maybe for a model which specifically intends to show the loop or something that benefits from it.

Discussion in FluxML/Flux.jl#1461 is not converging, and maybe this is something we could all agree on

I'm not sure what you mean here? There is ongoing and active discussion and best to have it properly

@CarloLucibello
Copy link
Member

I don't think that's a great idea, maybe for a model which specifically intends to show the loop or something that benefits from it.

I suggest the other way around, we give a single example of train!

I'm not sure what you mean here? There is ongoing and active discussion and best to have it properly

It surely good to have the discussion, I'm just saying it is not converging and it could go on for months. My suggestion is to reverse things as currently stand, and primarily point users to the pattern that is more informative and more flexible, and only in the second instance to train!

@DhairyaLGandhi
Copy link
Member

Let me rephrase this. The pattern that we have established with Flux.jl is what is represented in the examples, and we shouldn't change things only to change them back if we don't have a clear answer yet which will come with the convergence.

@CarloLucibello
Copy link
Member

CarloLucibello commented Jan 23, 2021

there is no need to change anything back once you have custom loops

@darsnack
Copy link
Member

If most of the examples use train!, and we converge on removing it or downplaying it, then we taught a bunch of users to use a non-preferred API.

The for-loop will always be a first class API, so you can't go wrong by using custom loops everywhere.

@DhairyaLGandhi
Copy link
Member

Not really, the for loop is under the same considerations and open to the same changes that any other api flux has had or will have. I prefer the for loop to train personally, but it isn't cogent to push it when it's not

If the examples are better served to not need a for loop, then we did the right thing by teaching users to look for similar constructs in other packages that provide loops for more complex cases, and also educate them about how the loops work by having examples and docs that are meant to teach the for loop api.

API design shouldn't prefer one set of assumptions for every case, but offer options.

I don't want everyone copying the same for loop everywhere, the average case does not need any more complication. If there are average cases not well served, improve the api to catch that case.

@CarloLucibello
Copy link
Member

I don't want everyone copying the same for loop everywhere, the average case does not need any more complication. If there are average cases not well served, improve the api to catch that case.

what I exactly want is for people to copy everywhere the for loop, they don't need the complication of train!. We have a very good api, it's called for loop, everyone is already familiar with it, it serves the average case very well, we don't need to improve anything.

@darsnack
Copy link
Member

I updated the main post to reflect @joostveenema's progress (correct if I got wrong).

I don't think we need to halt on the JuliaML PRs, since the current available solutions w.r.t. data loading work well. There is no reason to halt meaningful progress on an important model-zoo refresh for a future data loading update (I don't think any of the API changes to JuliaML will surface in the model-zoo in a meaningful way anyways).

I kept the Metalhead.jl PR in there, since I do think any pre-trained models in the zoo should use the future-facing version of Metalhead.jl. I plan on effectively automating the process of training and committing the pre-trained models today. I will update the Metalhead.jl PR when I am ready.

@ChrisRackauckas
Copy link
Member

what I exactly want is for people to copy everywhere the for loop, they don't need the complication of train!.

If everyone has to copy the same piece of code around then the abstraction is wrong and we should change it. Maybe callbacks need to support more things, or there can be a few more switches. But telling everyone to roll it out by hand is only useful to exactly the same devs building the library.

The for-loop will always be a first class API, so you can't go wrong by using custom loops everywhere.

Indeed, but flexibility will always be limiting. There will always be some optimizers that require less flexibility (BFGS, KrylovTrustRegion), and so fully promoting the most open choice in a function sense is also limiting in another sense. The path forward is to try and tame what can be done and capture what users do into a simplified API to then specialize and help the code perform better, not let it run too loose. Isn't that the point of Flux in the first place since you can just define the layers by hand?

@ghost
Copy link

ghost commented Jan 25, 2021

@darsnack I did start marking issues as stale but wasn't done yet. I'll spend some more time on it this week.

@aditkumar72
Copy link

aditkumar72 commented Jun 16, 2021

I have added and updated README of all the vision model in FluxML/model-zoo#305 which fixes #218

@aditkumar72
Copy link

aditkumar72 commented Jun 26, 2021

I have made all the changes in FluxML/model-zoo#305 can anyone please review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Looking for contributors
Projects
Triage
High priority
Development

No branches or pull requests

8 participants