Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation plans #204

Open
lorenzoh opened this issue Mar 17, 2022 · 3 comments
Open

Documentation plans #204

lorenzoh opened this issue Mar 17, 2022 · 3 comments
Labels
documentation Improvements or additions to documentation plans Long-term plans

Comments

@lorenzoh
Copy link
Member

lorenzoh commented Mar 17, 2022

With the new Pollen.jl frontend having been adopted in #203, I am taking the opportunity to think about changes and additions to the documentation.

The term Reader refers to someone who is reading the documentation.
I'll also be referencing the terms Tutorial, How-To, Reference, and Background so if you're not familar with this system for organizing documentation, please read https://diataxis.fr/.

Structural changes

Domain documentation

With #240 making domain-specific functionality subpackages, FastAI.jl has moved toward a one core, multiple domain extensions design. I think this is also beneficial for Readers who consult the docs for help with a problem in some domain they want to solve.

Each domain (e.g. computer vision, tabular, time series, text) will have its own page group in the docs menu, which should include the following pages:

  • Overview: gives a short background of the topic, links to related tutorials and also gives a short reference of learning tasks (e.g. TabularRegression), of the kinds of data it deals with (i.e. Blocks) and relevant data processing steps (i.e. Encodings) for those blocks.
  • Beginner Tutorial: for every domain, there should be at least one Tutorial that guides the Reader through a simple use case (e.g. single-label image classification). It should use the high-level interface (loaddataset, ready-made learning task, tasklearner) and link frequently to other pages with more detailed Reference and Background information.
  • Reference: An overview of the API of the domain (sub)module. Each exported symbol should have a comprehensive docstring that: gives a short description, explains required and optional arguments, and an Examples section that shortly covers some use cases in a How-To fashion.

Documentation for a domain module may also contain

  • more Tutorials: tutorials for intermediate and advanced use cases in the domain are a great way for Readers to engage with the library and possibly learn something about the domain as well
  • How-Tos: these should tell you how to perform common tasks, e.g. using augmentations in computer vision tasks.
  • Background: this can be used to explain topics related to the domain, design choices made when implementing the library and other topics that don't fit into the other categories.
  • Task pages: these can go into more detail about a specific learning task. Each should start with a 5-10 line end-to-end example, and then walk down the ladder of abstraction, showing the kinds of data and encodings being used.

General documentation

Next to the domain-specific docs, the domain-agnostic parts of FastAI.jl, like concepts, interfaces, training, data handling etc. should be documented.
Good examples from domain submodules should be used in tutorials and how-tos to set explanations into context.

Additions

APIs overview

FastAI.jl has a lot of API layers, that build on top of each other and having a page that summarizes these in a neat diagram would be nice.

API tour

As a more interactive tour through the API and how pieces relate, I have long been thinking of something organized as follows: the tour starts with a high-level, 5-line example (as in the README), and gives some context for what is happening. Then, you can "drill down" into each of the lines and it'll give you the extended version using APIs one layer below. Consider the following high-level example:

data, blocks = loaddataset("imagenette2-320", (Image, Label))
task = ImageClassificationSingle(blocks)
learner = tasklearner(task, data)
fitonecycle!(learner)
plotpredictions(task, learner)

We could then drill down on each line, e.g. the first would take us to the following, expanded code:

path = datasetpath("imagenette2-160")
data = inputs, targets = Datasets.loadfolderdata(
    path,
    filterfn=isimagefile,
    loadfn=(loadfile, parentname))
classes = unique(eachobs(targets))
blocks = (Image{2}(), Label(classes))

We could again drill down on relevant lines, demystifying the API at every step, showing the Reader how they could use their custom components and linking to relevant material everywhere.
For some more examples of "drilling down" from high-level one-liners, see this older post under the heading "API flexibility".

Extending

Every interface that is extensible should have documentation describing how to do so. Since most interfaces belong to the core FastAI.jl (i.e. not a domain library), this should be part of the general documentation.

  • Reference for how to implement the interface. This is best put under an "Extending" section in the abstract type's docstring, which should give an overview and link to necessary functions to implement. Each of these functions should have a more detailed "Extending" section.
  • Where possible, testing utilities like test_encoding that perform automated checks on the interface's invariants should be provided
  • Examples of extending an interface can also be featured in How-Tos or tutorials

Contributing

To make it easier to contribute and decrease maintainer burden, a contributing section should be part of the docs. It should clarify the following topics

  • Community standard and contribution process, e.g. ColPrac
  • coding style guidelines
  • how to implement interfaces
  • how the code is organized, especially that of domain submodules
  • how tests are written using InlineTest.jl and ReTest.jl
  • how to add documentation and run the docs interactively
  • PR template/checklist

Other content

(This is copied from FluxML/FluxML-Community-Call-Minutes#35)

  • Tutorials
    • FastAI.jl for fast.ai users: Multi-part tutorial series to help fast.ai users get started with FastAI.jl
      • (Part 1) Julia Basics: Syntax basics, array programming
      • (Part 2) Flux.jl vs. PyTorch: Differences between the frameworks, code comparisons for building a model
      • (Part 3) FastAI.jl vs. fast.ai: Differences shown by comparing the code for a basic finetuning task. Pointers to more resources.
    • Using parts of the API separately: Explains how FastAI.jl is built on many decoupled packages and that you don't have to use all of them. For example, showing how to use the LearningMethod machinery with a regular Flux.jl training loop and, inversely, using a Learner but with a custom data iterator and no learning method.
    • Serving predictions on a web server: Reuses the trained model from the serialization tutorial and shows how to package it into a small HTTP server that can be used to get predictions.
    • Implementing callbacks: Go from using callbacks to implementing your own callbacks, and explore how several existing callbacks are implemented. (Basic version here)
    • Siamese image similarity: Showcase different parts of FastAI.jl's APIs to implement an image similarity learning task (original fast.ai tutorial), FastAI.jl#31
    • Progressive resizing: Explain the method and implement it by building on the presizing tutorial. Train a vision model using it.
    • Transfer learning: Explain transfer learning, backbones, pretrained models and the techniques used to successfully finetune them.
  • How-to
    • Implement callbacks: Checklist for implementing callbacks.
    • Evaluate models: Measuring performance on trained models
  • Reference
    • FastAI.jl vs. fast.ai cheatsheet: Compare concepts and their equivalents in both libraries.
    • Packages: Overview of packages that FastAI.jl depends on for different parts of its API: Flux.jl, DLPipelines.jl, DataAugmentation.jl, DataLoaders.jl, Metalhead.jl, ...
@lorenzoh lorenzoh added documentation Improvements or additions to documentation plans Long-term plans labels Mar 17, 2022
@CarloLucibello
Copy link
Member

two very minor comments about the current state of documentation:

  • there is no search box (fixed in the revamp)
  • julia code highlighting is very faint

@lorenzoh
Copy link
Member Author

Thanks for the comments! Re the code highlighting: I plan to mirror the syntax highlighting used in Documenter.jl 👍

@lorenzoh
Copy link
Member Author

lorenzoh commented Sep 3, 2022

Updated with ideas from FluxML/FluxML-Community-Call-Minutes#35

@lorenzoh lorenzoh changed the title Documentation overhaul Documentation plans Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation plans Long-term plans
Projects
None yet
Development

No branches or pull requests

2 participants