<a href="https://colab.research.google.com/github/full-stack-deep-learning/fsdl-text-recognizer-2022/blob/main/notebooks/lab99_new_2022.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://fsdl.me/logo-720-dark-horizontal">

# 🥞 FSDL: What's ✨ New in 2022 ✨

This notebook walks you through environment setup, model training, and deployment for FSDL in its new 2022 iteration.

Right now, it's only expected to work on Colab.

## Setup

In [None]:
import sys

in_colab = "google.colab" in sys.modules
repo = "fsdl-text-recognizer-2022"

assert in_colab

!git clone https://github.com/full-stack-deep-learning/{repo}

Now we `cd` into the cloned repo and take a look around.

In [None]:
%cd /content/{repo}/
!ls

We need to install the `requirements` for both `prod`uction and `dev`evelopment.

Timing our installs so we can keep an eye on the latency from opening a Colab to doing useful work.

It should never be more than three minutes.

In [None]:
%%time
!pip install -r requirements/prod.in

We also install the requirements from `dev`, using a cute `sed`/`xargs` CLI combo:

In [None]:
%%time
!sed 1d requirements/dev.in | grep -v "#" | xargs pip install

## Finalizing Setup and Checking Imports

In [None]:
import pytorch_lightning as pl  # do we have our dev dependencies?

We update the `PYTHONPATH` so that the library is on the path.

In [None]:
pythonpath = !echo $PYTHONPATH
if "." not in pythonpath[-1]:
  pythonpath = ["."] + pythonpath
  %env PYTHONPATH={":".join(pythonpath)}

We turn on autoreload to allow "hot" code editing in the library.

In [None]:
%load_ext autoreload
%autoreload 2

Then we check to make sure it's all importable:

In [None]:
import text_recognizer
import training  # ✨ NEW 2022: training is now a module of its own

In [None]:
from text_recognizer.paragraph_text_recognizer import ParagraphTextRecognizer

In [None]:
import text_recognizer.data

## Training

Training is still primarily done through the `run_experiment.py` script.

#### Unfurl this section to see the `--help` output.

For help with data/model-specific arguments, provide a `--data_class` and `--model_class` in addition to `--help`.

In [None]:
%run training/run_experiment.py --help

### MNIST Hello World!

We start off with something really simple: one epoch of digit recognition with local logging and no acceleration.

In [None]:
%run training/run_experiment.py --gpus=0 --max_epochs 1

### ✨ NEW 2022: Profiling

We now have the PyTorch profiler available (outside of distributed training, where profiling is still hard).

Just pass the `--profile` flag in to `training/run_experiment.py`

The cell below profiles the `ResnetTransformer` on the real dataset.

You can see an example profile (in Tensorboard on W&B) [here](https://wandb.ai/cfrye59/test-colab-profile/runs/26au3nsn/tensorboard?workspace=user-cfrye59).

Read about how to read the traces in these profiles [here](http://wandb.me/trace-report).

You'll also find very basic profiling information printed to the `stdout`.

> Note that you'll need to provide a W&B auth key for this cell to finish running.

In [None]:
!WANDB_PROJECT=test-colab-profile python training/run_experiment.py --wandb --gpus=-1 \
  --data_class=IAMOriginalAndSyntheticParagraphs --model_class=ResnetTransformer --loss=transformer \
  --batch_size=64 --lr=0.0001 \
  --max_epochs=1 --precision 16 --profile --max_steps=16 --limit_test_batches=0

### ✨ NEW 2022: Richer Prediction Logging



The prediction logging has been migrated to W&B Tables,
which means we now have richer interfaces for interaction
with what we've put up.

Check some out [here](https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/artifacts/run_table/run-1vrnrd8p-trainpredictions/v194/files/train/predictions.table.json) (or run the cell below to view them inside the notebook).

View
[this report](https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/reports/Strings-are-truncated-appropriately-with-new-decode-method---VmlldzoxOTkxMTQ2)
for an example of them in use.

In [None]:
from IPython.display import IFrame

logged_preds_url = "https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/artifacts/run_table/run-1vrnrd8p-trainpredictions/v194/files/train/predictions.table.json"

IFrame(logged_preds_url, width=1024, height=768)

### ✨ NEW 2022: Overfitting Check

We now have a special script for testing whether the model can fit a small dataset -- wrapping `--overfit_batches` in PyTorch Lightning.

Specifically, we check whether it reaches a criterion loss value within a certain number of passes over that small dataset.

With default arguments, it should complete
in under 10 minutes on a commodity GPU (e.g. on Colab) --
it runs "just" 100 epochs.

Fully using the "overfitting trick" requires getting the loss down to levels close to what you are targeting in training.
That takes 5-10x longer.

You can see some of the work done using the overfitting trick in W&B Reports [here](https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/reports/Overfit-Check-After-Refactor--VmlldzoyMDY5MjI1) and [here](https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/reports/Overfitting-Studies-2022-05--VmlldzoyMDU2OTQ0).



In [None]:
!WANDB_PROJECT=fsdl-test-overfitting ./training/tests/overfit.sh 10 5

### "Serious" Training

Now that we've
1. done our "hello world" on MNIST,
2. profiled our code to look for compute performance issues, and
3. debugged our code for optimization performance issues by overfitting,

we're ready for some "serious" training
(but not actually, because it'd take like 24 hours or more on Colab).

In [None]:
train_for_real = False  # flip this switch to run training; but note that it takes a long time

if train_for_real:
  %run training/run_experiment.py --gpus=-1 --data_class=IAMOriginalAndSyntheticParagraphs --model_class=ResnetTransformer \
  --loss=transformer --batch_size=64 --accumulate_grad_batches 4 --log_every_n_steps=500 --lr=0.0004 \
  --precision 16 --max_epochs=1500 --check_val_every_n_epoch=3 --wandb

## Deployment

Once a model is trained, the next step is to put it in production.

### ✨ NEW 2022: Discrete Model Staging using W&B and TorchScript

We've got a new "two-step" approach, so that development and production can be cleanly separated (e.g. no Lightning in prod).

Specifically, we create a version-controlled artifact for
the TorchScript-compiled model.
This format of the model is very portable -- it can even be run without Python!

We use W&B to store the versions of both the model checkpoint and the Torchscript model.

From scratch, we'd pull a model checkpoint (as output by Lightning) down from W&B, jit script it with Torch, and then upload
the TorschSrupt model.

This workflow is encapsulated in the `training/stage_model.py` script.

But since this process has already been done
for a workable text recognizer,
here we will just `--fetch` the TorchScript model
to put it on the local disk.

In [None]:
%run training/stage_model.py --fetch \
  --entity "cfrye59" --from_project "fsdl-text-recognizer-2021-training"
# see --help docs for more

### ✨ NEW 2022: Gradio Frontend

Our model now has a frontend based on Gradio.
That frontend includes user feedback.

Using `gradio` on Colab after requires a restart for now, due to conflict over Jinja versions --
this is an issue we want to resolve.

The code below runs the model we just fetched locally
inside the same Python process as the Gradio frontend.

In [None]:
from app_gradio.app import PredictorBackend, make_frontend

predict = PredictorBackend(url=None).run  # run model "backend" in the same process
frontend = make_frontend(predict)

frontend.launch(share=True)

### ✨ NEW 2022: Public AWS Lambda URL

The above architecture is not great,
because it couples frontend and backend directly.

So we instead use the serverless api from 2021,
with an enhancement: AWS Lambdas now come with a URL that serves
as an HTTP endpoint,
instead of only being accessible via AWS's internal system of URIs.

Setting one up requires AWS CLI/UI interaction,
so we'll instead just quickly ping an existing Lambda as a proof-of-principle.

In [None]:
import json

from IPython.display import Image
import requests

lambda_url = "https://3akxma777p53w57mmdika3sflu0fvazm.lambda-url.us-west-1.on.aws/"
image_url = "https://fsdl-public-assets.s3-us-west-2.amazonaws.com/paragraphs/a01-077.png"

headers = {"Content-type": "application/json"}
payload = json.dumps({"image_url": image_url})

if "pred" not in locals():
  response = requests.post(lambda_url, data=payload, headers=headers)
  pred = response.json()["pred"]

print(pred)

Image(url=image_url, width=512)