# Chapter 2: Production

In this chapter we look at the end-to-end process of making a deep learning application

## Setup

In [None]:
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

In [None]:
from fastbook import *
from fastai.vision.widgets import *

## Deep learning

### Doing deep learning
To get started, there are several considerations:
- Work on your own projects - this is only way to get experience building and using models
- Pick a project where you can get results quickly
- The key consideration for a project topic is data availability
- Iterate from end to end: complete every step as well as possible in a reasonable amount of time; e.g. don't try to get "the perfect dataset". This helps spot where things are tricky, how much data you need, etc.
- Do small experiments. Try rewriting the notebooks on new datasets.
- Jumping into novel domains for beginners is a bad idea - you don't know whether you've made a mistake in the ML, or if things are just not doable


### Applying deep learning
- DL is generally not robust OOD
- To handle the expensiveness of image labelling, we can use *data augmentation*, e.g. distorting and rotating images
- Text generation is pretty good at *seeming* compelling, but so far it's not so great at being accurate!
- GANs-like dynamics: text generators are generally a little ahead of models that can recognise automatically generated text
- DL is usually used together with other approaches (e.g. random forests) for analysing time series and tabular data
- The *Drivetrain Approach* is a way to ensure that your modelling work is useful in practice: 
  1. Consider your objective
  2. Consider actions you can take to meet it, and what data can help
  3. Build a model to determine what actions to achieve the best result


### Loading images

In [None]:
key = os.environ.get("AZURE_SEARCH_KEY", 'redacted_key')

In [None]:
results = search_images_bing(key, 'grizzly bear')
ims = results.attrgot('contentUrl')
len(ims)

In [None]:
dest = 'images/grizzly.jpg'
download_url(ims[0], dest)

In [None]:
im = Image.open(dest)
im.to_thumb(128,128)

In [None]:
bear_types = 'grizzly', 'black', 'teddy'
path = Path('bears')

In [None]:
if not path.exists():
    path.mkdir()
    for o in bear_types:
        dest = (path/o)
        dest.mkdir(exist_ok=True)
        results = search_images_bing(key, f"{o} bear")
        download_images(dest, urls=results.attrgot("contentUrl"))

In [None]:
# check if images have been downloaded into the folders
fns = get_image_files(path)
fns

In [None]:
# check for corrupted images
failed = verify_images(fns)
failed

In [None]:
# remove failed images by unlinking them
failed.map(Path.unlink);

### Dataloaders

To load the images, we use a `DataLoaders` object, and tell fastai four things: 
1. What kinds of data we're working with
2. How to get the list of items
3. How to label these items
4. How to create the validation set

In [None]:
class DataLoaders(GetAttr):
    def __init__(self, *loaders): self.loaders = loaders
    def __getitem__(self, i): return self.loaders[i]
    train, valid = add_props(lambda i,self: self[i])

So far we've used factory methods to specify the above four things. If we're working on something that doesn't fit into these methods, then we can use the data block API, and specify each of the four things in turn:

In [None]:
bears = DataBlock(
    blocks=(ImageBlock, CategoryBlock), # specify types for IDVs and DVs
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128)
)

This creates a `DataBlock` object, which is a template for creating a `DataLoader`, which we create next. This gives 64 items at a time by default. 

In [None]:
dls = bears.dataloaders(path)
dls.valid.show_batch(max_n=4, nrows=1)

The above removes some parts of the images in order to resize it to the specified size. Alternatives include stretching the images, and adding padding to them. 

In [None]:
# try stretching the images
bears = bears.new(item_tfms=Resize(128, ResizeMethod.Squish))
dls = bears.dataloaders(path)
dls.valid.show_batch(max_n=4, nrows=1)

In [None]:
# try padding the images
bears = bears.new(item_tfms=Resize(128, ResizeMethod.Pad, pad_mode='zeros'))
dls = bears.dataloaders(path)
dls.valid.show_batch(max_n=4, nrows=1)

Weaknesses of each approach: 
- Crop: may lose important features
- Stretch: model learns unrealistic shapes
- Pad: wasted computation on empty space

A better approach than all the above is `RandomResizedCrop`, which randomly selects part of the image to be cropped to. `min_scale` tells the minimum proportion of the image to select at each time. 

In [None]:
# try random resized crop (example of data augmentation)
bears = bears.new(item_tfms=RandomResizedCrop(128, min_scale=0.3))
dls = bears.dataloaders(path)
dls.train.show_batch(max_n=4, nrows=1, unique=True)

In general, data augmentation means creating random variations of the input data, e.g. rotating/distorting it. This can help the model better "understand" what the object it is looking at actually is.

In [None]:
bears = bears.new(item_tfms=Resize(128), batch_tfms=aug_transforms(mult=2))
dls = bears.dataloaders(path)
dls.train.show_batch(max_n=8, nrows=2, unique=True)

### Training

In [None]:
bears = bears.new(
    item_tfms=RandomResizedCrop(224, min_scale=0.5),
    batch_tfms=aug_transforms()
)
dls = bears.dataloaders(path)

In [None]:
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)

To observe the mistakes the model is making, we can plot a confusion matrix.

In [None]:
# create confusion matrix
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

The loss is a number that tells how accurate the model is. We can find the images with the highest loss in the dataset using `plot_top_losses` - this shows the image together with the prediction, target label, loss, and probability (confidence in prediction). 

In [None]:
interp.plot_top_losses(5, nrows=1)

We can "clean" the data by fixing the labels - this can be done before training, but the model can also help us identify it quickly, making it fast to clean the data *after* training too. `ImageClassifierCleaner` lets us do this by hand in a GUI.

In [None]:
cleaner = ImageClassifierCleaner(learn)
cleaner

In [None]:
# deletes image
for idx in cleaner.delete(): cleaner.fns[idx].unlink()

In [None]:
# moves images to a different category
for idx, cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)

We actually get high accuracy without needing tons of data - techniques are also really important!

### Deployment

Save the model - the architecture and the trained parameters.

In [None]:
learn.export()

In [None]:
# check if the file is saved as "export.pkl"
path = Path()
path.ls(file_exts=".pkl")

Now we do inference on one image at a time:

In [None]:
learn_inf = load_learner(path/'export.pkl')

In [None]:
learn_inf.predict('images/grizzly.jpg')

Outputs: predicted category, index of the predicted category, probabilities of each category

In [None]:
learn_inf.dls.vocab

Now we can make a notebook app

In [None]:
btn_upload = widgets.FileUpload()
btn_upload

In [None]:
img = PILImage.create(btn_upload.data[-1])

In [None]:
out_pl = widgets.Output()
out_pl.clear_output()
with out_pl: display(img.to_thumb(128,128))
out_pl

In [None]:
pred,pred_idx,probs = learn_inf.predict(img)

In [None]:
lbl_pred = widgets.Label()
lbl_pred.value = f"Prediction: {pred}; Probability: {probs[pred_idx]:.04f}"
lbl_pred

In [None]:
btn_run = widgets.Button(description="Classify")
btn_run

In [None]:
def on_click_classify(change):
  img = PILImage.create(btn_upload.data[-1])
  out_pl.clear_output()
  with out_pl: display(img.to_thumb(128,128))
  pred,pred_idx,probs = learn_inf.predict(img)
  lbl_pred.value = f"Prediction {pred}; Probability: {probs[pred_idx]:.04f}"

btn_run.on_click(on_click_classify)

In [None]:
btn_upload = widgets.FileUpload()

In [None]:
VBox([widgets.Label("Select your bear!"), btn_upload, btn_run, out_pl, lbl_pred])

### Notes
In general, we don't need a GPU to serve the trained model. Several reasons: 
- GPUs are useful on parallelisable tasks, which this isn't, so CPUs are more cost-effective
- GPUs are complex, requiring careful manual management
- CPU servers have a lot more market competition and so are cheaper

Thus you should try to use CPU-based servers as far as possible.

Problems can arise with deep learning models that are hard to spot - we can't just go step by step and see exactly what the model is doing. With image classification, we may get less than ideal images (e.g. less well polished ones than those found online) or out of domain data. We can also get distributional shift

To mitigate such risks, we can follow the process below: 
1. Start with an entirely manual process if possible, with the deep learning model running in parallel but not driving any actions. The human should check the model to make sure it makes sense.
2. Do limited scope deployment (e.g. testing it for a one-week period)
3. Gradually increase the scope of the rollout, with good reporting systems in place and a premortem completed to spot potential failure modes

There will sometimes be unforeseen consequences, e.g. feedback loops in predictive policing. Prior to deployment, we should ask, "what would happen if it went really, really well?"

## Questionnaire

1. **Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.**  
Images with poor lighting, or a different type of background (e.g. polar bears, with a white background)

1. **Where do text models currently have a major deficiency?**  
They seem compelling, but their outputs aren't necessarily accurate. This makes their use particularly dangerous, because it passes our "common sense" filter, without passing the "accuracy" test. 

1. **What are possible negative societal implications of text generation models?**  
Generation of fake texts, leading to false beliefs. Pretending to be someone else. 

1. **In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?**  
Use a partially automated process, with a human in the loop.

1. **What kind of tabular data is deep learning particularly good at?**  
Models with natural language or data with many categories.

1. **What's a key downside of directly using a deep learning model for recommendation systems?**  
The recommendations may not be helpful for particular users, e.g. if they already know about the thing being recommended. 

1. **What are the steps of the Drivetrain Approach?**  
Consider the objective, consider what actions can be taken to achieve it, build a model to achieve it

1. **How do the steps of the Drivetrain Approach map to a recommendation system?**  
Objective: increase sales, Actions: collect data about a wide range of recommendations for many customers, Model: recommends based on a utility function for a given recommendation and user

1. **What is `DataLoaders`?**  
A class that stores `DataLoader` objects. It helps load stuff from a dataset. 

1. **What four things do we need to tell fastai to create `DataLoaders`?**  
The type of data, how to access it, how to label it, how to create the validation set

1. **What does the `splitter` parameter to `DataBlock` do?**  
Splits the data into training and validation sets, in a specified proportion. 

1. **How do we ensure a random split always gives the same validation set?**  
Specify a seed (e.g. `seed=42`)

1. **What letters are often used to signify the independent and dependent variables?**  
Independent - `x`, dependent - `y`

1. **What's the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?**  
Crop picks out a portion of the image, pad fills the image with empty space, squish distorts the image. Which one is chosen depends on the use case, e.g. we may or may not care about retaining features, reducing computation, or not distorting features

1. **What is data augmentation? Why is it needed?**  
A technique for creating random variations of input data, without changing its meaning. e.g. distortions, rotations. It helps when there isn't much labelled data available, and also helps the model learn better. 

1. **What is the difference between `item_tfms` and `batch_tfms`?**  
`item_tfms` acts on indivdual items, whereas `batch_tfms` applies transformations to a batch of images. 

1. **What is a confusion matrix?**  
One way of visualising errors in the model, showing the model predictions and the true values in matrix form. 

1. **What does `export` save?**  
The model architecture and parameters, as well as the definition of how to create the DataLoaders. 

1. **What is it called when we use a model for getting predictions, instead of training?**  
Inference.

1. **What are IPython widgets?**  
Things that help you set up a GUI in a web browser, that allows us to make a small application in a Jupyter notebook.

1. **When might you want to use CPU for deployment? When might GPU be better?**  
We almost always prefer CPUs because they are cheaper and simpler to use. GPUs might be better for compute intensive and parallelisable tasks. 

1. **What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?**  
The app needs a network connection, and there will be latency when the model is called. There may also be privacy qualms if sensitive data gets sent to a remote server! This may also lead to increased complexity in managing the server. 

1. **What are three examples of problems that could occur when rolling out a bear warning system in practice?**  
Failure to handle low resolution images, failure to handle images with low lighting, handling images that are OOD

1. **What is "out-of-domain data"?**  
Data that is very different from the training set

1. **What is "domain shift"?**  
Changes to the type of data seen by the model over time

1. **What are the three steps in the deployment process?**  
Start 100% manually with the model being run in parallel, deploy it in a limited way, then gradually expand deployment

## Further research

1. **When might it be best to avoid certain types of data augmentation?**  
Facial recognition - perhaps it's hard to do image distortions in this case