# fastai - Chapter 2 - From Model to Production [DRAFT]
> An top-down approach of the chapter

- toc: true
- branch: master
- badges: true
- comments: true
- categories: [Deep Learning for Coders, Jupyter]
- image: images/bear_example.png
- author: Nathaniel D'Amours

In [6]:
#hide
# !pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

In [7]:
#hide
from fastbook import *
from fastai.vision.widgets import *

The six lines of code we saw in the last chapter are just one small part of the process of using deep learning in practice. In this chapter, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically, we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, explore how to create datasets, look at possible gotchas when using deep learning in practice, and more. Many of the key points will apply equally well to other deep learning problems, such as those in last chapter. If you work through a problem similar in key respects to our example problems, we expect you to get excellent results with little code, quickly.

## The Practice of Deep Learning

We've seen that deep learning can solve a lot of challenging problems quickly and with little code. However, deep learning isn't magic! The same 6 lines of code won't work for every problem anyone can think of today. 

We often talk to people who underestimate both the constraints and the capabilities of deep learning. Both of these can be problems: underestimating the capabilities means that you might not even try things that could be very beneficial, and underestimating the constraints might mean that you fail to consider and react to important issues.

The best thing to do is to keep an open mind. Then, it is possible to design a process where you can find the specific capabilities and constraints related to your particular problem as you work through the process. This doesn't mean making any risky bets — we will show you how you can gradually roll out models so that they don't create significant risks, and can even backtest them prior to putting them in production.

### Starting Your Project

When selecting a project, the most important consideration is data availability. However, the goal is not to find the "perfect" dataset or project, but just to get started and iterate from there.

We also suggest that you iterate from end to end in your project; that is, don't spend months fine-tuning your model, or polishing the perfect GUI, or labelling the perfect dataset… Instead, complete every step as well as you can in a reasonable amount of time, all the way to the end. By completing the project end to end, you will see where the trickiest bits are, and which bits make the biggest difference to the final result.

As you work through this book, we suggest that you complete lots of small experiments, by running and adjusting the notebooks we provide, at the same time that you gradually develop your own projects. That way, you will be getting experience with all of the tools and techniques that we're explaining, as we discuss them.


> Tip: To make the most of this book, take the time to experiment between each chapter, be it on your own project or by exploring the notebooks we provide. Then try rewriting those notebooks from scratch on a new dataset. It's only by practicing (and failing) a lot that you will get an intuition of how to train a model.  


By using the end-to-end iteration approach you will also get a better understanding of how much data you really need. Indeed, for instance, you may find you can only easily get 200 labeled data items.

In an organizational context you will be able to show your colleagues that your idea can really work by showing them a real working prototype. We have repeatedly observed that this is the secret to getting good organizational buy-in for a project.

Since it is easiest to get started on a project where you already have data available, that means it's probably easiest to get started on a project related to something you are already doing, because you already have data about things that you are doing. For instance, if you work in the music business, you may have access to many recordings.

Sometimes, you have to get a bit creative. Maybe you can find some previous machine learning project, such as a Kaggle competition, that is related to your field of interest.

Sometimes, you have to compromise. Maybe you can't find the exact data you need for the precise project you have in mind; but you might be able to find something from a similar domain, or measured in a different way, tackling a slightly different problem.

Especially when you are just starting out with deep learning, it's not a good idea apply deep learning where it has not been before. That's because if your model does not work at first, you will not know whether it is because you have made a mistake, or if the very problem you are trying to solve is simply not solvable with deep learning. Let's have a look at the state of deep learning, just so you know what kinds of things deep learning is good at right now.

## Gathering Data

The project we'll be completing in this chapter is a *bear detector*. It will discriminate between three types of bear: grizzly, black, and teddy bears. You can follow along with this chapter and create your own image recognition application for whatever kinds of objects you're interested in. In the fast.ai course, thousands of students have presented their work in the course forums, displaying everything from hummingbird varieties in Trinidad to bus types in Panama—one student even created an application that would help his fiancée recognize his 16 cousins during Christmas vacation!

For many types of projects, you may be able to find all the data you need online. At the time of writing, the Google image downloader from [this repository](https://github.com/RiddlerQ/simple_image_download) is probably the best option for finding and downloading images.

> Tip: The downloader allows you to start quickly your DL project and iterate from there. However, you might encounter some issues with it such as some irrelevent images or a lot of duplicate images. Therefore, in your second iteration, during the creation of your dataset, you might use a software such as this [one](https://github.com/qarmin/czkawka) to delete the duplicates. On the other hand, you could also for an alternative the actual image downloader.

Here is the code to download our images: 

In [9]:
#hide_output
%pip install simple_image_download
from simple_image_download import simple_image_download as simp


image_downloader = simp.simple_image_download()
bear_types = ['grizzly bear', 'black bear', 'teddy bear']

for bear_type in bear_types:
    image_downloader.download(keywords=bear_type, limit=150)

simple_images_path = Path('simple_images')
image_files = get_image_files(simple_images_path)
failed_images = verify_images(image_files)
failed_images.map(Path.unlink)

You should consider upgrading via the 'C:\Users\natha\anaconda3\envs\fastbook\python.exe -m pip install --upgrade pip' command.
[                                                                        ]   0%

Note: you may need to restart the kernel to use updated packages.




(#0) []

In [4]:
image_files

(#488) [Path('simple_images/black_bear/black bear_1.png'),Path('simple_images/black_bear/black bear_10.jpeg'),Path('simple_images/black_bear/black bear_100.jpeg'),Path('simple_images/black_bear/black bear_101.jpeg'),Path('simple_images/black_bear/black bear_102.jpeg'),Path('simple_images/black_bear/black bear_103.jpeg'),Path('simple_images/black_bear/black bear_104.jpeg'),Path('simple_images/black_bear/black bear_105.jpeg'),Path('simple_images/black_bear/black bear_106.jpeg'),Path('simple_images/black_bear/black bear_107.jpeg')...]

Our folder has image files, as we'd expect. Let's open one:

In [1]:
#hide_output
bear_img = Image.open(image_files[0])
bear_img

![](my_icons/dl_for_coders_02/bear_example.png)

Let's break down this code.

In [6]:
#hide_output
%pip install simple_image_download

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'C:\Users\natha\anaconda3\envs\fastbook\python.exe -m pip install --upgrade pip' command.


This line is used to download images with Bing Image Search, it's the same thing as doing `pip install simple_image_download` in your terminal.

In [7]:
from simple_image_download import simple_image_download as simp

Here, we import the `simple_image_download` class as `simp` from the `simple_image_download` directory in order to use it to get the images from the web.

In [8]:
#hide_output
image_downloader = simp.simple_image_download()
bear_types = ['grizzly bear', 'black bear', 'teddy bear']

for bear_type in bear_types:
    image_downloader.download(keywords=bear_type, limit=150)



Finally, we iterate over the `bear_types` in order to download `150` images that will be stored in the `simple_images` folder. We actually do a Google search with your query and return the first results.

Here's all the parameter of the `download` method: 
- `keywords`: String to be searched.
- `limit`: Integer representing the numbers of files to download.
- `extensions`: Set containing the extensions of the files (optional, default is {`.jpg`, `.png`, `.ico`, `.gif`, `.jpeg`}).

In [9]:
#hide_output
image_files = get_image_files(simple_images_path)
failed_images = verify_images(image_files)
failed_images.map(Path.unlink)

(#0) []

When we download files from the internet, there are a few that are corrupt. To remove all the failed images, you can use `unlink` on each of them. In this case, no files were corrupted. Note that, like most fastai functions that return a collection, `verify_images` returns an object of type `L`, which includes the `map` method. This calls the passed function on each element of the collection.

### Sidebar: Getting Help in Jupyter Notebooks

Jupyter notebooks are great for experimenting and immediately seeing the results of each function, but there is also a lot of functionality to help you figure out how to use different functions, or even directly look at their source code. Here are some other features that are very useful in Jupyter notebooks:

![](my_icons/dl_for_coders_02/jupyter_autocomplete.png "Jupyter's autocomplete")

At any point, if you don't remember the exact spelling of a function or argument name, you can press Tab to get autocompletion suggestions.

![](my_icons/dl_for_coders_02/jupyter_shift_tab.png "Function's signature and short description")

When inside the parentheses of a function, pressing Shift and Tab simultaneously will display a window with the signature of the function and a short description. Pressing these keys twice will expand the documentation, and pressing them three times will open a full window with the same information at the bottom of your screen.

In [10]:
?verify_images

[1;31mSignature:[0m [0mverify_images[0m[1;33m([0m[0mfns[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Find images in `fns` that can't be opened
[1;31mFile:[0m      c:\users\natha\anaconda3\envs\fastbook\lib\site-packages\fastai\vision\utils.py
[1;31mType:[0m      function


<!-- ![](my_icons/dl_for_coders_02/jupyter_question_mark.png "Function's signature and short description") -->

In a cell, typing `?function_name` and executing will show the signature of the function and a short description.

<!-- ![](my_icons/dl_for_coders_02/jupyter_double_question_mark.png "Function's signature, short description and source code") -->

In [11]:
??verify_images

[1;31mSignature:[0m [0mverify_images[0m[1;33m([0m[0mfns[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mSource:[0m   
[1;32mdef[0m [0mverify_images[0m[1;33m([0m[0mfns[0m[1;33m)[0m[1;33m:[0m[1;33m
[0m    [1;34m"Find images in `fns` that can't be opened"[0m[1;33m
[0m    [1;32mreturn[0m [0mL[0m[1;33m([0m[0mfns[0m[1;33m[[0m[0mi[0m[1;33m][0m [1;32mfor[0m [0mi[0m[1;33m,[0m[0mo[0m [1;32min[0m [0menumerate[0m[1;33m([0m[0mparallel[0m[1;33m([0m[0mverify_image[0m[1;33m,[0m [0mfns[0m[1;33m)[0m[1;33m)[0m [1;32mif[0m [1;32mnot[0m [0mo[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mFile:[0m      c:\users\natha\anaconda3\envs\fastbook\lib\site-packages\fastai\vision\utils.py
[1;31mType:[0m      function


In a cell, typing `??function_name` and executing will show the signature of the function, a short description, and the source code.

![](my_icons/dl_for_coders_02/fastai_doc.png "Function's signature, short description and source code")

If you are using the fastai library, we added a `doc` function for you: executing `doc(function_name)` in a cell will open a window with the signature of the function, a short description and links to the source code on GitHub and the full documentation of the function in the [library docs](https://docs.fast.ai).

![](my_icons/dl_for_coders_02/jupyter_debug_short.png "%debug magic method")

To get help at any point if you get an error, type `%debug` in the next cell and execute to open the [Python debugger](https://docs.python.org/3/library/pdb.html), which will let you inspect the content of every variable and test expressions.

### End sidebar

One thing to be aware of in this process: as we discussed in the last chapter, models can only reflect the data used to train them. And the world is full of biased data, which ends up reflected in, for example, Bing Image Search. For instance, let's say you were interested in creating an app that could help users figure out whether they had healthy skin, so you trained a model on the results of searches for "healthy skin". Here's the kinds of results you would get: 

![](my_icons/dl_for_coders_02/healthy_skin.gif "Data for a healthy skin detector?")

With this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data.

Now that we have downloaded some data, we need to assemble it in a format suitable for model training. In fastai, that means creating an object called `DataLoaders`.

This post is highly inspired from *Deep Learning for Coders* {% cite howard2020deep %}.

{% bibliography --cited %}