In [None]:
#meta 1/3/2020 My fast.ai Documentation

Course: Practical Deep Learning for Coders.  https://course.fast.ai
##### Deep Learning for Coders with fastai & PyTorch Book Chapters and Related Resources
* [All book chapters](https://github.com/fastai/fastbook) -- the Fast.ai book as iPython notebooks
* [MOOC](https://course.fast.ai/) -- Includes videos, setup instructions, etc.

Documentation: https://docs.fast.ai
# my-fastai-2020 Documentation




## 1. Data Prep
`DataBlock`, `DataLoaders`, `DataLoader`, `Dataset` and `Datasets` classes
 

### 1.1 From Data to DataLoaders
Src: [Book Chapter 2. From Model to Production](https://github.com/fastai/fastbook/blob/master/02_production.ipynb) pg. 69-74  
We need to assemble data in a format suitable for model training. In fastai, that means creating an object called `DataLoaders`.  It provides the data for your model.


> DataLoaders: A fastai class that stores multiple `DataLoader` objects you pass to it, normally a `train` and a `valid` (it's possible to have as many as you like). The first two are made available as properties.

A `DataLoaders` includes validation and training `DataLoader`s. 
>DataLoader: a class that provides batches of a few items at a time to the GPU. When you loop through a `DataLoader` fastai will give you 64 (by default) items at a time, all stacked up into a single tensor. We can view a few of those items with the `show_batch` method on a `DataLoader`

To turn downloaded data into a `DataLoaders` object we need to tell fastai at least four things:

- What kinds of data we are working with
- How to get the list of items
- How to label these items
- How to create the validation set

A number of *factory methods* for particular combinations of these things, which are convenient when you have an application and data structure that happen to fit into those predefined methods. 

`DataBlock`  
For when you don't, fastai has an extremely flexible system called the *data block API*. With this API you can fully customize every stage of the creation of your `DataLoaders`. 
> DataBlock:  like a template for creating a `DataLoaders`. 

Here is what we need to create a `DataLoaders` for the dataset that we just downloaded:

```python
bears = DataBlock(
    blocks=(ImageBlock, CategoryBlock), #specify types for the independent and dependent variables
    get_items=get_image_files, #get a list of file paths
    splitter=RandomSplitter(valid_pct=0.2, seed=42), #split our training and validation sets randomly
    get_y=parent_label, #create the labels 
    item_tfms=Resize(128)) #a transform, resizes the images to the same size, needed to form mini-batches

dls = bears.dataloaders(path) #provide the path for the images
dls.valid.show_batch(max_n=4, nrows=1) #look at items in DataLoader
```

`get_image_files` function takes a path, and returns a list of all of the images in that path (recursively, by default)  
`parent_label` is a function provided by fastai that simply gets the name of the folder a file is in. Because we put each of our bear images into folders based on the type of bear, this is going to give us the labels that we need.

*Item transforms*: 
Pieces of code that run on each individual item, whether it be an image, category, or so forth. fastai includes many predefined transforms:  
`Resize` crops the images to fit a square shape of the size requested, using the full width or height. This can result in losing some important details.  
`RandomResizedCrop` provides images where the objects are in slightly different places and slightly different sizes. The most important parameter to pass in is `min_scale`, which determines how much of the image to select at minimum each time.  
Alternatively, you can ask fastai to pad the images with zeros (black), or squish/stretch them

```python
bears = bears.new(item_tfms=RandomResizedCrop(128, min_scale=0.3))
dls = bears.dataloaders(path)
dls.train.show_batch(max_n=4, nrows=1, unique=True)
```

### 1.1a Data Augmentation
Creating random variations of our input data, such that they appear different, but do not actually change the meaning of the data. Examples of common data augmentation techniques for images are rotation, flipping, perspective warping, brightness changes and contrast changes. For natural photo images a standard set of augmentations is provided with the `aug_transforms` function. Because our images are now all the same size, we can apply these augmentations to an entire batch of them using the GPU, which will save a lot of time. To tell fastai we want to use these transforms on a batch, we use the `batch_tfms` parameter (note that we're not using `RandomResizedCrop` in this example; we're also using double the amount of augmentation compared to the default):

```python
bears = bears.new(item_tfms=Resize(128), batch_tfms=aug_transforms(mult=2))
dls = bears.dataloaders(path)
dls.train.show_batch(max_n=8, nrows=2, unique=True)
```