<a href="https://colab.research.google.com/github/DylanLoader/641/blob/master/lesson_1_nb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fast.ai Lesson 1

The main focus of this lesson is image classification. 


In [0]:
# Import necessary packages
from fastai import *
from fastai.vision import *
from fastai.metrics import error_rate

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

## Dataset Description

The datasets used for this lesson are from the Oxford, [Oxford-IIIT pet dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/). My github repo will not include the dataset as linking and untaring the data will be far more efficient given the combined size of the files ~800mb. The URLs command wrapped in untar_data downloads the dataset directly from the Oxford data source.

The dataset includes images of 37 categories of pets with about 200 images per category. Of the 37 categories 15 are dog breeds and 12 are cat breeds. 


In [4]:
# Define the local path 
# URL.PETs is a predefined data location (constant) in the fast.ai package
path = untar_data(URLs.PETS); path

Downloading https://s3.amazonaws.com/fast-ai-imageclas/oxford-iiit-pet


PosixPath('/root/.fastai/data/oxford-iiit-pet')

In [5]:
# Check the path to make sure the images and annotations are available 
# Note path.ls() is an additional method added on to the path object in fast.ai
path.ls()

# Set sub-paths for annotations and images
path_anno = path/'annotations'
path_img = path/'images'

# Grabs an array of all of the image files in a path.
fnames = get_image_files(path_img)
fnames[:5] # look at the first 5 images

[PosixPath('/root/.fastai/data/oxford-iiit-pet/images/miniature_pinscher_20.jpg'),
 PosixPath('/root/.fastai/data/oxford-iiit-pet/images/Abyssinian_132.jpg'),
 PosixPath('/root/.fastai/data/oxford-iiit-pet/images/english_setter_96.jpg'),
 PosixPath('/root/.fastai/data/oxford-iiit-pet/images/boxer_67.jpg'),
 PosixPath('/root/.fastai/data/oxford-iiit-pet/images/Persian_120.jpg')]

In [0]:
np.random.seed(1775) # Set a random seed so our results will be replicable
batch_size = 64 # Set batch size to 64, if your gpu runs out of memory, reduce batchsize

### A Quick Explanation of The Following Regex

References: 

1. https://docs.python.org/3/library/re.html
2. https://regex101.com/

In the following code chunk we will see the string: *r'/([^/]+)_\d+.jpg$'*

1. In the regex string we use **r'\<something>'** to tell Python that this is a 
raw string. In a raw string, escape sequences such as newlines( **\n** ) are not parsed. 

2. The round parenthesis tells where the regex expression begins and ends **(\<something>)**

3. We use square parenthesis **[\<something>]** to tell regex to consider all enclosed characters as a single set.

4. **+** tells regex to match 1 or more repetitions of the preceeding structure defined in the set []. 

5. We then add **_** since all strings have the form ".../\<breed type>_\<number>.jpg"

6. **d+** tells regex to match 1 or more unicode decimal digits. 

7. **.jpg$** tells regex that each string will end in the .jpg extension.


We can see how the regex expression is used in fast.ai by looking at the class method for [from_name_re](https://github.com/fastai/fastai/blob/master/fastai/vision/data.py):



    @classmethod
    def from_name_re(cls, path:PathOrStr, fnames:FilePathList, pat:str, valid_pct:float=0.2, **kwargs):
        "Create from list of `fnames` in `path` with re expression `pat`."
        pat = re.compile(pat)
        def _get_label(fn):
            if isinstance(fn, Path): fn = fn.as_posix()
            res = pat.search(str(fn))
            assert res,f'Failed to find "{pat}" in "{fn}"'
            return res.group(1)
        return cls.from_name_func(path, fnames, _get_label, valid_pct=valid_pct, **kwargs)

We pass in a pattern, which is then compiled to a regex object. Then the pattern is searched for in the filename. The method ".group(1)" grabs the breed name segment of the file name. 

In [14]:
# Regex Example:
test_string = '/root/.fastai/data/oxford-iiit-pet/images/Persian_120.jpg'
test_pattern = r'/([^/]+)_\d+.jpg$'
re.split(test_pattern,test_string)

['/root/.fastai/data/oxford-iiit-pet/images', 'Persian', '']

In [0]:
pat = r'/([^/]+)_\d+.jpg$'
data = ImageDataBunch.from_name_re(path_img, fnames, 
                                   pat, 
                                   ds_tfms=get_transforms(), 
                                   size=224, bs=batch_size
                                  ).normalize(imagenet_stats)

data.show_batch(rows=3, figsize=(7,6))
print(data.classes)
len(data.classes),data.c

In [5]:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)

Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /root/.cache/torch/checkpoints/resnet34-333f7ec4.pth
100%|██████████| 83.3M/83.3M [00:03<00:00, 25.5MB/s]


In [6]:
# Check out the model architecture
learn.model

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  

In [7]:
learn.fit_one_cycle(4)

epoch,train_loss,valid_loss,error_rate,time
0,1.389594,0.348982,0.11502,01:16
1,0.60649,0.271823,0.093369,01:16
2,0.383173,0.243453,0.079838,01:16
3,0.275358,0.227401,0.079838,01:16


In [0]:
# Save the results of this learning cycle
learn.save('stage-1')