# Welcome to Computer Vision! #

Have you ever wanted to teach a computer to see? In this micro-course, that's exactly what you'll do!

<center>
<!-- <img src="./images/1-header.png" width="1600" alt="Header illustration: a line of cars."> -->
<img src="" width="1600" alt="Header illustration">
</center>

In this micro-course, you'll:
- Use modern deep-learning networks to build an **image classifier** with Keras!
- Design your own **custom convnet** with reusable blocks!
- Learn the fundamental ideas behind visual **feature extraction**!
- Master the art of **transfer learning** to boost your models!
- Utilize **data augmentation** to extend your dataset!

If you've taken the *Introduction to Deep Learning* micro-course, you'll know everything you need to be successful.

Now let's get started!

# Introduction #

This course will introduce you to the fundamental ideas of computer vision. Our goal is to learn how a neural network can "understand" a natural image well-enough to solve the same kinds of problems the human visual system can solve.

The neural networks that are best at this task are called **convolutional neural networks** (Sometimes we say **convnet** or **CNN** instead.) Convolution is the mathematical operation that gives the layers of a convnet their unique structure. In future lessons, you'll learn why this structure is so effective at solving computer vision problems.

We will apply these ideas to the problem of **image classification**: given a picture, can we train a computer to tell us what it's a picture *of*? You may have seen [apps](https://identify.plantnet.org/) that can identify a species of plant from a photograph. That's an image classifier! In this micro-course, you'll learn how to build image classifiers just as powerful as those used in professional applications.

While our focus will be on image classification, what you'll learn in this micro-course is relevant to every kind of computer vision problem. At the end, you'll be ready to move on to more advanced applications like [generative adversarial networks](https://www.kaggle.com/tags/gan) and [image segmentation](https://www.kaggle.com/tags/object-segmentation).

# The Convolutional Classifier #

A convnet used for image classification consists of two parts: a **convolutional base** and a **dense head**.

<center>
<!-- <img src="./images/1-parts-of-a-convnet.png" width="600" alt="The parts of a convnet: image, base, head, class; input, extract, classify, output.">-->
<img src="https://i.imgur.com/U0n5xjU.png" width="600" alt="The parts of a convnet: image, base, head, class; input, extract, classify, output.">
</center>

The base is used to **extract the features** from an image. It is formed primarily of layers performing the convolution operation, but often includes other kinds of layers as well. (You'll learn about these in the next lesson.)

The head is used to **determine the class** of the image. It is formed primarily of dense layers, but might include other layers like dropout. 

What do we mean by visual feature? A feature could be a line, a color, a texture, a shape, a pattern -- or some complicated combination.

The whole process goes something like this:

<center>
<!-- <img src="./images/1-extract-classify.png" width="600" alt="The idea of feature extraction."> -->
<img src="https://i.imgur.com/UUAafkn.png" width="600" alt="The idea of feature extraction.">
</center>

The features actually extracted look a bit different, but it gives the idea.

# Training the Classifier #

The goal of the network during training is to learn two things:
1. which features to extract from an image (base),
2. which class goes with what features (head).

These days, convnets are rarely trained from scratch. More often, we **reuse the base of a pretrained model**.

To the pretrained base we then **attach an untrained head**. Because the base has already learned to extract useful features, we then only need to train the head to classify the images in the new dataset.

<center>
<!-- <img src="./images/1-attach-head-to-base.png" width="400" alt="Attaching a new head to a trained base."> -->
<img src="https://imgur.com/E49fsmV.png" width="400" alt="Attaching a new head to a trained base.">
</center>

Because the head usually consists of only a few dense layers, very accurate classifiers can be created from relatively little data. 

Reusing a pretrained model is a technique known as **transfer learning**. It is so effective, that almost every image classifier these days will make use of it.

# Example - Train a Convnet Classifier #

Let's walk through an example. The dataset we'll use is derived from the **Stanford Cars** dataset. Our dataset contains about 10,000 images of various models of automobile. Our task will be to determine whether the image is of a **Car** or of a **Truck**.

The steps are basically the same as you learned about in the introductory course.

## Step 1 - Load Data ##

In [None]:
#$HIDE_INPUT$
# Imports
import os
import warnings
import numpy as np
import visiontools
from visiontools import StanfordCars
import tensorflow as tf
import tensorflow_datasets as tfds

# Reproducibility
def seed_everything(seed=31415):
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'

seed = 31415
seed_everything(seed)
warnings.filterwarnings("ignore")

# Load training and validation sets
DATA_DIR = '/kaggle/input/stanford-cars-for-learn/'
(ds_train_, ds_valid_), ds_info = tfds.load(
    'stanford_cars/simple',
    split=['train', 'test'],
    shuffle_files=True,
    with_info=True,
    data_dir=DATA_DIR,
)
print(("Loaded {} training examples " +
       "and {} validation examples " +
       "with classes {}.").format(
           ds_info.splits['train'].num_examples,
           ds_info.splits['test'].num_examples,
           ds_info.features['label'].names))

# Create data pipeline
BATCH_SIZE = 16
AUTO = tf.data.experimental.AUTOTUNE
SIZE = [192, 192]
preprocess = visiontools.make_preprocessor(size=SIZE)

ds_train = (ds_train_
            .map(preprocess)
            .cache()
            .shuffle(ds_info.splits['train'].num_examples)
            .batch(BATCH_SIZE)
            .prefetch(AUTO))

ds_valid = (ds_valid_
            .map(preprocess)
            .cache()
            .shuffle(ds_info.splits['test'].num_examples)             
            .batch(BATCH_SIZE)
            .prefetch(AUTO))

The first step is to prepare your dataset. We've already preloaded a training split `ds_train` and a validation split `ds_valid`, so let's skip the details of loading data for now and look at a few examples in the training split.

In [None]:
#$HIDE_INPUT$
import matplotlib.pyplot as plt
tfds.show_examples(ds_train_, ds_info);

## Step 2 - Define Pretrained Base ##

The most commonly used dataset for pretraining is [*ImageNet*](http://image-net.org/about-overview), a large dataset of many kind of natural images. Keras includes a variety models pretrained on ImageNet in its [`applications` module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

The pretrained model we'll use is called **VGG16**.

In [None]:
pretrained_base = tf.keras.models.load_model(
    '/kaggle/input/cv-course-models/cv-course-models/vgg16-pretrained-base',
)
pretrained_base.trainable = False

## Step 3 - Attach Head ##

Next, we attach the classifier head. For this example, we'll use a layer of hidden units (the first `Dense` layer) followed by a layer to transform the outputs to a probability score for class 1, `Truck`. The `Flatten` layer transforms the multidimensional outputs of the base into the one dimensional inputs needed by the head.

In [None]:
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

model = keras.Sequential([
    pretrained_base,
    layers.Flatten(),
    layers.Dense(6, activation='relu'),
    layers.Dense(1, activation='sigmoid'),
])

## Step 4 - Train ##

Finally, we'll train the model. First, compile the model with the training parameters: an `optimizer`, a `loss` function, and any desired `metrics`. (Review Lesson 5 of *Introduction to Deep Learning* if this is unfamiliar.)

In [None]:
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy'],
)

history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=10,
)

## Step 5 - Evaluate ##

When training a neural network, it's always a good idea to examine the loss and metric plots. The `history` object contains this information in a dictionary `history.history`. We can use Pandas to convert this dictionary to a dataframe and plot it with a built-in method.

In [None]:
import pandas as pd

history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['accuracy', 'val_accuracy']].plot();

# Conclusion #

In this lesson, we learned about the structure of a convnet classifier: a **head** to act as a classifier atop of a **base** which performs the feature extraction.

The head, essentially, is an ordinary classifier like you learned about in the introductory course. For features, it uses those features extracted by the base. This is the basic idea behind CNN image classifiers: that we can attach a unit that performs feature engineering to the classifier itself.

This is one of the big advantages deep neural networks have over traditional machine learning models: given the right network structure, the deep neural net can learn how to engineer the features it needs to solve its problem.

Over this course, we're going to explore this convolutional base and how it performs this feature engineering. Then you'll learn how to apply these ideas to design classifiers even better than the one from this lesson.