In [None]:
# running first notebook
# https://docs.fast.ai/tutorial.vision.html
from fastai.vision.all import *

# https://nbviewer.org/github/fastai/fastbook/blob/master/01_intro.ipynb
# dataset called Oxfort-IIIT Pet Dataset, 7349 images of cats and dogs form 37 different breeds
path = untar_data(URLs.PETS)

# path = C:\Users\jerem\.fastai\data\oxford-iiit-pet
# path.ls() = [Path('C:/Users/jerem/.fastai/data/oxford-iiit-pet/annotations'), Path('C:/Users/jerem/.fastai/data/oxford-iiit-pet/images')]
print(path.ls())

# get_image_files helps us get all images recursively in one folder
files = get_image_files(path/"images")
print(len(files))
print(files[0])

# based on filenames, cats have capital first letter, dogs have lowercase first letter
def is_cat(filename):
    return filename[0].isupper()

# dataloaders, we need this for feeding data to model
# item_tfms is a Transform applied to all items of dataset
# Resize(224) will resize image to 224 by 224, using a random crop on the largest dimension to make it a square
# so that we can batch items together
dls = ImageDataLoaders.from_name_func(path, fnames=files, valid_pct=0.2, seed=42, label_func=is_cat, item_tfms=Resize(224))

# check if dataloaders look okay
dls.show_batch()

# pretrained model will be used, it will be fine-tuned using transfer learning, to create a model specially for recognizing dogs and cats

# Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to C:\Users\jerem/.cache\torch\hub\checkpoints\resnet34-b627a593.pth
# 100%|██████████| 83.3M/83.3M [00:01<00:00, 78.7MB/s]
learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

# epoch 	train_loss 	valid_loss 	error_rate 	time
# 0 	0.134439 	0.029362 	0.008119 	01:16
# epoch 	train_loss 	valid_loss 	error_rate 	time
# 0 	0.052154 	0.020385 	0.006089 	02:17
# error rate is the proportion of images that were incorrectly identified

[Path('C:/Users/jerem/.fastai/data/oxford-iiit-pet/annotations'), Path('C:/Users/jerem/.fastai/data/oxford-iiit-pet/images')]
7390
C:\Users\jerem\.fastai\data\oxford-iiit-pet\images\Abyssinian_1.jpg


epoch,train_loss,valid_loss,error_rate,time
0,0.134439,0.029362,0.008119,01:16


epoch,train_loss,valid_loss,error_rate,time


In [1]:
import torch
print(torch.cuda.is_available())

True


In [1]:
import ipywidgets as widgets
uploader = widgets.FileUpload()
uploader

FileUpload(value=(), description='Upload')

In [2]:
from fastai.vision.all import PILImage

print(uploader)

def is_cat_image(image_data):
    upload_img = PILImage.create(image_data)
    show_image(upload_img.to_thumb(224))
    is_cat, _, probs = learn.predict(upload_img)
    print(f"Is this a cat?: {is_cat}.")
    print(f"Probability it's a cat: {probs[1].item():.6f}")

if uploader.data[0]:
    is_cat_image(uploader.data[0])

FileUpload(value=(), description='Upload')


AttributeError: 'FileUpload' object has no attribute 'data'

# What is a neural network?

We would like some function that is so flexible that it could be used to solve any given problem, just by varying its weights
- this is the neural network
- based on mathematical proof called universal approximation theorem

We can focus on finding good weight assignments (training the neural network)

We also need a completely general way to update the weights of a neural network to make it improve at any given task
- Stochastic gradient descent (SGD)
- Universal approximation theorem

To recap
- neural network is a particular kind of machine learning model
- neural networks are special because they are highly flexible, can solve wide range of problems by just finding the correct weights
- Stochastic gradient descent (SGD) helps us to find those weight values automatically

# Limitations inherent to machine learning
- a model cannot be created without data
- a model can only learn to operate on the patterns seen in the input data used to train it
- this learning approach only creates predictions, not recommended actions
- not enough to just have examples of input data, we need labels for that data also. (e.g. picture of cats and dogs are not enough to train the model, we need a label for each picture, saying which one are dogs, and which are cats)

Generally, most organizations say they don't have enough data, what they mean is labelled data

Predictions - might be recommending things user might already know, for recommendation systems

How a model interacts with its environment might create feedback loops
- predictive policing model based on where arrests have been made in the past. Not actually predicting crime, but rather predicting arrests, reflecting biases in existing policing processes
- Law enforcement officers might use this model to decide where to focus their police activity, resulting in increased arrests in those areas
- Data on these additional arrests would be fed back to retrain future versions of the model

This is a positive feedback loop, where the more the model is used, the more biased the data becomes, making the model more biased and so forth

# How our image recognizer works
https://nbviewer.org/github/fastai/fastbook/blob/master/01_intro.ipynb

```python
from fastai.vision.all import *
```
- Give us all the functions and classes that we need to create a wide variety of computer vision models
- A lot of coders recommend avoiding importing a whole library like this (import *) because it can cause problems in large software projects
- For interactive work in Jupyter notebook, it works great

```python
path = untar_data(URLs.PETS)/'images'
```
- download a standard dataset from the fast.ai datasets collection (if not previously downloaded) and extracts it (if not previously extracted), and returns a Path object with the extracted location
- https://docs.fast.ai/data.external.html#datasets

```python
def is_cat(filename):
    return filename[0].isupper()
```
- we define a function that labels cats based on a filename rule provided by the dataset creators

```python
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224)
)
```
- Tells fastai what kind of dataset we have and how it is structured
- First part of the class name will be generally the type of data you have (e.g. image or text)
- We need to tell fastai how to get the labels from the dataset. Computer vision datasets are normally structured in a way that the label for an image is part of the filename, or path - commonly the parent folder name
- Define the Transform that we need. A transform contains code that is applied automatically during training. item_tfms are applied to each item, while batch_tfms are applied to a batch of items at a time using the GPU, so they are particularly fast
    - Why Resize(224) for the item_tfms? It is the standard size for historical reasons. If you increase the size, you often get a model that can have better results (it can focus on more details), but at the price of speed and memory consumption. Opposite is true if you decrease the size    
-  Classification model is one that attempts to predict a class, or category (predicting from a number of discrete possibilities, e.g. cat or dog
- Regression model is one that attempts to predict one or more numeric quantities such as temperature or a location
- from_name_func method tells fastai that the label can be extracted using a function applied to the filename
- valid_pct=0.2 is the most important parameter. Tells fastai to hold out 20% of data and not use it for training the model at all. This is the validation set. The remaining 80% is the training set.
    - The validation set is used to measure the accuracy of the model
    - By default, the 20% validation set is selected randomly
        - The parameter "seed=42" sets the random seed to the same value, so that we get the same validation set. If we change our model and retrain it with this same seed, we know any differences are due to changes in model and not due to the random validation set
- fastai will ALWAYS show you the model's accuracy using the validation set, NEVER the training set. This is critical, if you train a large enough model for a long enough time, it eventually memorize the label of every item in your dataset. We only care about how well our model works on previously unseen images 
- During training, the longer you train for, the better the accuracy on the training set and the validation set, but eventually, it will start to get worse as the model starts to memorize the training set rather than finding generalizable underlying patterns in the data
    - This is when the model is overfitting 
- Overfitting is the single most important and challenging issue when training for all machine learning practitioners, and all algorithms 
    - Easy to create a model that does a great job on the data that it is trained on, but much harder to make accurate predictions on the data the model has never seen before
    - Overfitting -> when validation accuracy is getting worse during training 
        - Often see practitioners using over-fitting avoidance techniques even when they have enough data that they didn't need to do so, ending up with a model that may be less accurate than what they could have achieved.

 important: Validation Set: When you train a model, you must always have both a training set and a validation set, and must measure the accuracy of your model only on the validation set. If you train for too long, with not enough data, you will see the accuracy of your model start to get worse; this is called overfitting. fastai defaults valid_pct to 0.2, so even if you forget, fastai will create a validation set for you!
 
```python
learn = vision_learner(dls, resnet34, metrics=error_rate)
```
- Tell fastai to create a convolutional neural network (CNN) and specify which architecture to use (i.e what kind of model to create), what data we want to train it on, what metric to use
- CNN iss a current state of the art approach to create computer vision models. inspired by the way human vision system works
- Resnet34, 34 refers to the number of layers in this variant of the architecture (other options are 18, 50, 101, 152).
- Models using more layers take longer to train, and are more prone to overfitting (i.e. you can't train them for as many epochs before the accuracy on the validation set starts to get worse). When using more data, they can be quite a bit more accurate (more layers)

## What is a metric?
- Metric is a function that measures the quality of the model's predictions using the validation set. Will be printed at the end of each epoch
- In this case, we are using error_rate, provided function by fastai, that tells what percentage of images in the validation set are being classified incorrectly.
- Another common metric is accuracy, which is just 1.0 - error_rate
- Concept of metric might remind you of loss, but there is an important distinction. Entire purpose of loss is to define a "measure of performance" that the training system can use to update weights automatically. A good choice for loss is a choice that is easy for stochastic gradient descent to use. But a metric is defined for human consumption, so a good metric is one that is easy for you to understand, and that hews as closely as possible to what you want the model to do. At times, you might decide that the loss function is a suitable metric, but that is not necessarily the case

`vision_learner` has a parameter `pretrained` that defaults to `True`, which sets weights in your model to values that have been recognized by experts to recognize a thousand different categories across 1.3 million photos (using ImageNet dataset).
- A model that has weights that have already been trained on some other dataset is called a pretrained model
- You should nearly always use a pretrained model, it means that your model, is already very capable (even before showing it any of your data)
- When using a pretrained model, `vision_learner` will remove the last layer, since that is always customized to the original training task, and replace it with one or more new layers with randomized weights, of an appropriate size for the dataset you are working with. This last part of the model is known as the `head`

Using a pretrained model for a task different from what it was originally trained for is known as `transfer learning`

Sixth line of code tells fastai how to fit the model
```python
learn.fine_tune(1)
```
- The architecture only describes a template for a mathematical function, it doesn't actually do anything until we provide values for the millions of parameters it contains
- Key to deep learning - determine how to fit the parameters of a model to get it to solve your problem
- In order to fit a model, we have to provide at least one piece of info, how many times to look at each image (known as number of epochs). Number you select will depend on how much time you have available, and how long you find it takes in practice to fit your model. If you selected a too small number, you can always train for more epochs later
- Why is it called fine_tune and not fit? There is a fit method that fits a model (i.e look at images in the training set multiple times, each time updating the parameters to make the predictions closer and closer to the target labels). But in this case, we started with a pretrained model, and we don't want to throw away all those capabilities that it already has

Fine-tuning: A transfer learning technique where the parameters of a pretrained model are updated by training for additional epochs using a different task to that used for pretraining

When you use fine_tune method, fastai will use these tricks for you
There are a few parameters you can set, which we will discuss later, but in the default form shown, it does two steps:
1) Use one epoch to fit just those parts of the model necessary to get the new random head (last layer) to work correctly with your dataset
2) Use the number of epochs requested when calling the method to fit the entire model, updating the weights of the later layers (especially the head) faster than the earlier layers (generally don't require many changes from the pretrained weights)

The head of a model is the part that is newly added to be specific to the new dataset

An epoch is one complete pass through the dataset

After calling `fit`, the results after each epoch are printed, showing the epoch number, the training and validation set losses ("measure of performance" used for training the model), and any metrics you have requested (error_rate in this case)

Suggests a good rule of thumb for converting a dataset into an image representation: if the human eye can recognize categories from the images, then a deep learning model should be able to do so too.

In general, you'll find that a small number of general approaches in deep learning can go a long way, if you're a bit creative in how you represent your data! You shouldn't think of approaches like the ones described here as "hacky workarounds," because actually they often (as here) beat previously state-of-the-art results. These really are the right ways to think about these problem domains.

`fit_one_cycle` is the most common method for training fastai models from scratch (without transfer learning)
- e.g. used in tabular data, where there isn't really a pretrained model available for the task


## Sidebar: Datasets: Food for Models

You’ve already seen quite a few models in this section, each one trained using a different dataset to do a different task. In machine learning and deep learning, we can’t do anything without data. So, the people that create datasets for us to train our models on are the (often underappreciated) heroes. Some of the most useful and important datasets are those that become important academic baselines; that is, datasets that are widely studied by researchers and used to compare algorithmic changes. Some of these become household names (at least, among households that train models!), such as MNIST, CIFAR-10, and ImageNet.

The datasets used in this book have been selected because they provide great examples of the kinds of data that you are likely to encounter, and the academic literature has many examples of model results using these datasets to which you can compare your work.

Most datasets used in this book took the creators a lot of work to build. For instance, later in the book we’ll be showing you how to create a model that can translate between French and English. The key input to this is a French/English parallel text corpus prepared back in 2009 by Professor Chris Callison-Burch of the University of Pennsylvania. This dataset contains over 20 million sentence pairs in French and English. He built the dataset in a really clever way: by crawling millions of Canadian web pages (which are often multilingual) and then using a set of simple heuristics to transform URLs of French content onto URLs pointing to the same content in English.

As you look at datasets throughout this book, think about where they might have come from, and how they might have been curated. Then think about what kinds of interesting datasets you could create for your own projects. (We’ll even take you step by step through the process of creating your own image dataset soon.)

fast.ai has spent a lot of time creating cut-down versions of popular datasets that are specially designed to support rapid prototyping and experimentation, and to be easier to learn with. In this book we will often start by using one of the cut-down versions and later scale up to the full-size version (just as we're doing in this chapter!). In fact, this is how the world’s top practitioners do their modeling in practice; they do most of their experimentation and prototyping with subsets of their data, and only use the full dataset when they have a good understanding of what they have to do.


## Validation Sets and Test Sets

Each of the models we trained showed a training and validation loss. A good validation set is one of the most important pieces of the training process

If we train our model with all of our data, then evaluated the model using the same data, we won't be able to tell how well our model performed on data it hasn't seen

To avoid this, we first split our dataset into two sets: training set (model sees in training) and the validation set (development set) which is only used for evaluation
- Lets us test that the model learns lessons from the training data that generalize to new data -> validation data

Splitting off our validation data means our model never sees it in training and so is completely untainted by it and is in no way cheating right?
- NO, not necessarily. In realistic scenarios, we rarely build a model just by training its weight parameters once. We likely explore many versions of a model through various modelling choices like network architecture, learning rates, data augmentation strategies and other factors. Many of these choices are described as choices of *hyperparameters*
- Hyperparameters -> parameters about parameters. They are the higher level choices that govern the meaning of the weight parameters

The problem is that even though the ordinary training process is only looking at predictions on the training data when it learns values for the weight parameters, the same is not true of us. We, as modelers, are evaluating the model by looking at predictions on the validation data when we decide to explore new hyperparameter values! So subsequent versions of the model are, indirectly, shaped by us having seen the validation data. Just as the automatic training process is in danger of overfitting the training data, we are in danger of overfitting the validation data through human trial and error and exploration.

The solution is to introduce another level of even more highly reserved data -> test set
Just as we hold back the validation data from the training process, we must hold back the test data set from even ourselves
IT CANNOT BE USED TO IMPROVE THE MODEL, it can only be used to evaluate the model at the end of our efforts
We define a hierarchy of cuts of our data, based on how fully we want to hide it from training and modeling processes.
- Training data is fully exposed, validation data is less exposed, test data is totally hidden

The test and validation sets should have enough data to ensure that you get a good estimate of your accuracy
e.g. you want 30 cats in validation set for a cat detector
So if you have thousands of items in dataset, using 20% validation set size may be more than you need

Having two levels of "reserved data" - test set (totally hidden until final test) and validation set for tuning, may seem extreme. It is usually necessary as models tend to gravitate towards the simplest way to do good predictions (memorization) and we as fallible humans tend to gravitate toward fooling ourselves about how well our models are performing. The discipline of the test set helps us keep ourselves intellectually honest. That doesn't mean we always need a separate test set—if you have very little data, you may need to just have a validation set—but generally it's best to use one if at all possible.

This same discipline can be critical if you intend to hire a third party to perform modeling work on your behalf. A third party might not understand your requirements accurately, or their incentives might even encourage them to misunderstand them. A good test set can greatly mitigate these risks and let you evaluate whether their work solves your actual problem.

To put it bluntly, if you're a senior decision maker in your organization (or you're advising senior decision makers), the most important takeaway is this: if you ensure that you really understand what test and validation sets are and why they're important, then you'll avoid the single biggest source of failures we've seen when organizations decide to use AI. For instance, if you're considering bringing in an external vendor or service, make sure that you hold out some test data that the vendor never gets to see. Then you check their model on your test data, using a metric that you choose based on what actually matters to you in practice, and you decide what level of performance is adequate. (It's also a good idea for you to try out some simple baseline yourself, so you know what a really simple model can achieve. Often it'll turn out that your simple model performs just as well as one produced by an external "expert"!)

## Use Judgement in Defining Test Sets
To define good validation set and test set, you will sometimes want to do more than just randomly grab a fraction of your original dataset
Key property is that they have to representative of the new data you will see in the future
Many example cases are from predictive modelling competitions on the Kaggle platform

One case might be if you are looking at time series data. For time series, choosing a random subset of the data will be both too easy (look at the data both before and after the dates you are trying to predict) and not representative of most business use case (using historical data to build a model for use in future)
If your data includes the date and you are building a model to use in the future, you will want to choose a continuous section with the latest dates as your validation set (e.g. last two weeks or last month of available data)

Use the earlier data as your training set, and the laster data for the validation set

For example, Kaggle had a competition to predict the sales in a chain of Ecuadorian grocery stores. Kaggle's training data ran from Jan 1 2013 to Aug 15 2017, and the test data spanned Aug 16 2017 to Aug 31 2017. That way, the competition organizer ensured that entrants were making predictions for a time period that was in the future, from the perspective of their model. This is similar to the way quant hedge fund traders do back-testing to check whether their models are predictive of future periods, based on past data.



A second common case is when you can easily anticipate ways the data you will be making predictions for in production may be qualitatively different from the data you have to train your model with.

In the Kaggle distracted driver competition, the independent variables are pictures of drivers at the wheel of a car, and the dependent variables are categories such as texting, eating, or safely looking ahead. Lots of pictures are of the same drivers in different positions, as we can see in <<img_driver>>. If you were an insurance company building a model from this data, note that you would be most interested in how the model performs on drivers it hasn't seen before (since you would likely have training data only for a small group of people). In recognition of this, the test data for the competition consists of images of people that don't appear in the training set.
