[View in Colaboratory](https://colab.research.google.com/github/Giffy/CarCrashDetector/blob/master/3_Model_Train.ipynb)

# Index
<ol>
    <li><a href="#system _setup">System setup</a>
    <li><a href="#env_dataset">Environment setup and Dataset overview</a>
    <li><a href="#GPU_RAM">Checking RAM available in GPU</a>
    <li><a href="#model_training">Model training</a>
    <li><a href="#model_analysis">Model analysis</a>
    

<a id="system_setup"> </a>
# 1. System setup

## 1.1 System info and check

GPU is required to create the model. We start checking if system has a GPU ready

Checking if GPU is available.
If error message appears, go to 'Runtime' menu in colab and in 'Change runtime type' change the hardware
acceleration from None to GPU.

In [0]:
# Check if GPU is available
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found, to activate the GPU go to menu "Runtime" and submenu "Change runtime type", then change hardware accelerator from None to GPU.')
print('Found GPU at: {}'.format(device_name))

## 1.2 Link Goggle Drive with Colab 

Run the code and follow the link to get an authentification key, copy it and paste in the box that will appear in jupyter notebook.  After fist key the script will ask for a second authentification key, follow the process as above. 



In [0]:
# Check is Link to Drive is OK
google = !if [ -d 'mydrive/' ]; then echo "1" ; else echo "0"; fi
if (google[0] is '0' ):
  from google.colab import drive
  drive.mount('/content/mydrive')
!if [ -d 'mydrive/' ]; then echo "Connection to Google drive successful" ; else echo "Error to connect to Google drive"; fi

In [0]:
#!kill -9 -1

## 1.3 Install and update python libraries
Set up of Python, installing the required modules
<ol>
   <li>Updated python package manager (pip)
   <li>Torchvision
   <li>Pillow 4.0.0 (required for fastai library)
   <li>Image
   <li>Fast ai


In [0]:
!pip install --upgrade pip  > /dev/null
!pip install scipy==1.0.0 > /dev/null
!pip install http://download.pytorch.org/whl/cu75/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl  && pip install torchvision
#!pip install Pillow==4.0.0 > /dev/null
!pip install Pillow==4.1.1 > /dev/null
#!pip install PIL  > /dev/null
!pip install image  > /dev/null
!pip install fastai==0.7.0  > /dev/null
!apt update && apt install -y libsm6 libxext6 > /dev/null

<a id="env_dataset"> </a>
# 2. Environment setup & dataset overview

## 2.1 Dataset location and folder check / creation
Define the dataset location path


In [0]:
PATH = "mydrive/CarCrashDetection/Dataset/"
!if [ -d 'mydrive/CarCrashDetection/Dataset/models' ]; then echo "Directory models already exist" ; else mkdir '!mkdir mydrive/CarCrashDetection/Dataset/models' && echo "Directory models created"; fi
!ls {PATH}

```
./dataset    
│
└─── models
│
└─── train      (80% of supervised image set)
│   └─  accident
│   │    └─ frame001.jpg
│   │       frame009.jpg
│   └─  no_accident
│        └─ frame006.jpg
│           frame052.jpg
│
└─── valid     (20% of supervised image set)
     └─  accident
     │    └─ frame041.jpg
     │       frame037.jpg
     └─  no_accident
          └─ frame025.jpg
             frame068.jpg

```

## 2.2 Dataset overview
Quick check of dataset
### Number of training examples by categories

In [0]:
valid_accident = !ls {PATH}valid/accident 
valid_no_accident = !ls {PATH}valid/no_accident
train_accident = !ls {PATH}train/accident
train_no_accident = !ls {PATH}train/no_accident

print(f"Number of valid images with accident: {len(valid_accident)}")
print(f"Number of valid images with no_accident: {len(valid_no_accident)}")
print(f"Number of train images with accident: {len(train_accident)}")
print(f"Number of train images with no_accident: {len(train_no_accident)}")

## 2.3 Quick look a no_accident image

In [0]:
import matplotlib.pyplot as plt
files = !ls {PATH}valid/no_accident | head
img = plt.imread(f'{PATH}valid/no_accident/{files[5]}')
plt.imshow(img);
img.shape

<a id="GPU_RAM"> </a>
# 3. Checking RAM available in GPU

## 3.1 Checking system configuration and RAM memory available
The system should have enough RAM to create the model, let's go to check it.

In [0]:
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil  > /dev/null
!pip install psutil  > /dev/null
!pip install humanize  > /dev/null

In [0]:
import psutil
import humanize
import os
import GPUtil as GPU

In [0]:
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn't guaranteed
gpu = GPUs[0]

def printm():
  process = psutil.Process(os.getpid())
  print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " I Proc size: " + humanize.naturalsize( process.memory_info().rss))
  print('GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB'.format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

## 3.2 If GPU RAM free is lower than 600Mb, uncomment  ! pkill python3

In [0]:
#@title
# Uncomment the below if you need to reset GPU to recover RAM
! pkill python3

<a id="model_training"> </a>
# 4. Model training

Building a successful neural network is an iterative process. We shouldn't expect to come up with a magical idea that will make a great network from the start. 

In [0]:
# Uncomment the below if you need to reset your precomputed activations
#!rm -rf {PATH}tmp

## 4.1 Load libraries
Import the fastai library

In [0]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [0]:
# This file contains all the main libs we'll use
from fastai.imports import *
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

In [0]:
PATH = "mydrive/CarCrashDetection/Dataset/"      # Use the same PATH than Environment Setup

## 4.2  Architecture
The chosen architecture is resnet34, it comes pretrained with a huge imagenet dataset and it's not too complex.

In [0]:
arch = resnet34

## 4.3  Size
Resnet34 was trained on mostly 224·224 to 299·299 images sizes. For that reason, transforming the training images to that size should result in decent result.

-  Size can be reduced to minimize that the resnet will runtime error on Colab due to gpu memory shortage

In [0]:
size = 299

## 4.4 Batch size
Batch size defines how many images we take to compute the approximated gradient for stochastic gradient descent.<br>
If it's too big it will take a long time to converge and if it's to small the predictions won't be precise enough and it may not converge.<br>
For what I've seen, 64 seems like a reasonable choice.

In [0]:
batch_size = 60

## 4.5 Image classification with Convolutional Neural Networks


In [0]:
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, size), bs=batch_size)
try:
  learn = ConvLearner.pretrained(arch, data, precompute=True)                   # Colab generates a "runtime error due to gpu memory shortage" and pretrain can not finalize
except:
  learn = ConvLearner.pretrained(arch, data, precompute=True)                   # Launched again, it will continue and will finalize pretrain properly

## If error pops up, run again this line again. It's probably due to long processing time required or gpu memory shortage.

## 4.6 Model creation

In [0]:
epoch = 5
learn.fit(0.01, epoch )

import time
hora = time.strftime("%y%m%d-%H%M")
print ("Size = " + str(size))
print ("Batch size = " + str(batch_size))
model_name = "carCrash" + hora +"_sz"+str(size)+"_bs"+str(batch_size)+"_ep"+str(epoch)
print ("Saving model as:", model_name, ".h5")
learn.save(model_name)
print ("Model saved successfully")

<a id="model_analysis"> </a>
# 5. Model analysis

In [0]:
# This is the label for a val data
data.val_y

In [0]:
# from here we know that 'accident' is label 0 and 'no_accident' is label 1.
data.classes

In [0]:
# this gives prediction for validation set. Predictions are in log scale
log_preds = learn.predict()
log_preds.shape

In [0]:
log_preds[:10]

In [0]:
preds = np.argmax(log_preds, axis=1)  # from log probabilities to 0 or 1
probs = np.exp(log_preds[:,1])        # pr(no_accident)

In [0]:
def rand_by_mask(mask): return np.random.choice(np.where(mask)[0], 4, replace=False)
def rand_by_correct(is_correct): return rand_by_mask((preds == data.val_y)==is_correct)

In [0]:
def plot_val_with_title(idxs, title):
    imgs = np.stack([data.val_ds[x][0] for x in idxs])
    title_probs = [probs[x] for x in idxs]
    print(title)
    return plots(data.val_ds.denorm(imgs), rows=1, titles=title_probs)

In [0]:
def plots(ims, figsize=(12,6), rows=1, titles=None):
    f = plt.figure(figsize=figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i])

In [0]:
def load_img_id(ds, idx): return np.array(PIL.Image.open(PATH+ds.fnames[idx]))

def plot_val_with_title(idxs, title):
    imgs = [load_img_id(data.val_ds,x) for x in idxs]
    title_probs = [probs[x] for x in idxs]
    print(title)
    return plots(imgs, rows=1, titles=title_probs, figsize=(16,8))

In [0]:
# 1. A few correct labels at random
plot_val_with_title(rand_by_correct(True), "Correctly classified")

In [0]:
# 2. A few incorrect labels at random
plot_val_with_title(rand_by_correct(False), "Incorrectly classified")

In [0]:
# 3. A few no accident labels at random
most_by_correct_no_accident = np.argsort(np.abs(probs -1))[:4]
plot_val_with_title(most_by_correct_no_accident, "No accidents")

In [0]:
# 4. A few accident labels at random
most_by_correct_accident = np.argsort(np.abs(probs -0))[:4]
plot_val_with_title(most_by_correct_accident, "accidents")

In [0]:
# 5. A few uncertain labels at random
most_uncertain = np.argsort(np.abs(probs -0.5))[:4]
plot_val_with_title(most_uncertain, "Most uncertain predictions")

In [0]:
learn = ConvLearner.pretrained(arch, data, precompute=True)

In [0]:
lrf=learn.lr_find()

In [0]:
learn.sched.plot_lr()

In [0]:
learn.sched.plot()

## Analyzing results: loss and accuracy

In [0]:
def binary_loss(y, p):
    return np.mean(-(y * np.log(p) + (1-y)*np.log(1-p)))

In [0]:
acts = np.array([1, 0, 0, 1])
preds = np.array([0.9, 0.1, 0.2, 0.8])
binary_loss(acts, preds)