# PyData 2019 Deep Learning Workshop Challenge - Pipple & 510

Before you start this challenge, make sure you save this notebook file to your own Google Drive and continue from the copied notebook.

File -> Save a copy in Drive


# Table of Contents


1.   Problem Description
2.   Retrieve data
3.   Image Data Generators
4.   Transfer Learning + Fine-Tuning a CNN (in Keras)
5.   Training a CNN (in Keras)
6.   Analysing Training Results
7.   Submit your model!


---
---
# 1. Challenge Description

Help Pipple & 510 be developing a CNN model that can classify roof materials, either concrete, metal or tiles, of individual buildings in Sint Maarten! By doing this you help 510 with their challenga of automating the classification of building characteristics within aerial imagery. Try to ﬁne-tuning a CNN (i.e. either VGG16, Inception or Xception) on the target data set and make sure you submit your results! The team that achieves the highest accuracy on the 'never been seen' test set will recieve a wonderful Pipple prize! 

The challenge contest will be due on saturday 12:00 noon! All submitted CNN models will be evaluated by Pipple on the test set. 

Use the Pydata 2019 Deep Learning Workshop Tutorial as a guideline and be creative! And last but not least; ENJOY!

Pipple and 510 look forward to your results! Good luck!



---
---

# 2. Retrieve Data

You will be fine-tuning your CNN on the train and validation set which can be retrieved by running the cell below.

In [0]:
!git clone -b materials https://github.com/PippleNL/pydata2019.git


import zipfile

for sets in ['train_materials', 'validation_materials']:

  # path to zip
  local_zip = f'pydata2019/{sets}.zip' 

  # extract zip file
  zip_ref = zipfile.ZipFile(local_zip, 'r')
  zip_ref.extractall('/tmp')
  zip_ref.close()

import os

base_dir = '/tmp/'

# data directories
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

---
---

# 3. Image Data Generators

In the cell below you can specify your own ImageDataGenerator specifications.
For more information on Keras' ImageDataGenerators please visit <sup>[5](#myfootnote1)</sup> 


---

><sup>[5](#myfootnote1)</sup> https://keras.io/preprocessing/image/#imagedatagenerator-class


In [0]:
%tensorflow_version 1.x
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.vgg16 import preprocess_input as preprocess_input_vgg16
from keras.applications.inception_v3 import preprocess_input as preprocess_input_inception
from keras.applications.xception import preprocess_input as preprocess_input_xception

seed = 42  # Make the Image Data Generator objects reproducible

def get_cnn_data_generators(cnn):
  """
  Function that returns Image Data Generator objects for the train and validation set related to a pre-trained cnn (i.e. VGG16, Inception, Xception)
  """
  # Different pre-trained CNN's have different images pre-processor functions
  if cnn == 'vgg16':
    pre_processor = preprocess_input_vgg16
  elif cnn == 'inception':
    pre_processor = preprocess_input_inception
  elif cnn == 'xception':
    pre_processor = preprocess_input_xception
  else:
    raise ValueError(f'Unknown pre-trained CNN. Got {cnn} whereas vgg16, inception or exception is expected.')


  # Below Image Data Generator object is related to the train data set
  train_datagen = ImageDataGenerator(
    # Type here your (extra) Image Data Generator specifications
    # For more information visit the link above
  )


  # Below Image Data Generator object is related to the validation set
  val_datagen = ImageDataGenerator(
      # Type here your (extra) Image Data Generator specifications
  )

  return train_datagen, val_datagen


def get_image_batches(train_datagen, val_datagen, cnn):
  """
  Takes the path to a directory & generates batches of (augmented) data reshaped to the desired input shape
  """
  # Different pre-trained CNN's use different target images sizes as input 
  if cnn == 'vgg16':
    target_size = (224, 224)  # All images will be resized to 224x224
  elif cnn == 'inception':
    target_size = (299, 299)  # All images will be resized to 299x99
  elif cnn == 'xception':
    target_size = (299, 299)  # All images will be resized to 299x299
  else:
    raise ValueError(f'Unknown pre-trained CNN. Got {cnn} whereas vgg16, inception or exception is expected.') 


  # Keras is able to directly augment and use images out of folders from the server's local file system using the flow_from_directory method
  train_generator = train_datagen.flow_from_directory(
      # Type here your (extra) flow_from_directory specifications
  )  


  # Below examples uses the validation Image Data Generator to pre-process the images from the validation_dir (on the server's local file system)
  validation_generator = val_datagen.flow_from_directory(
    # Type here your (extra) flow_from_directory specifications
  )  

  return train_generator, validation_generator

---
---

# 4. Transfer Learning + Fine-Tuning a CNN (in Keras)

In the cell below you can fine-tune a pre-trained CNN! This makes it usable for classifying roof shapes. In <sup>[6](#myfootnote1)</sup> and <sup>[7](#myfootnote1)</sup> one can find more information about pre-trained CNNs and Keras' core layers. 


---

><sup>[6](#myfootnote1)</sup> https://pure.tue.nl/ws/portalfiles/portal/125083941/Master_Thesis_Bart_van_Driel.pdf

> <sup>[7](#myfootnote1)</sup> https://keras.io/layers/core/

In [0]:
from keras.applications.vgg16 import VGG16
from keras.applications.xception import Xception
from keras.applications.inception_v3 import InceptionV3

import warnings
warnings.filterwarnings("ignore")  # surpress library warnings


def get_cnn_model(cnn):
  """
  Retrieves the parameters and architecture of pre-trained CNNs (on ImageNet data) without the top (classification) layers.
  It builds on top of these retrieve (feature extraction) layers a block of layers used for classification. 
  """
  # Retrieve the pre-trained cnn on images of imagenet without the top-classification layers
  if cnn == 'vgg16':
    base_model  = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
  elif cnn == 'inception':
    base_model  = InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
  elif cnn == 'xception':
    base_model  = Xception(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
  else:
    raise ValueError(f'Unknown pre-trained CNN. Got {cnn} whereas vgg16, inception or exception is expected.') 

  # Think about how many layers you want to train
  # ...

  # Add (and import) extra layers; end the classification block with a layer called preds
  # ...
  
  # Build the model having non-trainable pre-trained feature extration layers and trainable classification layers
  model = Model(inputs=base_model.input, outputs=preds)

  # Optionally print model summary
  # print(model.summary())

  return model 

---
---
## 5.   Training a CNN (in Keras)

Train your model by running the cells below! Think about how you want your model to be compiled, which training results should be stored (i.e. callbacks) and which fitting parameters you prefer. 

More information about this can be found in <sup>[8](#myfootnote1)</sup>, <sup>[9](#myfootnote1)</sup>, <sup>[10](#myfootnote1)</sup>, <sup>[11](#myfootnote1)</sup> and <sup>[12](#myfootnote1)</sup>.



---

><sup>[8](#myfootnote1)</sup> https://keras.io/models/sequential/

><sup>[9](#myfootnote1)</sup> https://keras.io/losses/

><sup>[10](#myfootnote1)</sup> https://keras.io/optimizers/

><sup>[11](#myfootnote1)</sup> https://keras.io/metrics/

><sup>[12](#myfootnote1)</sup> https://keras.io/callbacks/

In [0]:
def build_compile_cnn(cnn):
  """
  Builds a CNN architecture based on one of the three pre-trained CNNs (on ImageNet data) and extra added classification layers.
  Compiles this built cnn by specifying a loss function, optimizer and evaluation metric.
  """

  # Retrieve feature extraction layers of the pre-trained CNN and add newly to be trained (classification) layers
  cnn_model = get_cnn_model(cnn)


  # Compile the model
  cnn_model.compile(
    # Type here your keras compile specifications
  )  
  
  return cnn_model

In [0]:
from keras.callbacks import ModelCheckpoint, TensorBoard
from os.path import join
from os import makedirs
import time


def get_callbacks(model_name):
  """
  Instantiates different Keras Callbacks used to analyse and compare training results
  """
  # Define unique training ID such that logging events are stored in a unique folder
  model_id = time.strftime('%Y-%m-%d_%H-%M-%S')

  
  # Callback that saves the model after every epoch
  os.makedirs(join(base_dir, 'models'), exist_ok=True)  # create models directory if not already present
  callback_model = ModelCheckpoint(filepath=join(base_dir, 'models', f'{model_name}_{model_id}.hdf5'),  # Make sure you save the model in /tmp/models !
                                   save_best_only=True, save_weights_only=False  # Make sure you set save_weights_only to FALSE and save_best_only to TRUE! This eases the submitting process
                                   # Type here your (extra) callback specifications
                                  )
  
  # Callback that writes a log for TensorBoard, which allows you to visualize dynamic graphs of your training and test metrics
  # as well as activation histograms for the different layers in your model. 
  # (https://www.tensorflow.org/tensorboard/)
  callback_tensorboard = TensorBoard(log_dir=join(base_dir, 'logs', f'{model_name}_{model_id}'), 
                                     # Type here your (extra) callback specifications
                                     )

  return [callback_model, callback_tensorboard]


In [0]:
# Define the pre-trained CNN that will be fine-tuned on roof shapes
pre_trained_cnn = ...  # 'vgg16', 'inception' or 'xception'


# Build and compile a to be trained cnn model (i.e. includes the model architecture, initial parameters, optimizer, loss and evaluation function)
cnn_model = build_compile_cnn(cnn=pre_trained_cnn)


# Create corresponding train and validation data generators that specify augmentation rules
train_data_gen, vali_data_gen = get_cnn_data_generators(cnn=pre_trained_cnn)


# Create generators to generate batches of augmented train data using the flow_from_directory function in Keras
train_batch_gen, vali_batch_gen = get_image_batches(train_data_gen, 
                                                    vali_data_gen, 
                                                    cnn=pre_trained_cnn)

# Create callbacks that log training results
callbacks = get_callbacks(model_name=pre_trained_cnn)


# Fit the CNN to training and validation data
trained_cnn = cnn_model.fit_generator(
    # Type here your fit_generator specifications
    )

---
---

## 6.   Analysing Training Results

Analyse your results using TensorBoard! The rnning the cell below one activates this interactive tool. Make sure that the --log_dir parameter is configured to be directory in which all TensorBoard log event folders (1 folder for each trained model) are stored.

In [0]:
%load_ext tensorboard
%tensorboard --logdir {join(base_dir, 'logs')}

---
--- 

# 7. Submit your model!

Submit your preferred trained cnn by running the code below. Make sure you specify a team name and add the name of the pre-trained basemodel to the sumbit_filename parameter. Of course you are allowed to submit more than 1 model.

In [0]:
from os.path import join
import zipfile
import pickle
import requests

# Specify your team name
team_name = ...

# Specify your trained model that will be submitted; hint: TensorBoard lists all modelnames under 'Runs'
submit_model = ... + '.hdf5'  # Type here the modelname of the preferred cnn in String (e.g. 'xception_2019-11-19_13-50-08')
path2submit_model = join(base_dir, 'models', submit_model)  # this follows from above defined parameters
# If you are not sure how you model is named; run code !ls {join(base_dir, 'models')}

# compress model to zipfile
with zipfile.ZipFile(f'{path2submit_model}.zip', mode='w') as zf:
  zf.write(path2submit_model, compress_type=zipfile.ZIP_DEFLATED)

with open('pydata2019/signed_url_dict' + '.pkl', 'rb') as f:
  s3 = pickle.load(f)

with open(f'{path2submit_model}.zip', 'rb') as f:
    files = {'file': (submit_model, f)}
    data = {'key': f'models/{team_name}/{submit_model}'}
    http_response = requests.post(s3['url'], data=data, files=files)

if http_response.status_code == 204:
  print('Submission sent!')
else:
  print('Error occurred! Please try again.')