<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_14_01_automl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# T81-558: Applications of Deep Neural Networks

**Module 14: Other Neural Network Techniques**

- Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
- For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).


# Module 14 Video Material

- **Part 14.1: What is AutoML** [[Video]](https://www.youtube.com/watch?v=1mB_5iurqzw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_14_01_automl.ipynb)
- Part 14.2: Using Denoising AutoEncoders in Keras [[Video]](https://www.youtube.com/watch?v=4bTSu6_fucc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_14_02_auto_encode.ipynb)
- Part 14.3: Training an Intrusion Detection System with KDD99 [[Video]](https://www.youtube.com/watch?v=1ySn6h2A68I&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_14_03_anomaly.ipynb)
- Part 14.4: Anomaly Detection in Keras [[Video]](https://www.youtube.com/watch?v=VgyKQ5MTDFc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_14_04_ids_kdd99.ipynb)
- Part 14.5: The Deep Learning Technologies I am Excited About [[Video]]() [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_14_05_new_tech.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.


In [None]:
# Detect Colab if present
try:
    from google.colab import drive
    COLAB = True
    print("Note: using Google CoLab")
    %tensorflow_version 2.x
except:
    print("Note: not using Google CoLab")
    COLAB = False

# Part 14.1: What is AutoML

Automatic Machine Learning (AutoML) attempts to use machine learning to automate itself. Data is passed to the AutoML application in raw form, and models are automatically generated.

## AutoML from your Local Computer

The following AutoML applications are free:

- [AutoKeras](https://autokeras.com/)
- [Auto-SKLearn](https://automl.github.io/auto-sklearn/master/)
- [Auto PyTorch](https://github.com/automl/Auto-PyTorch)
- [TPOT](http://epistasislab.github.io/tpot/)

The following AutoML applications are commercial:

- [Rapid Miner](https://rapidminer.com/educational-program/) - Free student version available.
- [Dataiku](https://www.dataiku.com/dss/editions/) - Free community version available.
- [DataRobot](https://www.datarobot.com/) - Commercial
- [H2O Driverless](https://www.h2o.ai/products/h2o-driverless-ai/) - Commercial

### AutoML from Google Cloud

There are also cloud-hosted AutoML platforms:

- [Google Cloud AutoML Tutorial](https://cloud.google.com/vision/automl/docs/tutorial)
- [Azure AutoML](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automated-ml-for-ml-models)

This module will show how to use [AutoKeras](https://autokeras.com/). First, we download the paperclips counting dataset that you saw previously in this book.


In [None]:
# HIDE OUTPUT
import os
import pandas as pd

URL = "https://github.com/jeffheaton/data-mirror/"
DOWNLOAD_SOURCE = URL+"releases/download/v1/paperclips.zip"
DOWNLOAD_NAME = DOWNLOAD_SOURCE[DOWNLOAD_SOURCE.rfind('/')+1:]

if COLAB:
  PATH = "/content"
else:
  # I used this locally on my machine, you may need different
  PATH = "/Users/jeff/temp"

EXTRACT_TARGET = os.path.join(PATH,"clips")
SOURCE = os.path.join(EXTRACT_TARGET, "paperclips")

# Download paperclip data
!wget -O {os.path.join(PATH,DOWNLOAD_NAME)} {DOWNLOAD_SOURCE}
!mkdir -p {SOURCE}
!mkdir -p {TARGET}
!mkdir -p {EXTRACT_TARGET}
!unzip -o -j -d {SOURCE} {os.path.join(PATH, DOWNLOAD_NAME)} >/dev/null

# Process training data 
df_train = pd.read_csv(os.path.join(SOURCE, "train.csv"))
df_train['filename'] = "clips-" + df_train.id.astype(str) + ".jpg"

# Use only the first 1000 images
df_train = df_train[0:1000]

One limitation of AutoKeras is that it cannot directly utilize generators. Without resorting to complex techniques, all training data must reside in RAM. We will use the following code to load the image data to RAM.


In [None]:
# HIDE OUTPUT
import tensorflow as tf
import keras_preprocessing
import glob, os
import tqdm
import numpy as np
from PIL import Image

IMG_SHAPE = (128, 128)


def load_images(files, img_shape):
    cnt = len(files)
    x = np.zeros((cnt,) + img_shape + (3,))
    i = 0
    for file in tqdm.tqdm(files):
        img = Image.open(file)
        img = img.resize(img_shape)
        img = np.array(img)
        img = img / 255
        x[i, :, :, :] = img
        i += 1
    return x


images = [os.path.join(SOURCE, x) for x in df_train.filename]
x = load_images(images, IMG_SHAPE)
y = df_train.clip_count.values

## Using AutoKeras

[AutoKeras](https://autokeras.com/) is an AutoML system based on Keras. The goal of AutoKeras is to make machine learning accessible to everyone. [DATA Lab](http://people.tamu.edu/~guangzhou92/Data_Lab/) develops it at [Texas A&M University](https://www.tamu.edu/). We will see how to provide the paperclips dataset to AutoKeras and create an automatically tuned Keras deep learning model from this dataset. This automatic process frees you from choosing layer types and neuron counts.

We begin by installing AutoKeras.


In [None]:
# HIDE OUTPUT
!pip install autokeras

AutoKeras contains several [examples](https://autokeras.com/tutorial/overview/) demonstrating image, tabular, and time-series data. We will make use of the **ImageRegressor**. Refer to the AutoKeras documentation for other classifiers and regressors to fit specific uses.

We define several variables to determine the AutoKeras operation:

- **MAX_TRIALS** - Determines how many different models to see.
- **SEED** - You can try different random seeds to obtain different results.
- **VAL_SPLIT** - What percent of the dataset should we use for validation.
- **EPOCHS** - How many epochs to try each model for training.
- **BATCH_SIZE** - Training batch size.

Setting MAX_TRIALS and EPOCHS will have a great impact on your total runtime. You must balance how many models to try (MAX_TRIALS) and how deeply to try to train each (EPOCHS). AutoKeras utilize early stopping, so setting EPOCHS too high will mean early stopping will prevent you from reaching the EPOCHS number of epochs.

One strategy is to do a broad, shallow search. Set TRIALS high and EPOCHS low. The resulting model likely has the best hyperparameters. Finally, train this resulting model fully.


In [None]:
import numpy as np
import autokeras as ak

MAX_TRIALS = 2
SEED = 42
VAL_SPLIT = 0.1
EPOCHS = 1000
BATCH_SIZE = 32

auto_reg = ak.ImageRegressor(overwrite=True, max_trials=MAX_TRIALS, seed=42)
auto_reg.fit(x, y, validation_split=VAL_SPLIT, batch_size=BATCH_SIZE, epochs=EPOCHS)
print(auto_reg.evaluate(x, y))

We can now display the best model.


In [None]:
print(type(auto_reg))
model = auto_reg.export_model()
model.summary()

This top model can be saved and either utilized or trained further.


In [None]:
from keras.models import load_model

print(type(model))

try:
    model.save("model_autokeras", save_format="tf")
except Exception:
    model.save("model_autokeras.h5")


loaded_model = load_model("model_autokeras", custom_objects=ak.CUSTOM_OBJECTS)
print(loaded_model.evaluate(x, y))