# Machine Learning For Lithography
## Unit IV : Convolutional Neural Networks

### Introduction

The Convolutional Neural Network (CNN) is a class of neural networks that are able to be trained efficiently for many problems, through exploitation of sparse connectivity between the neurons.  They are particularly useful for many communication problems where there is some justification for a strong prior that data can be organized in a list or matrix such that data outside of a local neighborhood can be assumed to be associated with zero weight.

A CNN architecture is attractive for a photoresist model because we have strong prior beliefs that the most important information about the photoresist contour position can be found near the rising and falling edges of the aerial image signal., and that the importance of the aerial image signal with regard to a specific contour point position diminishes with distance.

Like the linear classifier, the network is trained with a  a dataset D consisting of training input data samples X and labels y(X).  

Also, the acceptable values of y are in a finite set of N different classifications C={c1, c2, ...cN}.

In the script we  will call the number of different classifications "n_classes."

The number of features per input sample X is 48x48 = 2304.


### Preamble
This section imports some necessary packages and helper functions that enable our script.

Of particular importance is TensorFlow, here imported as "tf," which is the nickname by which we will be able to access it in our script.  TensorFlow is our machine learning framework, enabling definition of the model form, definition of the training and validation procedures, definition of the model prediction method, and implementation of the training and prediction procedures.

We also import numpy, which we will reference with the nickname "np".  The name "numpy" is short for "numerical python".  The numpy package is a critical cornerstone of the data science workflow, providing intuitivce and interactive support for arrays in a fashion that will be familiar to those who have previously done work in matlab.

The matplotlib library is a nice set of tools for looking at our aerial images.

The methods loaded from "classes" are little helper functions I wrote to make the demo script you see more compact and focused on Machine Learning rather than munging data and logs and visualizations.

The preamble also sets some useful variables that help keep our log data separate from the other model forms.

In [4]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from classes.Visualizations import *
from classes.Data import  loadResNIST
from classes.Specs import specs
import logging

In [5]:
DATADIR='./resNIST/'
LOGDIR = './cnn_classifier_logs/'
PROJECTORDIR=LOGDIR+'projector/'
scopes=['NetLayer/LogitsLayer/Logits:0','NetLayer/MetricsLayer/labels:0']
summary_writer = tf.summary.FileWriter(LOGDIR)
image_size=48


### Load and Transform Data
In this code block we are loading our data into four blocks:
1. **train_data** : the input training data **X**, representing a set of samples of aerial images, each 48x48 pixels. 
2. **train_labels** : the label **y(X)**, belonging to one of 11 classes, **c in C={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}**.  These class labels are integers, but they represent the proportion of the pixel vicinity that is "covered" by photoresist after development. A 0 denotes "not covered." A 1 denotes "fully or 100% covered."  Each increase in the index of the label correspondes to in increase in resist coverage of 10%.
3. **eval_data** : these samples **X** are held out from training so that we may evaluate the variance and detect potential overfitting.
4. **eval_labels** : these labels are sued in conjunction with **eval_data** to help detect overfitting.


In [6]:
train_data, train_labels , eval_data, eval_labels    = loadResNIST(DATADIR)


### Define the Estimator

There is not a canned CNN estimator so we will write a custom estimator "from scratch."  however, we will use the Keras layers API (which is incorporated into newer versions of TensorFlow) and this makes it very easy!

I add a little helper function as well which enables us to quickly build a sequence of CNN layers, including convolutions, activations, pooling/resizing, and dropouts, with the simple command "conv_resize_dropout_layer."

We provide a few variables to defien the learning rate and l2 regularization strength, as well as dropout rate.

Then we define the input layer and connect it to the tensorflow variable named "x" which we will feed with our training and validation data.

In this example we will use a 5-layer CNN.  At the layer closets to the input, the kernels will be rather large, 11x11, and there will be 8 of them. The result will be a list of 8 feature maps that represnt the activations of the first layer, which will be presnted to layer2.  but furst we will downsample the images to 19x19 (this is a kind of "pooling").

The story is similar for layer 2, except that will use smaller kernels, 3x3.  Also we will resize the feature maps into 8 small 8x8 maps. 

Layer 3 uses 3x3 kernels again.  The output feature maps are smaller still. Keep in mind that each time we do a convolution, we bring information in to the center point from surrounding points. This means that if we keep doing convolutions on the same original large images, the values on the outer edges will no longer be valid (since we will be convolving with unknown values outside of the edges of the provided image.  So we shrink the image size each layer due not only to pooling, but also to convolution. 

The final layer is different from all of the others and resembles the LinearClassifier we started our lab exercises with: for each output class, a linear filter will be learned that maps from the output activations of layer 4 to the predicted unscalled relative class probability which we will call logits.

Since we are using a custom estimator, we must not only define the model form but also specify how the model interacts with procedures that ask it for predictions. We also must define the model training procedure, which again uses softmax cross entropy loss and the gradient descent optimizer.  We also must define how the estimator should be validated, but we use the built-in accuracy metric to make this very easy.

In [7]:
def cnn_model_fn(features, labels, mode):
    """Model function for CNN."""
    from classes.CNNUtils import conv_resize_dropout_layer, log_images
    dropout_rate=0.10 #1
    l2_scale=0.0001 #.001
    learning_rate=0.001
    with tf.variable_scope('NetLayer'):
    # Input Layer
        input_layer = tf.cast(tf.reshape(features["x"], [-1, image_size, image_size, 1],name="x0"), tf.float32)
        log_images('input_image',input_layer)
    # Convolutional Neural Net
        conv = conv_resize_dropout_layer(input_layer, filters=8,  kernel_size=[11,11], 
                                          mode=mode,   resize=[19,19], l2_scale=l2_scale,
                                          rate=dropout_rate ,name='conv1')
        conv = conv_resize_dropout_layer(conv,       filters=8, kernel_size=[3,3],
                                          mode=mode,   resize=[3,3],   l2_scale=l2_scale, 
                                          rate=dropout_rate, name='conv3') 
        conv = conv_resize_dropout_layer(conv,       filters=16, kernel_size=[3,3],  
                                          mode=mode,                   l2_scale=l2_scale,
                                          rate=dropout_rate, name='conv4') 
        conv = conv_resize_dropout_layer(conv,       filters=11, kernel_size=1, 
                                          mode=mode,                   
                                          ) 
    # Logits Layer
        with tf.variable_scope('LogitsLayer'):
            logits=tf.reshape(conv, [-1,11],name='Logits')
            tf.summary.histogram('Logits', logits)
            tf.logging.info('Logits Layer build successful..')

    # Generate predictions (for PREDICT and EVAL mode)
        predictions = {
        "classes": tf.argmax(input=logits, axis=1),
        "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
        }
        if mode == tf.estimator.ModeKeys.PREDICT:
            return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
        
    # Calculate Loss (for both TRAIN and EVAL modes)
        l2_loss=tf.losses.get_regularization_loss()


        print(vars)
        with tf.variable_scope('MetricsLayer'):
            labels = tf.identity(labels, name='labels')            
        cross_entropy_loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
        loss=tf.add(cross_entropy_loss,l2_loss)

        tf.summary.scalar("cross_entropy_loss",cross_entropy_loss)
        tf.summary.scalar("l2_loss", l2_loss)
        tf.summary.scalar("loss", loss)
    # Configure the Training Op (for TRAIN mode)
        if mode == tf.estimator.ModeKeys.TRAIN:
            optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
            train_op = optimizer.minimize(
                loss=loss,
                global_step=tf.train.get_global_step())
            return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

    # Add evaluation metrics (for EVAL mode)
        evalhook = SessionHook(PROJECTORDIR, scopes)
        eval_metric_ops = {
            "accuracy": tf.metrics.accuracy(
            labels=labels, predictions=predictions["classes"])}
        return tf.estimator.EstimatorSpec(
            mode=mode, loss=loss, eval_metric_ops=eval_metric_ops, evaluation_hooks=[evalhook])


The code then defines two input functions, one for training (**train_input_fn**) and one for evaluation (**eval_input_fn**), according to the "numpy_input_fn" spec which helps facilitate feeding tensorflow batches of samples.  We indicate that the training input function will be fed from the **train_data** and **train_labels** variables, and likewise the evaluation input function will be fed from the **eval_data** and **eval_labels** variables.

For training we specify a mini-batch size, which determines how how many samples are averaged together in determining an update direction for adjusting the weights.  

During training we shuffle the dataset before breaking it into mini-batches, to prevent correlations from data preparation from skewing results or avoid reliance on lucky fits.  However, to ensure consistency when evaluating the data during training we do not shuffle during evaluation.

Finally, we take the 48x48 image for each input sample and break it into a long 2304 row, with each pixel belonging to its own "feature column" for every image.  This feature_column will be the front end of our TensorFlow model.



In [None]:
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": train_data},
    y=train_labels,
    batch_size=32,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": eval_data},
    y=eval_labels,
    num_epochs=1,
    shuffle=False)


We use the RunConfig facility of the tf.estimator to hspecify how frequently we want to checkpoint the model (save intermediate results.). This also influences how frequently we will perform validation.

Having defined the model function, including its form, prediction mode, training mode, and evaluation mode, we now call the tf.estimator.Estimator to actually build our classifier in TensorFlow, and then call it "cnn_classifier."

In [None]:
config=tf.estimator.RunConfig(save_checkpoints_steps=1000)
cnn_classifier = tf.estimator.Estimator(
    model_fn=cnn_model_fn, 
    config=config,
    model_dir=LOGDIR)

INFO:tensorflow:Using config: {'_model_dir': './cnn_classifier_logs/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x110a37470>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


We plan on using the "train_and_eval" method provided for the tf.estimator class, because it automates a periodic evaluation of the model during training, generating occasional checkpoints and then loading those checkpoints in to assess the model performance on the evaluation data.  In order to do this we need to activate the "logger" that logs data, and we need to define the **train_spec** and **eval_spec** which specify some of the details of the process, including directories for logged data, duration of the training process and frequency of logged data.

The function "specs" is a little helper function I wrote for added compactness of this lab exercise.

In [None]:
logging.getLogger().setLevel(logging.INFO)
train_spec, eval_spec = specs(train_input_fn, eval_input_fn, 
                              logdir=LOGDIR, projectordir=PROJECTORDIR, 
                              max_train_steps=10000, eval_steps = 100, 
                              scopes = scopes, name = 'cnn')

In this elegant line of code we ask tensorflow to begin the training process, with periodic evaluation, using the cnn_classifier model and the training and eval specs we previously defined. Nice and compact!

In [None]:
tf.estimator.train_and_evaluate(cnn_classifier, train_spec, eval_spec)

INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 1000 or save_checkpoints_secs None.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Logits Layer build successful..
<built-in function vars>
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into ./cnn_classifier_logs/model.ckpt.
INFO:tensorflow:l

This code prepares the data and metadata for plotting in TensorBoard using the Principal Components Analysis (PCA) and t-SNE projection methods for visualizing in high-dimensions.  The prepare_projector and prepare_sprites functions are little tidy script I wrote to simplify the lab.

In [None]:
prepare_projector(PROJECTORDIR, scopes)
prepare_sprites(PROJECTORDIR, eval_data)

This is how we can evaluate the accuracy of the model independent of the training data.

In [None]:
eval_results = cnn_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

We can see a specific example by picking an index and using the "matplotlib" library to make a nice picture.

In [None]:
plt.imshow(eval_data[11])