# Applied Machine Learning 2
## Course project          
                                                 Author: Diego Rodriguez
## Convolutional neural network
You tested above different models with the set of high-level features extracted from a pretrained neural network. However, can you get similar results by creating a ConvNet from scratch and using the pixel values from the original images to train the model?
- What accuracy can you achieve?
- Can you get good results? - If not, why?

In [1]:
# Upload data as pixels (arrays)
import imageio
import glob
import numpy as np

# Container and variables definitions
category = ["van","truck","other","motorcycle","car","bike"]
X_train = []
y_train = []
label = 0

for i in category:
    files = glob.glob("/Users/rodriguezmod/Downloads/swissroads/train/" + i + "/*.png")
    label = label + 1
    for myFile in files:
        X_train.append(imageio.imread(myFile))
        y_train.append(label)

X_train = np.array(X_train, dtype='float32')
# Rescale pixel values between -0.5 and 0.5
X_train = (X_train - 128) / 255
y_train = np.array(y_train, dtype='float64')

# Valid data
X_valid = []
y_valid = []
label = 0

for i in category:
    files = glob.glob("/Users/rodriguezmod/Downloads/swissroads/valid/" + i + "/*.png")
    label = label + 1
    for myFile in files:
        X_valid.append(imageio.imread(myFile))
        y_valid.append(label)

X_valid = np.array(X_valid, dtype='float32') 
# Rescale pixel values between -0.5 and 0.5
X_valid = (X_valid - 128) / 255
y_valid = np.array(y_valid, dtype='float64')

# Test data
X_test = []
y_test = []
label = 0

for i in category:
    files = glob.glob("/Users/rodriguezmod/Downloads/swissroads/test/" + i + "/*.png")
    label = label + 1
    for myFile in files:
        X_test.append(imageio.imread(myFile))
        y_test.append(label)

X_test = np.array(X_test, dtype='float32')
# Rescale pixel values between -0.5 and 0.5
X_test = (X_test - 128) / 255
y_test = np.array(y_test, dtype='float64')

print('Train:', X_train.shape, y_train.shape)
print('Valid:', X_valid.shape, y_valid.shape)
print('Test:', X_test.shape, y_test.shape)

Train: (280, 256, 256, 3) (280,)
Valid: (139, 256, 256, 3) (139,)
Test: (50, 256, 256, 3) (50,)


In [2]:
# Batch generator
def get_batches(X, y, batch_size):
    # Shuffle X,y
    shuffled_idx = np.arange(len(y))
    np.random.shuffle(shuffled_idx)
    
    # Enumerate indexes by steps of batch_size
    # i: 0, b, 2b, 3b, 4b, .. where b is the batch size
    for i in range(0, len(y), batch_size):
        # Batch indexes
        batch_idx = shuffled_idx[i:i+batch_size]
        yield X[batch_idx], y[batch_idx]

In [3]:
# Import warnings, there are a lot verbosity due deprecated tensorflow modules
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

# Create new graph
graph = tf.Graph()

# Create graph. Code from course Applied Machine Learning 2 (EPFL Extension School)
with graph.as_default():
    # Placeholders
    X = tf.placeholder(dtype=tf.float32, shape=[None, 256, 256, 3])
    y = tf.placeholder(dtype=tf.int32, shape=[None])
    #print('Input:', y.shape)
    
    # Convolutional layer (64 filters, 5x5, stride: 2)
    conv1 = tf.layers.conv2d(
        X, 64, (5, 5), (2, 2), 'SAME', # "same" padding
        activation=tf.nn.relu, # ReLU
        kernel_initializer=tf.truncated_normal_initializer(stddev=0.01, seed=0),
        name='conv1'
    )
    #print('Convolutional layer:', conv1.shape)
    
    # Maxpool layer (2x2, stride: 2, "same" padding)
    pool1 = tf.layers.max_pooling2d(conv1, (2, 2), (2, 2), 'SAME')
    #print('Maxpool:', pool1.shape)
    
    # Convolutional layer (64 filters, 3x3, stride: 1)
    conv2 = tf.layers.conv2d(
        pool1, 64, (3, 3), (1, 1), 'SAME', # "same" padding
        activation=tf.nn.relu, # ReLU
        kernel_initializer=tf.truncated_normal_initializer(stddev=0.01, seed=0),
        name='conv2'
    )
    #print('Convolutional layer:', conv2.shape)
    
    # Maxpool layer (2x2, stride: 2, "same" padding)
    pool2 = tf.layers.max_pooling2d(conv2, (2, 2), (2, 2), 'SAME')
    #print('Maxpool:', pool2.shape)
    
    # Flatten output
    flat_output = tf.contrib.layers.flatten(pool2)
    #print('Flatten:', flat_output.shape)
    
    # Dropout
    training = tf.placeholder(dtype=tf.bool)
    flat_output = tf.layers.dropout(flat_output, rate=0.5, seed=0, training=training)
    #print('Dropout:', flat_output.shape)
    
    # Fully connected layer
    fc1 = tf.layers.dense(
        flat_output, 256, # 256 hidden units
        activation=tf.nn.relu, # ReLU
        kernel_initializer=tf.variance_scaling_initializer(scale=2, seed=0),
        bias_initializer=tf.zeros_initializer()
    )
    #print('Fully-connected layer:', fc1.shape)
    
    # Output layer
    logits = tf.layers.dense(
        fc1, 7, # One output unit per category
        activation=None, # No activation function
        kernel_initializer=tf.variance_scaling_initializer(scale=1, seed=0),
        bias_initializer=tf.zeros_initializer()
    )
    #print('Output layer:', logits.shape)
    
    # Kernel of the 1st conv. layer
    with tf.variable_scope('conv1', reuse=True):
        conv_kernels = tf.get_variable('kernel')
    
    # Mean cross-entropy
    mean_ce = tf.reduce_mean(
        tf.nn.sparse_softmax_cross_entropy_with_logits(
            labels=y, logits=logits))
    
    # Adam optimizer
    lr = tf.placeholder(dtype=tf.float32)
    gd = tf.train.AdamOptimizer(learning_rate=lr)

    # Minimize cross-entropy
    train_op = gd.minimize(mean_ce)
    
    # Compute predictions and accuracy
    predictions = tf.argmax(logits, axis=1, output_type=tf.int32)
    is_correct = tf.equal(y, predictions)
    accuracy = tf.reduce_mean(tf.cast(is_correct, dtype=tf.float32))



## ConvNet performance

In [4]:
# Validation accuracy
valid_acc_values = []

# Work with session. Code from course Applied Machine Learning 2 (EPFL Extension School)
with tf.Session(graph=graph) as sess:
    # Initialize variables
    sess.run(tf.global_variables_initializer())
    
    # Set seed
    np.random.seed(0)
    
    # Train several epochs
    for epoch in range(100):
        # Accuracy values (train) after each batch
        batch_acc = []
        
        for X_batch, y_batch in get_batches(X_train, y_train, 64):
            # Run training and evaluate accuracy
            _, acc_value = sess.run([train_op, accuracy], feed_dict={
                X: X_batch,
                y: y_batch,
                lr: 0.001, # Learning rate
                training: True
            })
            
            # Save accuracy (current batch)
            batch_acc.append(acc_value)

        # Evaluate validation accuracy
        valid_acc = sess.run(accuracy, feed_dict={
            X: X_valid,
            y: y_valid,
            training: False
        })
        valid_acc_values.append(valid_acc)
        
        # Print progress
        print('Epoch {} - valid: {:.3f} train: {:.3f} (mean)'.format(
            epoch+1, valid_acc, np.mean(batch_acc)
        ))
        
    # Get 1st conv. layer kernels
    kernels = conv_kernels.eval()
    
    # Evaluate test accuracy
    test_acc = sess.run(accuracy, feed_dict={
        X: X_test,
        y: y_test,
            training: False
    })
    print('Test accuracy: {:.1f}%'.format(100*test_acc))

Epoch 1 - valid: 0.230 train: 0.222 (mean)
Epoch 2 - valid: 0.338 train: 0.344 (mean)
Epoch 3 - valid: 0.302 train: 0.434 (mean)
Epoch 4 - valid: 0.417 train: 0.460 (mean)
Epoch 5 - valid: 0.403 train: 0.535 (mean)
Epoch 6 - valid: 0.388 train: 0.637 (mean)
Epoch 7 - valid: 0.381 train: 0.697 (mean)
Epoch 8 - valid: 0.338 train: 0.807 (mean)
Epoch 9 - valid: 0.367 train: 0.785 (mean)
Epoch 10 - valid: 0.360 train: 0.871 (mean)
Epoch 11 - valid: 0.396 train: 0.903 (mean)
Epoch 12 - valid: 0.331 train: 0.953 (mean)
Epoch 13 - valid: 0.338 train: 0.972 (mean)
Epoch 14 - valid: 0.396 train: 0.978 (mean)
Epoch 15 - valid: 0.374 train: 0.978 (mean)
Epoch 16 - valid: 0.374 train: 0.988 (mean)
Epoch 17 - valid: 0.331 train: 0.975 (mean)
Epoch 18 - valid: 0.324 train: 0.994 (mean)
Epoch 19 - valid: 0.317 train: 0.967 (mean)
Epoch 20 - valid: 0.367 train: 0.973 (mean)
Epoch 21 - valid: 0.367 train: 0.948 (mean)
Epoch 22 - valid: 0.345 train: 0.945 (mean)
Epoch 23 - valid: 0.388 train: 0.947 (mea

## Describe results
- What accuracy can you achieve? **52.0%**
- Can you get good results? - If not, why?

*In our particular case, that is, our database with five typical categories (van, truck, motorcycle, car, bike), the use of pre-trained models such as **"MobileNet v2"** helps the forecast results have much more accuracy values than those that use neural networks based on from the raw pixels.*

The bad result observed using only pixels-based model, is somewhat expected. In part it can be explained because all pixels in the image have "equal value" in the sense of the essential characteristics of the image and the background can be confused. We could think that the low accuracy results are due to the fact that the training of our neural network model is done with the database we have (280 images). It is clear that we can expand our database for training using but it is still insufficient.

The use of analysis with feature-based models gives better results for two main reasons, according to my point of view: the extraction of the shapes (lines, curves etc.) that characterize an image and the use of pre-trained models such as **"MobileNet v2"** (Transfer learning) where the number of images with which these models were trained, greatly exceeds our data. *Please, see figure and table in notebook* **09 Results.ipynb** 

For the improvement of a model with raw pixels, I would hope that with the increase of the data available for training and the increase of layers and connections, such a model can improve the results. The obvious data acquisition process would increase and it would take a greater computing power and time invested in the processing.

### Side notes

>*Transitioning image models from pixel-based to feature-based allows us to extract information from images and video at a high level, to detect, classify, and track objects, co-register images, or understand a real-world scene. Using collections of features, we can train computers to recognize objects, with user-specified or automatically determined features.*(1) 

>*Image explanations are useful for two groups of people: model builders and model stakeholders. For data scientists and ML engineers building models, explanations can help verify that our model is picking up on the right signals in an image. In an apparel classification model, for example, if the highlighted pixels show that the model is looking at unique characteristics of a piece of clothing, we can be more confident that it’s behaving correctly for a particular image. However, if the highlighted pixels are instead in the background of the image, the model might not be learning the right features from our training data. In this case, explanations can help us identify and correct for imbalances in our data.*(2)

>*Let’s back up a second and review the basics at a super high level. A convolutional neural network model can be trained to categorize images based upon the stuff in the images, and we measure the performance of this image classifier with metrics like accuracy, precision, recall, and so forth. These models are so standard that you can basically download a pre-trained neural network model (e.g. inception-V4, VGG19, mobilenet, and so on). If the classes (things) you want to recognize are not in the list of stuff recognized by the pre-trained model, then you can usually retrain the pre-trained neural network model to recognize the new stuff, by using most of the weights and connections of the pre-trained model. That’s called transfer learning.
So, basically, we have a bunch of models out there that take in images and barf out labels. But why should I trust these models? Let’s have a look at what information one of these models is using to make predictions. This methodology can also be used to test out many other models including custom models.*(3)

>*Recently, pixel/voxel-based ML (PML) emerged in medical image processing/analysis, which use pixel/voxel values in images directly instead of features calculated from segmented objects as input information; thus, feature calculation or segmentation is not required. Because the PML can avoid errors caused by inaccurate feature calculation and segmentation which often occur for subtle or complex objects, the performance of the PML can potentially be higher for such objects than that of common classifiers (i.e., feature-based MLs). In this paper, PMLs are surveyed to make clear (a) classes of PMLs, (b) similarities and differences within (among) different PMLs and those between PMLs and feature-based MLs, (c) advantages and limitations of PMLs, and (d) their applications in medical imaging.*(4)

## References

(1) https://datamanagement.hms.harvard.edu/event/pixels-features-models-image-processing-computer-vision-and-machine-and-deep-learning

(2) https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-on-image-data

(3) https://towardsdatascience.com/justifying-image-classification-what-pixels-were-used-to-decide-2962e7e7391f

(4) https://www.researchgate.net/publication/223963336_Pixel-Based_Machine_Learning_in_Medical_Imaging