# 500px Machine Learning Engineer Intern - Tech Challenge
## Fooling an MNIST Classifier with Adversarial Images, using TensorFlow

## Description

Create adversarial images to fool a MNIST classifier in TensorFlow.

Deep convolutional neural networks (CNN) are state of the art models for image classification and object detection. Such models play crucial role at 500px where we use them for many applications like automatic keywording, people detection and image search. It’s important to understand how they work and what their limitations are.
One known “limitation” of CNN is that they can be fooled to misclassify an image with high confidence by slightly
perturbing the pixels. 

This is illustrated on the image below:

![](https://lh4.googleusercontent.com/Bz7CFzzMBRkKJ4xGqMTpufuL35Lf69z3DEoDAV-ZzD1OC9lMHYL4co0ED-LF2URMowvbDdqkRg6oxZHWeIspOVDkeaB0rqAfNpRHXfrhxS45U2cqsuX52J2GZwlFOB0TSc_rYxu7)

The delta between the original image and the adversarial one is so small that it is impossible for humans to detect. The fun fact is other machine learning models like SVM and logistic regression can be tricked in the similar manner.

Note that the “fast gradient sign” method presented in the [original paper by Goodfellow](https://arxiv.org/abs/1412.6572) produces adversarial images for a random target class. In this challenge we would like to generate adversarial images to misclassify any examples of ‘2’ as ‘6’ specifically. This puts certain implications on the final solution.

One of the useful application for adversarial images is that if you train your deep CNN classifier on them you can improve its accuracy on non-adversarial examples.

In this challenge you are given an opportunity to learn how to generate adversarial examples and also gain practical experience using Tensorflow.

### Getting Started

Let's start by importing dependencies, and loading up the MNIST dataset. Note that much of the code in this section, and sections to come, is repurposed from [this tutorial](https://www.tensorflow.org/versions/r0.11/tutorials/mnist/pros/#deep-mnist-for-experts)


In [22]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### Vanilla Neural Network 

Now that we have the data, let's get our feet wet with TensorFlow by implementing a basic fully connected neural network with one hidden layer, and see how it performs on the data. Note that aside from using the TensorFlow code mentioned earlier, much of this code is repurposed from an earlier project I've completed, which can be found [here](https://github.com/MunirAD/Facial_Recognition_AlexNet/blob/master/part2.py) (in the `fully_connected` function code).



In [60]:
def FullyConnectedNet(num_in, num_hid, num_out, lam, learn_rate, num_epochs, batch_size):
    
    # Set up a placeholder for the input 'x' and its label 'y_' 
    # Note the label is in one-of-k (one-hot) encoding
    x = tf.placeholder(tf.float32, [None, num_in])
    y_ = tf.placeholder(tf.float32, [None, num_out])

    # Set up variables for the network parameters. Note this is a single-hidden
    # layer network
    W0 = tf.Variable(tf.random_normal([num_in, num_hid], stddev=0.01))
    b0 = tf.Variable(tf.random_normal([num_hid], stddev=0.01))
    W1 = tf.Variable(tf.random_normal([num_hid, num_out], stddev=0.01))
    b1 = tf.Variable(tf.random_normal([num_out], stddev=0.01))
    
    # Initialize the variables
    sess.run(tf.initialize_all_variables())
    
    # Set up the computation of a forward pass on an input to the net
    layer1 = tf.nn.relu(tf.matmul(x, W0) + b0)
    layer2 = tf.matmul(layer1, W1) + b1
    y = tf.nn.softmax(layer2)
    
    # Set up a decay penalty to regularize, reducing the risk of over-fitting
    decay_penalty = lam*tf.reduce_sum(tf.square(W0)) + lam*tf.reduce_sum(tf.square(W1))
    NLL = -tf.reduce_sum(y_*tf.log(y)) + decay_penalty
    
    # Set up the Gradient Descent optimization step on the objective function 
    # with the given learning rate
    train_step = tf.train.GradientDescentOptimizer(learn_rate).minimize(NLL)
    
    # Set up the logic for what a correct prediction is, and 
    # classification accuracy
    correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Do mini batch gradient descent
    for i in range(num_epochs):
        batch = mnist.train.next_batch(batch_size)
        train_step.run(feed_dict = {x: batch[0], y_ : batch[1]})
    
    # Print the accuracy on the validation data
    print(accuracy.eval(feed_dict={x: mnist.validation.images, y_: mnist.validation.labels}))

FullyConnectedNet(784, 300, 10, 0.01, 0.0005, 1000, 100)

0.9154
