# Animal classification: a feed-forward neural network

This is one of them problems that only occurs to the guy in study material. You know, the guy who had a wheel barrow full of melons back in math? He's at it again and now has a zoo of animals for which he'd like too make a classifier. He decides to go for a feed-forward neural network, as he has a lot of labeled examples already. Let's get started!

First, import your libraries

In [10]:
import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split, KFold

tf.set_random_seed(1)
np.random.seed(1)

Now generate your zoo

In [11]:
def generate_animals(nb, hairy, nb_legs, weight_mean, weight_sd, label):
    weight_vec = np.random.normal(weight_mean, weight_sd, size=(nb,1))
    categorical_properties = np.tile(np.array([label, hairy, nb_legs]), (nb, 1))
    features = np.concatenate([categorical_properties, weight_vec], axis=1)
    return features


ducks = generate_animals(50, 0, 2, 1, 0.43, 1)
cats = generate_animals(50, 1, 4, 4, 0.45, 2)
dogs = generate_animals(50, 1, 4, 6, 6, 3)
bats = generate_animals(50, 1, 2, 1, 0.5, 4)
nb_classes = 4

animals = np.concatenate([ducks, cats, dogs, bats], axis=0)
training, test = train_test_split(animals, test_size=0.20)

test_features = test[:, 1:]
test_labels_int = test[:, 0].astype(int)


...500 of each...how does he keep getting in these situations?! Anyway, we have a nice test and training dataset now. Let's see what we can do with this.

TensorFlow needs to know what size of input to expect, so define some so called 'placeholders' for that. For convenience we made sure that input is always going to be the same size (1/5th of the full dataset), so that goes into our placeholder too.

In [12]:
nb_examples = int(animals.shape[0] * 0.2)
x = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.placeholder(tf.float32, shape=[nb_examples, nb_classes])
xl = tf.split(x, nb_examples, axis=0)

Time for the model weights! Initialize them at random, biases can be initialized as 0s.

In [13]:
hidden_layer_size=32

A1 = tf.Variable(tf.random_normal((3, hidden_layer_size), stddev=0.1))
b1 = tf.Variable(tf.zeros((1, hidden_layer_size),  dtype=tf.float32))

A2 = tf.Variable(tf.random_normal((hidden_layer_size, hidden_layer_size), stddev=0.1))
b2 = tf.Variable(tf.zeros((1, hidden_layer_size), dtype=tf.float32))

A3 = tf.Variable(tf.random_normal((hidden_layer_size, nb_classes), stddev=0.1))
b3 = tf.Variable(tf.zeros((1, nb_classes), dtype=tf.float32))


Construct the graph by linking placeholder, weight matrices and weights to eachother. We're going to provde the network with a batch of examples at once (see here why: http://ruder.io/optimizing-gradient-descent/), so the list comprehensions iterate through those.

In [14]:
h1 = [tf.nn.sigmoid(tf.add(tf.matmul(xc, A1), b1)) for xc in xl]
h2 = [tf.nn.sigmoid(tf.add(tf.matmul(h1c,A2), b2)) for h1c in h1]
y_hat = [tf.add(tf.matmul(h2c, A3), b3) for h2c in h2]

y_hat = tf.squeeze(tf.stack(y_hat, axis=0))


Almost there! Just need to define what you want to optimize (cross entropy is a good choice, see here: http://neuralnetworksanddeeplearning.com/chap3.html) and what algorithm you want to use to optimize it.

In [15]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_hat))
optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(cost)

We told Tensorflow what to do, now it's going to all of it outside Python (in C), in a "session" since it's outside Python it's hard to peek inside while it's running if you haven't configured it to do so. We'll be sure to print some result every now and then to see if we're on the right track.

In [16]:
with tf.Session() as session:
    session.run(tf.global_variables_initializer())

    for epoch in range(1001):
        for _, idx in KFold(n_splits=4, shuffle=True).split(training):
            features = training[idx][:,1:]
            labels = training[idx][:,0]
            labels = np.eye(nb_classes + 1)[labels.astype(int)].astype(float)[:,1:]
            yh, _ = session.run([y_hat, optimizer], feed_dict={
                x: features,
                y: labels
            })
        if not epoch % 100:
            pred = session.run(y_hat, feed_dict={
                x: test_features
            })

            # print(test_y_hat)
            pred_int = np.argmax(pred, axis=1) + 1
            test_acc = np.mean(pred_int.astype(int) == test_labels_int)
            print('epoch: {epoch}, test accuracy: {test_acc}'.format(test_acc=test_acc, epoch=epoch))


epoch: 0, test accuracy: 0.3
epoch: 100, test accuracy: 0.425
epoch: 200, test accuracy: 0.7
epoch: 300, test accuracy: 0.675
epoch: 400, test accuracy: 0.675
epoch: 500, test accuracy: 0.675
epoch: 600, test accuracy: 0.975
epoch: 700, test accuracy: 0.675
epoch: 800, test accuracy: 0.9
epoch: 900, test accuracy: 0.975
epoch: 1000, test accuracy: 0.975


Pretty good score, weird text book guy triumphs again! Play with your newly constructed network a little; you can implement one of many gradient descent flavors included in TensorFlow to train your network for example, or plot some metrics. Or make up another animal. See you next time text book guy!