# Introduction to Artificial Neural Networks

Perceptron: A perceptron is simply composed of a single layer of LTUs(linear thresholds unit)

It turns out that some of the limitations of Perceptron can be seliminated by stacking multiple Perceptrons. The resulting ANN is called a *Multi-layer Perceptron*.

input layer, hidden layer, and output layer

When an ANN has two or more hidden layers, it is called a *deep neural network*

### Backpropagation

For each training instance, the backpropagation algorithm first makes a prediction(forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection(reverse pass), and finally slightl tweaks the connection weights to reduce the error.

### Training an MLP with TensorFlow's High-Level API

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import tensorflow as tf

In [3]:
from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [4]:
mnist = fetch_mldata('MNIST original')



In [5]:
mnist.data.shape

(70000, 784)

In [11]:
X_train, X_test, y_train, y_test = mnist.data[:60000], mnist.data[60000:], mnist.target[:60000].astype(int), mnist.target[60000:].astype(int)

In [20]:
# feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)
# dnn_clf = tf.contrib.learn.DNNClassifier(hidden_units=[300, 100], n_classes=10, feature_columns=feature_columns)

In [19]:
# dnn_clf.fit(X_train, y_train, batch_size=50, steps=40000)

In [17]:
# dnn_clf.evaluate(X_train, y_train)

In [18]:
# dnn_clf.evaluate(X_test, y_test)

### Training a DNN Using Plain Tensorflow

In [23]:
import numpy as np
import tensorflow as tf

n_inputs = 28 * 28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X')
y = tf.placeholder(tf.int64, shape=(None), name='y')

def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name='weights')
        b = tf.Variable(tf.zeros([n_neurons]), name='biases')
        z = tf.matmul(X, W) + b
        if activation == 'relu':
            return tf.nn.relu(z)
        else:
            return z

with tf.name_scope('dnn'):
    hidden1 = neuron_layer(X, n_hidden1, 'hidden1', activation='relu')
    hidden2 = neuron_layer(hidden1, n_hidden2, 'hidden_2', activation='relu')
    logits = neuron_layer(hidden2, n_outputs, 'outputs')

with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name='loss')
    
learning_rate = 0.01
with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)
    
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    
init = tf.global_variables_initializer()
saver = tf.train.Saver()

In [None]:
# n_epochs = 400
# batch_size = 50

# with tf.Session() as sess:
#     init.run()
#     for epoch in range(n_epochs):
#         for iteration in range()

### Fine-Tunning Neural Network Hyperparameters

#### Number of Hidden Layers

- For many problems, you can just begin with a single hidden layer and you will get reasonable results
- Deep networks can model complex functions using exponentially fewer neurons than shallow nets
- For more complex problems, you can gradually ramp up the number of hidden layers, until you start overfitting the training set
- Very complex tasks typically require networks with dozens of layers(or even handreds, but not fully connected ones), and they need a huge amount of training data. However, you will rarely have to train such networks from scratch: it is much more common to reuse parts of a pretrained state_of_the art network that performs a similar task, training will be a lot faster and reauire much less data.

#### Number of Neurons per Hidden Layer

- Determined by the type if input and output your task requires
- In general you will get more bang for the buck by increasing the number of layers than the number of neurons per layer
- You can set all the layers to just one same number of neurons, then you got only one parameter here
- Gradually increase the number of neurons per layer until you are overfitting the training data
- A simpler approach is to pick a model with more layers and neurons than you actually need, then use early stopping to prevent overfitting

#### Activation Functions

- In most cases, you can just use relu or one of its variants in hidden layers and softmax in output layers
- For regression tasks, you can use no activation function at the output layer at all