# Neural networks

In the primitive neuron, the weighted sum of the input features is taken and is passed through a function (for eg. in Fig.1, the function used is a step function) to decide whether the neuron should fire or not. 

<figure>
    <img src="../assets/primitive_neuron.png" alt="Primitive neuron" style="width:100%">
    <figcaption align="center"> Fig.1: Primitive neuron used in the olden days </figcaption>
</figure>

Output of a neuron, $ y = f(\overline{W}.\overline{X}) $

* More than one hidder layer -> deep neural network.

**Why the neural networks took so long to come to fruition?**
Large datasets are required. The optimization problem is not convex and hence is computationally heavy. 


In [1]:
import tensorflow as tf

We use the *cross-entropy* as the cost function. 
1 Feedforward + 1 backpropagation = 1 epoch

In [4]:
import tensorflow_datasets as tfds

In [5]:
(ds_train, ds_test), ds_info = tfds.load(
                                    'mnist',
                                    split=['train', 'test'],
                                    shuffle_files=True,
                                    as_supervised=True,
                                    with_info=True,
                                )

In [7]:
n_nodes_hl1 = 500
n_nodes_hl2 = 500
n_nodes_hl3 = 500

n_classes = 10
batch_size = 128

In [8]:
def normalize_img(image, label):
    # Normalizes images: uint8 -> float32
    return tf.cast(image, tf.float32) / 255., label

In [9]:
ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

In [10]:
ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

* Categorical entropy -> For one-hot encoded labels
* Sparse categorical entroy -> For numerical labels
* `from_logits=True` is used when we don't use a softmax activation function in the output layer. 

In [12]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x25ca4a6f7f0>