The easiest way to get your hands dirty and do some practical work is using Keras. Keras - I still remember - used to be the _de facto_ library for DL back in 2016-18. Gradually, with retirement of Theano and CNTK, it reduced to a mere TensorFlow wrapper. Also, the rise of PyTorch and JAX meant Keras fell out of the favour.

Recently, Keras developers realized the need of the hour and introduced [Keras Core](https://keras.io/keras_core/), which supports both PyTorch and JAX. Its expected to be released as Keras 3.0 soon. Let's get started.

We will use JAX here as a backend, so specifying it first.

In [1]:
import os

# Set backend env to JAX
os.environ["KERAS_BACKEND"] = "jax"

> **Note:** Specifying the backend after importing Keras will not work.

In [2]:
import numpy as np
import keras_core as kr

Using JAX backend.




Now we will take one of the most commonly used/basic dataset: MNIST. MNIST is a dataset comprising of a number of scanned hand-written digits (from 0-9). This dataset is already available in Keras datasets.

Its already divided among training and testing subsets.

In [3]:
(xTrain, yTrain), (xTest, yTest) = kr.datasets.mnist.load_data()

In MNIST, each image is of dimension $28 \times 28$.

In [4]:
C = 10
imageShape = (28, 28, 1)

Usually, for the multiclass classification problem (like ours), we don't use direct labels of 0,1,2,... but instead use **One-hot encoding** which converts each label into a vector of length $c$ (where $c$ is number of classes we have). This vector has all the entries zero except the $j^{th}$ entry, where $j$ is its respective class (like 7 for the digit 7).

It can be achieved in Keras using **`to_categorical()`**. Easy peasy!

In [5]:
from keras_core.utils import to_categorical

yTrain = to_categorical(yTrain, C)
yTest = to_categorical(yTest, C)

We will specify the hyperparameters here.

In [6]:
batchSize = 128
numberEpochs = 3

Next step is to define the neural network (CNN in this case)'s model. Here we can see the beauty/ease of use of Keras as making layers of a neural network in Keras is as smooth as silk. For that, we can import `layers` from Keras.

In [7]:
from keras_core import layers

## Sequential API

Keras has two APIs: One of them is sequential. Sequential is pretty straightforward and allows us to make a neural network by simply stacking the layers on the top of each other with respective parameters. The output of each layer becomes the input of the succeeding one.

It has some pretty basic functions, for example:

### Input layer

The `Input()` as its name depicts is used to define the input layer. It takes the input (be it an image or any type of data)'s dimensions as an input.

**Caution:** Don't pass the input image etc itself as an input here. That's a later on job at the time of optimization. Right now, we are just defining the model's architecture.

In [8]:
inputLayer = layers.Input(shape=imageShape)

### Convolution layer

**`Conv2D()`** is a quite import function used to define the convolutional layer. Its arguments are:

- **Number of filters:** In order to ensure we don't overfit (or underfit in some cases) to a single filter, we can define a number of filters. Each filter has the same size, but is applied (and learns) independently to each other.
- **`kernel_size`:** Usually, we define it as an odd number (you are free to define any size of filter as you would like to) like $3 \times 3$, $5 \times 5$, etc. Here, MNIST images are already pretty small, so $3 \times 3$ will work.
- **`activation`:** The activation function to use. Usually, we use ReLU for the intermediate layers. Please feel free to try others too.

In [9]:
convLayer = layers.Conv2D(32, kernel_size=(3, 3), activation="relu")

### Pooling layer

Since convolution involves backpropagation and a number of parameters, so we can simply use a pooling filter to reduce the dimensions of an image. It simply takes the max or average (we have support for both) of the pixels under the filter. Its only argument is **`pool_size`**.

In [10]:
poolLayer = layers.MaxPooling2D(pool_size=(2, 2))

That completes our first layer. Usually, a CNN always comprises of a convolution layer, followed by a pooling layer. So let's rename the above layers to add the 1 as index:

In [11]:
convLayer1 = convLayer
poolLayer1 = poolLayer

And we can add another layer too.

In [12]:
convLayer2 = layers.Conv2D(64, kernel_size=(3, 3), activation="relu")
poolLayer2 = layers.AveragePooling2D(pool_size=(2,2))

Finally, we can simply flatten the output of this last layer, so we can apply a normal neural network layer on it in the end. We can simply do it using `Flatten()`.

In [13]:
flatten = layers.Flatten()

### Dropout

There are a number of phenomenons to avoid overfitting and one of them is Dropout. The intuition is pretty straightforward: Make sure that your model doesn't rely on some specific neurons too much and hence don't use some f them (picked randomly) in each iteration.

For Dropout, we simply call the function mentioning the ratio of neurons to be dropped out.

In [14]:
dropOut = layers.Dropout(0.35)

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)


### Dense

For normal feedforward neural networks, we use **`Dense()`** layer. It takes the number of classes, followed by the activation function. 

**Suggestion:** Use Softmax for multiclass and Sigmoid/Tanh for the binary classification problems.

In [15]:
outputLayer = layers.Dense(C, activation="softmax")

In [16]:
cnnModel = kr.Sequential(
    [
        inputLayer,
        convLayer1,
        poolLayer1,
        convLayer2,
        poolLayer2,
        flatten,
        dropOut,
        outputLayer,
    ]
)

It would be quite useful to check how model is shrinking the image and the number of parameters. For that, we can simple call `<model name>.summary()`

In [17]:
cnnModel.summary()

Having specified both hyperparameters and parameters, we can optimize/train the model. But wait a min, we need to specify the loss function, our optimizer and the final metric for evaluation as well. **`compile()`** is there to serve the purpose.

In [18]:
cnnModel.compile(
    loss="mean_squared_error", 
    optimizer="sgd", 
    metrics=["accuracy"]
)

Please feel free to play around with the model's definition above to use different loss functions or optimizers like Adam, etc. We can even try different evaluation metrices too, though I would recommend to go there a bit later.

Since a convolution operator assumes an input to have height, width and number of channels, we will have to convert our inputs accordingly. We don't have any channel for these black & white images, but can simply add one using NumPy's [expand_dims()](https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html)

In [19]:
xTrain = np.expand_dims(xTrain, -1)
xTest = np.expand_dims(xTest, -1)

Finally, we can train it. Since the basic objective of each ML model is to not only optimize it well (low training error) but also generalize well (low test error). Often we don't have enough testing data, so we can approximate the test accuracy/error by **validation**. It is performed by dividing the traininig subset further into training and validation.

Keras (and other ML/DL libraries) allow us the luxury of validating the model during the training. In Keras, all we have to do is to specify the `validation_split`. 0.1 means 10% and so on. 

In [20]:
cnnModel.fit(
    xTrain, yTrain, batch_size=batchSize, epochs=numberEpochs, validation_split=0.1
)

Epoch 1/3
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 80ms/step - accuracy: 0.2621 - loss: 0.1411 - val_accuracy: 0.5797 - val_loss: 0.0768
Epoch 2/3
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 83ms/step - accuracy: 0.5383 - loss: 0.0852 - val_accuracy: 0.7388 - val_loss: 0.0469
Epoch 3/3
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 80ms/step - accuracy: 0.6664 - loss: 0.0591 - val_accuracy: 0.8155 - val_loss: 0.0294


<keras_core.src.callbacks.history.History at 0x7fae706ad190>

In [22]:
score = cnnModel.evaluate(xTest, yTest, verbose=1)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.7960 - loss: 0.0327
Test loss: 0.031506579369306564
Test accuracy: 0.8059999942779541


That's it from our side. But there's much more to do here. Please use it as a launching pad to explore more avenues. Curiosity doesn't have (**shouldn't have**) any limits. Carpe diem!