# Basics of Keras
[Keras homepage](https://keras.io/): Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.


You may also consider installing the following optional dependencies:

    cuDNN (recommended if you plan on running Keras on GPU).
    HDF5 and h5py (required if you plan on saving Keras models to disk).
    graphviz and pydot (used by visualization utilities to plot model graphs).



**Installation**
- for python2.x: pip install keras
- for python3.x: pip3 install keras

In [1]:
#!pip install keras

In [2]:
import keras #importing keras library

### Building a network

In [3]:
import numpy as np
from numpy.random import seed
seed(1) # for reproducibility
from keras.models import Sequential
from keras.layers import Dense, Activation # import linear layer (Dense) and activation

#### There are two ways of building a network in keras:
- Sequential: It allows you to build your network by adding layers one-after-other in a sequence. One drawback of this method is that you can't build networks that share layers.
- Functional API: Here you build a network like a graph. Hence more complex networks can be built.

### Sequential
Read more: https://keras.io/models/sequential/

In [18]:
#Simple 1 layer network 
model = Sequential()
model.add(Dense(10, input_shape=(32,)))
model.add(Activation('softmax'))

In [19]:
model.summary() # prints the summary of the network
# Notice in Output Shape "None" is batch dims, 10 is feature dims

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_6 (Dense)             (None, 10)                330       
                                                                 
 activation_6 (Activation)   (None, 10)                0         
                                                                 
Total params: 330
Trainable params: 330
Non-trainable params: 0
_________________________________________________________________


### Functional API
Read more: https://keras.io/getting-started/functional-api-guide/

Building the same network using functional API.

In [20]:
from keras.layers import Input
from keras.models import Model
inp = Input(shape=(32,))
l1 = Dense(10)(inp) # See how the dense layer is pointing to inp
act1 = Activation('softmax')(l1)

model = Model(inputs=inp,outputs=act1)

model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 32)]              0         
                                                                 
 dense_7 (Dense)             (None, 10)                330       
                                                                 
 activation_7 (Activation)   (None, 10)                0         
                                                                 
Total params: 330
Trainable params: 330
Non-trainable params: 0
_________________________________________________________________


In [22]:
# Building a 2 layer NN for binary classification:

#Simple 1 layer network 
model = Sequential()
model.add(Dense(16, input_shape=(32,)))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))
model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_10 (Dense)            (None, 16)                528       
                                                                 
 activation_10 (Activation)  (None, 16)                0         
                                                                 
 dense_11 (Dense)            (None, 2)                 34        
                                                                 
 activation_11 (Activation)  (None, 2)                 0         
                                                                 
Total params: 562
Trainable params: 562
Non-trainable params: 0
_________________________________________________________________


#### Training

- For supervised learning, we need (x,y) pairs to train our model. Where x is the input data and y is the corresponding ground truth.

Lets sample x and y from a random distribution.

In [23]:
# Create a set of random input vectors.
# Both the input feature dimension and the input shape of the network should be consistent. Else you will get an error.
x_train = np.random.rand(1000,32)
y_train = np.random.binomial(1, 0.5, 1000) #Sampling from binomial distribution


# Lets check our input and outputs

print("x_train[:5]",x_train[:5])
print("y_train[:5]",y_train[:5])

x_train[:5] [[0.18961835 0.22782933 0.31436441 0.60492097 0.42118645 0.20154231
  0.1464898  0.91184627 0.44857941 0.79833144 0.37673675 0.85371272
  0.11162496 0.35985571 0.50806736 0.54875875 0.65577184 0.12445216
  0.65648235 0.00567807 0.2437447  0.20178268 0.06044281 0.39774929
  0.07738228 0.33011013 0.14853353 0.07479315 0.41522254 0.36208229
  0.11623708 0.94563485]
 [0.61799009 0.38103615 0.75890637 0.90310008 0.6331781  0.81894647
  0.35891329 0.98387846 0.96395515 0.79767278 0.6192263  0.88742212
  0.52880643 0.60072143 0.99474907 0.82542508 0.65820615 0.40970969
  0.78961851 0.92498159 0.03949512 0.15158366 0.27075206 0.66877402
  0.992547   0.54597122 0.74273553 0.84213526 0.31738051 0.08300837
  0.95976396 0.5092413 ]
 [0.26097044 0.74079204 0.08321352 0.11562825 0.08276288 0.84831507
  0.64554609 0.27027708 0.02614707 0.29105671 0.08533006 0.06484995
  0.53946561 0.18399332 0.25517132 0.48928098 0.37180363 0.6808723
  0.6032164  0.76570975 0.35413601 0.01767638 0.9393391

Similarly we will create our validation and test set

In [24]:
# Validation Set
x_val = np.random.rand(250,32)
y_val = np.random.binomial(1, 0.5, 250)

# Test Set
x_test = np.random.rand(250,32)
y_test = np.random.binomial(1, 0.5, 250)

Now we will set other hyperparameter and compile the model.

In [25]:
nb_batch = 32 # batch_size
nb_epoch = 100 # no. of epochs
# Compile model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
# Check keras documentation for other optimizers
# Since the task here is classification, categorical_crossentropy loss will be used.


We can add some callbacks to the model. This will allow us to eg. save the training/validation accuracy and loss at each epoch ( allows you to detect overfitting!), save the model, implement early stopping, ...

In [26]:
checkpoint = keras.callbacks.ModelCheckpoint('model.h5', monitor='val_acc', verbose=1, save_best_only=True, mode='max')
csvLogger = keras.callbacks.CSVLogger("training_log.csv")
earlyStopping = keras.callbacks.EarlyStopping(patience=20, restore_best_weights=True)

callbacks_list = [checkpoint, csvLogger,earlyStopping]


In [17]:
model.fit(x=x_train, y=y_train, batch_size=nb_batch, epochs=nb_epoch,callbacks=callbacks_list, verbose=1, validation_data=(x_val,y_val), shuffle=True)
# Keep shuffle True while training. Why?

Epoch 1/100


2023-03-21 08:36:42.581388: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


ValueError: in user code:

    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function  *
        return step_function(self, iterator)
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step  **
        outputs = model.train_step(data)
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/engine/training.py", line 1024, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/engine/training.py", line 1082, in compute_loss
        return self.compiled_loss(
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/engine/compile_utils.py", line 265, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/losses.py", line 152, in __call__
        losses = call_fn(y_true, y_pred)
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/losses.py", line 284, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/losses.py", line 2004, in categorical_crossentropy
        return backend.categorical_crossentropy(
    File "/Users/rudi/.pyenv/versions/3.10.8/lib/python3.10/site-packages/keras/backend.py", line 5532, in categorical_crossentropy
        target.shape.assert_is_compatible_with(output.shape)

    ValueError: Shapes (None, 1) and (None, 2) are incompatible


#### Questions:
- Why the training loss is decreasing? why validation loss is increasing?
- Why the training accuracy is increasing? why the validation accuracy is almost constant?


In [None]:
# Testing

test_loss, test_accuracy = model.evaluate(x=x_test,y=y_test,batch_size=8)
print("\n")
print("test_loss:",test_loss,"    test_accuracy:", test_accuracy)

## Convolutional Neural Network


A convolutional neural network (or CNN) is a type of neural network comprises of typically following building blocks:
- Convolutional Layer: These are a set of kernels/filters that convolve with a signal (1D: audio,EEG, etc; 2D: Images; 3D: Videos) to find particular patterns in it based on the kernel type. The kernels or filters are learnable through gradient descent.
- Non-linearity: Relu, Sigmoid, tanh, etc.
- Pooling layer: Downsamples the input signal, which also reduced the necessity to have a larger convolutional layer at the output. It also introduces small translation invariance to the input signal.
- Fully connected layer/Linear layer: They are mainly used to model the actual decision process. Example: classifier.

Hence, in contrast to classical methods where features are handcrafted and then we train a classifier on those features. CNN does both learning features and classification.

![Basic CNN block](cnn_architecture.svg)
Image source: https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks