##  Multi Layer Perceptron practice  for satellite images

It requires some codes, we are going to step over it slowly so that you will know how to create your own models in the future.

The steps you are going to cover in this tutorial are as follows:
- Load libraries
- Load Data
- Define Model
- Compile Model
- Fit Model
- Evaluate Model

**Load necessary libraries**

Mount google drive

In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
import os
os.chdir('/content/drive/My Drive/FOSS4G_Kansai/')

In [0]:
# ignore annoying warnings
import warnings
warnings.filterwarnings('ignore')

We use Keras for this practice

In [0]:
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Activation, Dense
model = Sequential()

**Load the prepared data**

In [0]:
x_train,y_train = np.load('./dataset/isprs_vaihingen/train/patches/image.npy'),np.load('./dataset/isprs_vaihingen/train/patches/label.npy')
x_test,y_test = np.load('./dataset/isprs_vaihingen/val/patches/image.npy'),np.load('./dataset/isprs_vaihingen/val/patches/label.npy')

In [0]:
print (x_train.shape)

In [0]:
from keras import backend as K
print (K.image_data_format())

In [0]:
print (int(y_train.max()))

In [0]:
num_classes = 5
channel = x_train.shape[-1]
print (channel)

Reshape the image patches to 1D vector for pixelwise perceptron learning

In [0]:
x_train = x_train.reshape(x_train.shape[0]*x_train.shape[1]*x_train.shape[2],x_train.shape[3])
x_test = x_test.reshape(x_test.shape[0]*x_test.shape[1]*x_test.shape[2],x_test.shape[3])

In [0]:
print (x_train.shape)

In [0]:
y_train = y_train.reshape(y_train.shape[0]*y_train.shape[1]*y_train.shape[2])
y_test = y_test.reshape(y_test.shape[0]*y_test.shape[1]*y_test.shape[2])

In [0]:
print (y_train.shape)

**One-hot encoding** it is convinient to transfer categorical features to numerical  variables 

Plot one sample using matplotlib

In [0]:
y_tra = keras.utils.to_categorical(y_train, num_classes)
y_tes = keras.utils.to_categorical(y_test, num_classes)

In [0]:
print(y_tra.shape)

In [0]:
print(y_tra[0,])

**Define a model**

In [0]:
model = Sequential()
model.add(Dense(16, input_dim=(3), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.summary()

You already aware about Dense layers which is a fully connected layer. Now let us see what is **Activation functions** ?

**Activation functions** are an extremely important feature of the artificial neural networks. They basically decide whether a neuron should be activated or not. Whether the information that the neuron is receiving is relevant for the given information or should it be ignored

**ReLU**

<center> $ A(x) = max(0,x) $ <center>

In [0]:
from IPython.display import Image
Image('fig/ReLU.jpeg', width=500, height=300)

**Sigmoid**

\begin{equation*}
A(x) = \frac{1}{(1-e^{-x})}
\end{equation*}

In [0]:
Image('fig/Sigmoid.png')

**Compile Model**

Once we  define the model,  we  have to compile it. There are a few choices to be mentioned while we train the model 

- Optimizer: specific algorithm to update weights while we train the model commonly used Optimizer  is **Stochastic Gradient Descent (SGD).**
- Loss function: Used to optimizer to navigate the space wights and the optimization is defined as a process of loss minimization (Some common choices of loss functions are **Binary cross-entropy**, **Categorical cross-entropy (Softmax cross entroy)**  and **Mean Squared Error (MSE)**)
- Evaluation of the model (Common choices are **accuracy**, **precision** and **recall**)

In [0]:
Image('fig/backpropog.png', width=850,height=350)

In [0]:
Image('fig/Metrics.png', width=350,height=200)

In [0]:
SGD = keras.optimizers.SGD(lr=0.01)

In [0]:
catergorical = keras.losses.categorical_crossentropy

In [0]:
accuracy = ['accuracy']

In [0]:
model.compile(loss = catergorical, optimizer = SGD,
              metrics = accuracy)

**Fit Model**

Once the model is compiled, it can be then trained with 'fit()' function in Keras, which specifies few paramters such as epochs, batch_size

In [0]:
BATCH_SIZE= 30000
EPOCHS = 1

In [0]:
history = model.fit(x_train, y_tra,
                    batch_size=BATCH_SIZE, epochs=EPOCHS, shuffle=True, validation_data = (x_test, y_tes))

In [0]:
score = model.evaluate(x_test, y_tes)
print  (score[1])

Plot graph showing  accuracy  and loss during the training and evaluation

In [0]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

In [0]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

Evaluate the model performance with test data

**Improving simple net in Keras with more hidden layers (more deep network)**
will add more dense layer to the defined network

In [0]:
model = Sequential()
model.add(Dense(32, input_dim=(3), activation='relu'))
model.add(Dense(32, activation='relu'))
# adding three more layers
model.add(Dense(16, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.summary()

**compile,train and validate the model performance**

In [0]:
model.compile(loss = catergorical, optimizer = SGD,
              metrics = accuracy)

history1 = model.fit(x_train, y_tra,
                    batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data = (x_test, y_tes))

# compare and mention the accuracy improvements

In [0]:
score1 = model.evaluate(x_test, y_tes)
print  (score1[1])

**Testing different optimizer functions with several hypertuning parameters in Keras**

Let us focus on popular optimizer known as **Stochastic Gradient  Descent (SGD)**
Using the Gradient Decent (GD) optimization algorithm, the weights are updated incrementally after each epoch (pass over the training dataset).

The loss function J(⋅), the sum of squared errors (SSE), can be written as:

In [0]:
Image('fig/loss_sgd1.png', width=350,height=100)

The magnitude and direction of the weight update is computed by taking a step in the opposite direction of the cost gradient

In [0]:
Image('fig/learning_sgd2.png', width=250,height=100)

where η is the learning rate and $ \frac{\sigma J}{\sigma wj} $ is partial derivatives

In [0]:
Image('fig/SGD.png', width=650,height=300)

Essentially, we can picture GD optimization as a hiker (the weight coefficient) who wants to climb down a mountain into a valley , and each step is determined by the steepness of the slope (gradient) and the leg length of the hiker (learning rate). Note thatt if learning rate (η) is too small then hiker will move slowly. if η is too high hiker will possibly miss the value[1]. 

Several learning rate tuning techniques are available,  advanced optimization techniques like **RMSprop**, **Adam**, **Adadelta**, etc. which automaticaly tune parameter.

In [0]:
SGD = keras.optimizers.SGD(lr=0.01, decay = 0.005)

In [0]:
model.compile(loss = catergorical, optimizer = SGD,
              metrics = accuracy)

In [0]:
history1 = model.fit(x_train, y_tra,
                    batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data = (x_test, y_tes))

Train using other advance techniques

In [0]:
SGD = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)

In [0]:
#SGD = keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=None, decay=0.0)

In [0]:
model.compile(loss = catergorical, optimizer = SGD,
              metrics = accuracy)
history1 = model.fit(x_train, y_tra,
                    batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data = (x_test, y_tes))

**Model prediction for unseen data**

In [0]:
x_pred,y_pred = np.load('./dataset/isprs_vaihingen/test/patches/image.npy'),np.load('./dataset/isprs_vaihingen/test/patches/label.npy')

In [0]:
print (x_pred.shape)

In [0]:
x_pred = x_pred.reshape(x_pred.shape[0]*x_pred.shape[1]*x_pred.shape[2],x_pred.shape[3])

In [0]:
print (x_pred.shape)

In [0]:
Predict_prob = model.predict(x_pred)

In [0]:
Predict_prob.shape

In [0]:
Predict_prob[0]

In [0]:
Predict_class = np.argmax(Predict_prob,axis=-1)

In [0]:
Predict_class.shape

In [0]:
Predict_class = Predict_class.reshape(70,256,256)

In [0]:
%matplotlib inline
import matplotlib.pyplot as  plt
fig = plt.figure()
ax = fig.add_subplot(121)
ax.imshow(Predict_class[4], interpolation='none')
ax.set_xticks([])
ax.set_yticks([])
ax.set_title('Predict')
fig.show()

ax = fig.add_subplot(122)
ax.imshow(y_pred[4], interpolation='none')
ax.set_xticks([])
ax.set_yticks([])
ax.set_title('Label')

fig.suptitle('Scene: top_mosaic_09cm_area1')
fig.show()


**References**

[1] Bottou, Léon. "Stochastic gradient descent tricks." Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012. 421-436.