# Exercise 7. Classification, deep learning

The aim of this exercise is to train a deep learning model for predicting different classes from satellite data. It also assesses the model accuracy with a test dataset.

## Input data

2 raster files with:

* Coordinate system: Finnish ETRS-TM35FIN, EPSG:3067
* Resolution: 20m
* BBOX: 200000, 6700000, 300000, 6800000

#### Labels

* Multiclass classification raster: 1 - forest, 2 - fields, 3 - water, 4 - urban, 0 - everything else.

#### Data image

* Sentinel2 mosaic, with data from 2 different dates (May and July), to have more data values. Dataset has 8 bands based on bands: 2, 3, 4 and 8 on dates: 2021-05-11 and 2021-07-21, reflection values scaled to [0 ... 1]. The bands source data is: 
     *  'b02' / '2021-05-11'
     *  'b02' / '2021-07-21'
     *  'b03' / '2021-05-11'
     *  'b03' / '2021-07-21'
     *  'b04' / '2021-05-11'
     *  'b04' / '2021-07-21'
     *  'b08' / '2021-05-11'
     *  'b08' / '2021-07-21'
     
[Bands](https://custom-scripts.sentinel-hub.com/custom-scripts/sentinel-2/bands/): b02=blue, b03=green, b04=red, b08=infrared          

## Results

* Trained deep learning model
* Model accuracy estimation
* Class confusion matrix
* Predicted image 

## Main steps

1) Read data and shape it to suitable form for scikit-learn.
2) Divide the data to training, validation and test datasets.
3) Undersample to balance the training dataset.
4) Train the model.
5) Estimate the model on test data, inc class confusion matrix classification report creation.
6) Predict classification based on the data image and save it.
7) Plot the results

## Imports and paths

In [None]:
import os, time
from imblearn.under_sampling import RandomUnderSampler
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np
import rasterio
from rasterio.windows import from_bounds
from rasterio.plot import show
from rasterio.plot import show_hist
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from tensorflow.keras import models, layers
from tensorflow.keras import optimizers
from tensorflow.keras.models import model_from_json
from tensorflow.keras.utils import to_categorical
import urllib
%matplotlib inline

In [None]:
### File paths.
# Source data URLs
image_url = 'https://a3s.fi/gis-courses/gis_ml/image.tif'
multiclass_classification_url = 'https://a3s.fi/gis-courses/gis_ml/labels_multiclass.tif'

# Folders
user = os.environ.get('USER')
base_folder = os.path.join('/scratch/project_2002044', user, '2022/GeoML')
dataFolder = os.path.join(base_folder,'data')
outputBaseFolder= os.path.join(base_folder,'07_deep_classification')
shallow_folder= os.path.join(base_folder,'05_shallow_classification')

# Source data local paths
image_file = os.path.join(dataFolder, 'image.tif')
multiclass_classification_file = os.path.join(dataFolder, 'labels_multiclass.tif')

# Outputs of the model
# Saved model and its weights
fullyConnectedModel = os.path.join(outputBaseFolder,'fullyConnectedModel.json')
fullyConnectedWeights = os.path.join(outputBaseFolder,'fullyConnectedWeights.h5')
# Predicted .tif image
predictedImageFile = os.path.join(outputBaseFolder,'classified_fullyConnected.tif')

#For comparision
random_forest_predicition = os.path.join(shallow_folder,'classification_random_forest.tif')
SGD_predicition = os.path.join(shallow_folder,'classification_SGD.tif')
gradient_boost_predicition = os.path.join(shallow_folder,'classification_gradient_boost.tif')

# BBOX for exercise data, we use less than full image for shallow learning training, because of speed and to better see the results when plotting.
minx = 240500
miny = 6775500
maxx = 253500
maxy = 6788500 

# Available cores. During the course only 1 core is available, outside of this course more cores might be available 
# You can make use of multiple cores by setting this number to the number of cores available.
n_jobs = 1

# During the course we run this on CPU, but all bigger deep learning models benefit from running on GPU.
# No changes to code should be needed to run this on GPU.

From second code box: # Available cores. During the course only 1 core is available, otherwise more cores could be available, so increase this then. outside of this course more cores might be available ; you can make use of multiple cores by setting this number to the number of cores available.

(Download input data if needed.)

In [None]:
if not os.path.isdir(dataFolder):
    os.makedirs(dataFolder)
    
if not os.path.exists(image_file):
    urllib.request.urlretrieve(image_url, image_file)
    
if not os.path.exists(multiclass_classification_file):
    urllib.request.urlretrieve(multiclass_classification_url, multiclass_classification_file) 

## Read data and shape it to suitable form for scikit-learn¶

Read the input datasets with Rasterio and shape it to suitable form for keras (same as for scikit-learn).

Exactly the same as for clustering or shallow classification data.

### Satellite image

The satellite image has 8 channels, so rasterio reads it in as 3D data cube.

For keras we reshape the data to 2D, having in dataframe one row for each pixel. Each pixel has eight values, one for each band/date.

In [None]:
# Read the pixel values from .tif file as dataframe
with rasterio.open(image_file) as image_dataset:
    image_data = image_dataset.read(window=from_bounds(minx, miny, maxx, maxy, image_dataset.transform)) 

# Check shape of input data
print ('Dataframe original shape, 3D: ', image_data.shape)    

Save number of bands for later, to be able to reshape data back to 2D.

In [None]:
no_bands_in_image = image_data.shape[0]
no_bands_in_image

As a mid-step transponse the axis order, so that the bands are the last. Notice how the dataframe size changes.

In [None]:
image_data2 = np.transpose(image_data, (1, 2, 0))
# Check again the data shape, now the bands should be last.
print ('Dataframe shape after transpose, 3D: ', image_data2.shape) 

In [None]:
# Then reshape to 2D.
pixels = image_data2.reshape(-1, no_bands_in_image)
print ('Dataframe shape after transpose and reshape, 2D: ', pixels.shape) 

### Forest classes image as labels

Do the same for labels.

In [None]:
# For labels only reshape to 1D is enough.
with rasterio.open(multiclass_classification_file) as src:
    labels_data = src.read(window=from_bounds(minx, miny, maxx, maxy, src.transform))
    input_labels = labels_data.reshape(-1)
    print ('Labels shape after reshape, 1D: ', input_labels.shape)

Save the number of classes in labels, it will be later needed for defining the last layer in the model.

In [None]:
number_of_classes = np.unique(labels_data).size
number_of_classes

### Divide the data to training, validation and test datasets

Set training, validation and test data ratios, how big part of the pixels is assigned to different sets.

In [None]:
train_ratio = 0.7
validation_ratio = 0.2
test_ratio = 0.1

First separate test set. (In the exercise we will not use test data, but in actual projects you should.)

In [None]:
x_rest, x_test, y_rest, y_test = train_test_split(pixels, input_labels, test_size=test_ratio, random_state=63, stratify=input_labels)

... and then training and validation set, using the ratios set above and keeping class representation the same in all sets.

In [None]:
x_train1, x_validation, y_train1, y_validation= train_test_split(x_rest, y_rest, test_size=validation_ratio/(train_ratio + validation_ratio), random_state=63, stratify=y_rest)

### Resample to balance the dataset

The classes are very imbalanced in the dataset, so undersample the majority classes in the training set, so that all classes are represented about similar number of pixels. 
Notice that validation and test set keep the original class-distribution.

In [None]:
show_hist(labels_data)

In [None]:
# The classes are very imbalanced, so undersample the majority classes
rus = RandomUnderSampler(random_state=63)
x_train, y_train = rus.fit_resample(x_train1, y_train1)   
print ('Dataframe shape after undersampling of majority classes, pixels 2D: ', x_train.shape)
print ('Dataframe shape after undersampling of majority classes, labels 2D: ', y_train.shape)

*How many pixels of different classes are included in training dataset?*

Notice that we lost a lot of pixel at this point, in real cases that may be undesired. See [inbalanced-learn User guide](https://imbalanced-learn.org/stable/user_guide.html#user-guide) for other options.

In [None]:
print('Labels before splitting:           ', np.unique(input_labels, return_counts=True)[1])
print('Training data before undersampling:', np.unique(y_train1, return_counts=True)[1])
print('Training data after undersampling: ', np.unique(y_train, return_counts=True)[1])
print('Validation data:                   ', np.unique(y_validation, return_counts=True)[1])
print('Test data:                         ', np.unique(y_test, return_counts=True)[1])

## Define and compile the model

In [None]:
# Initializing a sequential model
model = models.Sequential()
# adding the first layer containing 64 perceptrons. 3 is representing the number of bands used for training
model.add(layers.Dense(64, activation='relu', input_shape=(no_bands_in_image,)))
# add the first dropout layer
model.add(layers.Dropout(rate=0.2))
# adding more layers to the model
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dropout(rate=0.2))
model.add(layers.Dense(16, activation='relu'))
# for last layer, the activation is 'softmax', it should be that for multi-class classification models
model.add(layers.Dense(number_of_classes, activation='softmax'))

Compile the model, using:
 - `Adam optimizer`, often used, but could be some other optimizer too.
 - Some other learning rate could be tried
 - `categorical_crossentropy` loss function (should be used with multi-class classification)

In [None]:
model.compile(optimizer=optimizers.Adam(learning_rate=0.01), loss='categorical_crossentropy', metrics=['accuracy'])

## Train the model

Encode the labels categorically (as we did with the region names in Postcode preparations). As result each pixel has a label, which is a 1D vector with 5 elements, each representing the probability of belonging to each class.

In [None]:
y_train_categorical = to_categorical(y_train)
y_train_categorical.shape

Train the model and save it. *This takes a moment, please wait*

In [None]:
start_time = time.time() 
model.fit(x_train, y_train_categorical, epochs=500, batch_size=256, verbose=2)

# Save the model to disk
# Serialize the model to JSON
model_json = model.to_json()
with open(fullyConnectedModel, "w") as json_file:
    json_file.write(model_json)
# Serialize weights to HDF5
model.save_weights(fullyConnectedWeights)
print('Saved model to disk:  \nModel: ', fullyConnectedModel, '\nWeights: ',  fullyConnectedWeights)
print('Model training took: ', round((time.time() - start_time), 0), ' seconds')

### Estimate the model on validation data

Find accuracy using Keras own `evaluate()`-function.

In [None]:
y_validation_categorical = to_categorical(y_validation)

# Use verbose=0 when using this in batch jobs, avoids printing to output a lot of unclear text.
validation_loss, validation_acc = model.evaluate(x_validation, y_validation_categorical, verbose=1)
print('Validation accuracy:', validation_acc)

Calculate confusion matrix and classification report as we did with shallow classifier. Use `scikit-learn` functions for that.

First predict for the x_validation. The model returns a 2D array, with:
- each row representing one pixel.
- each column representing the probablity of this pixel representing each category

In [None]:
validation_prediction = model.predict(x_validation)	
print ('Validation prediction dataframe shape, original 2D: ', validation_prediction.shape) 

Find which class was most likely for each pixel and select only that class for the output. Output is 1D array, with the most likely class index given for each pixel. `Argmax` returns the indices of the maximum values 

In [None]:
predicted_classes = np.argmax(validation_prediction,axis=1)
print ('Validation prediction dataframe shape, after argmax, 1D: ', predicted_classes.shape)

In [None]:
print('Confusion matrix: \n', confusion_matrix(y_validation, predicted_classes))
print('Classification report: \n', classification_report(y_validation, predicted_classes))

> **_NOTE:_**  Skipped here, but in real case, you should run similar evaluation also with test dataset after finilizing your model, optimizer, loss etc.

## Predict classification based on the data image and save it

Very similar to the shallow classifiers, but:
 - `argmax` is used for finding the most likely class.
 - Data type is changed to int8, keras returns int64, which GDAL does not support.   
 
 Load the model from .json file and re-create the model.

In [None]:
json_file = open(fullyConnectedModel, 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)

# Load weights into the model
loaded_model.load_weights(fullyConnectedWeights)
print("Loaded model from disk")

Predict for all pixels, reshape data back to image and save it as file.

In [None]:
start_time = time.time() 
# Predict for all pixels
prediction = loaded_model.predict(pixels)
print ('Prediction dataframe shape, original 2D: ', prediction.shape)

# Find the most likely class for each pixel.
predicted_classes = np.argmax(prediction,axis=1)
print ('Prediction dataframe shape, after argmax, 1D: ', predicted_classes.shape)

# Reshape back to 2D as in original raster image
prediction2D = np.reshape(predicted_classes, (image_data.shape[1], image_data.shape[2]))
print('Prediction shape in 2D: ', prediction2D.shape)

# Change data type to int8
predicted2D_int8 = np.int8(prediction2D)

# Save the results as .tif file.
# Copy the coordinate system information, image size and other metadata from the satellite image 
outputMeta = image_dataset.meta
# Change the number of bands and data type.
outputMeta.update(count=1, dtype='int8', nodata=100)
# Writing the image on the disk
with rasterio.open(predictedImageFile, 'w', **outputMeta) as dst:
    dst.write(predicted2D_int8, 1)

print('Predicting took: ', round((time.time() - start_time), 0), ' seconds')

## Plot the results

In [None]:
### Help function to normalize band values and enhance contrast. Just like what QGIS does automatically
def normalize(array):
    min_percent = 2   # Low percentile
    max_percent = 98  # High percentile
    lo, hi = np.percentile(array, (min_percent, max_percent))
    return (array - lo) / (hi - lo)

In [None]:
### Create a subplot for 4 images and plot the sentinel image 
fig, ax = plt.subplots(ncols=2, nrows=3, figsize=(10, 15))
cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", ["white","green","orange","blue","violet"])

### The results
rf_results = rasterio.open(random_forest_predicition)
show(rf_results, ax=ax[0, 0], cmap=cmap, title='Random forest')

SGD_results = rasterio.open(SGD_predicition)
show(SGD_results, ax=ax[0, 1], cmap=cmap, title='SGD')

gradient_boost_results = rasterio.open(gradient_boost_predicition)
show(gradient_boost_results, ax=ax[2, 0], cmap=cmap, title='gradient_boost')

show(predicted2D_int8, ax=ax[2, 1], cmap=cmap, title='Dense deep network')

# Plot the sentinel image 
nir, red, green = image_data[7,], image_data[3,], image_data[1,]
nirn, redn, greenn = normalize(nir), normalize(red), normalize(green)
stacked = np.stack((nirn, redn, greenn))
show(stacked, ax=ax[1,0], title='image') 

#labels = rasterio.open(labelsImage)
show(labels_data, ax=ax[1,1], cmap=cmap, title='labels')
