# **1) Import the Modules**

Modules are code libraries that contain a set of ready-to-use functions.

* The `ee` module allows developers to interact with Google Earth Engine using the Python programming language.
* The `os` module provides functions to perform tasks such as file and directory operations, process management, and environment variable manipulation.
* The `datetime` module supplies classes for manipulating dates and times.
* The `tabulate` module allows the user to display data in a table format.
* The `google.colab` module provides access to some of the unique features and functionality of Google Colab.

In [1]:
import ee
import os
import csv

import numpy as np
import pandas as pd

from statistics import mean
from google.colab import drive

# **2) Authentication Procedure**

This section provides instructions for setting up the Google Earth Engine Python API on Colab and for setting up Google Drive on Colab. These steps should be performed each time you start/restart/rollback a Colab session.

## **2.1) GEE**

The `ee.Authenticate` function authenticates access to the Google Earth Engine servers, while the `ee.Initialize` function initializes it. After executing the following cell, the user is prompted to grant Google Earth Engine access to their Google account.

**Note:** The Earth Engine API is installed by default in Google Colaboratory.

In [2]:
ee.Authenticate()
ee.Initialize()

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=xd2ZXcLOp620b9Kw64qo5XcTboD7XuSTFnuMvIFA_V0&tc=_-TYdmfXnx_tRgHC6uHKZMGoeU1lIeKR3FZQyG_q0dQ&cc=tRDLLHkVk-bvTQaW7xDgCcKLOEeTi1Nk4-lddviHZks

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1AfJohXnUlqLFk3LG-DE_P3PEOxxltnmGDPvxt1E3vRRzDedFMM8cYvuiXKA

Successfully saved authorization token.


*** Earth Engine *** Share your feedback by taking our Annual Developer Satisfaction Survey: https://google.qualtrics.com/jfe/form/SV_doiqkQG3NJ1t8IS?source=API


## **2.2) GD**

The `drive.mount` function allows access to specific folders of Google Drive. Granting access to Google Drive allows code running in the notebook to modify files in Google Drive.

**Note:** When using the `Mount Drive` button in the file browser, no authentication codes are required for notebooks edited only by the current user.

In [3]:
drive.mount("/content/gdrive")

Mounted at /content/gdrive


# **3) Functions**

Data processing

In [4]:
def cartesian_product(array):
  """
  Description:
    Generates the Cartesian product of an array of arrays.

  Parameters:
    array (List[List]): The list of lists.

  Returns:
    A list representing the Cartesian product of the input lists.

  """
  result = [[]]

  for sub_array in array:
    # Pairwise combinations.
    result = [x + [y] for x in result for y in sub_array]

  return result


def relative_property_importance(key, value):
  """
  Description:
    Calculates the relative importance of a property based on its value.

  Parameters:
    key: The property key.
    value: The property value.

  Returns:
    The relative importance value.

  """
  return ee.Number(value).divide(importancesSum).multiply(100)


def calculate_metrics(matrix):
  """
  Description:
    Calculates the evaluation metrics for each class based on the 3x3 error matrix.

  Arguments:
    matrix (list of lists): A  NxN error matrix where each row represents the true class,
                            and each column represents the predicted class.
  Returns:
    A list of dictionaries containing evaluation metrics for each class.
  """
  metrics = {}
  classesCount = len(matrix)

  totalCorrectPredictions = sum(matrix[i][i] for i in range(classesCount))
  totalPixels = sum(sum(row) for row in matrix)

  overallAccuracy = totalCorrectPredictions / totalPixels

  for index in range(classesCount):
    TP = matrix[index][index]
    FP = sum(matrix[index][i] for i in range(classesCount) if i != index)
    FN = sum(matrix[i][index] for i in range(classesCount) if i != index)
    TN = sum(matrix[i][j] for i in range(classesCount) for j in range(classesCount) if i != index and j != index)

    fnr = FN / (FN + TP)                    # false negative rate
    fpr = FP / (FP + TN)                    # false positive rate
    spc = TN / (FP + TN)                    # specificity
    rec = TP / (TP + FN)                    # recall/producer accuracy
    prec = TP / (TP + FP)                   # precision/user accuracy
    jaccard = TP / (TP + FP + FN)           # jaccard
    f1 = (2 * prec * rec) / (prec + rec)    # f1 score
    acc = (TP + TN) / (TP + TN + FP + FN)   # accuracy
    mcc = (TP * TN - FP * FN) / ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)) ** 0.5  # Matthews Correlation Coefficient

    metrics[f"class_{index}_false_negative_rate"] = fnr
    metrics[f"class_{index}_false_positive_rate"] = fpr
    metrics[f"class_{index}_specificity"] = spc
    metrics[f"class_{index}_recall"] = rec
    metrics[f"class_{index}_precision"] = prec
    metrics[f"class_{index}_jaccard"] = jaccard
    metrics[f"class_{index}_f1_score"] = f1
    metrics[f"class_{index}_accuracy"] = acc
    metrics[f"class_{index}_mathews_correlation_coefficient"] = mcc

  metrics["overall_accuracy"] = overallAccuracy
  return metrics

# **4) Parameters**

In [5]:
# `Classification`
nonWaterSampleIdentifiers =  [
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/non_water/003747_00476D_87B6_004447_005723_9D4B",
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/non_water/025053_02C3A8_1153_021203_02475C_B1FE",
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/non_water/010579_00FBBC_09F9_005329_006C09_C51B",
]

floodSampleIdentifiers = [
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/flood/003747_00476D_87B6_004447_005723_9D4B",
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/flood/010579_00FBBC_09F9_005329_006C09_C51B",
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/flood/025053_02C3A8_1153_021203_02475C_B1FE",
]

waterSampleIdentifiers = [
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/water/003747_00476D_87B6_004447_005723_9D4B",
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/water/010579_00FBBC_09F9_005329_006C09_C51B",
  "users/stamlazaros/hua/t-h-e-s-i-s/assets/samples/water/025053_02C3A8_1153_021203_02475C_B1FE",
]

parameterAbbreviations = {
  "numberOfTrees": "Trs",
  "variablesPerSplit": "Vrbs",
  "minLeafPopulation": "Lvs",
  "bagFraction": "Frctn",
  "maxNodes": "Nds",
  "seed": "Sd"
}

numberOfTrees = [25, 50, 75, 100]
variablesPerSplit = [1, 2, 3, 5]
minLeafPopulation = [1, 2, 3, 5, 7]
bagFraction = [0.5]
maxNodes = [1000]
seeds = [0]

inputKeys = ["numberOfTrees", "variablesPerSplit", "minLeafPopulation", "seed"]
inputValues = [numberOfTrees, variablesPerSplit, minLeafPopulation, seeds]

classifierFeatures = [
  "VHVHD", "VVVHD", "VVVVD",
  "PRE_VH", "PRE_VV", "PRE_NDPI",
  "POST_VH", "POST_VV", "POST_NDPI"
]

# GD paths.
classifierIdentifier = "base"
assetClassifiersPath = "users/stamlazaros/t-h-e-s-i-s/assets/classifiers/"

destinationFolder = "/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/"

# **5) Configuration**

In [6]:
featureAbbreviations = {
  "VHVHD": "F1",
  "VHVHQ": "F2",
  "VVVHD": "F3",
  "VVVHQ": "F4",
  "VVVVD": "F5",
  "VVVVQ": "F6",
  "NDPID": "F7",
  "PRE_VH": "F8",
  "PRE_VV": "F9",
  "PRE_NDPI": "F10",
  "POST_VH": "F11",
  "POST_VV": "F12",
  "POST_NDPI": "F13"
}

classifierStats = []
failedClassifiers = []

emsrGroups = ["emsr117", "emsr122", "emsr277"]

# **6) Data Processing**

Process the samples catalog.

In [7]:
nonWaterSamples = ee.List([])
floodSamples = ee.List([])
waterSamples = ee.List([])

for identifier in nonWaterSampleIdentifiers:
  nonWaterSamples = nonWaterSamples.add(ee.FeatureCollection(identifier))

for identifier in floodSampleIdentifiers:
  floodSamples = floodSamples.add(ee.FeatureCollection(identifier))

for identifier in waterSampleIdentifiers:
  waterSamples = waterSamples.add(ee.FeatureCollection(identifier))

# Flatten sample collections.
nonWaterSamples = ee.FeatureCollection(nonWaterSamples).flatten()
floodSamples = ee.FeatureCollection(floodSamples).flatten()
waterSamples = ee.FeatureCollection(waterSamples).flatten()

# Merge sample collections.
samples = nonWaterSamples.merge(floodSamples).merge(waterSamples)

Perform Grid Search

In [8]:
# Calculate the Cartesian product of the input arrays.
tuples = cartesian_product(inputValues)
combinations = [dict(zip(inputKeys, tuple)) for tuple in tuples]

In [9]:
breakFlag = False

for combination in combinations:
  # Generate classifier name from feature abbreviations.
  abbreviations = [f"{parameterAbbreviations[key]}{combination[key]}" for key in combination]
  abbreviation = "_".join(abbreviations)

  print(f"Processing classifier: `{abbreviation}`.")

  # Collect metrics.
  featureImportances = []
  errorMatrixes = ee.Array([
    [0, 0, 0],
    [0, 0, 0],
    [0, 0, 0]
  ])
  confusionMatrixes = ee.Array([
    [0, 0, 0],
    [0, 0, 0],
    [0, 0, 0]
  ])

  for group in emsrGroups:
    print(f"Examining group: `{group}`.")

    # Filter samples into validation and training sets.
    groupFilter = ee.Filter.eq("group", group)
    validationSamples = samples.filter(groupFilter)
    trainingSamples = samples.filter(groupFilter.Not())

    # Create, train and process a RF classifier.
    classifier = ee.Classifier.smileRandomForest(**combination)  \
      .train(**{
        "features": trainingSamples,
        "classProperty": "class",
        "inputProperties": classifierFeatures
      })

    # Assess classifier reliability.
    classifierExplanation = classifier.explain()

    trees = ee.List(ee.Dictionary(classifierExplanation).get("trees"))
    expectedTreesCount = combination["numberOfTrees"]
    actualTreesCount = trees.size().getInfo()

    print(f"The classifier contains `{actualTreesCount}` out of `{expectedTreesCount}` trees.")

    if actualTreesCount < expectedTreesCount:
      breakFlag = True
      break

    # # Determine tree sizes by evaluating the length of their string representations.
    # if actualTreesCount == expectedTreesCount:
    #   treeSizes = trees.map(lambda tree: ee.String(tree).length())

    # Compute performance matrixes.
    confusionMatrix = classifier.confusionMatrix()
    errorMatrix = validationSamples.classify(classifier).errorMatrix("class", "classification")

    # Calculate feature importances.

    # Absolute
    absoluteFeatureImportances = ee.Dictionary(classifierExplanation.get("importance"))
    featureNames = absoluteFeatureImportances.keys()

    # Relative
    importancesSum = absoluteFeatureImportances.values().reduce(ee.Reducer.sum())
    relativeFeatureImportances = absoluteFeatureImportances.map(relative_property_importance)

    # Collect metrics.
    errorMatrixes = errorMatrixes.add(errorMatrix.array())
    confusionMatrixes = confusionMatrixes.add(confusionMatrix.array())
    featureImportances.append(relativeFeatureImportances)

  if breakFlag:
    print("\nA defective classifier was detected.\n")
    failedClassifiers.append(abbreviation)
    breakFlag = False
    continue

  # Retrieve metrics from GOOGLE's servers.
  errorMatrix = errorMatrixes.toList().getInfo()
  confusionMatrix = confusionMatrixes.toList().getInfo()

  totalSums = ee.List([])

  for key in classifierFeatures:
    values = ee.List([ee.Number(dct.get(key)) for dct in featureImportances])
    totalSums = totalSums.add(values.reduce(ee.Reducer.sum()))

  featureImportances = dict(zip(classifierFeatures, totalSums.getInfo()))

  classifierStats.append({
    "abbreviation": abbreviation,
    "error_matrix": errorMatrix,
    "confusion_matrix": confusionMatrix,
    "feature_importances": featureImportances
  })

  print("")

Processing classifier: `Trs25_Vrbs1_Lvs1_Sd0`.
Examining group: `emsr117`.
The classifier contains `25` out of `25` trees.
Examining group: `emsr122`.
The classifier contains `25` out of `25` trees.
Examining group: `emsr277`.
The classifier contains `25` out of `25` trees.

Processing classifier: `Trs25_Vrbs1_Lvs2_Sd0`.
Examining group: `emsr117`.
The classifier contains `25` out of `25` trees.
Examining group: `emsr122`.
The classifier contains `25` out of `25` trees.
Examining group: `emsr277`.
The classifier contains `25` out of `25` trees.

Processing classifier: `Trs25_Vrbs1_Lvs3_Sd0`.
Examining group: `emsr117`.
The classifier contains `25` out of `25` trees.
Examining group: `emsr122`.
The classifier contains `25` out of `25` trees.
Examining group: `emsr277`.
The classifier contains `25` out of `25` trees.

Processing classifier: `Trs25_Vrbs1_Lvs5_Sd0`.
Examining group: `emsr117`.
The classifier contains `25` out of `25` trees.
Examining group: `emsr122`.
The classifier contai

In [10]:
failedClassifiers

['Trs75_Vrbs1_Lvs1_Sd0',
 'Trs75_Vrbs2_Lvs1_Sd0',
 'Trs75_Vrbs3_Lvs1_Sd0',
 'Trs100_Vrbs1_Lvs1_Sd0',
 'Trs100_Vrbs2_Lvs1_Sd0',
 'Trs100_Vrbs3_Lvs1_Sd0',
 'Trs100_Vrbs5_Lvs1_Sd0']

# **8) Data Export**

Create destination folders.

In [11]:
abbreviations = [stat["abbreviation"] for stat in classifierStats]
abbreviations.append("summary")

for abbreviation in abbreviations:
  path = os.path.join(destinationFolder, abbreviation)

  try:
    message = f"created." if not os.path.exists(path) else "already exists."
    os.makedirs(path, exist_ok=True)
    print(f"Directory: `{path}` {message}")

  except OSError as error:
    print(f"Error creating directory `{path}`: {error}.")

Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs1_Sd0` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs2_Sd0` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs3_Sd0` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs5_Sd0` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs7_Sd0` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs2_Lvs1_Sd0` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs2_Lvs2_Sd0` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs2_Lvs3_Sd0` created.
Directory: `/content/gdrive/MyDr

Store classifier stats as CSV files.

In [12]:
for stats in classifierStats:
  print(f"Processing classifier: `{stats['abbreviation']}`.")

  confusionMatrix = stats["confusion_matrix"]
  errorMatrix = stats["error_matrix"]

  # Construct paths.
  fiPath = os.path.join(destinationFolder, stats["abbreviation"], "feature_importances.csv")

  cmMetricsPath = os.path.join(destinationFolder, stats["abbreviation"], "confusion_matrix_metrics.csv")
  cmPath = os.path.join(destinationFolder, stats["abbreviation"], "confusion_matrix.csv")

  emMetricsPath = os.path.join(destinationFolder, stats["abbreviation"], "error_matrix_metrics.csv")
  emPath = os.path.join(destinationFolder, stats["abbreviation"], "error_matrix.csv")

  # Calculate matrix metrics.
  confusionMetrics = calculate_metrics(confusionMatrix)
  errorMetrics = calculate_metrics(errorMatrix)

  # Construct matrix dictionaries.
  confusionMatrix = {
    "class_0": ",".join(map(str, confusionMatrix[0])),
    "class_1": ",".join(map(str, confusionMatrix[1])),
    "class_2": ",".join(map(str, confusionMatrix[2]))
  }

  errorMatrix = {
    "class_0": ",".join(map(str, errorMatrix[0])),
    "class_1": ",".join(map(str, errorMatrix[1])),
    "class_2": ",".join(map(str, errorMatrix[2]))
  }

  # Store classifier stats.

  # Feature importances
  with open(fiPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(stats["feature_importances"].keys()))
    writer.writeheader()
    writer.writerows([stats["feature_importances"]])

  print(f"Stored feature importances to: `{fiPath}`.")

  # Confusion matrix metrics.
  with open(cmMetricsPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(confusionMetrics.keys()))
    writer.writeheader()
    writer.writerows([confusionMetrics])

  print(f"Stored confusion matrix metrics to: `{cmMetricsPath}`.")

  # Confusion matrix.
  with open(cmPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(confusionMatrix.keys()))
    writer.writeheader()
    writer.writerows([confusionMatrix])

  print(f"Stored confusion matrix to: `{cmPath}`.")

  # Error matrix metrics.
  with open(emMetricsPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(errorMetrics.keys()))
    writer.writeheader()
    writer.writerows([errorMetrics])

  print(f"Stored error matrix metrics to: `{emMetricsPath}`.")

  # Error matrix.
  with open(emPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(errorMatrix.keys()))
    writer.writeheader()
    writer.writerows([errorMatrix])

  print(f"Stored error matrix to: `{emPath}`.\n")

Processing classifier: `Trs25_Vrbs1_Lvs1_Sd0`.
Stored feature importances to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs1_Sd0/feature_importances.csv`.
Stored confusion matrix metrics to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs1_Sd0/confusion_matrix_metrics.csv`.
Stored confusion matrix to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs1_Sd0/confusion_matrix.csv`.
Stored error matrix metrics to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs1_Sd0/error_matrix_metrics.csv`.
Stored error matrix to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs1_Sd0/error_matrix.csv`.

Processing classifier: `Trs25_Vrbs1_Lvs2_Sd0`.
Stored feature importances to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/hyperparameter_tuning/Trs25_Vrbs1_Lvs2_S


-End of Notebook-