# **1) Import the Modules**

Modules are code libraries that contain a set of ready-to-use functions.

* The `ee` module allows developers to interact with Google Earth Engine using the Python programming language.
* The `os` module provides functions to perform tasks such as file and directory operations, process management, and environment variable manipulation.
* The `csv` module allows developers to load, read and write CSV files.
* The `numpy` module provides support for large, multi-dimensional arrays and matrices, as well as a collection of mathematical functions to efficiently manipulate these arrays.
* The `pandas` module provides a powerful and efficient toolkit for data manipulation, analysis, and exploration.
* The `statistics` module provides functions for statistical operations and calculations on numerical data.
* The `google.colab` module provides access to some of the unique features and functionality of Google Colab.

In [32]:
import ee
import os
import csv

import numpy as np
import pandas as pd

from statistics import mean
from google.colab import drive

# **2) Authentication Procedure**

This section provides instructions for setting up the Google Earth Engine Python API on Colab and for setting up Google Drive on Colab. These steps should be performed each time you start/restart/rollback a Colab session.

## **2.1) GEE**

The `ee.Authenticate` function authenticates access to the Google Earth Engine servers, while the `ee.Initialize` function initializes it. After executing the following cell, the user is prompted to grant Google Earth Engine access to their Google account.

**Note:** The Earth Engine API is installed by default in Google Colaboratory.

In [33]:
ee.Authenticate()
ee.Initialize(project="...")

## **2.2) GD**

The `drive.mount` function allows access to specific folders of Google Drive. Granting access to Google Drive allows code running in the notebook to modify files in Google Drive.

**Note:** When using the `Mount Drive` button in the file browser, no authentication codes are required for notebooks edited only by the current user.

In [34]:
drive.mount("/content/gdrive")

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


# **3) Functions**

Data Processing

In [35]:
def relative_property_importance(key, value):
  """
  Description:
    Calculates the relative importance of a property based on its value.

  Parameters:
    key: The property key.
    value: The property value.

  Returns:
    The relative importance value.

  """
  return ee.Number(value).divide(importancesSum).multiply(100)


def calculate_metrics(matrix):
  """
  Description:
    Calculates the evaluation metrics for each class based on the 3x3 error matrix.

  Arguments:
    matrix (list of lists): A  NxN error matrix where each row represents the true class,
                            and each column represents the predicted class.
  Returns:
    A list of dictionaries containing evaluation metrics for each class.
  """
  metrics = {}
  classesCount = len(matrix)

  totalCorrectPredictions = sum(matrix[i][i] for i in range(classesCount))
  totalPixels = sum(sum(row) for row in matrix)

  overallAccuracy = totalCorrectPredictions / totalPixels

  for index in range(classesCount):
    TP = matrix[index][index]
    FP = sum(matrix[index][i] for i in range(classesCount) if i != index)
    FN = sum(matrix[i][index] for i in range(classesCount) if i != index)
    TN = sum(matrix[i][j] for i in range(classesCount) for j in range(classesCount) if i != index and j != index)

    fnr = FN / (FN + TP)                    # false negative rate
    fpr = FP / (FP + TN)                    # false positive rate
    spc = TN / (FP + TN)                    # specificity
    rec = TP / (TP + FN)                    # recall/producer accuracy
    prec = TP / (TP + FP)                   # precision/user accuracy
    jaccard = TP / (TP + FP + FN)           # jaccard
    f1 = (2 * prec * rec) / (prec + rec)    # f1 score
    acc = (TP + TN) / (TP + TN + FP + FN)   # accuracy
    mcc = (TP * TN - FP * FN) / ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)) ** 0.5  # Matthews Correlation Coefficient

    metrics[f"class_{index}_false_negative_rate"] = fnr
    metrics[f"class_{index}_false_positive_rate"] = fpr
    metrics[f"class_{index}_specificity"] = spc
    metrics[f"class_{index}_recall"] = rec
    metrics[f"class_{index}_precision"] = prec
    metrics[f"class_{index}_jaccard"] = jaccard
    metrics[f"class_{index}_f1_score"] = f1
    metrics[f"class_{index}_accuracy"] = acc
    metrics[f"class_{index}_mathews_correlation_coefficient"] = mcc

  metrics["overall_accuracy"] = overallAccuracy
  return metrics

# **4) Parameters**

In [36]:
# `Classification`
emsr117SampleIdentifiers = [
  "..."
]

emsr122SampleIdentifiers = [
  "..."
]

emsr277SampleIdentifiers =  [
  "..."
]

# `RFE`
minimumFeaturesCount = 1

# `Group K-Fold`
emsrGroups = ["emsr117", "emsr122", "emsr277"]

In [37]:
classifierParameters = {
  "maxNodes": None,
  "minLeafPopulation": 1
}

classifierFeatures = [
  "VHVHD", "VHVHQ", "VVVHD", "VVVHQ", "VVVVD", "VVVVQ", "NDPID", "NDPIQ",
  "PRE_VV", "PRE_VH", "PRE_NDPI", "POST_VV", "POST_VH", "POST_NDPI"
]

# GD paths.
destinationFolder = "..."

# **5) Configuration**

In [38]:
featureAbbreviations = {
  "VHVHD": "F1",
  "VHVHQ": "F2",
  "VVVHD": "F3",
  "VVVHQ": "F4",
  "VVVVD": "F5",
  "VVVVQ": "F6",
  "NDPID": "F7",
  "NDPIQ": "F8",
  "PRE_VH": "F9",
  "PRE_VV": "F10",
  "PRE_NDPI": "F11",
  "POST_VH": "F12",
  "POST_VV": "F13",
  "POST_NDPI": "F14"
}

classifierStats = []

# **6) Data Processing**

Process the samples catalog.

In [39]:
emsr117Samples = ee.List([])
emsr122Samples = ee.List([])
emsr277Samples = ee.List([])

for identifier in emsr117SampleIdentifiers:
  emsr117Samples = emsr117Samples.add(ee.FeatureCollection(identifier))

for identifier in emsr122SampleIdentifiers:
  emsr122Samples = emsr122Samples.add(ee.FeatureCollection(identifier))

for identifier in emsr277SampleIdentifiers:
  emsr277Samples = emsr277Samples.add(ee.FeatureCollection(identifier))

# Flatten sample collections.
emsr117Samples = ee.FeatureCollection(emsr117Samples).flatten()
emsr122Samples = ee.FeatureCollection(emsr122Samples).flatten()
emsr277Samples = ee.FeatureCollection(emsr277Samples).flatten()

# Merge sample collections.
samples = emsr117Samples.merge(emsr122Samples).merge(emsr277Samples)

Create, train and process a RF classifier.

In [40]:
breakFlag = False

while len(classifierFeatures) > minimumFeaturesCount:
  # Generate classifier name from feature abbreviations.
  abbreviatedFeatures = [featureAbbreviations[key] for key in classifierFeatures]
  abbreviatedFeatures = sorted(abbreviatedFeatures, key=lambda x: int(x[1:]))
  abbreviation = "_".join(abbreviatedFeatures)

  print(f"Processing classifier: `{abbreviation}`.")

  # Collect metrics.
  featureImportances = []
  errorMatrixes = ee.Array([
    [0, 0, 0],
    [0, 0, 0],
    [0, 0, 0]
  ])
  confusionMatrixes = ee.Array([
    [0, 0, 0],
    [0, 0, 0],
    [0, 0, 0]
  ])

  for group in emsrGroups:
    print(f"Examining group: `{group}`.")

    # Filter samples into validation and training sets.
    groupFilter = ee.Filter.eq("group", group)
    validationSamples = samples.filter(groupFilter)
    trainingSamples = samples.filter(groupFilter.Not())

    # Create, train and process a RF classifier.
    classifier = ee.Classifier.smileCart(**classifierParameters)  \
      .train(**{
        "features": trainingSamples,
        "classProperty": "class",
        "inputProperties": classifierFeatures
      })

    # Assess classifier reliability.
    classifierExplanation = classifier.explain()

    # Compute performance matrixes.
    confusionMatrix = classifier.confusionMatrix()
    errorMatrix = validationSamples.classify(classifier).errorMatrix("class", "classification")

    # Calculate feature importances.

    # Absolute
    absoluteFeatureImportances = ee.Dictionary(classifierExplanation.get("importance"))
    featureNames = absoluteFeatureImportances.keys()

    # Relative
    importancesSum = absoluteFeatureImportances.values().reduce(ee.Reducer.sum())
    relativeFeatureImportances = absoluteFeatureImportances.map(relative_property_importance)

    # Collect metrics.
    errorMatrixes = errorMatrixes.add(errorMatrix.array())
    confusionMatrixes = confusionMatrixes.add(confusionMatrix.array())
    featureImportances.append(relativeFeatureImportances)

  if breakFlag:
    print("\n A defective classifier was detected. Exiting prematurely...")
    break

  # Retrieve metrics from GOOGLE's servers.
  errorMatrix = errorMatrixes.toList().getInfo()
  confusionMatrix = confusionMatrixes.toList().getInfo()

  totalSums = ee.List([])

  for key in classifierFeatures:
    values = ee.List([ee.Number(dct.get(key)) for dct in featureImportances])
    totalSums = totalSums.add(values.reduce(ee.Reducer.sum()))

  featureImportances = dict(zip(classifierFeatures, totalSums.getInfo()))

  # Store metrics.
  classifierStats.append({
    "abbreviation": abbreviation,
    "error_matrix": errorMatrix,
    "confusion_matrix": confusionMatrix,
    "feature_importances": featureImportances
  })

  # Remove the least important feature.
  leastImportantFeature = min(featureImportances, key=featureImportances.get)
  print(f"Removing least important feature: `{leastImportantFeature}`\n")
  classifierFeatures.pop(classifierFeatures.index(leastImportantFeature))

Processing classifier: `F1_F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14`.
Examining group: `emsr117`.
Examining group: `emsr122`.
Examining group: `emsr277`.
Removing least important feature: `VVVHQ`

Processing classifier: `F1_F2_F3_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14`.
Examining group: `emsr117`.
Examining group: `emsr122`.
Examining group: `emsr277`.
Removing least important feature: `VVVVQ`

Processing classifier: `F1_F2_F3_F5_F7_F8_F9_F10_F11_F12_F13_F14`.
Examining group: `emsr117`.
Examining group: `emsr122`.
Examining group: `emsr277`.
Removing least important feature: `NDPIQ`

Processing classifier: `F1_F2_F3_F5_F7_F9_F10_F11_F12_F13_F14`.
Examining group: `emsr117`.
Examining group: `emsr122`.
Examining group: `emsr277`.
Removing least important feature: `POST_NDPI`

Processing classifier: `F1_F2_F3_F5_F7_F9_F10_F11_F12_F13`.
Examining group: `emsr117`.
Examining group: `emsr122`.
Examining group: `emsr277`.
Removing least important feature: `PRE_NDPI`

Processing classifier: `

# **8) Data Export**

Create destination folders.

In [41]:
abbreviations = [stat["abbreviation"] for stat in classifierStats]
abbreviations.append("summary")

for abbreviation in abbreviations:
  path = os.path.join(destinationFolder, abbreviation)

  try:
    message = f"created." if not os.path.exists(path) else "already exists."
    os.makedirs(path, exist_ok=True)
    print(f"Directory: `{path}` {message}")

  except OSError as error:
    print(f"Error creating directory `{path}`: {error}.")

Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F5_F7_F8_F9_F10_F11_F12_F13_F14` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F5_F7_F9_F10_F11_F12_F13_F14` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F5_F7_F9_F10_F11_F12_F13` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F5_F7_F9_F10_F12_F13` created.
Directory: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F5_F7_F10_F12_F13` created.
Directory: `/content/gdrive/MyDr

Store classifier stats as CSV files.

In [42]:
for stats in classifierStats:
  print(f"Processing classifier: `{stats['abbreviation']}`.")

  confusionMatrix = stats["confusion_matrix"]
  errorMatrix = stats["error_matrix"]

  # Construct paths.
  fiPath = os.path.join(destinationFolder, stats["abbreviation"], "feature_importances.csv")

  cmMetricsPath = os.path.join(destinationFolder, stats["abbreviation"], "confusion_matrix_metrics.csv")
  cmPath = os.path.join(destinationFolder, stats["abbreviation"], "confusion_matrix.csv")

  emMetricsPath = os.path.join(destinationFolder, stats["abbreviation"], "error_matrix_metrics.csv")
  emPath = os.path.join(destinationFolder, stats["abbreviation"], "error_matrix.csv")

  # Calculate matrix metrics.
  confusionMetrics = calculate_metrics(confusionMatrix)
  errorMetrics = calculate_metrics(errorMatrix)

  # Construct matrix dictionaries.
  confusionMatrix = {
    "class_0": ",".join(map(str, confusionMatrix[0])),
    "class_1": ",".join(map(str, confusionMatrix[1])),
    "class_2": ",".join(map(str, confusionMatrix[2]))
  }

  errorMatrix = {
    "class_0": ",".join(map(str, errorMatrix[0])),
    "class_1": ",".join(map(str, errorMatrix[1])),
    "class_2": ",".join(map(str, errorMatrix[2]))
  }

  # Store classifier stats.

  # Feature importances
  with open(fiPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(stats["feature_importances"].keys()))
    writer.writeheader()
    writer.writerows([stats["feature_importances"]])

  print(f"Stored feature importances to: `{fiPath}`.")

  # Confusion matrix metrics.
  with open(cmMetricsPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(confusionMetrics.keys()))
    writer.writeheader()
    writer.writerows([confusionMetrics])

  print(f"Stored confusion matrix metrics to: `{cmMetricsPath}`.")

  # Confusion matrix.
  with open(cmPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(confusionMatrix.keys()))
    writer.writeheader()
    writer.writerows([confusionMatrix])

  print(f"Stored confusion matrix to: `{cmPath}`.")

  # Error matrix metrics.
  with open(emMetricsPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(errorMetrics.keys()))
    writer.writeheader()
    writer.writerows([errorMetrics])

  print(f"Stored error matrix metrics to: `{emMetricsPath}`.")

  # Error matrix.
  with open(emPath, "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=list(errorMatrix.keys()))
    writer.writeheader()
    writer.writerows([errorMatrix])

  print(f"Stored error matrix to: `{emPath}`.\n")

Processing classifier: `F1_F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14`.
Stored feature importances to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14/feature_importances.csv`.
Stored confusion matrix metrics to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14/confusion_matrix_metrics.csv`.
Stored confusion matrix to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14/confusion_matrix.csv`.
Stored error matrix metrics to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14/error_matrix_metrics.csv`.
Stored error matrix to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/classifiers/cart/feature_selection/F1_F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14/error_matrix.csv`.

Processing cla


-End of Notebook-