# **1) Import the Modules**

Modules are code libraries that contain a set of ready-to-use functions.

* The `ee` module allows developers to interact with Google Earth Engine using the Python programming language.
* The `os` module provides functions to perform tasks such as file and directory operations, process management, and environment variable manipulation.
* The `datetime` module supplies classes for manipulating dates and times.
* The `tabulate` module allows the user to display data in a table format.
* The `google.colab` module provides access to some of the unique features and functionality of Google Colab.

In [None]:
import ee
import os
import datetime
import tabulate

from google.colab import drive

# **2) Authentication Procedure**

This section provides instructions for setting up the Google Earth Engine Python API on Colab and for setting up Google Drive on Colab. These steps should be performed each time you start/restart/rollback a Colab session.

## **2.1) GEE**

The `ee.Authenticate` function authenticates access to the Google Earth Engine servers, while the `ee.Initialize` function initializes it. After executing the following cell, the user is prompted to grant Google Earth Engine access to their Google account.

**Note:** The Earth Engine API is installed by default in Google Colaboratory.

In [None]:
ee.Authenticate()
ee.Initialize(project="...")

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=udcDvPyV3xbh3Ju9ra6Nk7Cm6u7I41YnV0lHperWAJU&tc=6VTh0nDOJDoQ9mefXif0gDyv7e9ZJCORPpiLDdGkBqk&cc=8fT7HxeYKPCrbSDuki67QttnmnKWU19RFz95yEpkg80

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1AfJohXmIVqv1W-e11r3HmK_PzNpdqcWA3eAlU_mi_lP3dwDnTpBvM-BXbwA

Successfully saved authorization token.


*** Earth Engine *** Share your feedback by taking our Annual Developer Satisfaction Survey: https://google.qualtrics.com/jfe/form/SV_doiqkQG3NJ1t8IS?source=API


## **2.2) GD**

The `drive.mount` function allows access to specific folders of Google Drive. Granting access to Google Drive allows code running in the notebook to modify files in Google Drive.

**Note:** When using the `Mount Drive` button in the file browser, no authentication codes are required for notebooks edited only by the current user.

In [None]:
drive.mount("/content/gdrive")

Mounted at /content/gdrive


# **3) Functions**

Data Processing

In [None]:
def relative_property_importance(key, value):
  """
  Description:
    Calculates the relative importance of a property based on its value.

  Parameters:
    key: The property key.
    value: The property value.

  Returns:
    The relative importance value.
  """
  return ee.Number(value).divide(importancesSum).multiply(100)


def export_tasks_viewer(exportTasksIds, tableFormat: str = "plain"):
  """
  Description:
    Displays a table view which contains useful information about the provided export tasks.

  Notes:
    * Task_Id: The task identifier.
    * Task_State: One of READY, RUNNING, COMPLETED, FAILED, CANCELLED, UNSUBMITTED or UNKNOWN.
    * Task_Type: One of EXPORT_IMAGE, EXPORT_TILES, EXPORT_FEATURES, EXPORT_VIDEO.
    * Task_Attempt: Number of attempts.
    * Task_Description: A human-readable description of the task.
    * Queue_Time: The time that is taken while being in a queue.
    * Execution_Time: The time spent by the servers executing the task.
    * Completion_Time: SUm of queue and execution times.
    * Error_Message: Failure reason. Appears only if state is FAILED. May also include other fields.

  Arguments:
    exportTasksIdsList (list) (mandatory) A list of export task identifiers.
    tableFormat (str) (optional) The table format to use. Defaults to "plain".

  Returns:
    None, displays the export tasks table.
  """
  taskInfo = []
  tableHeaders = [
    "Task_Id", "Task_State", "Task_Type", "Task_Attempt", "Task_Description",
    "Queue_Time", "Execution_Time", "Completion_Time", "Error_Message"
  ]
  tableFormats = tabulate._table_formats.keys()

  if tableFormat not in tableFormats:
    raise ValueError(f"Invalid table format. Choose from: `{tableFormats}`.")

  # Populate taskInfo.
  for exportTaskId in exportTasksIds:

    taskState = ee.data.getTaskStatus(exportTaskId)[0]["state"]
    taskType = ee.data.getTaskStatus(exportTaskId)[0]["task_type"]
    taskDescription = ee.data.getTaskStatus(exportTaskId)[0]["description"]
    startTimestamp = datetime.datetime.fromtimestamp(ee.data.getTaskStatus(exportTaskId)[0]["start_timestamp_ms"]/1000.0)
    updateTimestamp = datetime.datetime.fromtimestamp(ee.data.getTaskStatus(exportTaskId)[0]["update_timestamp_ms"]/1000.0)
    creationTimestamp = datetime.datetime.fromtimestamp(ee.data.getTaskStatus(exportTaskId)[0]["creation_timestamp_ms"]/1000.0)

    queueTime = None
    taskAttempt = None
    executionTime = None
    completionTime = None

    if taskState not in ["READY", "RUNNING"]:
      queueTime = (startTimestamp - creationTimestamp).total_seconds()
      executionTime = (updateTimestamp - startTimestamp).total_seconds()

    if taskState == "COMPLETED":
      taskAttempt = ee.data.getTaskStatus(exportTaskId)[0]["attempt"]
      completionTime = (updateTimestamp - creationTimestamp).total_seconds()

    try:
      errorMessage = ee.data.getTaskStatus(exportTaskId)[0]["error_message"]
    except KeyError:
      errorMessage = None  # This just means that the export task has not failed.

    taskInfo.append([exportTaskId, taskState, taskType, taskAttempt, taskDescription, queueTime, executionTime, completionTime, errorMessage])

  # Table display.
  table = tabulate.tabulate(taskInfo, headers=tableHeaders, tablefmt=tableFormat)
  print(table)

# **4) Parameters**

In [None]:
nonWaterSampleIdentifiers =  [
  "..."
]

floodSampleIdentifiers = [
    "..."
]

waterSampleIdentifiers = [
  "..."
]

Cart

In [None]:
classifierParameters = {
  "maxNodes": null,
  "minLeafPopulation": 1
}

classifierFeatures = [
  "VHVHD", "VHVHQ", "VVVHD", "VVVHQ", "VVVVD", "VVVVQ", "NDPID", "NDPIQ",
  "PRE_VV", "PRE_VH", "PRE_NDPI", "POST_VV", "POST_VH", "POST_NDPI"
]

# GD paths.
classifierIdentifier = "cart"
destinationPath = "..."

Random Forest

In [None]:
classifierParameters = {
  "numberOfTrees": 25,
  # "variablesPerSplit": 3,
  # "minLeafPopulation": 5,
  # "bagFraction": 0.5,
  # "maxNodes": None,
  "seed": 0
}

classifierFeatures = [
  "VHVHD", "VHVHQ", "VVVHD", "VVVHQ", "VVVVD", "VVVVQ", "NDPID", "NDPIQ",
  "PRE_VV", "PRE_VH", "PRE_NDPI", "POST_VV", "POST_VH", "POST_NDPI"
]

# GD paths.
classifierIdentifier = "random_forest"
destinationPath = "..."

# **5) Configuration**

In [None]:
featureAbbreviations = {
  "VHVHD": "F1",
  "VHVHQ": "F2",
  "VVVHD": "F3",
  "VVVHQ": "F4",
  "VVVVD": "F5",
  "VVVVQ": "F6",
  "NDPID": "F7",
  "NDPIQ": "F8",
  "PRE_VH": "F9",
  "PRE_VV": "F10",
  "PRE_NDPI": "F11",
  "POST_VH": "F12",
  "POST_VV": "F13",
  "POST_NDPI": "F14"
}

# **6) Data Processing**

Process the samples catalog.

In [None]:
floodSamples = ee.List([])
nonWaterSamples = ee.List([])
waterSamples = ee.List([])

for identifier in nonWaterSampleIdentifiers:
  nonWaterSamples = nonWaterSamples.add(ee.FeatureCollection(identifier))

for identifier in floodSampleIdentifiers:
  floodSamples = floodSamples.add(ee.FeatureCollection(identifier))

for identifier in waterSampleIdentifiers:
  waterSamples = waterSamples.add(ee.FeatureCollection(identifier))

# Flatten sample collections.
floodSamples = ee.FeatureCollection(floodSamples).flatten()
nonWaterSamples = ee.FeatureCollection(nonWaterSamples).flatten()
waterSamples = ee.FeatureCollection(waterSamples).flatten()

# Merge sample collections.
samples = floodSamples.merge(nonWaterSamples).merge(waterSamples)

Create, train and process a RF classifier.

In [None]:
# Generate classifier name from feature abbreviations.
abbreviatedFeatures = [featureAbbreviations[key] for key in classifierFeatures]
abbreviatedFeatures = sorted(abbreviatedFeatures, key=lambda x: int(x[1:]))
abbreviation = "_".join(abbreviatedFeatures)

print(f"Processing classifier: `{abbreviation}`.")

classifier = ee.Classifier.smileRandomForest(**classifierParameters)  \
  .train(**{
    "features": samples,
    "classProperty": "class",
    "inputProperties": classifierFeatures
  })

# Assess classifier reliability.
classifierExplanation = classifier.explain()

trees = ee.List(ee.Dictionary(classifierExplanation).get("trees"))
expectedTreesCount = classifierParameters["numberOfTrees"]
actualTreesCount = trees.size().getInfo()

print(f"Classifier contains `{actualTreesCount}` out of `{expectedTreesCount}` trees.")

if actualTreesCount < expectedTreesCount:
  raise KeyboardInterrupt

# Determine tree sizes by evaluating the length of their string representations.
if actualTreesCount == expectedTreesCount:
  treeSizes = trees.map(lambda tree: ee.String(tree).length())

# Calculate feature importances.

# Absolute
absoluteFeatureImportances = ee.Dictionary(classifierExplanation.get("importance"))
featureNames = absoluteFeatureImportances.keys()

# Relative
importancesSum = absoluteFeatureImportances.values().reduce(ee.Reducer.sum())
relativeFeatureImportances = absoluteFeatureImportances.map(relative_property_importance)

# Store classifier as a feature collection.

# Handle the `Unable to export features with empty geometry` error.
dummyFeature = ee.Feature(ee.Geometry.Point([0, 0]))
dummyFeatures = ee.FeatureCollection(trees.map(lambda tree: dummyFeature.set("tree", tree)))

# Set properties.

# Confusion matrix.
confusionMatrix = "|".join([",".join(map(str, sublist)) for sublist in classifier.confusionMatrix().getInfo()])
dummyFeatures = dummyFeatures.set("confusion_matrix", confusionMatrix)

# Feature names & importances.
featureImportances = "|".join([f"{key},{value}" for key, value in relativeFeatureImportances.getInfo().items()])
dummyFeatures = dummyFeatures.set("feature_importances", featureImportances)
dummyFeatures = dummyFeatures.set("feature_names", ",".join(classifierFeatures))

Processing classifier: `F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14`.
Classifier contains `50` out of `50` trees.


# **7) Console**

In [None]:
print("*samples*");
print(f"total number of samples:", samples.size().getInfo())
print(f"number of flood samples:", floodSamples.size().getInfo())
print(f"number of water (sea) samples:", waterSamples.size().getInfo())
print(f"number of non-water samples:", nonWaterSamples.size().getInfo())

print("")

print("*classifier*")
print(f"classifier-abbreviation: `{abbreviation}`.")
print(f"classifier-features: `{classifierFeatures}`.")
print(f"classifier tree sizes: `{treeSizes.getInfo()}`.")

*samples*
total number of samples: 225000
number of flood samples: 75000
number of water (sea) samples: 75000
number of non-water samples: 75000

*classifier*
classifier-abbreviation: `F2_F3_F4_F5_F6_F7_F8_F9_F10_F11_F12_F13_F14`.
classifier-features: `['VHVHD', 'VHVHQ', 'VVVHD', 'VVVHQ', 'VVVVD', 'VVVVQ', 'NDPID', 'PRE_VV', 'PRE_VH', 'PRE_NDPI', 'POST_VV', 'POST_VH', 'POST_NDPI']`.
classifier tree sizes: `[322721, 326305, 314879, 305622, 309807, 309215, 332150, 319626, 325815, 322073, 313266, 315103, 319304, 329339, 305631, 329190, 314412, 310882, 326310, 319806, 311475, 319037, 326569, 314697, 326453, 316755, 323659, 325181, 317013, 326596, 321691, 316132, 320172, 317106, 315389, 315134, 330107, 315609, 319542, 314647, 323446, 312144, 331321, 320909, 313627, 338472, 326483, 316978, 329374, 311134]`.


# **8) Data Export**

Submit tasks.

In [None]:
exportTask = ee.batch.Export.table.toAsset(**{
  "collection": dummyFeatures,
  "description": classifierIdentifier,
  "assetId": os.path.join(destinationPath, classifierIdentifier),
})

exportTask.start()

Monitor tasks.

In [None]:
export_tasks_viewer([exportTask.id])

Task_Id                   Task_State    Task_Type          Task_Attempt  Task_Description      Queue_Time    Execution_Time    Completion_Time  Error_Message
WOXQG7OWM5HZMMG7IBJKY4IY  COMPLETED     EXPORT_FEATURES               1  base_slope                 4.877            39.598             44.475


-End of Notebook-