# **1) Import the Modules**

Modules are code libraries that contain a set of ready-to-use functions.

* The `os` module provides functions to perform tasks such as file and directory operations, process management, and environment variable manipulation.
* The `hashlib` module provides functionality to calculate cryptographic hash values.
* The `json` module allows developers to load, read and write JSON files.
* The `numpy` module provides support for large, multi-dimensional arrays and matrices, as well as a collection of mathematical functions to efficiently manipulate these arrays.
* The `pandas` module provides a powerful and efficient toolkit for data manipulation, analysis, and exploration.
* The `seaborn` module provides a high-level interface for creating informative and visually appealing statistical graphics.
* The `matplotlib.pyplot` module provides a collection of functions for creating and customizing plots, diagrams and visualizations.
* The `plotly` module enables the creation of interactive and visually appealing visualizations for data analysis and presentation.
* The `google.colab` module provides access to some of the unique features and functionality of Google Colab.

In [None]:
!pip install -U kaleido

Collecting kaleido
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.9/79.9 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: kaleido
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.[0m[31m
[0mSuccessfully installed kaleido-0.2.1


In [None]:
import os
import json
import hashlib

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go

from google.colab import drive

# **2) Authentication Procedure**

This section provides instructions for setting up the Google Earth Engine Python API on Colab and for setting up Google Drive on Colab. These steps should be performed each time you start/restart/rollback a Colab session.

## **2.1) GD**

The `drive.mount` function allows access to specific folders of Google Drive. Granting access to Google Drive allows code running in the notebook to modify files in Google Drive.

**Note:** When using the `Mount Drive` button in the file browser, no authentication codes are required for notebooks edited only by the current user.

In [None]:
drive.mount("/content/gdrive")

Mounted at /content/gdrive


# **3) Functions**

Data Processing

In [None]:
def generate_hash(lists, desiredLength: int):
  """
  Description:
    Generate a fixed-length hash value for concatenated elements from multiple lists.

  Arguments:
    data_lists (list of lists): A list of lists, where each list contains strings.
    desired_length (int): The desired length of the resulting hash.

  Returns:
    A fixed-length hash value as a hexadecimal string.
  """
  # Create a new SHA-256 hash object
  sha256Hash = hashlib.sha256()

  # Update the hash object with the concatenated string's bytes
  sha256Hash.update("".join(lists).encode("utf-8"))

  # Get the hexadecimal representation of the hash
  return sha256Hash.hexdigest()[:desiredLength]


def record_hash(jsonPath, hash, data):
  """
  Description:
    Add a new entry to a JSON file with a SHA-256 hash as the key and list elements as values.

  Arguments:
    jsonPath (str): The path to the JSON file.
    sampleCollections (list of lists): A list of lists, where each list contains strings.

  Returns:
    None
  """
  # Load existing JSON data.
  try:
    with open(jsonPath, "r") as jsonData:
      existingData = json.load(jsonData)

  except FileNotFoundError:
    print("JSON data were not found. Initializing with an empty dictionary...")
    existingData = {}

  except json.JSONDecodeError:
    print("Error decoding JSON data. Initializing with an empty dictionary...")
    existingData = {}

  # Check if the hash key already exists.
  if hash not in existingData:
    existingData[hash] = data

    # Write the updated JSON data.
    with open(jsonPath, "w") as json_file:
      json.dump(existingData, json_file, indent=4)

      print(f"New entry has been successfully added to {jsonPath} with the SHA-256 hash key: {hash}.")
  else:
    print(f"Entry with SHA-256 hash key {hash} already exists in {jsonPath}")

# **4) Parameters**

In [None]:
# Classification.
nonWaterSamplePaths = [
  "..."
]

floodSamplePaths = [
  "..."
]

waterSamplePaths = [
  "..."
]

correlationMethod = "pearson" # ["pearson" | "kendall" | "spearman"]
correlationColumns = [
  "class", "VHVHD", "VHVHQ", "VVVHD", "VVVHQ", "VVVVD",
  "VVVVQ", "NDPID", "NDPIQ", "PRE_VV", "PRE_VH",
  "PRE_NDPI", "POST_VV", "POST_VH", "POST_NDPI"
]

destinationPath = "..."

# **5) Configuration**

In [None]:
# `Dynamic World V1`
dwLabelValues = {
  "water": 0,
  "trees": 1,
  "grass": 2,
  "flooded_vegetation": 3,
  "crops": 4,
  "shrub_and_scrub": 5,
  "built": 6,
  "bare": 7,
  "snow_and_ice": 8
}

dwLabelPalette = {
  "water": "#419bdf",
  "trees": "#397d49",
  "grass": "#88b053",
  "flooded_vegetation": "#7a87c6",
  "crops": "#e49635",
  "shrub_and_scrub": "#dfc35a",
  "built": "#c4281b",
  "bare": "#a59b8f",
  "snow_and_ice": "#b39fe1"
}

dwValuesPalette = {
  "0": "#419bdf",
  "1":  "#397d49",
  "2": "#88b053",
  "3": "#7a87c6",
  "4": "#e49635",
  "5": "#dfc35a",
  "6": "#c4281b",
  "7": "#a59b8f",
  "8": "#b39fe1",
}

dwValueLabels = {
  "0": "water",
  "1": "trees",
  "2": "grass",
  "3": "flooded_vegetation",
  "4": "crops",
  "5": "shrub_and_scrub",
  "6": "built",
  "7": "bare",
  "8": "snow_and_ice",
}

In [None]:
sampleFrames = []

# **6) Data Processing**

Process the sample collections.

In [None]:
# Combine all sample file paths.
samplePaths = floodSamplePaths + nonWaterSamplePaths + waterSamplePaths

# Combine all dataframes.
for path in samplePaths:
  frame = pd.read_csv(path)
  frame.name = os.path.splitext(os.path.basename(path))[0]
  sampleFrames.append(frame)

samplesFrame = pd.concat(sampleFrames, ignore_index=True)

# Calculate and recored a 16-character sha256 hash for the provided sample paths.
samplesHash = generate_hash(samplePaths, 16)
record_hash(os.path.join(destinationPath, "catalog.json"), samplesHash, samplePaths)

# Create a hash folder if does not already exist.
hashFolder = os.path.join(destinationPath, samplesHash)

if not os.path.exists(hashFolder):
  os.makedirs(hashFolder)

New entry has been successfully added to /content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/catalog.json with the SHA-256 hash key: 28509f93fb3e2988.


# **6) Console**

In [None]:
print("*Sample Collection Sizes*")

for df in sampleFrames:
  print(f"DataFrame Name: `{df.name}`")
  print(f"Row Count: `{df.shape[0]}`\n")

print(f"Total samples Count: `{samplesFrame.shape[0]}`.\n")

print("*Total count of Samples Grouped by DW labels*")

temp = samplesFrame["dw"].value_counts().to_dict()
print({dwValueLabels[str(key)]: value for key, value in temp.items()})

*Sample Collection Sizes*
DataFrame Name: `003747_00476D_87B6_004447_005723_9D4B`
Row Count: `25000`

DataFrame Name: `010579_00FBBC_09F9_005329_006C09_C51B`
Row Count: `25000`

DataFrame Name: `025053_02C3A8_1153_021203_02475C_B1FE`
Row Count: `25000`

DataFrame Name: `003747_00476D_87B6_004447_005723_9D4B`
Row Count: `25000`

DataFrame Name: `010579_00FBBC_09F9_005329_006C09_C51B`
Row Count: `25000`

DataFrame Name: `025053_02C3A8_1153_021203_02475C_B1FE`
Row Count: `25000`

DataFrame Name: `003747_00476D_87B6_004447_005723_9D4B`
Row Count: `25000`

DataFrame Name: `010579_00FBBC_09F9_005329_006C09_C51B`
Row Count: `25000`

DataFrame Name: `025053_02C3A8_1153_021203_02475C_B1FE`
Row Count: `25000`

Total samples Count: `225000`.

*Total count of Samples Grouped by DW labels*
{'crops': 90797, 'water': 75000, 'trees': 39026, 'shrub_and_scrub': 9208, 'grass': 5957, 'built': 3745, 'bare': 708, 'flooded_vegetation': 426, 'snow_and_ice': 133}


# **6) Data Visualization**

## **6.1) Scatter plots**

Pre-event

In [None]:
# Create a new figure.
plt.figure(figsize=(10, 6))

# Get unique labels.
dwLabels = list(dwValuesPalette.keys())

# Populate plot.
for label in dwLabels:
  labelDescription = dwValueLabels.get(label)
  labelColor = dwValuesPalette.get(label, "gray")
  subset = samplesFrame[samplesFrame["dw"] == int(label)]
  plt.scatter(subset["PRE_VV"], subset["PRE_VH"], label=labelDescription, color=labelColor, alpha=0.7)

# Customize plot.
plt.xlabel("VV")
plt.ylabel("VH")
plt.legend(title="DW classes")
plt.title("VV vs VH (Pre)")

plt.gca().set_facecolor("lightgray")
plt.grid(color="white", linestyle="--", linewidth=0.5)

# Save the plot to a file with high resolution.
destination = os.path.join(hashFolder, "scatter_plot_pre.png")
plt.savefig(destination, dpi=500)
plt.close()

print(f"Stored scatter plot to: `{destination}`.")

Stored scatter plot to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/28509f93fb3e2988/scatter_plot_pre.png`.


In [None]:
# Create a new figure with 3x3 subplots.
fig, axs = plt.subplots(3, 3, figsize=(15, 15))
# fig.subplots_adjust(hspace=0.5)  # Adjust spacing between subplots

# Get unique labels.
dwLabels = list(dwValuesPalette.keys())

# Iterate through the labels and populate subplots.
for i, label in enumerate(dwLabels):
  labelDescription = dwValueLabels.get(label)
  labelColor = dwValuesPalette.get(label, "gray")
  subset = samplesFrame[samplesFrame["dw"] == int(label)]

  # Determine the subplot position in the 3x3 grid.
  row, col = divmod(i, 3)
  ax = axs[row, col]

  # Scatter plot for the current group in the corresponding subplot.
  ax.scatter(subset["PRE_VV"], subset["PRE_VH"], label=labelDescription, color=labelColor, alpha=0.85)

  # Customize each subplot.
  ax.set_facecolor("lightgray")
  ax.set_xlabel("VV", fontsize=18)
  ax.set_ylabel("VH", fontsize=18)
  ax.grid(color="white", linestyle="--", linewidth=0.5)
  ax.legend()

# Customize figure.
fig.suptitle("VV vs VH (Pre)", fontsize=22)

# Save the entire figure to a file with high resolution.
destination = os.path.join(hashFolder, "grouped_scatter_plot_pre.png")
fig.tight_layout(pad=2.5)
plt.savefig(destination, dpi=500)
plt.close()

print(f"Stored scatter plot to: `{destination}`.")

Stored scatter plot to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/28509f93fb3e2988/grouped_scatter_plot_pre.png`.


Post-event

In [None]:
# Create a new figure.
plt.figure(figsize=(10, 6))

# Get unique labels.
dwLabels = list(dwValuesPalette.keys())

# Populate plot.
for label in dwLabels:
  labelDescription = dwValueLabels.get(label)
  labelColor = dwValuesPalette.get(label, "gray")
  subset = samplesFrame[samplesFrame["dw"] == int(label)]
  plt.scatter(subset["POST_VV"], subset["POST_VH"], label=labelDescription, color=labelColor, alpha=0.7)

# Customize plot.
plt.xlabel("VV")
plt.ylabel("VH")
plt.legend(title="DW classes")
plt.title("VV vs VH (Post)")

plt.gca().set_facecolor("lightgray")
plt.grid(color="white", linestyle="--", linewidth=0.5)

# Save the plot to a file with high resolution.
destination = os.path.join(hashFolder, "scatter_plot_post.png")
plt.savefig(destination, dpi=500)
plt.close()

print(f"Stored scatter plot to: `{destination}`.")

Stored scatter plot to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/28509f93fb3e2988/scatter_plot_post.png`.


In [None]:
# Create a new figure with 3x3 subplots.
fig, axs = plt.subplots(3, 3, figsize=(15, 15))
# fig.subplots_adjust(hspace=0.5)  # Adjust spacing between subplots

# Get unique labels.
dwLabels = list(dwValuesPalette.keys())

# Iterate through the labels and populate subplots.
for i, label in enumerate(dwLabels):
  labelDescription = dwValueLabels.get(label)
  labelColor = dwValuesPalette.get(label, "gray")
  subset = samplesFrame[samplesFrame["dw"] == int(label)]

  # Determine the subplot position in the 3x3 grid.
  row, col = divmod(i, 3)
  ax = axs[row, col]

  # Scatter plot for the current group in the corresponding subplot.
  ax.scatter(subset["POST_VV"], subset["POST_VH"], label=labelDescription, color=labelColor, alpha=0.85)

  # Customize each subplot.
  ax.set_facecolor("lightgray")
  ax.set_xlabel("VV", fontsize=18)
  ax.set_ylabel("VH", fontsize=18)
  ax.grid(color="white", linestyle="--", linewidth=0.5)
  ax.legend()

# Customize figure.
fig.suptitle("VV vs VH (Post)", fontsize=22)

# Save the entire figure to a file with high resolution.
destination = os.path.join(hashFolder, "grouped_scatter_plot_post.png")
fig.tight_layout(pad=2.5)
plt.savefig(destination, dpi=500)
plt.close()

print(f"Stored scatter plot to: `{destination}`.")

Stored scatter plot to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/28509f93fb3e2988/grouped_scatter_plot_post.png`.


## **6.2) Donut chart**

In [None]:
# Data for the pie chart.
dwLabelCounts = samplesFrame["dw"].value_counts().to_dict()
dwDescriptionCounts = {key: dwLabelCounts.get(value, 0) for key, value in dwLabelValues.items()}
dwDescriptionCounts = {key: value for key, value in dwDescriptionCounts.items() if value != 0}

labels = list(dwDescriptionCounts.keys())
values = list(dwDescriptionCounts.values())
colors = [dwLabelPalette[key] for key in dwDescriptionCounts]

In [None]:
# Create a Pie chart.
fig = go.Figure(data=[go.Pie(
  labels=labels,
  values=values,
  hole=0.3,
  textinfo="percent",
  insidetextorientation="radial",
  marker=dict(colors=colors),
  textposition="inside",
  textfont=dict(color="#FFFFFF")
)])

# Customize the layout.
fig.update_layout(
  title="LULC Distribution",
  title_x=0.5,
  font=dict(size=16),
  title_yanchor="top",
  title_font_size=20,
  legend_title_text="Classes",
  uniformtext_minsize=12,
  width=700,
  height=700,
  uniformtext_mode="hide"
)

# Export the chart as a static image.
destination = os.path.join(hashFolder, "dw_distribution.png")
fig.write_image(destination, scale=6)
print(f"Stored donut chart to: `{destination}`.")

Stored donut chart to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/28509f93fb3e2988/dw_distribution.png`.


## **6.2) Correlation matrixes**

In [None]:
# Compute pairwise correlations.
correlations = samplesFrame[correlationColumns].corr(method=correlationMethod).round(5)
diagonalCorrelations = correlations.where(np.tril(np.ones(correlations.shape)).astype(bool))

# Save the correlation matrix table.
tabularCorrDestination = os.path.join(hashFolder, "correlations.csv")
correlations.to_csv(tabularCorrDestination)
print(f"Stored correlation table to: `{tabularCorrDestination}`.")

Stored correlation table to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/28509f93fb3e2988/correlations.csv`.


Half correlation matrix

In [None]:
# Create a new figure.
plt.figure(figsize=(11, 11))

# Use seaborn heatmap for better visualization.
sns.heatmap(diagonalCorrelations, vmin=-1, vmax=1, center=0, annot=True, fmt=".2f",
            cmap="RdBu", cbar_kws={"shrink": 0.8}, square=True)

# Set title.
plt.title("Feature Correlations", pad=30, fontsize=16)

# Set tick labels and rotate x-axis labels for better readability.
plt.xticks(rotation=45, ha="right", fontsize=12)
plt.yticks(rotation=0, fontsize=12)

# Save the plot to a file with high resolution.
diagCorrDestination = os.path.join(hashFolder, "correlations_half.png")
plt.savefig(diagCorrDestination, dpi=500)
plt.close()

print(f"Stored diagonal correlations plot to: `{diagCorrDestination}`.")

Stored diagonal correlations plot to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/28509f93fb3e2988/correlations_half.png`.


Full correlation matrix

In [None]:
# Create a new figure.
plt.figure(figsize=(11, 11))

# Use seaborn heatmap for better visualization.
sns.heatmap(correlations, vmin=-1, vmax=1, center=0, annot=True, fmt=".2f",
            cmap="RdBu", cbar_kws={"shrink": 0.8}, square=True)

# Set title.
plt.title("Feature Correlations", pad=30, fontsize=16)

# Set tick labels and rotate x-axis labels for better readability.
plt.xticks(rotation=45, ha="right", fontsize=12)
plt.yticks(rotation=0, fontsize=12)

# Save the plot to a file with high resolution.
fullCorrDestination = os.path.join(hashFolder, "correlations_full.png")
plt.savefig(fullCorrDestination, dpi=500)
plt.close()

print(f"Stored full correlations plot to: `{fullCorrDestination}`.")

Stored full correlations plot to: `/content/gdrive/MyDrive/t-h-e-s-i-s/results/samples/28509f93fb3e2988/correlations_full.png`.


-End of Notebook-