<img src="https://github.com/nicholasmetherall/digital-earth-pacific-macblue-activities/blob/main/attachments/images/DE_Pacific_banner.JPG?raw=true" width="900"/>

Figure 1.1.a. Jupyter environment + Python notebooks

# Digital Earth Pacific Notebook 1 prepare postcard and load data to csv

The objective of this notebook is to prepare a geomad postcard for your AOI (masking, scaling and loading additional band ratios and spectral indices) and sampling all the datasets into a csv based on your training data geodataframe.

## Step 1.1: Configure the environment

In [33]:
from datetime import datetime
from shapely.geometry import Polygon
from shapely import box
from pyproj import CRS 
import folium
import geopandas as gpd
import numpy as np
import pandas as pd
import rasterio as rio
import xarray as xr
import rioxarray
from ipyleaflet import basemaps
from numpy.lib.stride_tricks import sliding_window_view
import pystac_client
import planetary_computer
from odc.stac import load
from pystac.client import Client
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from skimage.feature import graycomatrix, graycoprops
from utils import load_data, scale, do_prediction, calculate_band_indices, apply_masks, elevation_mask, glcm_features

In [34]:
# Predefined variable for title and version

# Enter your initials
initials = "nm"

# Enter your site name
site = "bootless"

# Date
date = datetime.now()

# Make a clean version string
version = f"{initials}-{date.strftime('%d%m%Y')}"
print(version)

nm-15072025


### Postcard csv

The objective of this notebook was to train the machine learning model that will allow us to classify an area with land cover classes defined through the training data.

Step 1.2. Input the training data to sample geomad data from the postcard

In [35]:
joined_df = gpd.read_file("training-data/nm-14072025-joined_tdata.csv")
joined_df

Unnamed: 0,cc_id,nir,red,blue,green,emad,smad,bcmad,nir08,nir09,...,ln_bg,contrast,homogeneity,energy,ASM,correlation,mean,entropy,y,x
0,3,0.14120000000000002,0.1573,0.1374,0.1776,0.263184,4.520297e-06,2.2864166e-05,0.1332,0.1158,...,-0.2566374507698208,9.819445,0.42828682,0.124226,0.015432099,0.8981699,20.6875,6.2668757,-2054025.0,3098525.0
1,3,0.14120000000000002,0.1573,0.1374,0.1776,0.263184,4.520297e-06,2.2864166e-05,0.1332,0.1158,...,-0.2566374507698208,3.9861112,0.5067936,0.13139506,0.01726466,0.954634,20.784721,6.110422,-2054025.0,3098515.0
2,3,0.1067,0.1438,0.13820000000000002,0.1734,0.22762872,5.765301e-06,2.647964e-05,0.1027,0.1134,...,-0.22689915301287178,1.9583334,0.5835784,0.17319395,0.029996142,0.9855082,17.32639,5.6070666,-2054045.0,3098515.0
3,3,0.1154,0.16620000000000001,0.16090000000000002,0.19990000000000002,0.26050386,4.535532e-06,2.5878915e-05,0.11220000000000001,0.11170000000000001,...,-0.2170341875080169,1.0555556,0.6722222,0.21336515,0.04552469,0.99210817,15.388889,5.035168,-2054055.0,3098505.0
4,3,0.116,0.1688,0.1641,0.2024,0.26948765,5.468303e-06,2.8206985e-05,0.1153,0.1242,...,-0.20976993931726523,1.0972222,0.66805553,0.22004138,0.04841821,0.99177974,15.270833,5.0255857,-2054055.0,3098495.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
796,2,0.0658,0.1405,0.152,0.18480000000000002,0.13483694,3.050053e-06,2.590787e-05,0.056600000000000004,0.056400000000000006,...,-0.19539363836130752,1.0277778,0.80277777,0.6603433,0.43605325,0.6035124,2.5972223,2.2381856,-1057695.0,-302245.0
797,2,0.0449,0.09280000000000001,0.10840000000000001,0.1283,0.105597325,2.4306833e-06,2.3776416e-05,0.0417,0.054900000000000004,...,-0.16854318261604476,1.0833334,0.775,0.57265353,0.3279321,0.78047067,3.0694444,2.6260667,-1057665.0,-302255.0
798,2,0.0431,0.095,0.11810000000000001,0.1391,0.11168822,2.7728677e-06,2.0128533e-05,0.0437,0.0531,...,-0.16366137572608053,0.7916667,0.7708333,0.5891738,0.34712577,0.56151503,2.4930556,2.4821439,-1057635.0,-302235.0
799,2,0.0487,0.1173,0.14200000000000002,0.1701,0.11532059,2.2951542e-06,2.138217e-05,0.043300000000000005,0.0534,...,-0.18055944180055522,0.6666667,0.8,0.62884617,0.39544752,0.5618661,2.4444444,2.2000573,-1057645.0,-302235.0


In [38]:
joined_df=joined_df.drop(columns=["y", "x"])

In [39]:
print(len(joined_df.columns))
joined_df.columns

35


Index(['cc_id', 'nir', 'red', 'blue', 'green', 'emad', 'smad', 'bcmad',
       'nir08', 'nir09', 'swir16', 'swir22', 'coastal', 'rededge1', 'rededge2',
       'rededge3', 'mndwi', 'ndti', 'cai', 'ndvi', 'evi', 'savi', 'ndwi',
       'b_g', 'b_r', 'mci', 'ndci', 'ln_bg', 'contrast', 'homogeneity',
       'energy', 'ASM', 'correlation', 'mean', 'entropy'],
      dtype='object')

In [40]:
# The classes are the first column
classes = np.array(joined_df)[:, 0]

# The observation data is everything after the second column
observations = np.array(joined_df)[:, 1:]

# Create a model...
classifier = RandomForestClassifier()

# ...and fit it to the data
model = classifier.fit(observations, classes)

In [41]:
# Dynamically create the filename with f-string
file_path = f"models/{version}-test.model"

# Save the model
joblib.dump(model, file_path)

['models/nm-15072025-test.model']