https://geemap.org/notebooks/46_local_rf_training/

<a href="https://githubtocolab.com/gee-community/geemap/blob/master/examples/notebooks/46_local_rf_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/></a>

Uncomment the following line to install [geemap](https://geemap.org) if needed.

In [None]:
# !pip install geemap scikit-learn

# How to use locally trained machine learning models with GEE

This notebook illustrates how to train a random forest (or any other ensemble tree estimator) locally using scikit-learn, convert the estimator into a string representation that Earth Engine can interpret, and how to apply the machine learning model with EE. **The notebook and the geemap machine learning module ([ml.py](https://geemap.org/ml/)) were contributed by [Kel Markert](https://github.com/KMarkert). A huge thank you to him.**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/farm_plot_detection/automl

/content/drive/MyDrive/farm_plot_detection/automl


In [None]:
%pwd

'/content/drive/MyDrive/farm_plot_detection/automl'

In [None]:
import ee
import geemap
import pandas as pd

from geemap import ml
from sklearn import ensemble

In [None]:
geemap.ee_initialize()

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=aMQHIu0EhrZx5QvsknFGuJgcAgM0eh3bQKC0vJvoPWo&tc=Cz472gL6g5I8b5_KCWoyMAVsuZW1tcHNynwb9-mK5ys&cc=grZhNCE31ooPNBvBCOy7AEDqlld_0fYGUQtNNuQF9NE

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1AfJohXlzfRCd1IXtHUej_hOwPUHUPUnmeczq5u-DBLNoSu7ChkdHp60dBa0

Successfully saved authorization token.


## Train a model locally using scikit-learn

In this demo, we are going to use the training data from [here](https://github.com/gee-community/geemap/blob/master/examples/data/rf_example.csv).

In [None]:
# read the feature table to train our RandomForest model
# data taken from https://colab.research.google.com/drive/1XiltuDdt6l8WrvX_qVYLY6W-mX60CSOQ#scrollTo=ylo57qJwrH2F

url = "train_test_set.csv"
df = pd.read_csv(url)

In [None]:
df

Unnamed: 0,B2,B3,B4,B8,crop,subset
0,1501.285714,1683.900000,2270.000000,3176.571429,0,train
1,1366.666667,1423.428571,1827.500000,2570.000000,1,train
2,1304.000000,1187.000000,1397.833333,2475.500000,0,train
3,1252.777778,1309.363636,1698.500000,2835.222222,1,train
4,1061.833333,893.125000,831.833333,1931.000000,0,train
...,...,...,...,...,...,...
1995,1118.250000,1114.800000,1515.750000,2473.200000,0,test
1996,1434.400000,1317.000000,1523.000000,2386.714286,0,test
1997,1437.333333,1490.000000,1932.166667,2955.250000,1,test
1998,1165.428571,1067.333333,1236.666667,2133.666667,0,test


In [1]:
# ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')

In [None]:
# specify the names of the features (i.e. band names) and label
# feature names used to extract out features and define what bands

feature_names = ['B2', 'B3', 'B4', 'B8']
label = "crop"

In [None]:
# get the features and labels into separate variables
X = df[feature_names]
y = df[label]

In [None]:
# create a classifier and fit
n_trees = 10
rf = ensemble.RandomForestClassifier(n_trees).fit(X, y)

## Convert a sklearn classifier object to a list of strings

In [None]:
# convert the estimator into a list of strings
# this function also works with the ensemble.ExtraTrees estimator
trees = ml.rf_to_strings(rf, feature_names)

In [None]:
# print the first tree to see the result
print(trees[0])

1) root 1265 9999 9999 (105.24308102845026)
  2) B4 <= 1642.500000 1265 0.4997 0
    4) B2 <= 1307.523804 757 0.4259 0
      8) B8 <= 2182.888916 581 0.3779 0
        16) B4 <= 1297.285706 192 0.2100 0
          32) B8 <= 1956.974976 129 0.1221 0
            64) B2 <= 1158.166687 42 0.0000 0 *
            65) B2 > 1158.166687 58 0.0233 0
              130) B3 <= 1090.566711 8 0.0000 0 *
              131) B3 > 1090.566711 16 0.0832 0
                262) B2 <= 1170.078430 1 0.0000 1 *
                263) B2 > 1170.078430 7 0.0000 0 *
          33) B8 > 1956.974976 129 0.1221 0
            66) B8 <= 1960.375000 1 0.0000 1 *
            67) B8 > 1960.375000 71 0.1884 0
              134) B4 <= 1175.344421 70 0.1626 0
                268) B8 <= 2030.750000 34 0.0689 0
                  536) B8 <= 2023.750000 9 0.0000 0 *
                  537) B8 > 2023.750000 1 0.0000 1 *
                269) B8 > 2030.750000 24 0.0000 0 *
              135) B4 > 1175.344421 70 0.1626 0
                

In [None]:
print(trees[1])

1) root 1264 9999 9999 (105.84426431853026)
  2) B4 <= 1622.750000 1264 0.4998 1
    4) B8 <= 2146.437500 677 0.4212 0
      8) B3 <= 1178.791687 167 0.2009 0
        16) B2 <= 1121.125000 49 0.0000 0 *
        17) B2 > 1121.125000 134 0.0980 0
          34) B2 <= 1123.364563 85 0.1487 0
            68) B8 <= 2020.625000 3 0.0000 0 *
            69) B8 > 2020.625000 2 0.0000 1 *
          35) B2 > 1123.364563 85 0.1487 0
            70) B3 <= 1118.250000 80 0.1026 0
              140) B2 <= 1165.880981 20 0.0000 0 *
              141) B2 > 1165.880981 46 0.1490 0
                282) B4 <= 1219.250000 11 0.0000 0 *
                283) B4 > 1219.250000 26 0.2550 0
                  566) B3 <= 1109.066711 15 0.4082 0
                    1132) B4 <= 1248.767029 2 0.0000 1 *
                    1133) B4 > 1248.767029 10 0.3047 0
                      2266) B3 <= 1083.333313 4 0.0000 0 *
                      2267) B3 > 1083.333313 8 0.1327 0
                        4534) B4 <= 1327.833374

In [None]:
# number of trees we converted should equal the number of trees we defined for the model
len(trees) == n_trees

True

## Convert sklearn classifier to GEE classifier

At this point you can take the list of strings and save them locally to avoid training again. However, we want to use the model with EE so we need to create an ee.Classifier and persist the data on ee for best results.

In [None]:
# create a ee classifier to use with ee objects from the trees
ee_classifier = ml.strings_to_classifier(trees)

In [None]:
ee_classifier.getInfo()

{'type': 'Classifier.decisionTreeEnsemble',
 'treeStrings': ['1) root 1265 9999 9999 (105.24308102845026)\n  2) B4 <= 1642.500000 1265 0.4997 0\n    4) B2 <= 1307.523804 757 0.4259 0\n      8) B8 <= 2182.888916 581 0.3779 0\n        16) B4 <= 1297.285706 192 0.2100 0\n          32) B8 <= 1956.974976 129 0.1221 0\n            64) B2 <= 1158.166687 42 0.0000 0 *\n            65) B2 > 1158.166687 58 0.0233 0\n              130) B3 <= 1090.566711 8 0.0000 0 *\n              131) B3 > 1090.566711 16 0.0832 0\n                262) B2 <= 1170.078430 1 0.0000 1 *\n                263) B2 > 1170.078430 7 0.0000 0 *\n          33) B8 > 1956.974976 129 0.1221 0\n            66) B8 <= 1960.375000 1 0.0000 1 *\n            67) B8 > 1960.375000 71 0.1884 0\n              134) B4 <= 1175.344421 70 0.1626 0\n                268) B8 <= 2030.750000 34 0.0689 0\n                  536) B8 <= 2023.750000 9 0.0000 0 *\n                  537) B8 > 2023.750000 1 0.0000 1 *\n                269) B8 > 2030.7500

## Classify image using GEE classifier

In [None]:
# get Sentinel-2 imagery and create median composite
# mask clouds?
start_date = ee.Date('2020-01-01')
end_date = start_date.advance(365, 'day')
bands = ['B2', 'B3', 'B4', 'B8']
image = ee.ImageCollection('COPERNICUS/S2').filterDate(start_date, end_date).select(bands).median()

In [None]:
# classify the image using the classifier we created from the local training
# note: here we select the feature_names from the image that way the classifier knows which bands to use
classified = image.select(feature_names).classify(ee_classifier)

In [None]:
# display results
Map = geemap.Map(center=(-18.67769, 26.27313), zoom=8)

Map.addLayer(
    image,
    {"bands": ['B2', 'B3', 'B4'], "min": 0, "max": 3000},
    'image',
)
Map.addLayer(
    classified,
    {"min": 0, "max": 1, "palette": ['beige', 'green']},
    'classification',
)

Map

Map(center=[-18.67769, 26.27313], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=Searc…

Yay!! 🎉 Looks like our example works. Don't party too much because there is a catch...

This workflow has several limitations particularly due to how much data you can pass from the client to the server and how large of a model ee can actually handle. EE can only handle 40MB of data passed to the server, so if you have a lot of large decision tree strings then this will not work. Also, creating a classifier from strings has limitation (see this ee-forum discussion: https://groups.google.com/g/google-earth-engine-developers/c/lFFU1GBPzi8/m/6MewQk1FBwAJ), this is again limited by string lengths when ee creates a computation graph.

So, you can use this but know you will probably run into errors when training large models.

## Save trees to the cloud

Now we have the strings in a format that ee can use, we want to save it for later use. There is a function to export a list of tree strings to a feature collection. The feature collection will have a pro

In [None]:
user_id = geemap.ee_user_id()
user_id

'users/alexvmt'

In [None]:
# specify asset id where to save trees
# be sure to change <user_name> to your ee user name
asset_id = user_id + "/random_forest_strings_test"
asset_id

'users/alexvmt/random_forest_strings_test'

In [None]:
# kick off an export process so it will be saved to the ee asset
ml.export_trees_to_fc(trees, asset_id)

# this will kick off an export task, so wait a few minutes before moving on

In [None]:
# read the exported tree feature collection
rf_fc = ee.FeatureCollection(asset_id)

# convert it to a classifier, very similar to the `ml.trees_to_classifier` function
another_classifier = ml.fc_to_classifier(rf_fc)

# classify the image again but with the classifier from the persisted trees
classified = image.select(feature_names).classify(another_classifier)

In [None]:
# display results
# we should get the exact same results as before
Map = geemap.Map(center=(-18.67769, 26.27313), zoom=8)

Map.addLayer(
    image,
    {"bands": ['B2', 'B3', 'B4'], "min": 0, "max": 3000},
    'image',
)
Map.addLayer(
    classified,
    {"min": 0, "max": 1, "palette": ['beige', 'green']},
    'classification',
)

Map

Map(center=[-18.67769, 26.27313], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=Searc…

## Save trees locally

In [None]:
import os

out_csv = os.path.expanduser("trees.csv")

In [None]:
ml.trees_to_csv(trees, out_csv)

In [None]:
another_classifier = ml.csv_to_classifier(out_csv)

In [None]:
classified = image.select(feature_names).classify(another_classifier)

In [None]:
# display results
# we should get the exact same results as before
Map = geemap.Map(center=(-18.67769, 26.27313), zoom=8)

Map.addLayer(
    image,
    {"bands": ['B2', 'B3', 'B4'], "min": 0, "max": 3000},
    'image',
)
Map.addLayer(
    classified,
    {"min": 0, "max": 1, "palette": ['beige', 'green']},
    'classification',
)

Map

Map(center=[-18.67769, 26.27313], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=Searc…