# Landcover Classification Example

#### Sources: 
- https://blog.gishub.org/earth-engine-tutorial-32-machine-learning-with-earth-engine-supervised-classification
- https://geohackweek.github.io/GoogleEarthEngine/05-classify-imagery/
- https://ceholden.github.io/open-geo-tutorial/python/chapter_5_classification.html
- GEE Documentation

#### Steps:
1. Collect training data. Assemble features which have a property that stores the known class label and properties storing numeric values for the predictors.
2. Instantiate a classifier. Set its parameters if necessary.
3. Train the classifier using the training data.
4. Classify an image or feature collection.
5. Estimate classification error with independent validation data.

The training data is a `FeatureCollection` with a property storing the class label and properties storing predictor variables. Class labels should be consecutive, integers starting from 0. If necessary, use remap() to convert class values to consecutive integers. The predictors should be numeric.

### Import libraries

In [1]:
import ee
import geemap
from geemap import *
import json
from geemap import geojson_to_ee, ee_to_geojson
from ipyleaflet import GeoJSON
import os
import sklearn
# !pip install geemap


## Data Preparation

### Create an interactive map

In [2]:
Map = geemap.Map()
Map

Map(center=[40, -100], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(T…

### Add region data to the map

In [3]:
point = ee.Geometry.Point(-122.4439, 37.7538)

#making a cloud free Landsat 8 Surface Reflectance Composite
image = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR').filterBounds(point).filterDate('2016-01-01', '2016-12-31').sort('CLOUD_COVER').first().select('B[1-7]')

#taking out any remaining cloud cover with the bitmask QA band
#qa = image.select('pixel_qa')
#cloudMask = qa.bitwiseAnd(1<<5).eq(0)
#.and(qa.bitwiseAnd(1<<3).eq(0))
#masked = image.updateMask(cloudMask).clip(bounds)

#Define a cloud mask for any remaining cloud
#def maskClouds(image):
    #clear = image.select('pixel_qa').bitwiseAnd(2).neq(0)
    #return image.updateMask(clear)

#image = LS8_SR1.map(maskClouds).median()

vis_params = {
    'min': 0,
    'max': 3000,
    'bands': ['B5', 'B4', 'B3']
}


Map.centerObject(point, 8)
Map.addLayer(image, vis_params, "Landsat-8")
Map
#Map.addLayer(aoi, "Mai_Ndombe");


Map(center=[37.75379999999999, -122.44390000000001], controls=(WidgetControl(options=['position', 'transparent…

### Check image properties

In [4]:
ee.Date(image.get('system:time_start')).format('YYYY-MM-dd').getInfo()

'2016-11-18'

In [5]:
image.get('CLOUD_COVER').getInfo()

0.08

### Creating the training dataset

There are several ways you can create a region for generating the training dataset.

- Draw a shape (e.g., rectangle) on the map and the use `region = Map.user_roi`
- Define a geometry, such as `region = ee.Geometry.Rectangle([-122.6003, 37.4831, -121.8036, 37.8288])`
- Create a buffer zone around a point, such as `region = ee.Geometry.Point([-122.4439, 37.7538]).buffer(10000)`
- If you don't define a region, it will use the image footprint by default

In [6]:
# region = Map.user_roi
# region = ee.Geometry.Rectangle([-122.6003, 37.4831, -121.8036, 37.8288])
region = ee.Geometry.Point([-122.4439, 37.7538]).buffer(10000)

The [USGS National Land Cover Database (NLCD)](https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD) will be used to create label dataset for training


![](https://i.imgur.com/7QoRXxu.png)

In [7]:
nlcd = ee.Image('USGS/NLCD/NLCD2016').select('landcover').clip(image.geometry()) #pre-defined data from an Earth Engine table asset
#remapping the LULC categories to new values ranging from 0–19 to make 
#visualizations consistent across products later in the script.

class_values = nlcd.get('landcover_class_values')
class_values_remap = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19]

nlcd = nlcd.remap(class_values,
         class_values_remap).rename('Landcover_Class')

#import the colour palette of the nlcd data
classColours = (
  '476ba1','d1defa','decaca','d99482','ee0000',
  'ab0000','b3aea3','68ab63','1c6330','b5ca8f',
  'a68c30','ccba7d','e3e3c2','caca78','99c247',
  '78ae94','dcd93d','ab7028','bad9eb','70a3ba'
    )

Map.addLayer(nlcd, {'min':0, 'max':19, 'palette':classColours}, 'NLCD')
Map


Map(center=[37.75379999999999, -122.44390000000001], controls=(WidgetControl(options=['position', 'transparent…

In [8]:
# Make the training dataset.
'''points = nlcd.sample(**{
    'region': image.geometry(), #The region to sample from. If unspecified, uses the image's whole footprint.
    'scale': 30, #A nominal scale in meters of the projection to sample in.
    'numPixels': 5000, #The approximate number of pixels to sample.
    'seed': 0, #A randomization seed to use for subsampling.
    'geometries': True  # If true, adds the center of the sampled pixel as the geometry property of the output 
                        #feature. Otherwise, geometries will be omitted (saving memory).Set this to False to 
                        #ignore geometries
})
'''
sample = nlcd.sample(**{
    'region': image.geometry(), #The region to sample from. If unspecified, uses the image's whole footprint.
    'scale': 30, #A nominal scale in meters of the projection to sample in.
    'numPixels': 5000, #The approximate number of pixels to sample.
    'seed': 0, #A randomization seed to use for subsampling.
    'geometries': True  # If true, adds the center of the sampled pixel as the geometry property of the output 
                        #feature. Otherwise, geometries will be omitted (saving memory).Set this to False to 
                        #ignore geometries
})


#The randomColumn() method will add a column of uniform random
#numbers in a column named 'random' by default.
sample = sample.randomColumn();

split = 0.7;  #Roughly 70% training, 30% testing.
training = sample.filter(ee.Filter.lt('random', split))
validation = sample.filter(ee.Filter.gte('random', split))

Map.addLayer(training, {}, 'Training')
Map

Map(center=[37.75379999999999, -122.44390000000001], controls=(WidgetControl(options=['position', 'transparent…

In [9]:
print(training.size().getInfo())

2495


In [10]:
print(training.first().getInfo()) #Returns the first entry from a given collection.

{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': [-122.25798986874739, 38.2706212827936]}, 'id': '0', 'properties': {'Landcover_Class': 6, 'random': 0.35449786187014276}}


In [11]:
#Export the training data as csv
geemap.ee_to_csv(training, 'lc_training_data.csv')


Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/tables/6743d778a6a20b1ed51f70c5f7fe0292-3d7be6e3074601b292ee498cf4dc8685:getFeatures
Please wait ...
Data downloaded to /Users/joycelynlongdon/Desktop/Cambridge/CambridgeCoding/MRES/GEE_examples/lc_training_data.csv


In [12]:
print(validation.size().getInfo())

1088


In [13]:
#Export the validation data as csv
geemap.ee_to_csv(validation, 'lc_validation_data.csv')


Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/tables/0b186a24444e0a3c4db36b237bcdd12f-79ca7d8971f1a1e7cd32282349c855d5:getFeatures
Please wait ...
Data downloaded to /Users/joycelynlongdon/Desktop/Cambridge/CambridgeCoding/MRES/GEE_examples/lc_validation_data.csv


### Train the classifier

Here we complete supervised classification using the RandomForest ensemble decision tree algorithm by [Leo Breiman and Adele Cutler](https://link.springer.com/article/10.1023/A:1010933404324).

The RandomForest algorithm is popular in the field of remote sensing, and is quite fast compared to some other machine learning approaches (e.g., SVM can be quite computationally intensive). It isn't necessarily the best but provides great first step into the world of machine learning for classification and regression.

In [14]:
# Use these bands for prediction.
bands = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7']


# This property of the table stores the land cover labels.
label = 'Landcover_Class'

# Overlay the points on the imagery to get training.
training = image.select(bands).sampleRegions(**{
  'collection': training,
  'properties': [label],
  'scale': 30
})

# Train a Random Forest classifier with 10 decision trees (will employ hyperparameter testing outside of notebook)
classifier = ee.Classifier.smileRandomForest(10).train(training,label,bands)

In [15]:
print(training.first().getInfo())

{'type': 'Feature', 'geometry': None, 'id': '0_0', 'properties': {'B1': 575, 'B2': 814, 'B3': 1312, 'B4': 1638, 'B5': 1980, 'B6': 2091, 'B7': 1967, 'Landcover_Class': 6}}


In [16]:
#Export the data as csv
geemap.ee_to_csv(training, 'trained_data.csv')


Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/tables/1e96f779b0c4b958f6c880525e3a9c5a-830a9bda943a2932ccf8dffa0d5085f0:getFeatures
Please wait ...
Data downloaded to /Users/joycelynlongdon/Desktop/Cambridge/CambridgeCoding/MRES/GEE_examples/trained_data.csv


### Classify the image

In [17]:
# Classify the image with the same bands used for training.
classified = image.select(bands).classify(classifier)

# Display the clusters with random colors.
Map.addLayer(classified.randomVisualizer(), {}, 'classfied')
Map

Map(center=[37.75379999999999, -122.44390000000001], controls=(WidgetControl(options=['position', 'transparent…

### Render categorical map

To render a categorical map, we can set two image properties: `landcover_class_values` and `landcover_class_palette`. We can use the same style as the NLCD so that it is easy to compare the two maps. 

In [18]:
classValues = class_values_remap 
classValues

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [19]:
class_palette = nlcd.get('landcover_class_palette').getInfo()
class_palette

['476ba1',
 'd1defa',
 'decaca',
 'd99482',
 'ee0000',
 'ab0000',
 'b3aea3',
 '68ab63',
 '1c6330',
 'b5ca8f',
 'a68c30',
 'ccba7d',
 'e3e3c2',
 'caca78',
 '99c247',
 '78ae94',
 'dcd93d',
 'ab7028',
 'bad9eb',
 '70a3ba']

In [20]:
landcover = classified.set('classification_class_values', classValues)
landcover = landcover.set('classification_class_palette', class_palette)

In [21]:
Map.addLayer(landcover, {}, 'Land Cover Map')
Map

Map(center=[37.75379999999999, -122.44390000000001], controls=(WidgetControl(options=['position', 'transparent…

### Visualize the result

In [22]:
print('Change layer opacity:')
cluster_layer = Map.layers[-1]
cluster_layer.interact(opacity=(0, 1, 0.1))

Change layer opacity:


Box(children=(FloatSlider(value=1.0, description='opacity', max=1.0),))

### Add a legend to the map

In [23]:
Map.add_legend(builtin_legend='NLCD')
Map

Map(center=[37.75379999999999, -122.44390000000001], controls=(WidgetControl(options=['position', 'transparent…

### Export the result



In [24]:
#tif file
geemap.ee_export_image(landcover, 'Land_Cover_Classification.tif', scale=900)

Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/5be35caa5e816acaa2325909f3255a2a-4d61e41b11060cc4f074467f77d0fc17:getPixels
Please wait ...
Data downloaded to /Users/joycelynlongdon/Desktop/Cambridge/CambridgeCoding/MRES/GEE_examples/Land_Cover_Classification.tif


### Assess The Accuracy

A confusion matrix is used here to compare two different image grids. We will be using it primarily to compute overall accuracy between the model and the validation layer, but confusion matrices also provide explicit information about which LULC classes were classified incorrectly; not just if pixels were classified incorrectly, but what LULC they were incorrectly classified as. 

In [25]:
#Get a confusion matrix representing resubstitution accuracy.
train_conf_matrix = classifier.confusionMatrix()
train_conf_matrix.getInfo()


[[384, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 89, 2, 0, 0, 0, 0, 1, 2, 0, 2, 3, 0, 0, 0, 0, 5, 0, 0],
 [0, 0, 0, 106, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 0],
 [0, 0, 0, 10, 127, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 [0, 0, 0, 0, 5, 49, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 7, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 183, 4, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 1, 0, 0, 0, 0, 4, 225, 0, 8, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 1, 1, 0, 0, 1, 1, 6, 0, 289, 9, 0, 0, 0, 0, 1, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 8, 380, 0, 0, 0, 0, 9, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [26]:
#Get the overall train accuracy
train_accuracy = train_conf_matrix.accuracy()
train_accuracy.getInfo()

0.9438877755511023

The accuracy estimated from training data is an overestimate because the random forest is “fit” to the training data.

To get validation accuracy, we classify the validation data.

We want to ensure that the training samples are uncorrelated with the evaluation samples. This might result from spatial autocorrelation of the phenomenon being predicted. One way to exclude samples that might be correlated in this manner is to remove samples that are within some distance to any other sample(s). This can be accomplished with a spatial join:

### Validation on NLCD Data

In [27]:
# Overlay the points on the imagery to get validation points.
validation = image.select(bands).sampleRegions(**{
  'collection': validation,
  'properties': [label],
  'scale': 30
})

In [28]:
print(validation.first().getInfo())

{'type': 'Feature', 'geometry': None, 'id': '7_0', 'properties': {'B1': 286, 'B2': 351, 'B3': 515, 'B4': 480, 'B5': 1734, 'B6': 1518, 'B7': 1002, 'Landcover_Class': 17}}


In [29]:
validated = validation.classify(classifier)

In [30]:
#Export the data as csv
geemap.ee_to_csv(validated, 'validated_data.csv')

Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/tables/d4bd218d2bb04db4caaec1418cc91dee-588c21c7e651dcc3baefabd5bc726605:getFeatures
Please wait ...
Data downloaded to /Users/joycelynlongdon/Desktop/Cambridge/CambridgeCoding/MRES/GEE_examples/validated_data.csv


In [31]:
#Get a confusion matrix representing resubstitution accuracy.
#errorMatrix computes a 2D error matrix for a collection by comparing two columns 
#of a collection: one containing the actual values, and one containing predicted values.
test_conf_matrix = validated.errorMatrix('Landcover_Class', 'classification')
test_conf_matrix.getInfo()


[[158, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 6, 5, 1, 0, 0, 0, 6, 10, 0, 6, 10, 0, 0, 0, 0, 12, 0, 0],
 [0, 0, 2, 17, 11, 2, 0, 0, 0, 2, 0, 3, 1, 0, 0, 0, 0, 15, 0, 0],
 [1, 0, 2, 20, 41, 9, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0],
 [0, 0, 0, 1, 6, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 1, 0, 0, 0, 1, 61, 20, 0, 1, 0, 0, 0, 0, 0, 4, 0, 0],
 [0, 0, 3, 0, 0, 0, 0, 0, 27, 47, 0, 19, 3, 0, 0, 0, 0, 5, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 4, 1, 1, 0, 0, 8, 30, 0, 53, 19, 0, 0, 0, 0, 8, 0, 1],
 [0, 0, 4, 1, 1, 1, 0, 0, 2, 3, 0, 24, 111, 0, 0, 0, 0, 25, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [32]:
#Get the overall test accuracy
test_accuracy = test_conf_matrix.accuracy()
test_accuracy.getInfo()

0.5847145488029466

The model does not perform too well on data it hasn't seen before. Hyperparameter testing will need to be performed to improve this performance.

## Validation on AOI (Mai Ndombe)

#### Set-Up Data for first time period for AOI

In [33]:
'''
#Adding AOI GeoJSON
file_path = os.path.abspath('/Users/joycelynlongdon/Desktop/Cambridge/CambridgeCoding/MRES/GEE_examples/mai_ndombe.json')

with open(file_path) as f:
    json_data = json.load(f)

mai_ndombe = GeoJSON(data=json_data, name='Mai_Ndombe')
'''

#include aoi region
file_path = os.path.abspath('/Users/joycelynlongdon/Desktop/Cambridge/CambridgeCoding/MRES/GEE_examples/mai_ndombe_poly.json')

with open(file_path) as f:
    aoi_poly = json.load(f)

aoi = ee.Geometry.Polygon(aoi_poly)


In [36]:
point = ee.Geometry.Point(18.3300, -2.00)
pointBuffer = point.buffer(100000)
#making a cloud free Landsat 8 Surface Reflectance Composite
image_aoi = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR').filterBounds(point).filterDate('2016-01-01', '2016-12-31').sort('CLOUD_COVER').first().select('B[1-7]')

#taking out any remaining cloud cover with the bitmask QA band
#qa = image.select('pixel_qa')
#cloudMask = qa.bitwiseAnd(1<<5).eq(0)
#.and(qa.bitwiseAnd(1<<3).eq(0))
#masked = image.updateMask(cloudMask).clip(bounds)

vis_params = {
    'min': 0,
    'max': 3000,
    'bands': ['B5', 'B4', 'B3']
}

Map.centerObject(point, 8)
Map.addLayer(image_aoi, vis_params, "Landsat-8")
Map

Map(bottom=33432.0, center=[-2, 18.329999999999995], controls=(WidgetControl(options=['position', 'transparent…

In [40]:
aoi_class = image_aoi.classify(classifier)
geemap.ee_to_csv(aoi_class, 'aoi_data.csv')

ee_object must be an ee.FeatureCollection


In [None]:
Map.addLayer(aoi_class, vis_params, "AOI Classification")