# Lab 7: Land cover classification

**Purpose:** The purpose of this lab is to explore different approaches to land cover classification using Earth Engine. Students will explore sampling methods to gather training datasets for land cover classification methods as well as applying unsupervised and supervised classification.

In [None]:
# import ee api and geemap package
import ee
import math
import geemap
from geemap import colormaps as cmaps

In [None]:
# try to initalize an ee session
# if not authenticated then run auth workflow and initialize
try:
    ee.Initialize()
except:
    ee.Authenticate()
    ee.Initialize()

## Background

Land cover classification has deep roots in the remote sensing community. One of the first applications of satellite image was to create land cover maps. 

No matter what you would like to call the approaches the workflow is the same:
1. Identify classification problem (what are you classifying)
2. Make sure you have an image (or images) that will be used for collecting data
3. Sample data from image(s) as input into model
4. Train/fit model
5. Apply model to image(s)
6. Check results (verification/validation)
7. Refine and iterate

As a prompt for the exercise within this notebook, imagine that you are a consultant that is setting up a hydrology model in a region that does not have an existing land cover dataset. You need to use remote sensing data to create a land cover dataset for parameterization of your model. The resulting land cover class won't need to be to complex (we are using CN method) so we will try to replicate the NLCD dataset for another country. This way we can use the parameterization from established NLCD classes.

## Image Compositing

Following along our steps for classification, we know what we will be using our land cover map for and the target classes we will now need an image to run the classification. We will use Landsat to create a composite for one year:

In [None]:
l8_collection = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')

In [None]:
# this loads in a global vector file of countries
# filter by country of interest
region = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017").filter(
    ee.Filter.eq("country_na","United States")
)

In [None]:
# specify time to filter data
# using an NLCD release year
start_time = "2019-05-01"
end_time = "2019-10-01"

In [None]:
l8_filtered = l8_collection.filterBounds(region).filterDate(start_time,end_time)

In [None]:
# QA mask function
def qa_mask(image):
    #Bits 3, 4, and 5 are cloud shadow, snow, and cloud, respectively.
    cloudShadowBitMask = (1 << 3);
    cloudsBitMask = (1 << 5);
    snowBitMask = (1 << 4);

    #Get the pixel QA band.
    qa = image.select('pixel_qa');

    # apply the bit shift and get binary image of different QA flags
    cloud_shadow_qa = qa.bitwiseAnd(cloudShadowBitMask).eq(0)
    snow_qa = qa.bitwiseAnd(snowBitMask).eq(0)
    cloud_qa = qa.bitwiseAnd(cloudsBitMask).eq(0)

    # combine qa mask layers to one final mask
    mask = cloud_shadow_qa.And(snow_qa).And(cloud_qa)

    # apply mask and return orignal image
    return image.updateMask(mask);


In [None]:
# apply qa and composite (using median reducer at the moment)
l8_composite = l8_filtered.map(qa_mask).median()

Check to make sure our composite is doing what we expect

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(region, 5); 

Map.addLayer(region,{},"United States")
Map.addLayer(l8_composite, {"bands":"B7,B5,B3", "min": 50, "max": 5500,"gamma":1.5}, 'L8 Composite');

Map.addLayerControl()

Map

## Sample data

Now that we have an image we need to sample data to use within a model. There are a couple ways to do this with Earth Engine but we will focus on the most straight forward approach.

In [None]:
# load in the NLCD data from 2019
nlcd = (
    ee.ImageCollection("USGS/NLCD_RELEASES/2019_REL/NLCD")
    .first()
    .select("landcover")
)

In [None]:
# combine the images together to sample from
sample_img = l8_composite.select("B[2-7]").addBands(nlcd)

In [None]:
# define which are feature inputs vs labels/targets
# these are used later on in the notebook
feature_names = l8_composite.select("B[2-7]").bandNames()
label_name = "landcover"

### Simple Random Sampling

The easiest approach to sample randomly throughout the domain. Doing this has pros and cons we will explore these later on. Here is how you sample:

In [None]:
random_samples = sample_img.sample(
    region = region.geometry().bounds(),
    numPixels = 2500, # number of samples to collect, in this case 2500 
    scale = 30, # important to be explicit about scale here, we want the data at native resolution
    seed = 7,
    tileScale = 4,
    geometries = True
)

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(region, 5); 

Map.addLayer(region,{},"United States")
Map.addLayer(l8_composite, {"bands":"B7,B5,B3", "min": 50, "max": 5500,"gamma":1.5}, 'L8 Composite');
Map.addLayer(nlcd, {}, 'NLCD');
Map.addLayer(random_samples, {}, 'Random Samples');


Map.addLayerControl()

Map

### Stratified Random Sampling

Another more robust approach to sample is the randomly sample within classes. This ensures that classes with smaller areas are sampled and not missed. Again, this has pros and cons but this is generally the approach used. Here is how you apply the stratified sampling:

In [None]:
# stratified sampling require class values to sample and how much
# here we define the values for each class
# and create a list of number of samples per class
nlcd_classes = ee.List([11,12,21,22,23,24,31,41,42,43,51,52,71,72,73,74,81,82,90,95])

n_classes = nlcd_classes.length()
n_points = 1000
perclass_points = ee.Number(n_points).divide(n_classes).round()

class_num = ee.List.repeat(perclass_points, n_classes)

In [None]:
stratified_samples = sample_img.stratifiedSample(
    region = region.geometry().bounds(),
    numPoints = 10,  
    classBand = label_name, 
    classValues = nlcd_classes, 
    classPoints = class_num,
    scale = 30, # important to be explicit about scale here, we want the data at native resolution
    seed = 7,
    tileScale = 4,
    # geometries = True
)

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(region, 5); 

Map.addLayer(region,{},"United States")
Map.addLayer(l8_composite, {"bands":"B7,B5,B3", "min": 50, "max": 5500,"gamma":1.5}, 'L8 Composite');
Map.addLayer(nlcd, {}, 'NLCD');
Map.addLayer(stratified_samples, {}, 'Stratified Random Samples');


Map.addLayerControl()

Map

You may get a computation time out error, this unfortunately happens because the process takes a while to run and EE's internal scheduler does not allow interactive processing to run too long to share resources. One way that is strongly advised is to export intermidiate results. The following code will export the stratified samples to an asset that we can load in later:

In [None]:
ee_asset_id = "kmarkert" # change to your id to export to your assets

task = ee.batch.Export.table.toAsset(
    collection = stratified_samples,
    description = "NLCD_sample_export",
    assetId = f"users/{ee_asset_id}/CE594_NLCD_stratified_samples"
)
task.start()

In [None]:
geemap.ee_user_id()

It is hard to tell how long exports will run. I have pre-exported these samples so we can continue with our exercise.

In [None]:
# load in the pre-exported samples
stratified_samples = ee.FeatureCollection("users/kmarkert/CE594_NLCD_stratified_samples")

## Unsupervised Classification

Now that we have our samples we can start apply some classification techniques. Before we get into supervised classification (which is what we are set up to do), we are going to try unsupervised classification and see what that gives us.

For classification/clustering on EE, there are very straightforward steps:
1. define a model and parameters
2. train/fit the model
3. apply model on imagery

In [None]:
# first step is to define a model
# get a KMeans clusterer object
kmeans = (
    ee.Clusterer.wekaKMeans(
        nClusters=n_classes, # specify same number of classes as NLCD
        init=1 # init model with k-means++
    )
    # apply training all at once to avoid having to return another object
    .train(random_samples, inputProperties=feature_names)
)

In [None]:
# apply model to image
cluster_img = sample_img.cluster(kmeans)

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(region, 5); 

Map.addLayer(region,{},"United States")
Map.addLayer(l8_composite, {"bands":"B7,B5,B3", "min": 50, "max": 5500,"gamma":1.5}, 'L8 Composite');
Map.addLayer(nlcd, {}, 'NLCD');
Map.addLayer(cluster_img.randomVisualizer(), {}, 'Clustered image');


Map.addLayerControl()

Map

There are other clustering algorithms, like XMeans, available on EE which I encourage you to explore. However, clustering is every good for data exploration and if you have labels a better approach is to use supervised classification.

## Supervised classification

Now to the task at hand...we want to classify our 

We will use well used and fancy classifier called a random forest ([Breiman 2001](https://link.springer.com/article/10.1023/A:1010933404324)).  A random forest is a collection of random decision trees the predictions of which are used to compute an average (regression) or vote on a label (classification). 

In [None]:
# first step is to define a model
# get a Random Forest classifier object

rf = (
    ee.Classifier.smileRandomForest(
        numberOfTrees = 20, # specify number of trees to use for classification
    )
    # again train in one-go to prevent returning another object
    .train(
        random_samples,
        classProperty = label_name, 
        inputProperties = feature_names
    )
)

In [None]:
# apply the classifier to our composite image
classified_img = sample_img.classify(rf).uint8()

In [None]:
lc_vis_values = nlcd.get("landcover_class_values")
lc_vis_colors = nlcd.get("landcover_class_palette")

# set image metadata to automatically visualize values and palette
classified_img = classified_img.set({
    "classification_class_values":nlcd_classes,
    "classification_class_palette":lc_vis_colors,
})

In [None]:
# Visualize the results
Map = geemap.Map()

Map.centerObject(region, 5); 

Map.addLayer(region,{},"United States")
Map.addLayer(l8_composite, {"bands":"B7,B5,B3", "min": 50, "max": 5500,"gamma":1.5}, 'L8 Composite');
Map.addLayer(nlcd, {}, 'NLCD');
Map.addLayer(classified_img, {}, 'Classified image');


Map.addLayerControl()

Map

### Accuracy assessment

It is generally good practice to split the dataset into training and testing so we have an idea of how well our model does at estimating our labels. We did not do this earlier to avoid confusion, however, we will do this now and get an idea of accuracy using EE. In the classification context, accuracy measurements are often derived from a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix).

First step is to randomly split the data, we can easily do this by reusing the classification training set, add a column of random numbers used to partition the known data where about 70% of the data will be used for training and 30% for testing:

In [None]:
# set a random column to table
stratified_samples_random = random_samples.randomColumn()

# split into training and testing
train_samples = stratified_samples_random.filter(ee.Filter.lte("random",0.7))
test_samples = stratified_samples_random.filter(ee.Filter.gt("random",0.7))

Now train a model only using the train dataset:

In [None]:
# get a Random Forest classifier object
rf = (
    ee.Classifier.smileRandomForest(
        numberOfTrees = 20, # specify number of trees to use for classification
    )
    # again train in one-go to prevent returning another object
    .train(
        train_samples, # note only training on train samples
        classProperty = label_name, 
        inputProperties = feature_names
    )
)

Apply the model to the test dataset. Note: here we are applying the classification model to a table and the classifier automatically adds a property called 'classification'!

In [None]:
# apply the classifier just like we would with an image
# this returns the original table but now with a classified column
pred_samples = test_samples.classify(rf)

Now that we have applied the model, we can do some data wrangling to get observed vs predicted labels and then calculate accuracy metrics.

In [None]:
# convert the table to ConfusionMatrix
# need to provide which columns are predicted vs observed
cm = pred_samples.errorMatrix(actual="landcover",predicted="classification")

Now we can use EE to calculate common accuracy metrics

In [None]:
# call the methods to calculate metrics and get locally
overall_acc = cm.accuracy().getInfo()
producers_acc = cm.producersAccuracy().getInfo()
consumers_acc = cm.consumersAccuracy().getInfo()
kappa = cm.kappa().getInfo()

In [None]:
print(f"Overall Accuracy: {overall_acc:.4f}")
print(f"Producer's Accuracy: {producers_acc}")
print(f"Consumer's Accuracy: {consumers_acc}")
print(f"Kappa coefficient: {kappa:.4f}")