## GEOG0027 (2020/1) Classification with Google Earth Engine (GEE)

This practical will use Google Earth Engine (GEE)'s python library [EE](https://developers.google.com/earth-engine) and [geemap library](https://geemap.org/) to automatically classify land covers and land uses in Shenzhen area. These two libraries are built to handle remote sensing (RS) data from the Cloud without physically downloading the data to our local computers, and they also allow easy python coding, where only small modifications are needed before handling large datasets. GEE hosts many popular RS datasets on the Cloud, and details of its data catalog can be found at: https://developers.google.com/earth-engine/datasets.

For the Shenzhen classification work, we start from displaying a basemap of the area.

In [1]:
import geemap, ee, os

Map = geemap.Map(center=[22.634, 114.19], zoom=9)
Map

Map(center=[22.634, 114.19], controls=(WidgetControl(options=['position'], widget=HBox(children=(ToggleButton(…

## Examining the time series
Let's define a rectangular region of interest, following [min lon, min lat, max mon, max lat] first, and display a short 'movie' (a .gif file in fact) of how this area has changed over the past decades.

In [2]:
shenzhen_rec = ee.Geometry.Rectangle([113.7659, 22.40, 114.6654, 22.8536]) 
Map.addLayer(shenzhen_rec, {}, 'AOI rec')
print(shenzhen_rec.getInfo())

{'type': 'Polygon', 'coordinates': [[[113.7659, 22.4], [114.6654, 22.4], [114.6654, 22.8536], [113.7659, 22.8536], [113.7659, 22.4]]]}


In [2]:
Map_gif = geemap.Map(center=[22.7511, 113.91], zoom=10)
Map_gif.add_landsat_ts_gif(roi=shenzhen_rec, start_year=1985, bands=['NIR', 'Red', 'Green'], frames_per_second=5)

{'type': 'Polygon', 'coordinates': [[[113.7659, 22.4], [114.6654, 22.4], [114.6654, 22.8536], [113.7659, 22.8536], [113.7659, 22.4]]]}
Generating URL...
Downloading GIF image from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/videoThumbnails/fe35de13a51e97410db5e2a076d03f36-bf580ddde140dfc86a157c8a3a727dd4:getPixels
Please wait ...
The GIF image has been saved to: /Users/qingling/Downloads/landsat_ts_mjf.gif
Adding animated text to GIF ...
Adding GIF to the map ...
The timelapse has been added to the map.


![shenzhengif](../../../../Downloads/landsat_ts_qfo.gif "shenzhen")

Next, we will compare such change by using a slider

In [3]:
landsat_ts = geemap.landsat_timeseries(roi=shenzhen_rec, start_year=1986, end_year=2020, \
                                       start_date='01-01', end_date='12-31')

layer_names = ['Landsat ' + str(year) for year in range(1986, 2021)]

geemap_landsat_vis = {
    'min': 0,
    'max': 3000,
    'gamma': [1, 1, 1],
    'bands': ['NIR', 'Red', 'Green']} # You can change the vis bands here

Map2 = geemap.Map()
Map2.ts_inspector(left_ts=landsat_ts, right_ts=landsat_ts, left_names=layer_names, right_names=layer_names, \
                 left_vis=geemap_landsat_vis, right_vis=geemap_landsat_vis)
Map2.centerObject(shenzhen_rec, zoom=10)
Map2

Map(center=[22.627302103068118, 114.2156500000001], controls=(WidgetControl(options=['position'], widget=Dropd…

We have manually defined a rectangle for Shenzhen area, but we can also use shape/vector files to select Areas of Interest. 

# Select 'Shenzhen' as the area of interest (AOI)
The vector border layer is imported from https://developers.google.com/earth-engine/datasets/tags/borders, which includes the [Global Administrative Unit Layers (GAUL) data](https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level2) from 2015. You may notice that Shenzhen's boundary has expanded since (e.g. coastal landfill). We can manually draw some polygon and clip it to the GAUL border file, or, to make a simple example, we can add some 'buffer' (e.g. 3000 meters) to the GAUL boundary data. This inevitably will introduce some areas outside the border of Shenzhen, e.g. part of Hong Kong, so you can work out some more elegant way to combining/clipping multiple mask layers if time allows.

In [6]:
cities = ee.FeatureCollection("FAO/GAUL/2015/level2")
#Map.addLayer(cities, {}, 'Cities', False)

shenzhen = cities.filter(ee.Filter.eq('ADM2_NAME', 'Shenzhen'))
outline = ee.Image().byte().paint(**{
  'featureCollection': shenzhen,
  'color': 1,
  'width': 3
})
Map.addLayer(outline, {}, 'Shenzhen', False)

# Next, add some buffer to include the coastal expansion areas
shenzhen_buffer = ee.Geometry(shenzhen.geometry()).buffer(3000)
Map.addLayer(shenzhen_buffer, {}, 'Buffer around Shenzhen')
#Map.addLayer(rec, "Original rec bounds")
Map

Map(bottom=114444.0, center=[22.63429269379353, 114.19052124023439], controls=(WidgetControl(options=['positio…

Next, let's load some Landsat images for the Shenzhen area. I've defined here a python function called `display_landsat_collection` to do so. It automatically loads both the [surface reflectance](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_SR) and [annual NDVI](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_ANNUAL_NDVI) image collections from GEE's data catalog and also calculates the annual means for each band. 

In order to run such function, you will need to supply a year (any year since 1984) and a region of interest. In the following example, I chose year 2019 and the same Shenzhen rectangle for instance.

In [11]:
# to read in the Landsat data collection(s)

def display_landsat_collection(year, region, cloud_tolerance = 3.0):
    '''This function allows GEE to display an image collection 
    that fall within the cloud tolerance, e.g. 3.0%'''
    if year >= 2013:
        layer_name = 'LC08' # LS8: 2013-now        
        fcc_bands = ['B5', 'B4', 'B3']
    elif year == 2012: # # LS7: 1999- , however SLT error >= 1999:
        layer_name = 'LE07' 
        fcc_bands = ['B4', 'B3', 'B2']
    elif year >=1984:
        layer_name = 'LT05' # LS5: 1984-2012
        fcc_bands = ['B4', 'B3', 'B2']
    else:
        print('Please name a year after 1984')
        
    collection_name_sr = f"LANDSAT/{layer_name}/C01/T1_SR" 
    # You can also use the following line, if interested in incorperating ndvi
    collection_name_ndvi = f"LANDSAT/{layer_name}/C01/T1_ANNUAL_NDVI" 

    all_sr_image = ee.ImageCollection(collection_name_sr) \
        .filterBounds(region) \
        .filterDate(f'{year}-01-01', f'{year}-12-31') \
        .filter(ee.Filter.lt('CLOUD_COVER', cloud_tolerance))\
        .sort('system:time_start') \
        .select('B[1-7]') \
        .sort('CLOUD_COVER')
    
    landsat_vis_param = {
        'min': 0,
        'max': 3000,
        'bands': fcc_bands 
    }
 
    # reduce all_sr_image to annual average per pixel
    mean_image = all_sr_image.mean()
    mean_image = mean_image.clip(region).unmask()
    # alternatively, we can use the median from geemap lib
    median_image = landsat_ts.filterDate(f'{year}-01-01', f'{year}-12-31').first()
    median_image = median_image.clip(region).unmask()
    
    geemap_landsat_vis = {
        'min': 0,
        'max': 3000,
        'bands': ['NIR', 'Red', 'Green']
    }
        
    ndvi_image = ee.ImageCollection(collection_name_ndvi)\
        .filterBounds(region) \
        .filterDate(f'{year}-01-01', f'{year}-12-31')\
        .select('NDVI')\
        .first()
    ndvi_image = ndvi_image.clip(region).unmask()
    
    ndvi_colorized_vis = {
        'min': 0.0,
        'max': 1.0,
        'palette': [
    'FFFFFF', 'CE7E45', 'DF923D', 'F1B555', 'FCD163', '99B718', '74A901',
    '66A000', '529400', '3E8601', '207401', '056201', '004C00', '023B01',
    '012E01', '011D01', '011301']
    }
    
    #mean_image.addBands(ndvi_image, 'NDVI')
    
    Map.addLayer(ndvi_image, ndvi_colorized_vis, 'NDVI '+str(year),  opacity=0.5)
    Map.addLayer(mean_image, landsat_vis_param, "Mean Ref "+str(year))
    Map.addLayer(median_image, geemap_landsat_vis, "Median Ref "+str(year))

    return [all_image_2019, mean_image, median_image, ndvi_image]


# All you need to modify is the YEAR below:
[all_image_2019, mean_2019, median_2019, ndvi_2019] = display_landsat_collection(2019,\
                                                    shenzhen_buffer, cloud_tolerance = 3)
Map

Map(bottom=114444.0, center=[22.63429269379353, 114.19052124023439], controls=(WidgetControl(options=['positio…

We can also check the metadata from the Landsat image collection we just loaded from the Cloud. Have a look of the output. Any useful information?

In [13]:
first_image = all_image_2019.first() 

props = geemap.image_props(first_image)
print( props.getInfo())
print(props.get('IMAGE_DATE').getInfo())
print(props.get('CLOUD_COVER').getInfo(), '%')

{'CLOUD_COVER': 0.21, 'CLOUD_COVER_LAND': 0.23, 'EARTH_SUN_DISTANCE': 0.989463, 'ESPA_VERSION': '2_23_0_1b', 'GEOMETRIC_RMSE_MODEL': 7.889, 'GEOMETRIC_RMSE_MODEL_X': 5.66, 'GEOMETRIC_RMSE_MODEL_Y': 5.495, 'IMAGE_DATE': '2019-11-14', 'IMAGE_QUALITY_OLI': 9, 'IMAGE_QUALITY_TIRS': 9, 'LANDSAT_ID': 'LC08_L1TP_122044_20191114_20191202_01_T1', 'LEVEL1_PRODUCTION_DATE': 1575301122000, 'NOMINAL_SCALE': 30, 'PIXEL_QA_VERSION': 'generate_pixel_qa_1.6.0', 'SATELLITE': 'LANDSAT_8', 'SENSING_TIME': '2019-11-14T02:52:28.2228010Z', 'SOLAR_AZIMUTH_ANGLE': 153.700989, 'SOLAR_ZENITH_ANGLE': 45.366386, 'SR_APP_VERSION': 'LaSRC_1.3.0', 'WRS_PATH': 122, 'WRS_ROW': 44, 'system:asset_size': '627.230926 MB', 'system:band_names': ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7'], 'system:id': 'LANDSAT/LC08/C01/T1_SR/LC08_122044_20191114', 'system:index': 'LC08_122044_20191114', 'system:time_end': '2019-11-14 02:52:28', 'system:time_start': '2019-11-14 02:52:28', 'system:version': 1576236040308713}
2019-11-14
0.21 %


Next, examine the mean vs median surface reflectance layers we've visualise (switching the layers on and off, calculating the differences, etc.) . Which one is better? What should we include into the classification?

In [14]:
NIR = mean_2019.select('B5')
Red = mean_2019.select('B4')
mean_ndvi = NIR.subtract(Red).divide(NIR.add(Red))

NIR = median_2019.select('NIR')
Red = median_2019.select('Red')
median_ndvi = NIR.subtract(Red).divide(NIR.add(Red))

Map.addLayer(mean_ndvi.subtract(median_ndvi), {'min': 0.0,'max': 0.1}, 'Diff')
Map

Map(bottom=114444.0, center=[22.63429269379353, 114.19052124023439], controls=(WidgetControl(options=['positio…

You may wish to save/export some of the clipped raster image to a TIF file locally. For example:

In [17]:
geemap.ee_export_image(median_2019, filename='Shenzhen_landsat_2019_median.tif')

Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/dd1fbe919f2d315c75d5779e3fa29a42-760c05182c574a768efbaafb211e58dc:getPixels
Please wait ...
Data downloaded to /Users/qingling/Documents/UCL/Teaching-2020/GEE/Shenzhen_landsat_2019_median.tif


# Nextly, let's run some unsupervised classification (e.g. K-means) with geemap



## Unsupervised classification
Let's start with the 2019 Median image:

In [25]:
def unsupervised_classifier(region, image, n_clusters=5, output_filename=''):
    '''This function provides a simple K-means classifier,
    with a default no. of cluster of 5. User will need to specify 
    the region of interest and image to be classified'''
    
    # Make the training dataset:
    training_points = image.sample(**{
        'region': region,
        'scale': 30,
        'numPixels': 5000,
        'seed': 0,
        'geometries': True  # Set this to False to ignore geometries
    })

    #Map.addLayer(training_points, {}, 'training points', False) # No need to visualise this layer

    # Instantiate the clusterer and train it.
    clusterer = ee.Clusterer.wekaKMeans(n_clusters).train(training_points)

    # Cluster the input using the trained clusterer.
    class_result = image.cluster(clusterer)

    if output_filename == '':
        print('No output filename given. Results NOT exported')
    else:
        #Export the result directly to your computer/Hub:
        geemap.ee_export_image(class_result, filename=output_filename)

    # # Display the clusters with random colors.
    Map.addLayer(class_result.randomVisualizer(), {}, 'clusters', opacity=0.7)
    
    return class_result

#define an area (e.g. rectangle) of study area
#region = ee.Geometry.Rectangle([113.7659, 22.40, 114.6654, 22.8536]) 
# alternatively, we can use the 'buffer' as the study region

class_result = unsupervised_classifier(shenzhen_buffer, mean_2019, \
                n_clusters=5, output_filename='Shenzhen_Landsat_Kmeans_2019.tif')
Map

Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/62703533dbad461db352e12bdd9fc137-2e84fb0ed01754515f40ec5eed6a0169:getPixels
Please wait ...
Data downloaded to /Users/qingling/Documents/UCL/Teaching-2020/GEE/Shenzhen_Landsat_Kmeans_2019.tif


Map(bottom=3647044.0, center=[22.86636893657275, 113.76151800155641], controls=(WidgetControl(options=['positi…

In [21]:
# NOT SEEM TO WORK
#------------TO SPLIT SAMPLES-------------------------------------

bands = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7']
#bands nned to be adapted to GEEMAP_BANDS if using the MEDIAN: ['NIR', 'Red', etc]

# This property of the table stores the land cover labels.
label = 'landcover'

# Overlay the points on the imagery to get training.
sample = mean_2019.select(bands).sampleRegions(**{
  'collection': training_points,
  'properties': [label],
  'scale': 30
})

# Adds a column of deterministic pseudorandom numbers. 
sample = sample.randomColumn()

split = 0.7

training = sample.filter(ee.Filter.lt('random', split))
validation = sample.filter(ee.Filter.gte('random', split))
print(type(training_points))
print(type(training))

print(training.first().getInfo())
print(validation.first().getInfo())

Map.addLayer(training, {}, 'training points', True) 
Map.addLayer(validation, {}, 'validation points', False) 
Map

Map(bottom=114444.0, center=[22.63429269379353, 114.19052124023439], controls=(WidgetControl(options=['positio…

## Next, we will need to define or name these unsupervised classes
You need to carefully compare the classified results with the original images to decide which cluster belongs to what class. Keep in mind, there might be mis-classified pixels. How can you improve the results?

In [26]:
legend_keys = ['Light Clouds', 'Urban', 'Vegetation', 'Water', 'Clouds']
legend_colors = ['#FFFFFF', '#FFFFB3', '#8DD3C7', '#80B1D3', '#FFFFFF']

# Reclassify the map (only run this line ONCE)
class_result = class_result.remap([0, 1, 2, 3, 4], [1, 2, 3, 4, 5])

Map.addLayer(class_result, {'min': 1, 'max': 5, 'palette': legend_colors}, 'Labelled clusters')
Map.add_legend(legend_keys=legend_keys, legend_colors=legend_colors, position='bottomright')


In [27]:
#Visualize the result
print('Change layer opacity:')
#cluster_layer = Map.layers[-1]
cluster_layer = Map.find_layer('Labelled clusters')
cluster_layer.interact(opacity=(0, 1, 0.1))

Change layer opacity:


Box(children=(FloatSlider(value=1.0, description='opacity', max=1.0),))

# Draw ROIs and Accuracy
https://geemap.org/notebooks/33_accuracy_assessment/ 

Here, we need to use the drawing tools to manually select some ROIs, just like how we did it in ENVI. Clip on the polygon drawing tools on the LHS of the map interface to draw ROIs. Once finished, you can switch the ROI layer on/off from the RHS `layer` tool. The layer by default is named 'Drawn features'. Once you have selected abundant ROIs for one class (e.g. I have just selected three 'Urban' ROIs), we can then access these features as below. Also, saving these features into a shape (.shp) file for future use, e.g. in ENVI or GEE.

In [21]:
def save_roi_drawing(class_name='Unclassified'):
    '''This function allows uder to save the ROIs fro current drawing features to SHP and TIF files
    to be used later'''
    
    roi = ee.FeatureCollection(Map.draw_features)
    #print(roi.geometry())

    #clipped_roi = image.clip(roi) 
    #Map.addLayer(clipped_roi, vis_params, 'Clipped ROIs')
    
    # Generate A Classied RASTER LAYER FOR NUMPY USE LATER
    clipped_class = class_result.clip(roi)

    
    filename = f'Shenzhen_TruthROI_{class_name}'
    geemap.ee_to_shp(roi, filename=filename+'.shp')
    
    geemap.ee_export_image(clipped_class, filename=filename+'.tif', region=buffer)

In [22]:
save_roi_drawing(class_name='water')

Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/tables/d877194279459038974f3233e68e1fa7-0975735d25e3b74f206d687c9cde69a1:getFeatures
Please wait ...
Data downloaded to /Users/qingling/Documents/UCL/Teaching-2020/GEE/Shenzhen_TruthROI_water.shp
Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/03a8af94fd0d1c20aab281d3c7b45cf3-59457ef3f84d6648b25d9fee2cf1ac28:getPixels
Please wait ...
Data downloaded to /Users/qingling/Documents/UCL/Teaching-2020/GEE/Shenzhen_TruthROI_water.tif


In [21]:
'''TBD:  SHP FILE WONT LOAD HERE YET'''

def load_shp(filename, class_name, color="#0000ff"):
    style = {
        "stroke": True,
        "color": color,
        "weight": 2,
        "opacity": 1,
        "fill": True,
        "fillColor": "#0000ff",
        "fillOpacity": 0.4,
    }
    
    imported_roi = geemap.shp_to_ee(filename)
    print(imported_roi)
    Map.addLayer(imported_roi, style, class_name+' ROIs')
    
load_shp('Shenzhen_TruthROI_water.shp', 'Water')

ee.FeatureCollection({
  "functionInvocationValue": {
    "functionName": "Collection",
    "arguments": {
      "features": {
        "constantValue": []
      }
    }
  }
})


# Export results to excel (.csv) files

In [13]:
import numpy as np
import matplotlib.pyplot as plt

class_ct = class_result.reduceRegion(**{
  'reducer': ee.Reducer.count(),
  'geometry': buffer,
  'scale': 30
})

#mean = class_ct.get('to')
print(class_ct)

#class_ar = class_result.bitsToArrayImage()
#class_np = geemap.ee_to_numpy(class_result, bands= ['Urban'], region=rec)

#print(class_result)
#print(geemap.image_stats(class_result))
#props = geemap.image_props(class_result)
#print(props)

ee.List({
  "functionInvocationValue": {
    "functionName": "Dictionary.keys",
    "arguments": {
      "dictionary": {
        "functionInvocationValue": {
          "functionName": "Image.reduceRegion",
          "arguments": {
            "geometry": {
              "functionInvocationValue": {
                "functionName": "Geometry.buffer",
                "arguments": {
                  "distance": {
                    "constantValue": 3000
                  },
                  "geometry": {
                    "functionInvocationValue": {
                      "functionName": "Collection.geometry",
                      "arguments": {
                        "collection": {
                          "functionInvocationValue": {
                            "functionName": "Collection.filter",
                            "arguments": {
                              "collection": {
                                "functionInvocationValue": {
                                  

In [19]:
img_mean = image.reduceRegion(**{
  'reducer': ee.Reducer.mean(),
  'geometry': rec,
  'scale': 30
})

#mean = class_ct.get('to')
print(ee.Feature(img_mean).select(all_image.bandNames()));

#class_ar = class_result.bitsToArrayImage()
#class_np = geemap.ee_to_numpy(class_result, bands= ['Urban'], region=rec)

#print(class_result)
#print(geemap.image_stats(class_result))
#props = geemap.image_props(class_result)
#print(props)

AttributeError: 'ImageCollection' object has no attribute 'bandNames'

In [50]:
print(class_ar)

ee.Image({
  "functionInvocationValue": {
    "functionName": "Image.bitsToArrayImage",
    "arguments": {
      "input": {
        "functionInvocationValue": {
          "functionName": "Image.remap",
          "arguments": {
            "from": {
              "constantValue": [
                0,
                1,
                2,
                3,
                4
              ]
            },
            "image": {
              "functionInvocationValue": {
                "functionName": "Image.cluster",
                "arguments": {
                  "clusterer": {
                    "functionInvocationValue": {
                      "functionName": "Clusterer.train",
                      "arguments": {
                        "clusterer": {
                          "functionInvocationValue": {
                            "functionName": "Clusterer.wekaKMeans",
                            "arguments": {
                              "nClusters": {
                     

In [25]:
import pandas as pd

class_pd = geemap.ee_to_pandas(class_result)

AttributeError: module 'geemap' has no attribute 'ee_to_pandas'

In [34]:
out_dir = os.path.join(os.path.expanduser('~'), 'Downloads')
out_class_stats = os.path.join(out_dir, 'class_stats.csv')  
out_result = os.path.join(out_dir, 'class.csv') 

if not os.path.exists(out_dir):
    os.makedirs(out_dir)

# Allowed output formats: csv, shp, json, kml, kmz
# Allowed statistics type: MEAN, MAXIMUM, MINIMUM, MEDIAN, STD, MIN_MAX, VARIANCE, SUM
geemap.zonal_statistics_by_group(result, shenzhen, out_class_stats, statistics_type='SUM', scale=1000)
geemap.ee_to_csv(result, out_result)

Computing ... 
Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/tables/416be7565a87b8052fbedf9437752f8f-b8f661ee1ee8124f39dbd01e95d095ef:getFeatures
Please wait ...
Data downloaded to /Users/qingling/Downloads/class_stats.csv
The ee_object must be an ee.FeatureCollection.


# Repeat for multiple years

# NDVI
