## Unsupervised Classifications in Google Earth Engine
## GIS 4050/5050: Lab 7

Classifications are a very important part of image processing. Classifications sort pixels into groups based on their DN’s across bands, which in satellite images are typically dependent on their land use or land cover. In this lab we are going to perform some low-level machine learning to classify satellite imagery using Google Earth Engine. 

I'm working out of a colab notebook, so your script may vary slightly.

**Deliverables:**  submitted in a zipped folder to Blackboard.

*   Two tiffs. One from walkthrough, one from Your Turn
*   Word doc. with answers to integrated questions



## Part 1: Unsupervised Classification Walkthrough

In [1]:
!pip install geemap

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting geemap
  Downloading geemap-0.20.5-py2.py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.6/98.6 KB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
Collecting ipyfilechooser>=0.6.0
  Downloading ipyfilechooser-0.6.0-py3-none-any.whl (11 kB)
Collecting python-box
  Downloading python_box-7.0.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m52.5 MB/s[0m eta [36m0:00:00[0m
Collecting scooby
  Downloading scooby-0.7.1-py3-none-any.whl (16 kB)
Collecting bqplot
  Downloading bqplot-0.12.38-py2.py3-none-any.whl

# Unsupervised Classification with Google Earth Engine

## Unsupervised classification algorithms available in Earth Engine

Source: https://developers.google.com/earth-engine/clustering

The `ee.Clusterer` package handles unsupervised classification (or clustering) in Earth Engine. These algorithms are currently based on the algorithms with the same name in [Weka](http://www.cs.waikato.ac.nz/ml/weka/). More details about each Clusterer are available in the reference docs in the Code Editor.

Clusterers are used in the same manner as classifiers in Earth Engine. The general workflow for clustering is:

1. Assemble features with numeric properties in which to find clusters.
2. Instantiate a clusterer. Set its parameters if necessary.
3. Train the clusterer using the training data.
4. Apply the clusterer to an image or feature collection.
5. Label the clusters.

The training data is a `FeatureCollection` with properties that will be input to the clusterer. Unlike classifiers, there is no input class value for an `Clusterer`. Like classifiers, the data for the train and apply steps are expected to have the same number of values. When a trained clusterer is applied to an image or table, it assigns an integer cluster ID to each pixel or feature.

Here is a simple example of building and using an ee.Clusterer:


![](https://i.imgur.com/IcBapEx.png)

## Step-by-step tutorial

### Import libraries

In [2]:
import ee
import geemap

### Create an interactive map

In [3]:
Map = geemap.Map()
Map

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=F3b-iKIKP0W63HGwJtzeKPSm7VDfxmkgSdCJlNvHmME&tc=q8hdey2eGQTtLKnp6OGZTMjZr8c0WqmOybEnNrgQWx4&cc=gYRHKAtvhaG_oFmTPSl5pIxHMPslL-uT045BzL5alVw

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1AVHEtk4OwNxYZJ4dMaawDQJZO8FJvqRwMRaeFlxA1rgfy7Xm5dIXnQu6U2U

Successfully saved authorization token.


Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

### Add data to the map

In [4]:
Map = geemap.Map()

# point = ee.Geometry.Point([-122.4439, 37.7538])
point = ee.Geometry.Point([-87.7719, 41.8799])

image = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR') \
    .filterBounds(point) \
    .filterDate('2019-01-01', '2019-12-31') \
    .sort('CLOUD_COVER') \
    .first() \
    .select('B[1-7]')

vis_params = {
    'min': 0,
    'max': 3000,
    'bands': ['B5', 'B4', 'B3']
}

Map.centerObject(point, 8)
Map.addLayer(image, vis_params, "Landsat-8")
Map

Map(center=[41.8799, -87.7719], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(ch…

### Check image properties
If needed, we can get info about images using the following...

In [5]:
props = geemap.image_props(image)
props.getInfo()

{'CLOUD_COVER': 0.03,
 'CLOUD_COVER_LAND': 0.04,
 'EARTH_SUN_DISTANCE': 1.016591,
 'ESPA_VERSION': '2_23_0_1b',
 'GEOMETRIC_RMSE_MODEL': 6.348,
 'GEOMETRIC_RMSE_MODEL_X': 4.429,
 'GEOMETRIC_RMSE_MODEL_Y': 4.547,
 'IMAGE_DATE': '2019-07-12',
 'IMAGE_QUALITY_OLI': 9,
 'IMAGE_QUALITY_TIRS': 9,
 'LANDSAT_ID': 'LC08_L1TP_022031_20190712_20190719_01_T1',
 'LEVEL1_PRODUCTION_DATE': 1563565308000,
 'NOMINAL_SCALE': 30,
 'PIXEL_QA_VERSION': 'generate_pixel_qa_1.6.0',
 'SATELLITE': 'LANDSAT_8',
 'SENSING_TIME': '2019-07-12T16:28:51.3794760Z',
 'SOLAR_AZIMUTH_ANGLE': 131.949371,
 'SOLAR_ZENITH_ANGLE': 26.494972,
 'SR_APP_VERSION': 'LaSRC_1.3.0',
 'WRS_PATH': 22,
 'WRS_ROW': 31,
 'system:asset_size': '553.046839 MB',
 'system:band_names': ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7'],
 'system:id': 'LANDSAT/LC08/C01/T1_SR/LC08_022031_20190712',
 'system:index': 'LC08_022031_20190712',
 'system:time_end': '2019-07-12 16:28:51',
 'system:time_start': '2019-07-12 16:28:51',
 'system:version': 1564390084

In [6]:
props.get('IMAGE_DATE').getInfo()

'2019-07-12'

In [7]:
props.get('CLOUD_COVER').getInfo()

0.03

### Make training dataset

There are several ways you can create a region for generating the training dataset.

- Draw a shape (e.g., rectangle) on the map and the use `region = Map.user_roi`
- Define a geometry, such as `region = ee.Geometry.Rectangle([-122.6003, 37.4831, -121.8036, 37.8288])`
- Create a buffer zone around a point, such as `region = ee.Geometry.Point([-122.4439, 37.7538]).buffer(10000)`
- If you don't define a region, it will use the image footprint by default

Here, we are going to apply a random training sample because we are using an unsupervised classification.

In [8]:
# These are alternative ways of making a ROI for the training data
# region = Map.user_roi
region = ee.Geometry.Rectangle([-122.6003, 37.4831, -121.8036, 37.8288])
# region = ee.Geometry.Point([-122.4439, 37.7538]).buffer(10000)

In [9]:
Map = geemap.Map()
# Make the training dataset.
training = image.sample(**{
    #'region': region,
    'scale': 30, # seperation b/w points
    'numPixels': 5000, # # of points
    'seed': 0, # set to no seed location for random points
    'geometries': True  # Set this to False to ignore geometries
})

Map.addLayer(training, {}, 'training', True)
Map

Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

** Question 1: ** Why does an unsupervised classification need a "training" dataset? Where are the DN's coming from for the training data? What format (raster vs vector pts, lines, poly) are the training data in?

**Because it doesn't have labels for training. That means it needs a vast amount of data to learn to identify hidden patterns in the data without human interaction. The DNs come from Landsat. The format is raster.**


### Train the clusterer

In [10]:
# Instantiate the clusterer and train it.
n_clusters = 5
clusterer = ee.Clusterer.wekaKMeans(n_clusters).train(training)

### Classify the image

In [11]:
Map = geemap.Map()
# Cluster the input using the trained clusterer.
result = image.cluster(clusterer)

# # Display the clusters with random colors.
Map.addLayer(result.randomVisualizer(), {}, 'clusters')
Map

Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

### Label the clusters

In [12]:
Map = geemap.Map()
legend_keys = ['One', 'Two', 'Three', 'Four', 'etc.']
legend_colors = ['#8DD3C7', '#FFFFB3', '#BEBADA', '#FB8072', '#80B1D3']

# Reclassify the map
result = result.remap([0, 1, 2, 3, 4], [1, 2, 3, 4, 5])

Map.addLayer(result, {'min': 1, 'max': 5, 'palette': legend_colors}, 'Labelled clusters')
Map.add_legend(legend_keys=legend_keys, legend_colors=legend_colors, position='bottomright')
Map

Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

** Question 2: ** Which landcover types are represented by which values in the above result? Which classes were more successfully formed? Which classes need to be improved?

**One represents vegetation. Two represents water. Three represents lighter urbanization Four represents intense urban development. Two and four seem to line up pretty well with the imagery, but one and three could use further improvement.**

### Export the result

Export the result directly to your computer:

In [13]:
import os
out_dir = os.path.join(os.path.expanduser('~'), 'Downloads')
out_file = os.path.join(out_dir, 'cluster.tif')

In [14]:
#geemap.ee_export_image(result, filename=out_file, scale=90)

Export the result to Google Drive:

In [15]:
geemap.ee_export_image_to_drive(result, description='clusters', folder='export', scale=90)

## Part 2 - Your Turn: Masked Unsupervised Classification 
Directions:


*   Filter through the Landsat 8 (or 9) Tier 1 Surface Reflectance (as 
above) to find the least cloudy image recovered of the St. Louis area in 2019.
*   Perform an unsupervised classification and rename the classes based on their apparent Landcover
* Export the resulting TIFF
* Answer the following questions

**Question 3:**   What are some of the major landcover types you can observe in the Tier 1 imagery over St Louis? What is each colored as in your display?
 

**Question 4:**   How many clusters did you use for your classification Why?

**Question 5:** Was your classification good, bad, or in between? How did you decide?





In [37]:
Map = geemap.Map()

# point = ee.Geometry.Point([-122.4439, 37.7538])
point = ee.Geometry.Point([-90.199402, 38.627003])

image = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR') \
    .filterBounds(point) \
    .filterDate('2019-01-01', '2019-12-31') \
    .sort('CLOUD_COVER') \
    .first() \
    .select('B[1-7]')

vis_params = {
    'min': 0,
    'max': 3000,
    'bands': ['B5', 'B4', 'B3']
}

Map.centerObject(point, 8)
Map.addLayer(image, vis_params, "Landsat-8")
Map

Map(center=[38.627003, -90.199402], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBo…

In [38]:
region = Map.user_roi
Map

Map(center=[38.627003, -90.199402], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBo…

In [39]:
Map = geemap.Map()
# Make the training dataset.
training = image.sample(**{
    'region': region,
    'scale': 30, # seperation b/w points
    'numPixels': 5000, # # of points
    'seed': 0, # set to no seed location for random points
    'geometries': True  # Set this to False to ignore geometries
})

Map.addLayer(training, {}, 'training', True)
Map

Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

In [40]:
# Instantiate the clusterer and train it.
n_clusters = 5
clusterer = ee.Clusterer.wekaKMeans(n_clusters).train(training)

In [41]:
Map = geemap.Map()
# Cluster the input using the trained clusterer.
result = image.cluster(clusterer)

# # Display the clusters with random colors.
Map.addLayer(result.randomVisualizer(), {}, 'clusters')
Map

Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

In [42]:
Map = geemap.Map()
legend_keys = ['Vegetation', 'Water', 'Urban', 'etc.']
legend_colors = ['#8DD3C7', '#FFFFB3', '#BEBADA', '#FB8072', '#80B1D3']

# Reclassify the map
result = result.remap([0, 1, 2, 3, 4], [1, 2, 3, 4, 5])

Map.addLayer(result, {'min': 1, 'max': 5, 'palette': legend_colors}, 'Labelled clusters')
Map.add_legend(legend_keys=legend_keys, legend_colors=legend_colors, position='bottomright')
Map

Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

In [43]:
import os
out_dir = os.path.join(os.path.expanduser('~'), 'Downloads')
out_file = os.path.join(out_dir, 'cluster2.tif')

In [44]:
geemap.ee_export_image_to_drive(result, description='clusters2', folder='export', scale=90)

Question 3: What are some of the major landcover types you can observe in the Tier 1 imagery over St Louis? What is each colored as in your display?

**Water, Vegetation, urban sprawl, etc. Water is colored purple, vegetation is colored blue, urban sprawl is colored red/yellow, and whatever is left is colored green.**

Question 4: How many clusters did you use for your classification Why?

**I used 5. I tried using four but it caused gaps in the classifying. 5 seemed to be the sweet spot to cover everything in the image.**

Question 5: Was your classification good, bad, or in between? How did you decide?

**In between. On one hand I think the classification correctly gave the landcover types the colors they were supposed to. On the other hand, the colors it gave them did not suit the types. Water should have been the one to get blue, not purple. Vegetation should be colored green not blue. Only urban sprawl really got an appropriate color classification.**