<a href="https://colab.research.google.com/github/YoungHyunKoo/GEE_remote_sensing/blob/main/Week4/4_1_Pixel_based_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **[GEO 6083] Remote Sensing Imge Processing - Spring 2024**
# **WEEK 4-1. Pixel-based image classification**

### OBJECTIVE
(1) Implement pixel-based supervised classification

(2) Implement pixel-based unsupervised classification

(3) Assess the accuracy of classification result

Credited by Younghyun Koo (kooala317@gmail.com)





# **What is classification?**

Remote sensing images cover a large geographical area. To easily understand and derive land use and land cover information, we need to process the image interpretation and image classification. Image classification is this process of assigning land cover classes to image pixels. There are two types of basic classifications: **(1) Suprevised classification** and **(2) Unsupervised classification**.

### **(1) Suppervised Classification**
In supervised classification, you select "training" samples and let the computer classfiy your image based on those samples. These samples represent specific classes and these training samples are used as references for the classification (e.g. vegetation, tree, urban area, water, etc.) of the other pixels. These training samples can be determined by the knowledge of the user. There are a few algorithms for supervised classification, but the fundamental idea is that images are classified and allocated to the classes in which they show the greatest similarities with the training samples based on the statistics results.

### **(2) Unsupervised Classification**
Unsupervised classification generates clusters based on similar spectral characteristics inherent in the image. The computer uses various techniques to determine which pixels are related and groups them into classes. All this is done without the help of training data or prior knowledge. Since the algorithm cannot determine what class is what land cover, you have to determine the correspondences between the spectral classes that the algorithm defines.

<img src = "https://media.licdn.com/dms/image/C4E12AQGbAM-_eriLMw/article-cover_image-shrink_720_1280/0/1632916005843?e=2147483647&v=beta&t=tK9-0A3zL7uHv_CVIaRZKIiw7vqSsHXfKWg4O2_PS64" width = 700>

<img src = "https://remotesensinginactionlearningblog.files.wordpress.com/2014/11/capture.jpg?w=640" width = 700>




# **Pixel-based v. Object-based classification**
One of the traditional way to do classification from remote sensing imagery data is **pixel-based** classification. This classification is only based on pixel value itself, so each pixel is classified into a certain class. We expect a sort of *salt-and-pepper* effect from this pixel-based classification result. Instead, in order to avoid such salt-and-pepper effect, we can also use **object-based** image classification, which groups individual pixels into several objects based on the similarity between neighboring pixels. In this tutorial, first we will learn how to conduct pixel-based image classification using Google Earth Engine.

<img src = "https://www.researchgate.net/publication/271197176/figure/fig3/AS:295146110373904@1447379724740/A-comparison-between-pixel-based-and-object-based-classification-results-The-first-row.png" width = 800>


# **Practice supervised classification**

First, let's start with the pixel-based **supervised** classification.

In [None]:
# Import ee library
import ee

# Authenticate
ee.Authenticate()

# Initialize with your own project.
ee.Initialize(project = "utsa-spring2024")

In [None]:
# Import geemap library
import geemap

In [None]:
# Import geopandas and pandas library
import geopandas
import pandas

Google Earth Engine provides the `Classifier` packages for supervised classification. The `Classifier` package allows us to handle supervised classification by traditional algorithms, including CART, RandomForest, NaiveBayes, and SVM (support vector machine). Please see this link for more details: [link](https://developers.google.com/earth-engine/guides/classification). The general workflow for classification is following:

**(1) Collect training data**: Assemble features which have a property that stores the known class label and properties storing numeric values for the predictors.

**(2) Instantiate a classifier**: Set its parameters if necessary.

**(3) Train the classifier**: Use the training dataset.

**(4) Classify images**

**(5) Estimate classification error**: Use independent validation dataset.

First, let's import a Landsat imagery.

In [None]:
# Region of interest - San Antonio
point = ee.Geometry.Point([-98.47, 29.43])

image = (
    ee.ImageCollection('LANDSAT/LC08/C01/T1_TOA')
    .filterBounds(point)
    .filterDate('2016-01-01', '2016-12-31')
    .sort('CLOUD_COVER')
    .first()
    .select('B[1-7]')
)

box = ee.Geometry.Rectangle(
  [
    [-98.95, 29.02],
    [-98.05, 29.74]
  ]
)

image = image.clip(box)

vis_params = {'min': 0, 'max': 0.3, 'bands': ['B4', 'B3', 'B2'], 'alpha': 1.0}

Map = geemap.Map()

Map.centerObject(image, 10)
Map.addLayer(image, vis_params, "Landsat-8")

Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
fc_params = {'min': 0, 'max': 0.3, 'bands': ['B5', 'B4', 'B3'], 'alpha': 1.0}

Map.centerObject(image, 10)
Map.addLayer(image, fc_params, "Landsat-8 False color")

Map

Map(bottom=109032.0, center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['positio…

## Collect trainig samples by yourself

Now, you will get points of different land types for the surface classification. In this tutorial, we will assume there are 5 surface types in this area: (1) Quarry & Concrete, (2) Water, (3) Tree/Forest, (4) Grass, and (5) Bare soil.

(1) Quarry & Concrete

In general, quarry & concrete structures should look very bright and white in the true color image.

In [None]:
Map = geemap.Map()

Map.centerObject(image, 10)
Map.addLayer(image, vis_params, "Landsat-8")

Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
roi = ee.FeatureCollection(Map.draw_features)
concrete = geemap.ee_to_gdf(roi)
concrete['label'] = "concrete"
concrete['id'] = 0
concrete

Unnamed: 0,geometry,label,id
0,POINT (-98.56577 29.62167),concrete,0
1,POINT (-98.57846 29.60943),concrete,0
2,POINT (-98.56714 29.61406),concrete,0
3,POINT (-98.37045 29.61809),concrete,0
4,POINT (-98.36650 29.62913),concrete,0
5,POINT (-98.28842 29.64554),concrete,0
6,POINT (-98.28430 29.64107),concrete,0
7,POINT (-98.19980 29.69163),concrete,0
8,POINT (-98.28915 29.60025),concrete,0
9,POINT (-98.29593 29.60689),concrete,0


(2) Water

Let's digitize waterbodies. There are three large lakes near San Antonio.

In [None]:
Map = geemap.Map()

Map.centerObject(image, 10)
Map.addLayer(image, vis_params, "Landsat-8")

Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
roi = ee.FeatureCollection(Map.draw_features)
water = geemap.ee_to_gdf(roi)
water['label'] = "water"
water['id'] = 1
water

Unnamed: 0,geometry,label,id
0,POINT (-98.30225 29.32607),water,1
1,POINT (-98.30800 29.31035),water,1
2,POINT (-98.31066 29.30010),water,1
3,POINT (-98.31958 29.28894),water,1
4,POINT (-98.31701 29.28288),water,1
5,POINT (-98.30585 29.28939),water,1
6,POINT (-98.38523 29.25039),water,1
7,POINT (-98.37982 29.24657),water,1
8,POINT (-98.37039 29.24529),water,1
9,POINT (-98.37219 29.25548),water,1


(3) Tree / forest

Next, we will find tree or forest land cover.

In [None]:
Map = geemap.Map()

Map.centerObject(image, 10)
Map.addLayer(image, vis_params, "Landsat-8")

Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
roi = ee.FeatureCollection(Map.draw_features)
tree = geemap.ee_to_gdf(roi)
tree['label'] = "tree"
tree['id'] = 2
tree

Unnamed: 0,geometry,label,id
0,POINT (-98.77159 29.56420),tree,2
1,POINT (-98.76024 29.57659),tree,2
2,POINT (-98.75956 29.56943),tree,2
3,POINT (-98.77569 29.55479),tree,2
4,POINT (-98.75166 29.56106),tree,2
5,POINT (-98.74960 29.58555),tree,2
6,POINT (-98.53718 29.66643),tree,2
7,POINT (-98.54611 29.66136),tree,2
8,POINT (-98.36478 29.63846),tree,2
9,POINT (-98.36993 29.64002),tree,2


(4) Grass

Next, we will define grass land covers.

In [None]:
Map = geemap.Map()

Map.centerObject(image, 10)
Map.addLayer(image, vis_params, "Landsat-8")

Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
roi = ee.FeatureCollection(Map.draw_features)
grass = geemap.ee_to_gdf(roi)
grass['label'] = "grass"
grass['id'] = 3
grass

Unnamed: 0,geometry,label,id
0,POINT (-98.31890 29.38876),grass,3
1,POINT (-98.32575 29.34575),grass,3
2,POINT (-98.38213 29.34537),grass,3
3,POINT (-98.45756 29.29635),grass,3
4,POINT (-98.43027 29.26933),grass,3
5,POINT (-98.33261 29.21563),grass,3
6,POINT (-98.30875 29.20042),grass,3


(5) Bare soil

Finally, we will define bare soil land covers. Or you can think this land cover as cropland or any other types.

In [None]:
Map = geemap.Map()

Map.centerObject(image, 10)
Map.addLayer(image, vis_params, "Landsat-8")

Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
roi = ee.FeatureCollection(Map.draw_features)
soil = geemap.ee_to_gdf(roi)
soil['label'] = "soil"
soil['id'] = 4
soil

Unnamed: 0,geometry,label,id
0,POINT (-98.21986 29.26985),soil,4
1,POINT (-98.20819 29.26716),soil,4
2,POINT (-98.21420 29.26521),soil,4
3,POINT (-98.21076 29.26334),soil,4
4,POINT (-98.16280 29.25398),soil,4
5,POINT (-98.15963 29.26109),soil,4
6,POINT (-98.15465 29.24829),soil,4
7,POINT (-98.11404 29.30736),soil,4
8,POINT (-98.12210 29.30691),soil,4


Now, let's collect all of the training samples together. We will store these samples as a geodataframe of `geopandas`.

In [None]:
gdf = concrete.append(water).append(tree).append(grass).append(soil).reset_index(drop = True)
gdf["latitude"] = gdf.geometry.y
gdf["longitude"] = gdf.geometry.x
gdf

Unnamed: 0,geometry,label,id,latitude,longitude
0,POINT (-98.56577 29.62167),concrete,0,29.621669,-98.565766
1,POINT (-98.57846 29.60943),concrete,0,29.609431,-98.578465
2,POINT (-98.56714 29.61406),concrete,0,29.614058,-98.567139
3,POINT (-98.37045 29.61809),concrete,0,29.618087,-98.370448
4,POINT (-98.36650 29.62913),concrete,0,29.62913,-98.366501
5,POINT (-98.28842 29.64554),concrete,0,29.645542,-98.288421
6,POINT (-98.28430 29.64107),concrete,0,29.641066,-98.284303
7,POINT (-98.19980 29.69163),concrete,0,29.691632,-98.199798
8,POINT (-98.28915 29.60025),concrete,0,29.600252,-98.289147
9,POINT (-98.29593 29.60689),concrete,0,29.606894,-98.295926


In [None]:
# Convert geodataframe into ee feature collection
training = geemap.gdf_to_ee(gdf)

In [None]:
# Check the size of the training samples
training.size().getInfo()

59

## Train classifier

Now the training dataset is ready! We will train a CART (Classification and Regression Trees) classifier using this training dataset. CART is a decision tree algorithm that is used for both classification and regression tasks. This algorithm is provided as a built-in function of Google Earth Engine, so you can easily import and use this function. You can find more details about this algorithm here:
- ["Classification and Regression Trees" by Leo Breiman](https://www.taylorfrancis.com/books/mono/10.1201/9781315139470/classification-regression-trees-leo-breiman-jerome-friedman-olshen-charles-stone)
- [ee.Classifier.smileCart](https://developers.google.com/earth-engine/apidocs/ee-classifier-smilecart)

In [None]:
# Use these bands for prediction.
bands = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7']

# This property of the table stores the land cover labels.
label = 'id'

# Overlay the points on the imagery to get training.
sample = image.select(bands).sampleRegions(
    **{'collection': training, 'properties': [label], 'scale': 30}
)

# Train a CART classifier with default parameters.
classifier = ee.Classifier.smileCart().train(sample, label, bands)

## Apply classifier to the entire image

The CART classifier is trained with the training samples. Now let's apply this trained classifier to the entire image pixels.

In [None]:
# Classify the image with the same bands used for training.
result = image.select(bands).classify(classifier)

Map = geemap.Map()

Map.centerObject(image, 10)

# Original image
Map.addLayer(image, vis_params, "Landsat-8")

# Display the clusters with random colors.
# Map.addLayer(result.randomVisualizer(), {}, 'classified')
Map.addLayer(result, {'palette': ['red', 'blue', 'green', 'cyan', 'yellow'], 'min':0, 'max':4}, 'classified')

Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

### Collect training sampels from external ground truth data sources

During the previous steps, we defined our own training samples by manually digitize 5 different land covers. However, if we have any ground truth data, we can just import this data and use it as training samples. In this example, we will use the [USGS Nataional Land cover databased (NLCD)](https://https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD) to create labeled dataset for training.

In [None]:
# Visualize this land cover dataset
nlcd = ee.Image('USGS/NLCD/NLCD2016').select('landcover').clip(image.geometry())

Map = geemap.Map()

Map.centerObject(image, 10)
Map.addLayer(nlcd, {}, 'NLCD')
Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
# Randomly sample the training points from the NLCD land cover data

points = nlcd.sample(
    **{
        'region': image.geometry(),
        'scale': 30,
        'numPixels': 1000,
        'seed': 0,
        'geometries': True,  # Set this to False to ignore geometries
    }
)

Map.addLayer(points, {}, 'training')
Map

Map(bottom=108952.0, center=[29.406105055709293, -98.46840967089531], controls=(WidgetControl(options=['positi…

In [None]:
# Number of training sample points
print(points.size().getInfo())

1000


In [None]:
# Information of the first sample point
print(points.first().getInfo())

{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': [-98.5992888771733, 29.571684216951827]}, 'id': '0', 'properties': {'landcover': 23}}


Now we will train the classifier using these training samples. Here we are gonna use the same CART classifier.

In [None]:
# Use these bands for prediction.
bands = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7']

# This property of the table stores the land cover labels.
label = 'landcover'

# Overlay the points on the imagery to get training.
sample = image.select(bands).sampleRegions(
    **{'collection': points, 'properties': [label], 'scale': 30}
)

# Train a CART classifier with default parameters.
classifier = ee.Classifier.smileCart().train(sample, label, bands)

In [None]:
# Classify the image with the same bands used for training.
result = image.select(bands).classify(classifier)

Map = geemap.Map()

Map.centerObject(image, 10)

# # Display the clusters with random colors.
Map.addLayer(result.randomVisualizer(), {}, 'classified')
Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
# Use the symbology style of the NLCD database to the classified image.
Map = geemap.Map()

class_values = nlcd.get('landcover_class_values').getInfo()
class_values

class_palette = nlcd.get('landcover_class_palette').getInfo()
class_palette

landcover = result.set('classification_class_values', class_values)
landcover = landcover.set('classification_class_palette', class_palette)

Map.addLayer(image, vis_params, "VIS")
Map.addLayer(landcover, {}, 'Land cover')
Map.add_legend(builtin_legend='NLCD')
Map.centerObject(image, 10)
Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
# Visualize and comapre the classificaiton result and the reference data
print('Change layer opacity:')
cluster_layer = Map.layers[-1]
cluster_layer.interact(opacity=(0, 1, 0.1))

Change layer opacity:


Box(children=(FloatSlider(value=1.0, description='opacity', max=1.0),))

**[Question]**

Please compare the classified image and NLCD reference land cover. How do they look like? Do they have similar land covers?

***DO IT YOURSELF!!***
- Please try another classifier as the training model. (e.g., SVM, random forest)
- SVM: [how to use](https://developers.google.com/earth-engine/apidocs/ee-classifier-libsvm)
- Random forest: [how to use](https://developers.google.com/earth-engine/apidocs/ee-classifier-smilerandomforest)

In [None]:
# Use SVM


In [None]:
# Use Random forest


##**Accuracy assessment of the classifier**

Now we will assess the accuracy of the classifier. For assessment of the classifier, we need to divide all the samples into two datasets: training and test. We will assess the training accuracy with the trainig dataset and test accuracy with the test dataset.

In [None]:
# Adds a column of deterministic pseudorandom numbers.
sample = sample.randomColumn()

split = 0.7

training = sample.filter(ee.Filter.lt('random', split)) # Traing data set: 70 % of the samples
test = sample.filter(ee.Filter.gte('random', split)) # Test data set: 30 % of the samples

# Train a CART classifier with default parameters.
classifier = ee.Classifier.smileCart().train(training, label, bands)

In [None]:
# train accuracy
train_accuracy = classifier.confusionMatrix()
train_accuracy.accuracy().getInfo()

1

In [None]:
# test accuracy
tested = test.classify(classifier)
test_accuracy = tested.errorMatrix('landcover', 'classification')

test_accuracy.accuracy().getInfo()

0.35313531353135313

# **Practice unsupervised Classification**

While the `Classifier` package handles the supervised classification problems, the `Clusterer` package handles unsupervised classification (or clustering) in GEE.

Clusterers are used in the same manner as classifiers in Earth Engine. The general workflow for clustering is:

**(1) Assemble features** with numeric properties in which to find clusters.

**(2) Instantiate a clusterer**: Set its parameters if necessary.

**(3) Train the clusterer**: Use the training data.

**(4) Apply the clusterer** to an image or feature collection.

**(5) Label the clusters**


Unlike `Classifier`, no input class value is required for an `Clusterer`. Once a `Clusterer` trained, it can be applied to an image or table. It assigns an unique integer cluster ID to each pixel or feature.

In [None]:
Map = geemap.Map()

Map.centerObject(image, 10)
Map.addLayer(image, vis_params, "Landsat-8")

Map

Map(center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['position', 'transparent_…

In [None]:
# Check image date
ee.Date(image.get('system:time_start')).format('YYYY-MM-dd').getInfo()

'2016-01-28'

In [None]:
# Check cloud covers
image.get('CLOUD_COVER').getInfo()

0.07000000029802322

As similar to the previous supervised classification practice, we will sample a few trainig sites.

In [None]:
# Make the training dataset.
training = image.sample(
    **{
        #     'region': region,
        'scale': 30,
        'numPixels': 10000,
        'seed': 0,
        'geometries': True,  # Set this to False to ignore geometries
    }
)

Map.addLayer(training, {}, 'training')
Map

Map(bottom=108974.0, center=[29.38033042325311, -98.50000060238762], controls=(WidgetControl(options=['positio…

We will use K-means cluster algorithm, which is the most popular unsupervised algorithm in remote sensing.



In [None]:
# Instantiate the clusterer and train it.
n_clusters = 8
clusterer = ee.Clusterer.wekaKMeans(n_clusters).train(training)

In [None]:
# Cluster the input using the trained clusterer.
result = image.cluster(clusterer)

# Display the clusters with random colors.
Map.addLayer(result.randomVisualizer(), {}, 'clusters')
Map

Map(bottom=217509.0, center=[29.46291618351984, -98.34390960213777], controls=(WidgetControl(options=['positio…

In [None]:
nlcd = ee.Image('USGS/NLCD/NLCD2016').select('landcover').clip(image.geometry())
Map.addLayer(nlcd, {}, 'NLCD')
Map

Map(bottom=108954.0, center=[29.40371231103247, -98.28170629290419], controls=(WidgetControl(options=['positio…

In [None]:
print('Change layer opacity:')
cluster_layer = Map.layers[-1]
cluster_layer.interact(opacity=(0, 1, 0.1))

Change layer opacity:


Box(children=(FloatSlider(value=1.0, description='opacity', max=1.0),))

**[Question]**

- Please compare the clusters from the K-means algorithm and NLCD reference land cover. How do they look like? Do they have similar land covers?
- In your opinion, what land cover is represented by each cluster?

***DO IT YOURSELF!!***
- In the previous example, we set the number of clusters as 5, so the result only has 5 clusters (classes). However, you can also change the number of clusters. Please change the number of clusters and see how the result is changed. When you compare the result with the reference NLCD data, how many cluster is the best for the classification?

## References
- https://geemap.org/tutorials/#geemap-tutorials
- https://developers.google.com/earth-engine/guides/classification
- https://developers.google.com/earth-engine/guides/clustering
- https://developers.google.com/earth-engine/apidocs/ee-classifier-smilecart