# Unsupervised Classification with Satellite Embedding Dataset



**Author**: Zhanchao Yang <br>
Weitzman School of Design, University of Pennsylvania

This tutorial is adapted from the official Google Earth Engine embedding tutorial: https://developers.google.com/earth-engine/tutorials/community/satellite-embedding-02-unsupervised-classification

In this tutorial, we will take an unsupervised classification approach to crop mapping that enables us to perform this complex task without relying on field labels. This method leverages local knowledge of the region along with aggregate crop statistics, which are readily available for many parts of the world. The study area for this tutorial is the state of Pennsylvania in the United States, which has a diverse mix of crops including corn, soybeans, and wheat.


In [None]:
import ee
import geemap

In [None]:
ee.Authenticate()
ee.Initialize(project="ee-zhanchaoyang")

## Defined study area

Lancaster County in Pennsylvania is one of the most productive agricultural counties in the United States. It is known for its fertile soil and favorable climate, which support a wide variety of crops. The county is particularly famous for its corn and soybean production, which are the two main crops grown in the area. In addition to these staple crops, Lancaster County also produces wheat, barley, oats, and various fruits and vegetables. The county's agricultural landscape is characterized by a mix of small family farms and larger commercial operations, contributing to its reputation as a leading agricultural region.


In [None]:
counties = ee.FeatureCollection("TIGER/2018/Counties")

In [None]:
lancaster = counties.filter(ee.Filter.eq("GEOID", "42071")).geometry()

In [None]:
m = geemap.Map(center=[40.04, -76.30], zoom=9)
m.addLayer(lancaster, {}, "Lancaster County")
m

## Loading satellite embedding and training dataset

In [None]:
embedding = ee.ImageCollection("GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL")

In [None]:
year = 2022
startdate = ee.Date.fromYMD(year, 1, 1)
enddate = ee.Date.fromYMD(year + 1, 1, 1)

In [None]:
study_embeddings = embedding.filter(ee.Filter.date(startdate, enddate)).filter(
    ee.Filter.bounds(lancaster)
);

In [None]:
embeddingsImage = study_embeddings.mosaic()

In [None]:
cdl = (
    ee.ImageCollection("USDA/NASS/CDL")
    .filter(ee.Filter.date("2022-01-01", "2023-01-01"))
    .first()
)
cropland = cdl.select("cropland")
cropland_mask = cdl.select("cultivated").eq(2).rename("cropmask")

In [None]:
map = geemap.Map(center=[40.04, -76.30], zoom=9)
m.addLayer(
    cropland_mask.clip(lancaster),
    {"min": 0, "max": 1, "palette": ["white", "green"]},
    "Cropland Mask",
)
m

In [None]:
cluster_image = embeddingsImage.updateMask(cropland_mask).addBands(cropland_mask)

In [None]:
training = cluster_image.stratifiedSample(
    numPoints=1000,
    classBand="cropmask",
    region=lancaster,
    scale=10,
    tileScale=16,
    seed=100,
    dropNulls=True,
    geometries=True,
)

In [None]:
m.addLayer(training.style(**{"color": "red", "pointSize": 3}), {}, "Training Points")
m

In [None]:
mincluster = 4
maxcluster = 5

In [None]:
clusterer = ee.Clusterer.wekaCascadeKMeans(
    minClusters=mincluster,
    maxClusters=maxcluster,
).train(features=training, inputProperties=cluster_image.bandNames())

clustered = cluster_image.cluster(clusterer)

In [None]:
vis = clustered.randomVisualizer().clip(lancaster)
m.addLayer(vis, {}, "Clustered Image")
m

In [None]:
area_image = ee.Image.pixelArea().divide(4046.86).addBands(clustered)

In [None]:
areas = area_image.reduceRegion(
    reducer=ee.Reducer.sum().group(
        groupField=1,
        groupName="cluster",
    ),
    geometry=lancaster,
    scale=10,
    maxPixels=1e10,
)

In [None]:
print(areas.getInfo())

In [None]:
cluster_areas = ee.List(areas.get("groups"))

In [None]:
clusterAreas = ee.List(cluster_areas)

In [None]:
def to_feature(item):
    d = ee.Dictionary(item)
    return ee.Feature(
        None, {"cluster": d.getNumber("cluster").format(), "area": d.getNumber("sum")}
    )


cluster_area_fc = ee.FeatureCollection(cluster_areas.map(to_feature))

In [None]:
print(cluster_area_fc.limit(10).getInfo())

## Validating classification results 

- Corn for grain 95,549 + 35,988 = 131537; Prediction=119071
- Forage (hay/haylage), all 65,142 (others)
- Soybeans for beans 51,695
- Wheat for grain, all 24,101

In [None]:
cdl = (
    ee.ImageCollection("USDA/NASS/CDL")
    .filter(ee.Filter.date("2022-01-01", "2023-01-01"))
    .first()
)
cropland = cdl.select("cropland")
cropmap = cropland.updateMask(cropland_mask).rename("crops")

In [None]:
cropclasses = ee.List.sequence(0, 254)

In [None]:
targetclasses = ee.List.repeat(0, 255).set(1, 1).set(5, 2)

In [None]:
cropmapreclass = cropmap.remap(cropclasses, targetclasses).rename("crops")

In [None]:
crop_vis = {"min": 0, "max": 2, "palette": ["#bdbdbd", "#ffd400", "#267300"]}
m.addLayer(cropmapreclass.clip(lancaster), crop_vis, "Reclassified Crop Map")
m