In [1]:
import geemap
import ee

In [2]:
ee.Initialize(project='useful-theory-442820-q8')

*** Earth Engine *** Share your feedback by taking our Annual Developer Satisfaction Survey: https://google.qualtrics.com/jfe/form/SV_7TDKVSyKvBdmMqW?ref=4i2o6


# Similarity Search with Alpha Earth Satellite Embeddings

This notebook demonstrates how to use Google's Satellite Embedding dataset to find similar objects/features using Earth Engine. We'll implement a similarity search that can identify objects of interest (e.g., grain silos) by comparing embedding vectors.

## Key Concepts:
- **Satellite Embeddings**: High-dimensional vector representations of satellite imagery patches
- **Similarity Search**: Using dot product between embedding vectors to find similar locations
- **Reference Locations**: Sample points representing the target features we want to find

## Workflow:
1. Define a search region and reference locations
2. Load and process the Satellite Embedding dataset
3. Extract embedding vectors from reference locations
4. Calculate similarity scores across the region
5. Apply threshold and extract matches

In [3]:
# Additional imports for data manipulation and visualization
import pandas as pd
import numpy as np

## 1. Select the Search Region

We'll use Franklin County, Kansas as our search area for finding grain silos. This region has many agricultural facilities with grain storage structures.

In [4]:
# Load US counties dataset and select Franklin County, Kansas
counties = ee.FeatureCollection('TIGER/2018/Counties')

# Select Franklin County, Kansas (GEOID: 20059)
selected_county = counties.filter(ee.Filter.eq('GEOID', '20059'))
geometry = selected_county.geometry()

# Create a map centered on the region
Map = geemap.Map(center=[38.5, -95.0], zoom=10)
Map.add_layer(geometry, {'color': 'red'}, 'Search Area')

print("Search region: Franklin County, Kansas")
print(f"Region area: {geometry.area().divide(1e6).getInfo():.1f} sq km")

Map

Search region: Franklin County, Kansas
Region area: 1493.7 sq km


Map(center=[38.5, -95.0], controls=(WidgetControl(options=['position', 'transparent_bg'], position='topright',…

## 2. Define Reference Locations

Here we define sample locations of grain silos that will serve as our reference points for similarity search. In practice, you would identify these by examining high-resolution satellite imagery.

In [5]:
# Define sample locations of grain silos within Franklin County, Kansas
# These coordinates represent known grain silo locations
sample_coordinates = [
    [-95.18, 38.52],  # Example grain silo location 1
    [-95.25, 38.48],  # Example grain silo location 2  
    [-95.12, 38.55]   # Example grain silo location 3
]

# Create Earth Engine points from coordinates
sample_points = [ee.Geometry.Point(coord) for coord in sample_coordinates]

# Create a FeatureCollection from the sample points
samples = ee.FeatureCollection([ee.Feature(point) for point in sample_points])

print(f"Created {samples.size().getInfo()} reference sample points")

# Add sample points to the map
Map.add_layer(samples, {'color': 'yellow'}, 'Reference Locations')
Map

Created 3 reference sample points


Map(bottom=100953.0, center=[38.5, -95.0], controls=(WidgetControl(options=['position', 'transparent_bg'], pos…

## 3. Load and Process Satellite Embedding Dataset

We'll load the Google Satellite Embedding dataset, filter for our time period, and create a mosaic for analysis.

In [6]:
# Set time period for analysis
year = 2024
start_date = ee.Date.fromYMD(year, 1, 1)
end_date = start_date.advance(1, 'year')

print(f"Analysis period: {year}")
print(f"Date range: {start_date.format().getInfo()} to {end_date.format().getInfo()}")

# Load the Satellite Embedding dataset
embeddings = ee.ImageCollection('GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL')

# Filter and create mosaic
mosaic = embeddings.filter(ee.Filter.date(start_date, end_date)).mosaic()

print(f"Loaded embedding mosaic with {mosaic.bandNames().length().getInfo()} bands")
print(f"Band names: {mosaic.bandNames().getInfo()[:5]}... (showing first 5)")

# Check the resolution and projection
projection = mosaic.projection()
print(f"Native resolution: {projection.nominalScale().getInfo()} meters")
print(f"CRS: {projection.crs().getInfo()}")

Analysis period: 2024
Date range: 2024-01-01T00:00:00 to 2025-01-01T00:00:00
Loaded embedding mosaic with 64 bands
Date range: 2024-01-01T00:00:00 to 2025-01-01T00:00:00
Loaded embedding mosaic with 64 bands
Band names: ['A00', 'A01', 'A02', 'A03', 'A04']... (showing first 5)
Native resolution: 111319.49079327357 meters
Band names: ['A00', 'A01', 'A02', 'A03', 'A04']... (showing first 5)
Native resolution: 111319.49079327357 meters
CRS: EPSG:4326
CRS: EPSG:4326


## 4. Extract Embedding Vectors from Reference Locations

We sample the embedding mosaic at our reference locations to get the embedding vectors that represent grain silos.

In [7]:
# Choose the scale for sampling
# Use native resolution (10m) for small objects like grain silos
scale = 10

# Extract the embedding vector from the sample locations
sample_embeddings = mosaic.sampleRegions(
    collection=samples,
    scale=scale,
    geometries=True
)

print(f"Extracted embeddings from {sample_embeddings.size().getInfo()} sample points")
print(f"Sampling scale: {scale} meters")

# Get information about the extracted embeddings
first_sample = sample_embeddings.first()
properties = first_sample.propertyNames()
print(f"Properties per sample: {properties.size().getInfo()}")

# Display first few embedding values as example
sample_dict = first_sample.getInfo()
embedding_keys = [k for k in sample_dict['properties'].keys() if k.startswith('b')][:5]
print(f"First 5 embedding bands: {embedding_keys}")
print(f"Example values: {[sample_dict['properties'][k] for k in embedding_keys]}")

Extracted embeddings from 3 sample points
Sampling scale: 10 meters
Properties per sample: 65
First 5 embedding bands: []
Example values: []
Properties per sample: 65
First 5 embedding bands: []
Example values: []


## 5. Calculate Similarity Scores

We compute the dot product between each reference embedding vector and all pixels in the mosaic to find similar locations. The dot product measures the cosine similarity between unit-length vectors.

In [8]:
# Get band names for the embedding vectors
band_names = mosaic.bandNames()

# Compute dot product between reference embeddings and all pixels
def compute_similarity(sample_feature):
    """Compute dot product similarity for a single reference sample."""
    # Convert feature properties to an array image
    array_image = ee.Image(sample_feature.toArray(band_names)).arrayFlatten([band_names])
    
    # Compute dot product with mosaic
    dot_product = array_image.multiply(mosaic).reduce('sum').rename('similarity')
    
    return dot_product

# Calculate similarity for each reference location
sample_distances = ee.ImageCollection(sample_embeddings.map(compute_similarity))

# Calculate mean similarity across all reference locations
mean_similarity = sample_distances.mean()

print("Computed similarity scores across the region")
print(f"Similarity values range from -1 (opposite) to +1 (identical)")

# Get some statistics about the similarity image
stats = mean_similarity.reduceRegion(
    reducer=ee.Reducer.minMax().combine(ee.Reducer.mean(), '', True),
    geometry=geometry,
    scale=scale * 4,  # Use coarser scale for stats to avoid timeout
    maxPixels=1e8
)

print(f"Similarity statistics: {stats.getInfo()}")

Computed similarity scores across the region
Similarity values range from -1 (opposite) to +1 (identical)
Similarity statistics: {'similarity_max': 0.8559548783448316, 'similarity_mean': 0.6079015803545534, 'similarity_min': 0.13356934630619255}
Similarity statistics: {'similarity_max': 0.8559548783448316, 'similarity_mean': 0.6079015803545534, 'similarity_min': 0.13356934630619255}


In [9]:
# Visualize the similarity scores
palette = ['000004', '2C105C', '711F81', 'B63679', 'EE605E', 'FDAE78', 'FCFDBF', 'FFFFFF']
similarity_vis = {'palette': palette, 'min': 0, 'max': 1}

Map.add_layer(mean_similarity.clip(geometry), similarity_vis, 
              'Similarity Scores (bright = similar)', False)

print("Added similarity layer to map (toggle to view)")
Map

Added similarity layer to map (toggle to view)


Map(bottom=100953.0, center=[38.5, -95.0], controls=(WidgetControl(options=['position', 'transparent_bg'], pos…

## 6. Extract Location Matches

We apply a threshold to identify pixels with high similarity scores and convert them to vector features representing potential grain silo locations.

In [10]:
# Apply a threshold to find similar pixels
threshold = 0.90
print(f"Applying similarity threshold: {threshold}")

similar_pixels = mean_similarity.gt(threshold)

# Convert to polygons and extract centroids
# This operation can be computationally intensive for large areas
try:
    print("Converting similar pixels to vector features...")
    
    # Vectorize the similar pixels
    polygons = similar_pixels.selfMask().reduceToVectors(
        scale=scale,
        eightConnected=False,
        maxPixels=1e9,
        geometry=geometry
    )
    
    # Extract centroids as point locations
    predicted_matches = polygons.map(lambda f: f.centroid(maxError=1))
    
    match_count = predicted_matches.size().getInfo()
    print(f"Found {match_count} potential matches")
    
    # Add matches to map
    Map.add_layer(predicted_matches, {'color': 'cyan'}, 'Predicted Matches')
    
except Exception as e:
    print(f"Vectorization failed (common with large areas): {e}")
    print("Consider using a smaller region or exporting results as an asset")
    
    # Alternative: just show the thresholded pixels
    Map.add_layer(similar_pixels.selfMask().clip(geometry), 
                  {'palette': ['cyan']}, 'High Similarity Pixels')
    
Map

Applying similarity threshold: 0.9
Converting similar pixels to vector features...
Found 0 potential matches
Found 0 potential matches


Map(bottom=100953.0, center=[38.5, -95.0], controls=(WidgetControl(options=['position', 'transparent_bg'], pos…

## 7. Analysis and Results

Let's analyze the results and provide some tools for further investigation.

In [11]:
# Function to experiment with different thresholds
def test_threshold(threshold_value):
    """Test different similarity thresholds."""
    test_pixels = mean_similarity.gt(threshold_value)
    
    # Count pixels above threshold
    pixel_count = test_pixels.reduceRegion(
        reducer=ee.Reducer.sum(),
        geometry=geometry,
        scale=scale * 4,
        maxPixels=1e8
    ).getInfo()
    
    return pixel_count.get('similarity', 0)

# Test different threshold values
thresholds = [0.85, 0.90, 0.95, 0.98]
print("Threshold sensitivity analysis:")
for thresh in thresholds:
    count = test_threshold(thresh)
    print(f"  Threshold {thresh}: {count} pixels")

print(f"\nCurrent threshold ({threshold}) seems reasonable for this analysis.")
print("Higher thresholds = fewer but more precise matches")
print("Lower thresholds = more matches but more false positives")

Threshold sensitivity analysis:
  Threshold 0.85: 1 pixels
  Threshold 0.85: 1 pixels
  Threshold 0.9: 0 pixels
  Threshold 0.9: 0 pixels
  Threshold 0.95: 0 pixels
  Threshold 0.95: 0 pixels
  Threshold 0.98: 0 pixels

Current threshold (0.9) seems reasonable for this analysis.
Higher thresholds = fewer but more precise matches
Lower thresholds = more matches but more false positives
  Threshold 0.98: 0 pixels

Current threshold (0.9) seems reasonable for this analysis.
Higher thresholds = fewer but more precise matches
Lower thresholds = more matches but more false positives


In [12]:
# Optional: Export results to Google Drive or Asset
# Uncomment and modify the following code to export results

"""
# Export similarity image to Drive
task = ee.batch.Export.image.toDrive(
    image=mean_similarity.clip(geometry),
    description='grain_silo_similarity',
    scale=scale,
    region=geometry,
    maxPixels=1e9
)
task.start()
print("Started export task for similarity image")

# Export high similarity areas as asset (if vectorization succeeded)
if 'predicted_matches' in locals():
    task = ee.batch.Export.table.toAsset(
        collection=predicted_matches,
        description='grain_silo_matches',
        assetId='users/your-username/grain_silo_matches'  # Update this path
    )
    task.start()
    print("Started export task for predicted matches")
"""

print("Export code is commented out - uncomment and modify paths as needed")

Export code is commented out - uncomment and modify paths as needed


## Summary and Next Steps

This notebook demonstrated how to use Google's Satellite Embedding dataset for similarity search to identify objects of interest (grain silos) in satellite imagery.

### Key Results:
- Loaded and processed satellite embedding data for Franklin County, Kansas
- Extracted embedding vectors from reference grain silo locations
- Computed similarity scores across the entire region using dot product
- Applied thresholds to identify potential matches
- Visualized results on an interactive map

### What we learned:
1. **Satellite Embeddings** encode semantic information about land cover and objects
2. **Dot Product Similarity** effectively identifies similar features in the embedding space
3. **Threshold Selection** is critical - higher thresholds reduce false positives but may miss matches
4. **Scale Selection** should match the target object size (10m for small structures)

### Potential Improvements:
- Use multiple reference examples from different locations/contexts
- Experiment with different scales for different object sizes
- Apply post-processing filters (e.g., size, shape constraints)
- Validate results with ground truth data
- Use spatial clustering to reduce duplicate detections

### Applications:
- Infrastructure mapping (grain silos, solar panels, buildings)
- Agricultural asset inventory
- Environmental monitoring (identifying similar habitats)
- Disaster response (finding similar damage patterns)