<a href="https://colab.research.google.com/github/chqzeng/WaterSatOnCloud/blob/main/Tool1%20-%20GEE%20S2%20Matchup%20Extraction/Tool1_GEE_S2_Matchup_Extraction_Level_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tool1 - GEE S2 Matchup Extraction - Level 2

GEE S2 satellite data extraction (Level 2), with user-defined locations and time (lat, lon, datetime field in a .csv)

Note that GEE does not have a complete archieve of Sentinel-2 level-2 data, Dataset availability is from 2017-03-28 to present.

This script finds the median pixel values within a 100m radius from the user-defined locations. Only cloud-free pixels are included. Land pixels are filtered out using an NDWI mask, this can be adjusted.

In [1]:
# Load GEE API
import ee
ee.Authenticate()
ee.Initialize()

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=AQIfnu_61C9nImrPXThnOzdvYSYxgF62ROaJG2Srax4&tc=WLuvhhpFQwSIAdvWA-aPvQ4bsmIwKmLf9FunEPl7FGA&cc=70jksIKFDRxprBHE8xaqbnl4mtXvJeTvKQtoM_QEkGM

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1AZEOvhU0CyZVx5CroxaQ7BbrCjeM1RYvssdbeDw5QufXtuuJtPMj9UD4pFI

Successfully saved authorization token.


In [2]:
# Load other libraries
from datetime import timedelta
import numpy as np
import pandas as pd

In [3]:
# Get our date range to search, and format correctly for query
# By default, we only look at images within 15 days BEFORE the in-situ data collection, according to the contest requirement. This can be adjusted.
def get_date_range(date, time_buffer_days=15):
    """Get a date range to search for in the planetary computer based
    on a sample's date. The time range will include the sample date
    and time_buffer_days days prior

    Returns a string"""
    datetime_format = "%Y-%m-%d"
    range_start = pd.to_datetime(date) - timedelta(days=time_buffer_days)
    return [range_start.strftime(datetime_format),pd.to_datetime(date).strftime(datetime_format)]

In [4]:
# Simulated data - you can replace this with real data

# create data, locations and time from the provided training dataset in the contest
data = [['A', 39.474744, -86.898353, '2021-08-23'],
  ['B', 35.980000, -78.839410, '2021-08-16'],
  ['C', 38.04947, -99.827, '2019-07-23']]

# create the pandas DataFrame
df = pd.DataFrame(data, columns=['sample', 'latitude','longitude','date'])

# print dataframe
df

Unnamed: 0,sample,latitude,longitude,date
0,A,39.474744,-86.898353,2021-08-23
1,B,35.98,-78.83941,2021-08-16
2,C,38.04947,-99.827,2019-07-23


In [5]:
# S2 surface reflectance in GEE
S2_data = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')

In [6]:
# Create a dataframe to store results
output_matchups = pd.DataFrame()

# Loop through the rows in the dataframe
for i in range(len(df)):

    row = df.iloc[i]
    print('\n========== Row: ' +str(i))
    print('\n' + str(row))

    date_range = get_date_range(row.date)

    # point of interest
    my_poi = ee.Geometry.Point(row.longitude, row.latitude)

    # point of interest with a 100m buffer
    my_poi_buffer = my_poi.buffer(100)

    # if no image found, go to next row
    try:
        # Sort by date time: newest first
        S2_data_filtered = S2_data.filterBounds(my_poi).filterDate(date_range[0], date_range[1]).sort('system:time_start',False)

        # https://gis.stackexchange.com/questions/231333/selecting-every-image-of-collection-using-google-earth-engine
        listOfImages = S2_data_filtered.toList(S2_data_filtered.size())

        numberOfImages = listOfImages.length().getInfo()
        print('Number of images found: ' + str(numberOfImages))

        # Loop through the returned images. If no water pixels found, go to the next image
        for image_n in range(numberOfImages):
            print('\nimage_n: ' + str(image_n))

            image = ee.Image(listOfImages.get(image_n))

            # Create buffered pixels
            image_at_poi_buffer = image.sampleRegions(my_poi_buffer,None,10)

            # Extract pixel band values
            pixels_values = image_at_poi_buffer.getInfo()['features']
            pixels_values_properties = [x['properties'] for x in pixels_values]
            bands = pd.DataFrame(pixels_values_properties)

            # Remove clouds
            bands = bands[bands.QA60 == 0]

            # Remove non-water pixels
            bands = bands[bands.B3 > bands.B8] # green > NIR

            if len(bands)==0:
                print('Failed to find water pixels')
                continue

            # Find median values
            bands_median = bands.median()

            print('Median pixel values within 100m: \n' + str(bands_median))

            # Add meta data back
            bands_median['latitude'] = row.latitude
            bands_median['longitude'] = row.longitude
            bands_median['date'] = row.date

            # Add to results
            output_matchups = output_matchups.append(bands_median,ignore_index=True)
            print('\nMatchup found!')

            break

    except:
        print('Failed to find Sentinel-2 imagery')
        continue



sample                A
latitude      39.474744
longitude    -86.898353
date         2021-08-23
Name: 0, dtype: object
Number of images found: 12

image_n: 0
Median pixel values within 100m: 
AOT            111.0
B1             694.0
B11            169.5
B12            152.0
B2             684.5
B3             783.5
B4             484.5
B5             644.0
B6             404.0
B7             483.0
B8             351.0
B8A            334.0
B9             627.0
MSK_CLDPRB       0.0
MSK_SNWPRB       0.0
QA10             0.0
QA20             0.0
QA60             0.0
SCL              6.0
TCI_B           70.0
TCI_G           80.0
TCI_R           50.0
WVP           3537.0
dtype: float64

Matchup found!


sample                B
latitude          35.98
longitude     -78.83941
date         2021-08-16
Name: 1, dtype: object


  output_matchups = output_matchups.append(bands_median,ignore_index=True)


Number of images found: 3

image_n: 0
Failed to find water pixels

image_n: 1
Failed to find water pixels

image_n: 2
Failed to find water pixels


sample                C
latitude       38.04947
longitude       -99.827
date         2019-07-23
Name: 2, dtype: object
Number of images found: 3

image_n: 0
Median pixel values within 100m: 
AOT            184.0
B1             186.0
B11            231.0
B12            182.0
B2             117.0
B3             194.0
B4             164.0
B5             193.0
B6              47.0
B7              70.0
B8              41.0
B8A             48.0
B9             539.0
MSK_CLDPRB       0.0
MSK_SNWPRB       0.0
QA10             0.0
QA20             0.0
QA60             0.0
SCL              2.0
TCI_B           13.0
TCI_G           20.0
TCI_R           17.0
WVP           1953.0
dtype: float64

Matchup found!


  output_matchups = output_matchups.append(bands_median,ignore_index=True)


In [7]:
# Print output matchups
output_matchups

Unnamed: 0,AOT,B1,B11,B12,B2,B3,B4,B5,B6,B7,...,QA20,QA60,SCL,TCI_B,TCI_G,TCI_R,WVP,latitude,longitude,date
0,111.0,694.0,169.5,152.0,684.5,783.5,484.5,644.0,404.0,483.0,...,0.0,0.0,6.0,70.0,80.0,50.0,3537.0,39.474744,-86.898353,2021-08-23
1,184.0,186.0,231.0,182.0,117.0,194.0,164.0,193.0,47.0,70.0,...,0.0,0.0,2.0,13.0,20.0,17.0,1953.0,38.04947,-99.827,2019-07-23


In [None]:
# To save data
# output_matchups.to_csv('S2_matchups.csv', index=False)