# Sentinel 2A Plastic Waste Exploration
This notebook is a starting point for exploring multispectral data from Sentinel 2A for TPA sites in Indonesia.

## Explorations:
### [1. Patch Visualization](#Exploration-1)
For each known plastic waste site, define a rect centered on the known coordinates. Additionally, define an adjacent rect as a control reference. For every rect, extract an image patch from every Sentinel image band and visualize. 

### [2. Patch Comparison](#Exploration-2)
Using the extracted patches from Exploration #1, compare mean/median reflectance across bands between patches from waste sites and their corresponding control sites. In addition to site-by-site comparisons, aggregate these statistics across all sites to assess the trends.

### [3. Temporal Monitoring](#Exploration-3)
At each site, visualize how mean/median reflectance changes over time for each Sentinel imaging band.

### [4. Spectral Signal Clustering](#Exploration-4)
Compile the mean/median values computed at each site and time point in Exploration #3 into a multi-dimensional vector. Compress the dimensionality of the vectors using PCA or tSNE, and visualize whether waste and control sites form separable clusters.

## Setup

In [180]:
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from tqdm.contrib.concurrent import thread_map
from functools import partial
import json
import os
from datetime import datetime
from sklearn.decomposition import PCA

import ee
import geemap
import geemap.eefolium

import folium

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
ee.Authenticate()

Enter verification code: 4/1AY0e-g6FEHAHzaiDcM47wmO_UT038AhA4sc22CLywthKl2v8LWCY4hp8Wfs

Successfully saved authorization token.


In [8]:
ee.Initialize()

## Define Functions

### Sentinel 2 Cloud Filtering
Uses the new S2 Cloud Probability dataset. [Details on the algorithm](https://medium.com/google-earth/more-accurate-and-flexible-cloud-masking-for-sentinel-2-images-766897a9ba5f)

In [45]:
def get_s2_sr_cld_col(aoi, start_date, end_date):
    """
    Creates an ImageCollection for a region and time period.
    ImageCollection is prefiltered by the QA60 cloud mask band
    Prefiltering percentage specified by global `CLOUD_FILTER` variable
    """
    # Import and filter S2 SR.
    s2_sr_col = (ee.ImageCollection('COPERNICUS/S2_SR')
        .filterBounds(aoi)
        .filterDate(start_date, end_date)
        .filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', CLOUD_FILTER)))

    # Import and filter s2cloudless.
    s2_cloudless_col = (ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY')
        .filterBounds(aoi)
        .filterDate(start_date, end_date))

    # Join the filtered s2cloudless collection to the SR collection by the 'system:index' property.
    return ee.ImageCollection(ee.Join.saveFirst('s2cloudless').apply(**{
        'primary': s2_sr_col,
        'secondary': s2_cloudless_col,
        'condition': ee.Filter.equals(**{
            'leftField': 'system:index',
            'rightField': 'system:index'
        })
    }))

def add_cloud_bands(img):
    """
    From the s2_cloud_probability dataset, return an image with
    cloud probabilities below the global `CLD_PRB_THRESH` variable
    """
    # Get s2cloudless image, subset the probability band.
    cld_prb = ee.Image(img.get('s2cloudless')).select('probability')

    # Condition s2cloudless by the probability threshold value.
    is_cloud = cld_prb.gt(CLD_PRB_THRESH).rename('clouds')

    # Add the cloud probability layer and cloud mask as image bands.
    return img.addBands(ee.Image([cld_prb, is_cloud]))

def add_shadow_bands(img):
    """
    Isolate cloud shadows over land
    Cloud shadow thresholds are given by the global `NIR_DRK_THRESH` variable
    CK Note: I don't think this algorithm works over water
    """
    # Identify water pixels from the SCL band.
    not_water = img.select('SCL').neq(6)

    # Identify dark NIR pixels that are not water (potential cloud shadow pixels).
    SR_BAND_SCALE = 1e4
    dark_pixels = img.select('B8').lt(NIR_DRK_THRESH*SR_BAND_SCALE).multiply(not_water).rename('dark_pixels')

    # Determine the direction to project cloud shadow from clouds (assumes UTM projection).
    shadow_azimuth = ee.Number(90).subtract(ee.Number(img.get('MEAN_SOLAR_AZIMUTH_ANGLE')));

    # Project shadows from clouds for the distance specified by the CLD_PRJ_DIST input.
    cld_proj = (img.select('clouds').directionalDistanceTransform(shadow_azimuth, CLD_PRJ_DIST*10)
        .reproject(**{'crs': img.select(0).projection(), 'scale': 100})
        .select('distance')
        .mask()
        .rename('cloud_transform'))

    # Identify the intersection of dark pixels with cloud shadow projection.
    shadows = cld_proj.multiply(dark_pixels).rename('shadows')

    # Add dark pixels, cloud projection, and identified shadows as image bands.
    return img.addBands(ee.Image([dark_pixels, cld_proj, shadows]))

def add_cld_shdw_mask(img):
    """
    Create a mask based on the cloud and cloud shadow images
    """
    # Add cloud component bands.
    img_cloud = add_cloud_bands(img)

    # Add cloud shadow component bands.
    img_cloud_shadow = add_shadow_bands(img_cloud)

    # Combine cloud and shadow mask, set cloud and shadow as value 1, else 0.
    is_cld_shdw = img_cloud_shadow.select('clouds').add(img_cloud_shadow.select('shadows')).gt(0)

    # Remove small cloud-shadow patches and dilate remaining pixels by BUFFER input.
    # 20 m scale is for speed, and assumes clouds don't require 10 m precision.
    is_cld_shdw = (is_cld_shdw.focal_min(2).focal_max(BUFFER*2/20)
        .reproject(**{'crs': img.select([0]).projection(), 'scale': 20})
        .rename('cloudmask'))

    # Add the final cloud-shadow mask to the image.
    return img_cloud_shadow.addBands(is_cld_shdw)

def apply_cld_shdw_mask(img):
    """
    Apply the cloud mask to the all Sentinel bands beginning with `B`
    """
    # Subset the cloudmask band and invert it so clouds/shadow are 0, else 1.
    not_cld_shdw = img.select('cloudmask').Not()

    # Subset reflectance bands and update their masks, return the result.
    return img.select('B.*').updateMask(not_cld_shdw)

## Setup Parameters

In [9]:
DATA_DIR = '../data'

In [78]:
CLOUD_FILTER = 30
CLD_PRB_THRESH = 60
NIR_DRK_THRESH = 0.15
CLD_PRJ_DIST = 1
BUFFER = 50
DATASET = 'COPERNICUS/S2_SR'

In [183]:
# Load tpa_points dataset and create a list of coordinates for the known sites

with open(os.path.join(DATA_DIR, 'tpa_points.json')) as f:
  tpa_points = json.load(f)
  f.close()

tpa_sites = pd.DataFrame({
    'name': [site['properties']['Name'] for site in tpa_points['features']],
    'lon': [site['geometry']['coordinates'][0] for site in tpa_points['features']],
    'lat': [site['geometry']['coordinates'][1] for site in tpa_points['features']],
    'area': [site['properties']['Surface_Ha'] for site in tpa_points['features']],
    'daily_volume': [site['properties']['TOT_Kg/Day'] for site in tpa_points['features']],
    'coords': [site['geometry']['coordinates'] for site in tpa_points['features']]
})


display(tpa_sites)


Unnamed: 0,name,lon,lat,area,daily_volume,coords
0,TPA Jungut Batu,115.459414,-8.670958,1.2,,"[115.45941439485306, -8.670958330781342]"
1,TPA Biaung,115.498017,-8.67993,1.85,9433.0,"[115.49801683267276, -8.679930042100876]"
2,TPA Sente,115.45446,-8.530372,1.0,43219.0,"[115.45446033358267, -8.530371792768301]"
3,TPA Regional Bangli,115.367927,-8.353542,0.99,47350.0,"[115.3679270185395, -8.353541681392851]"
4,TPA Peh,114.583295,-8.327938,2.0,38130.0,"[114.58329467897306, -8.327937523143966]"
5,TPA Temesi,115.350242,-8.562121,1.05,209560.0,"[115.35024222376158, -8.562120592268835]"
6,TPA Bengkala,115.170153,-8.091409,0.35,125350.0,"[115.17015250951603, -8.091408966864984]"
7,TPA Bebandem,115.564212,-8.403448,0.77,43860.0,"[115.56421218781115, -8.403447940383018]"
8,TPA Mandung,115.095133,-8.529953,2.5,64580.0,"[115.09513259339289, -8.529952637269275]"
9,TPA Regional Suwung,115.221107,-8.720775,4.52,1358533.0,"[115.22110665725374, -8.720775407750443]"


In [184]:
# Sentinel 2 band descriptions
band_descriptions = {
    'B1': 'Aerosols, 442nm',
    'B2': 'Blue, 492nm',
    'B3': 'Green, 559nm',
    'B4': 'Red, 665nm',
    'B5': 'Red Edge 1, 704nm',
    'B6': 'Red Edge 2, 739nm',
    'B7': 'Red Edge 3, 779nm',
    'B8': 'NIR, 833nm',
    'B8A': 'Red Edge 4, 864nm',
    'B9': 'Water Vapor, 943nm',
    'B11': 'SWIR 1, 1610nm',
    'B12': 'SWIR 2, 2186nm'
}

## Visualize Cloud-Filtered Imagery

In [56]:
bali_rect = ee.Geometry.Polygon([[116, -8],
                    [116, -9],
                    [114, -9],
                    [114, -8]], None, False)

In [96]:
s2_data = get_s2_sr_cld_col(bali_rect, '2019-06-01', '2019-07-01')
s2_sr_median = s2_data.filterBounds(bali_rect) \
                    .map(add_cld_shdw_mask) \
                    .map(apply_cld_shdw_mask) \
                    .median() \
                    .clip(bali_rect)

In [208]:
# Define visualization parameters
vizParams = {'bands': ['B4', 'B3', 'B2'],
             'min': 0, 'max': 3000}

Map = geemap.eefolium.Map(center=[-8.4, 115.1], zoom=10)
Map.addLayer(s2_sr_median, vizParams, 'Sentinel 2 Image')

# Add the sites of interest as yellow dots
for i in range(len(tpa_sites)):
    site = tpa_sites.iloc[i]
    description = f"{site['name']}<br>Size: {site['area']:.1f} Ha<br>Volume: {site['daily_volume'] / 1000:.0f} Tonnes/day"
    folium.CircleMarker([site['lat'], site['lon']], 
                        fill=True, 
                        radius=3,
                        color='#FFCE00',
                        fll_opacity=1,
                        tooltip=description).add_to(Map)

display(Map)

## Exploration 1
### Patch Extraction and Visualization

In [243]:
def create_rect(lon, lat, width):
    """
    Given a set of coordinates, create an earth engine rect of a fixed width/height
    """
    rect = ee.Geometry.Polygon([[lon + width, lat + width / 2],
                                [lon + width, lat - width / 2],
                                [lon - width, lat - width / 2],
                                [lon - width, lat + width / 2]], None, False)
    return rect

In [None]:
# Define the rect width in degrees
# NOTE: I realize that degrees -> meters differs for lat/lon
# This shouldn't matter, but it's good to keep in mind
RECT_WIDTH = 0.001

In [217]:
display(tpa_sites)

Unnamed: 0,name,lon,lat,area,daily_volume,coords
0,TPA Jungut Batu,115.459414,-8.670958,1.2,,"[115.45941439485306, -8.670958330781342]"
1,TPA Biaung,115.498017,-8.67993,1.85,9433.0,"[115.49801683267276, -8.679930042100876]"
2,TPA Sente,115.45446,-8.530372,1.0,43219.0,"[115.45446033358267, -8.530371792768301]"
3,TPA Regional Bangli,115.367927,-8.353542,0.99,47350.0,"[115.3679270185395, -8.353541681392851]"
4,TPA Peh,114.583295,-8.327938,2.0,38130.0,"[114.58329467897306, -8.327937523143966]"
5,TPA Temesi,115.350242,-8.562121,1.05,209560.0,"[115.35024222376158, -8.562120592268835]"
6,TPA Bengkala,115.170153,-8.091409,0.35,125350.0,"[115.17015250951603, -8.091408966864984]"
7,TPA Bebandem,115.564212,-8.403448,0.77,43860.0,"[115.56421218781115, -8.403447940383018]"
8,TPA Mandung,115.095133,-8.529953,2.5,64580.0,"[115.09513259339289, -8.529952637269275]"
9,TPA Regional Suwung,115.221107,-8.720775,4.52,1358533.0,"[115.22110665725374, -8.720775407750443]"


In [235]:
# create a list of sites that are adjacent to the patches with dumps. 
# This should keep the distribution the same while isolating dump-specific factors
# could do multiple offset directions and distances. For now, only selecting one

offset = 2 * RECT_WIDTH

control_sites = pd.DataFrame({
    'name': ['control_' + str(i) for i in range(len(tpa_sites))],
    'lon': [lon + offset for lon in tpa_sites['lon']],
    'lat': [lat for lat in tpa_sites['lat']],
    'coords': [[lon + offset, lat] for lon, lat in zip(tpa_sites['lon'], tpa_sites['lat'])]
})

display(control_sites)

Unnamed: 0,name,lon,lat,coords
0,control_0,115.461414,-8.670958,"[115.46141439485305, -8.670958330781342]"
1,control_1,115.500017,-8.67993,"[115.50001683267276, -8.679930042100876]"
2,control_2,115.45646,-8.530372,"[115.45646033358267, -8.530371792768301]"
3,control_3,115.369927,-8.353542,"[115.3699270185395, -8.353541681392851]"
4,control_4,114.585295,-8.327938,"[114.58529467897306, -8.327937523143966]"
5,control_5,115.352242,-8.562121,"[115.35224222376158, -8.562120592268835]"
6,control_6,115.172153,-8.091409,"[115.17215250951602, -8.091408966864984]"
7,control_7,115.566212,-8.403448,"[115.56621218781115, -8.403447940383018]"
8,control_8,115.097133,-8.529953,"[115.09713259339289, -8.529952637269275]"
9,control_9,115.223107,-8.720775,"[115.22310665725374, -8.720775407750443]"


In [248]:
def get_sentinel_band(site_name, roi, output_dict, band):
    band_img = median.select(band).clipToBoundsAndScale(roi, scale=10)
    image_array = geemap.ee_to_numpy(band_img, region=roi)
    output_dict[band] = np.squeeze(image_array)
    return np.squeeze(image_array)

def get_patches(site_names, site_coords, buffer, image):
    """
    Multithreaded process to export Sentinel 2 patches as numpy arrays.
    Input lists of site names and site coordinates along with an Earth Engine image.
    Exports each band in image to a dictionary organized by [site name][band][band_img]
    """
    patch_dict = {}
    for name, site in zip(site_names, site_coords):
        print("Processing", name)
        roi = create_rect(site[0], site[1], buffer)
        images = {}
        bands = [band for band in image.bandNames().getInfo() if band.startswith('B')]
        get_sentinel_partial = partial(get_sentinel_band, 
                                      name, 
                                      roi, 
                                      images)
        thread_map(get_sentinel_partial, bands)
        patch_dict[name] = images
    return patch_dict

In [251]:
tpa_patches = get_patches(tpa_sites['name'], tpa_sites['coords'], RECT_WIDTH, s2_sr_median)

Processing TPA Jungut Batu


ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

In [None]:
# Define visualization parameters
vizParams = {'bands': ['B4', 'B3', 'B2'],
             'min': 0, 'max': 3000}

Map = geemap.eefolium.Map(center=[-8.67, 115.46], zoom=12)
for band in tqdm(s2_sr_median.bandNames().getInfo()):
    for tpa_site, control_site in zip(list(tpa_sites['coords']), list(control_sites['coords'])):
        tpa_rect = create_rect(tpa_site[0], tpa_site[1], RECT_WIDTH)
        control_rect = create_rect(control_site[0], control_site[1], RECT_WIDTH)
        Map.addLayer(s2_sr_median.clip(tpa_rect), {'min': 0,
                                                  'max': 3000,
                                                  'bands': band}, band)
        Map.addLayer(s2_sr_median.clip(control_rect), {'min': 0,
                                                  'max': 3000,
                                                  'bands': band}, band)
Map.addLayerControl()
display(Map)

In [240]:
list(control_sites['coords'])

[[115.46141439485305, -8.670958330781342],
 [115.50001683267276, -8.679930042100876],
 [115.45646033358267, -8.530371792768301],
 [115.3699270185395, -8.353541681392851],
 [114.58529467897306, -8.327937523143966],
 [115.35224222376158, -8.562120592268835],
 [115.17215250951602, -8.091408966864984],
 [115.56621218781115, -8.403447940383018],
 [115.09713259339289, -8.529952637269275],
 [115.22310665725374, -8.720775407750443]]

In [153]:
# Define visualization parameters
vizParams = {'bands': ['B4', 'B3', 'B2'],
             'min': 0, 'max': 3000}

Map = geemap.eefolium.Map(center=[-8.67, 115.46], zoom=12)
for band in s2_sr_median.bandNames().getInfo():
    Map.addLayer(s2_sr_median.clip(tpa_rect), {'min': 0,
                                              'max': 3000,
                                              'bands': band}, band)
Map.addLayerControl()
display(Map)

In [85]:
{'color': 'FFCE00'}, "TPAs"

['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B8A', 'B9', 'B11', 'B12']

In [None]:

# Add the sites of interest as red dots with rects surrounding them
for site in site_coords:
  lon = site[0]
  lat = site[1]
  rect = create_rect(lon, lat, BUFFER)
  Map.addLayer(rect, {'color': 'FF0000', 'opacity': 0.75})
  site = ee.Geometry.Point(lon, lat)
  Map.addLayer(site, {'color': 'FFCE00'}, "TPAs")


for site in control_coords:
  lon = site[0]
  lat = site[1]
  rect = create_rect(lon, lat, BUFFER)
  Map.addLayer(rect, {'color': '3236a8', 'opacity': 0.75})
  site = ee.Geometry.Point(lon, lat)
  Map.addLayer(site, {'color': '3236a8'}, "Controls")


In [None]:
def add_cloud_bands(img):
    # Get s2cloudless image, subset the probability band.
    cld_prb = ee.Image(img.get('s2cloudless')).select('probability')

    # Condition s2cloudless by the probability threshold value.
    is_cloud = cld_prb.gt(CLD_PRB_THRESH).rename('clouds')

    # Add the cloud probability layer and cloud mask as image bands.
    return img.addBands(ee.Image([cld_prb, is_cloud]))

def add_shadow_bands(img):
    # Identify water pixels from the SCL band.
    not_water = img.select('SCL').neq(6)

    # Identify dark NIR pixels that are not water (potential cloud shadow pixels).
    SR_BAND_SCALE = 1e4
    dark_pixels = img.select('B8').lt(NIR_DRK_THRESH*SR_BAND_SCALE).multiply(not_water).rename('dark_pixels')

    # Determine the direction to project cloud shadow from clouds (assumes UTM projection).
    shadow_azimuth = ee.Number(90).subtract(ee.Number(img.get('MEAN_SOLAR_AZIMUTH_ANGLE')));

    # Project shadows from clouds for the distance specified by the CLD_PRJ_DIST input.
    cld_proj = (img.select('clouds').directionalDistanceTransform(shadow_azimuth, CLD_PRJ_DIST*10)
        .reproject(**{'crs': img.select(0).projection(), 'scale': 100})
        .select('distance')
        .mask()
        .rename('cloud_transform'))

    # Identify the intersection of dark pixels with cloud shadow projection.
    shadows = cld_proj.multiply(dark_pixels).rename('shadows')

    # Add dark pixels, cloud projection, and identified shadows as image bands.
    return img.addBands(ee.Image([dark_pixels, cld_proj, shadows]))

def add_cld_shdw_mask(img):
    # Add cloud component bands.
    img_cloud = add_cloud_bands(img)

    # Add cloud shadow component bands.
    img_cloud_shadow = add_shadow_bands(img_cloud)

    # Combine cloud and shadow mask, set cloud and shadow as value 1, else 0.
    is_cld_shdw = img_cloud_shadow.select('clouds').add(img_cloud_shadow.select('shadows')).gt(0)

    # Remove small cloud-shadow patches and dilate remaining pixels by BUFFER input.
    # 20 m scale is for speed, and assumes clouds don't require 10 m precision.
    is_cld_shdw = (is_cld_shdw.focal_min(2).focal_max(BUFFER*2/20)
        .reproject(**{'crs': img.select([0]).projection(), 'scale': 20})
        .rename('cloudmask'))

    # Add the final cloud-shadow mask to the image.
    return img_cloud_shadow.addBands(is_cld_shdw)

## Exploration 2
### Patch Comparison

## Exploration 3
### Temporal Monitoring

## Exploration 4
### Spectral Clustering