# Classify snow-covered area (SCA) in PlanetScope imagery: full pipeline

Rainey Aberle

Department of Geosciences, Boise State University

2022

### Requirements:
- Planet account with access to PlanetScope imagery through the NASA CSDA contract. Sign up __[here](https://www.planet.com/markets/nasa/)__.
- Area of Interest (AOI) shapefile: where snow will be classified in each image. 
- PlanetScope 4-band image collection over the AOI. Download images using `planetAPI_image_download.ipynb` or through __[PlanetExplorer](https://www.planet.com/explorer/)__. 
- Google Earth Engine (GEE) account: used to pull DEM over the AOI. Sign up for a free account __[here](https://earthengine.google.com/new_signup/)__. 


### Outline:
__0. Setup__ paths in directory, AOI file location - _modify this section!_

__1. Mosaic images__ captured in the same hour

__2. Adjust image radiometry__ using median surface reflectance at the top or bottom perentile of elevations

__3. Classify SCA__ and use the snow elevations distribution to estimate the seasonal snowline

__4. Estimate snow line__ and snow line elevations

----------

## 0. Setup

#### Define paths in directory, image file extensions, and desired settings. 
Modify lines located within the following:

`#### MODIFY HERE ####`  

`#####################`

In [None]:
##### MODIFY HERE #####
# -----Path to snow-cover-mapping
base_path = '/Users/raineyaberle/Research/PhD/snow_cover_mapping/snow-cover-mapping/'

# -----Paths in directory
site_name = 'SouthCascade'
# path to images
im_path = base_path + '../study-sites/' + site_name + '/imagery/PlanetScope/2016-2021/'
# path to AOI including the name of the shapefile
AOI_fn = im_path + '../../../glacier_outlines/' + site_name + '_USGS_*.shp'
# path to DEM including the name of the tif file
# Note: set DEM_fn=None if you want to use the ASTER GDEM on Google Earth Engine
DEM_fn = im_path + '../../../DEMs/' + site_name + '*_DEM/' + site_name + '*_DEM_filled.tif'
# path for output images
out_path = im_path + '../'
# path for output figures
figures_out_path = im_path + '../../../figures/'

# -----Determine settings
plot_results = True # = True to plot figures of results for each image where applicable
skip_clipped = False # = True to skip images where bands appear "clipped", i.e. max blue SR < 0.8
crop_to_AOI = True # = True to crop images to AOI before calculating SCA
save_outputs = True # = True to save SCA images to file
save_figures = True # = True to save SCA output figures to file

#######################

# -----Import packages
import os
import numpy as np
import glob
import subprocess
from matplotlib.patches import Rectangle
from matplotlib import pyplot as plt, dates
import rasterio as rio
import xarray as xr
import rioxarray as rxr
from scipy import stats
import pandas as pd
import geopandas as gpd
import sys
import time
import ee
import pickle

# -----Add path to functions
sys.path.insert(1, base_path+'functions/')
import ps_pipeline_utils as f

# -----Set paths for output files
im_mask_path = out_path + 'masked/'
im_mosaic_path = out_path + 'mosaics/'
im_adj_path = out_path + 'adjusted/'
im_classified_path = out_path + 'classified/'
snowlines_path = out_path + 'snowlines/'

# -----Load AOI as gpd.GeoDataFrame
AOI_fn = glob.glob(AOI_fn)[0]
AOI = gpd.read_file(AOI_fn)
    
# -----Load DEM as Xarray DataSet
if DEM_fn==None:
    
    # Authenticate and initialize Google Earth Engine
    # Note: The first time you run this, you will be asked to authenticate your GEE account 
    # for use in this notebook. This will send you to an external web page, where you will 
    # walk through the GEE authentication workflow and copy an authentication code back 
    # in this notebook when prompted. 
    try:
        ee.Initialize()
    except: 
        ee.Authenticate()
        ee.Initialize()
    # query GEE for DEM
    DEM, AOI_UTM = f.query_GEE_for_DEM(AOI)
    
else:
    
    # reproject the AOI to the optimal UTM zone
    # solve for optimal UTM zone
    AOI_WGS = AOI.to_crs(4326)
    AOI_WGS_centroid = [AOI_WGS.geometry[0].centroid.xy[0][0],
                        AOI_WGS.geometry[0].centroid.xy[1][0]]
    epsg_UTM = f.convert_wgs_to_utm(AOI_WGS_centroid[0], AOI_WGS_centroid[1])
    # reproject AOI to UTM
    AOI_UTM = AOI.to_crs(str(epsg_UTM))
    # load DEM as xarray DataSet
    DEM_fn = glob.glob(DEM_fn)[0]
    DEM_rio = rio.open(DEM_fn) # open using rasterio to access the transform
    DEM = xr.open_dataset(DEM_fn)
    DEM = DEM.rename({'band_data': 'elevation'})
    # reproject the DEM to the optimal UTM zone
    DEM = DEM.rio.reproject(str('EPSG:'+epsg_UTM))

## 1. Mask image pixels with clouds, shadows, and heavy haze using associated Usable Data Mask (`udm`) files.  

In [None]:
# -----Read surface reflectance file names
os.chdir(im_path)
im_fns = glob.glob('*SR*.tif')
im_fns = sorted(im_fns) # sort chronologically

# ----Mask images
for im_fn in im_fns:
    
    print(im_fn)
    f.mask_im_pixels(im_path, im_fn, im_mask_path, save_outputs, plot_results)
    print(' ')

## 2. Mosaic images by date

Mosaic all images captured within the same hour to increase area coverage of each image over the AOI. Images captured in different hours are more likely to have drastic variations in illumination. Adapted from code developed by [Jukes Liu](https://github.com/julialiu18). 

Note that images with no data over the AOI are skipped in this step. Issues with illumination or radiometry will be further filtered and adjusted in the next step.  

In [None]:
# -----Read masked image file names
os.chdir(im_mask_path)
im_mask_fns = glob.glob('*_mask.tif')
im_mask_fns = sorted(im_mask_fns) # sort chronologically

# ----Mosaic images by date
f.mosaic_ims_by_date(im_mask_path, im_mask_fns, im_mosaic_path, AOI_UTM, plot_results)

## 3. Adjust image radiometry

Mitigate issues related to varying illumination and general radiometry by first creating a polygon(s) representing the of an area within the AOI that is likely covered with snow year-round using the upper 30th percentile of elevations. The polygon(s) will then be used to stretch the image, assuming the median surface reflectance value within the polygon is equal to that predicted for snow, and that the darkest point in the image has a surface reflectance of 0. Images with no real data values within the AOI or in the polygon(s) will be skipped. 

In [None]:
# -----Read mosaicked image file names
os.chdir(im_mosaic_path)
im_mosaic_fns = glob.glob('*.tif')
im_mosaic_fns = sorted(im_mosaic_fns)

# -----Create a polygon(s) of the top 20th percentile elevations within the AOI
polygon_top, polygon_bottom, im_mosaic_fn, im_mosaic = f.create_AOI_elev_polys(AOI_UTM, im_mosaic_path, im_mosaic_fns, DEM)
# plot
if plot_results:
    fig, ax = plt.subplots(figsize=(8,8))
    ax.imshow(np.dstack([im.data[2], im.data[1], im.data[0]]), 
               extent=(np.min(im.x), np.max(im.x), np.min(im.y), np.max(im.y)))
    AOI_UTM.plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2, label='AOI')
    for count, geom in enumerate(polygon_top.geoms):
        xs, ys = geom.exterior.xy
        if count==0:
            ax.plot([x for x in xs], [y for y in ys], color='c', label='top polygon(s)')
        else:
            ax.plot([x for x in xs], [y for y in ys], color='c', label='_nolegend_')
    for count, geom in enumerate(polygon_bottom.geoms):
        xs, ys = geom.exterior.xy
        if count==0:
            ax.plot([x for x in xs], [y for y in ys], color='orange', label='bottom polygon(s)')
        else:
            ax.plot([x for x in xs], [y for y in ys], color='orange', label='_nolegend_')
    ax.set_xlabel('Easting [m]')
    ax.set_ylabel('Northing [m]')
    ax.set_title(im_fn)
    fig.legend(loc='upper right')
    fig.tight_layout()
    plt.show()
    
# -----Loop through images
for im_mosaic_fn in im_mosaic_fns:
    
    # load image
    print(im_mosaic_fn)
    # adjust radiometry
    im_adj_fn, im_adj_method = f.adjust_image_radiometry(im_mosaic_fn, im_mosaic_path, polygon_top, polygon_bottom, 
                                                         im_adj_path, skip_clipped, plot_results)
    print('image adjustment method = ' + im_adj_method)
    print('----------')
    print(' ')

## 4. Classify images

All adjusted images will be classified using the pre-trained classifier into the following classes:
- 1 = Snow
- 2 = Shadowed snow
- 3 = Ice
- 4 = Bare ground
- 5 = Water

The resulting classified image collection cropped to the AOI if `crop_to_AOI = True` and will be saved to the `im_classified_path` folder in directory if `save_outputs = True`. 

In [None]:
# -----Read adjusted image file names
os.chdir(im_adj_path)
im_adj_fns = glob.glob('*.tif')
im_adj_fns = sorted(im_adj_fns)

# start timer
t1 = time.time()

# -----Load image classifier and feature columns
clf_fn = base_path+'inputs-outputs/PS_classifier_all_sites.sav'
clf = pickle.load(open(clf_fn, 'rb'))
feature_cols_fn = base_path+'inputs-outputs/PS_feature_cols.pkl'
feature_cols = pickle.load(open(feature_cols_fn,'rb'))

# -----Loop through images
# image datetimes
im_dts = [] 
# DataFrame to hold stats summary
df = pd.DataFrame(columns=('site_name', 'datetime', 'im_elev_min', 'im_elev_max', 'snow_elev_min', 'snow_elev_max', 
                           'snow_elev_median', 'snow_elev_10th_perc', 'snow_elev_90th_perc'))
for im_adj_fn in im_adj_fns:

    # extract datetime from image name
    im_dt = np.datetime64(im_adj_fn[0:4] + '-' + im_adj_fn[4:6] + '-' + im_adj_fn[6:8]
                          + 'T' + im_adj_fn[9:11] + ':00:00')
    im_dts = im_dts + [im_dt]

    # classify snow
    im_classified_fn, im_adj = f.classify_image(im_adj_fn, im_adj_path, 
                                                clf, feature_cols, crop_to_AOI, AOI_UTM, im_classified_path)  

# -----Stop timer
print('Time elapsed: '+str(np.round((time.time()-t1)/60, 2))+' minutes')


## 5. Estimate seasonal snow line and snow line elevations

In [None]:
# -----Read classified image file names
os.chdir(im_classified_path)
im_classified_fns = glob.glob('*.tif')
im_classified_fns = sorted(im_classified_fns)

# -----Create directories for outputs if they do not exist
# snowlines folder
if save_outputs and os.path.exists(snowlines_path)==False:
    os.mkdir(snowlines_path)
    print('made directory for output snowlines:' + snowlines_path)
# figures folder
if save_figures and os.path.exists(figures_out_path)==False:
    os.mkdir(figures_out_path)
    print('made directory for output figures:' + figures_out_path)

# -----Intialize variables
results_df = pd.DataFrame(columns=['study_site', 'datetime', 'snowlines_coords', 'snowlines_elevs', 'snowlines_elevs_median'])

# -----Loop through classified image filenames
for im_classified_fn in im_classified_fns:
    
    # extract datetime from image file name
    im_date = im_classified_fn[0:11]
    im_dt = np.datetime64(im_classified_fn[0:4] + '-' + im_classified_fn[4:6] + '-' + im_classified_fn[6:8]
                          + 'T' + im_classified_fn[9:11] + ':00:00')
    
    print(im_date)
    
    # load adjusted image from the same date
    os.chdir(im_adj_path)
    im_adj_fn = glob.glob(im_date + '*.tif')[0]
    
    # estimate snow line
    try:
        fig, ax, sl_est, sl_est_elev = f.delineate_snow_line(im_adj_fn, im_adj_path, im_classified_fn, im_classified_path, AOI_UTM, DEM, DEM_rio)
        plt.show()

        # calculate median snow line elevation
        sl_est_elev_median = np.nanmedian(sl_est_elev)

        # save figure
        if save_figures:
            fig.savefig(figures_out_path+'PS_' + im_date + '_SCA.png', dpi=300, facecolor='white', edgecolor='none')
            print('figure saved to file')

        # compile results in df
        result_df = pd.DataFrame({'study_site': site_name, 
                                  'datetime': im_dt, 
                                  'snowlines_coords': [sl_est], 
                                  'snowlines_elevs': [sl_est_elev], 
                                  'snowlines_elevs_median': sl_est_elev_median})
        # concatenate to results_df
        results_df = pd.concat([results_df, result_df])
        
    except:
        
        print('error in snowline delineation, continuing...')
        print('----------')
        print(' ')
        pass
    
# -----Plot median snow line elevations
fig2, ax2 = plt.subplots(figsize=(10,6))
ax2.plot(results_df['datetime'], results_df['snowlines_elevs_median'], '.b')
ax2.set_xlabel('Image capture date')
ax2.set_ylabel('Median snow line elevation [m]')
ax2.grid()
fig2.suptitle(site_name + ' Glacier snow line elevations')
plt.show()

# -----Save results
if save_outputs:
    snowlines_fn =  site_name + '_snowlines.pkl'
    results_df.to_pickle(snowlines_path + snowlines_fn)
    print('snowline data table saved to file:' + snowlines_path + snowlines_fn)
if save_figures:
    figure_sl_fn = site_name + '_sl_elevs_median.png'
    fig2.savefig(figures_out_path + figure_sl_fn, 
                 facecolor='white', edgecolor='none')
    print('figure saved to file:' + figures_out_path + figure_sl_fn)

### _Optional:_ Compile individual figures into a .gif and delete individual figures

In [None]:
from PIL import Image as PIL_Image
from IPython.display import Image as IPy_Image

# -----Make a .gif of output images
os.chdir(figures_out_path)
fig_fns = glob.glob('PS_*_SCA.png') # load all output figure file names
fig_fns.sort() # sort chronologically
# grab figures date range for .gif file name
fig_start_date = fig_fns[0][3:-8] # first figure date
fig_end_date = fig_fns[-1][3:-8] # final figure date
frames = [PIL_Image.open(im) for im in fig_fns]
frame_one = frames[0]
gif_fn = ('PS_' + fig_start_date[0:8] + '_' + fig_end_date[0:8] + '_SCA.gif' )
frame_one.save(figures_out_path + gif_fn, format="GIF", append_images=frames, save_all=True, duration=2000, loop=0)
print('GIF saved to file:' + figures_out_path + gif_fn)

# -----Display .gif
IPy_Image(filename = figures_out_path + gif_fn)

# -----Clean up: delete individual figure files
files = os.listdir(figures_out_path)
for file in files:
    if ('PS_' in file) and ('_SCA.png' in file):
        os.remove(os.path.join(figures_out_path, file))
print('Individual figure files deleted.')

## 6. Fit Fourier series model to snowline time series

Adapted from code developed by [Jukes Liu](https://github.com/CryoGARS-Glaciology/Fourier-terminus-models)

This code fits time series of median snow line elevations using Fourier Series with the optimal number of terms (approximately the number of years in the time series) chosen using Monte Carlo simulations. 500 Fourier Series models are generated for each time series. The model IQR is calculated and exported to a new csv file.

In [None]:
#### NOT WORKING PROPERLY YET ####

from symfit import parameters, variables, sin, cos, Fit
from sklearn.model_selection import train_test_split
from time import mktime

def fourier_series(x, f, n=0):
    """
    Returns a symbolic fourier series of order `n`.

    :param n: Order of the fourier series.
    :param x: Independent variable
    :param f: Frequency of the fourier series
    """
    # Make the parameter objects for all the terms
    a0, *cos_a = parameters(','.join(['a{}'.format(i) for i in range(0, n + 1)]))
    sin_b = parameters(','.join(['b{}'.format(i) for i in range(1, n + 1)]))
    # Construct the series
    series = a0 + sum(ai * cos(i * f * x) + bi * sin(i * f * x)
                     for i, (ai, bi) in enumerate(zip(cos_a, sin_b), start=1))
    return series

x, y = variables('x, y')
w, = parameters('w')
model_dict = {y: fourier_series(x, f=w, n=5)}
print(model_dict)

In [None]:
# -----Read snowlines file name
# os.chdir(snowlines_path)
# snowlines_fn = glob.glob('*_sl_elevs*.pkl')[0]
# snowlines = pd.read_pickle(snowlines_fn)
# snowlines = snowlines.reset_index(drop=True)

# # ----Split snowline dates and median elevations into X and Y data
# X = np.ravel(snowlines['datetime'])
# Y = np.array(np.ravel(snowlines['snowlines_elevs_median']), dtype=float)
# # convert datatimes to floats
# X = np.array([mktime(x.timetuple()) for x in X], dtype=float)
# # remove NaNs
# X = X[~np.isnan(Y)]
# Y = Y[~np.isnan(Y)]
    
Y = np.array([1500, 1501, 1503, 1506, 1550, 1545, 1575, 1800, 1750, 1725,
    1515, 1516, 1575, 1574, 1585, 1590, 1650, 1725, 1750, 1640])
time_deltas = [np.timedelta64(x, 'D') for x in [0, 1, 5, 14, 15, 20, 35, 46, 57, 60, 
                                                200, 202, 207, 214, 215, 220, 266, 273, 295, 296]]
X = [np.datetime64('2016-09-01') + x for x in time_deltas]
# convert dates to days after 2016-05-01 for model fitting
day1 = np.datetime64('2016-05-01')
days = np.array([pd.Timedelta(day - day1, 'D').total_seconds() / 86400 for day in X])
X = days

# -----Identify the ideal number of terms for the Fourier model using Monte Carlo simulations
# try fourier fits with 3 different number of terms using number of years in snowlines
nyears = 2
# nyears = snowlines['datetime'].iloc[-1].year - snowlines['datetime'].iloc[0].year # number of years in snowlines
nmc = 100 # number of Monte Carlo simulations
pTrain = 0.9 # percent of data to use as training
fourier_ns = [nyears-1, nyears, nyears+1]
sls = np.zeros(nmc) # number of terms for the best fit model
error = np.zeros(nmc) # error associated with the best fit model
print('Conducting Monte Carlo simulations to determine the ideal number of model terms...')    

# loop through possible number of terms
df = pd.DataFrame(columns=['fit_minus1_err', 'fit_err', 'fit_plus1_err'])
for i in np.arange(0,nmc):
        
    # split into training and testing data
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=pTrain, random_state=42)

    # fit fourier curves to the training data with varying number of coeffients
    fit_minus1 = Fit({y: fourier_series(x, f=w, n=fourier_ns[0])}, 
                x=X_train, y=Y_train).execute() # fit model
    fit = Fit({y: fourier_series(x, f=w, n=fourier_ns[1])}, 
                x=X_train, y=Y_train).execute()
    fit_plus1 = Fit({y: fourier_series(x, f=w, n=fourier_ns[2])}, 
                x=X_train, y=Y_train).execute()

    # fit models to testing data
    Y_pred_minus1 = fit_minus1.model(x=X_test, **fit_minus1.params).y 
    Y_pred = fit.model(x=X_test, **fit.params).y
    Y_pred_plus1 = fit_plus1.model(x=X_test, **fit_plus1.params).y
    
    # calculate error, concatenate to df
    fit_minus1_err = np.abs(Y_test - Y_pred_minus1)
    fit_err = np.abs(Y_test - Y_pred)
    fit_plus1_err = np.abs(Y_test - Y_pred_plus1)
    result = pd.DataFrame({'fit_minus1_err': fit_minus1_err, 
                           'fit_err': fit_err, 
                           'fit_plus1_err': fit_plus1_err})    
    # add results to df
    df = pd.concat([df, result])
    
df = df.reset_index(drop=True)
# calculate mean error for each number of coefficients
fit_err_mean = [np.nanmean(df['fit_minus1_err']), np.nanmean(df['fit_err']), np.nanmean(df['fit_plus1_err'])]
# identify best number of coefficients
Ibest = np.argmin(fit_err_mean)
fit_best = [fit_minus1, fit, fit_plus1][Ibest]
fourier_n = fourier_ns[Ibest]
print('Optimal # of model terms = ' + str(fourier_n))
print(' Mean error = +/- ' + str(fit_err_mean[Ibest]) + ' m')
    

In [None]:
# -----Calculate average model coefficients from 500 Monte Carlo simulations
nmc = 500
# initialize figure

# initialize coefficients data frame
cols = [val[0] for val in fit.params.items()]
df = pd.DataFrame(columns=cols)
# # loop through Monte Carlo simulations
for i in np.arange(0,nmc):
        
    # split into training and testing data
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=pTrain, random_state=42, shuffle=True)

    # fit fourier model to training data
    fit = Fit({y: fourier_series(x, f=w, n=fourier_n)}, 
                x=X_train, y=Y_train).execute() 

    # apply fourier model to testing data
    Y_pred = fit.model(x=X_test, **fit.params).y 
    # calculate error
    fit_err = np.abs(Y_test - Y_pred_minus1)
    # compile coefficient values
    result = result = pd.DataFrame(columns=cols)
    result[cols[0]] = np.zeros(1)
    for i, col in enumerate(cols):
        result[col] = vals[i]
    # concatenate results to df
    df = pd.concat([df, result])
    
df = df.reset_index(drop=True)
df
    