# OPERA RTC Validation: Terrain Flattening Performance
## Backscatter Distributions by Slope (Foreslope, Backslope, Flat)

**Alex Lewandowski & Franz J Meyer; Alaska Satellite Facility, University of Alaska Fairbanks**

This notebook analyzes the RTC backscatter distributions for foreslopes, backslopes, and flat regions as part of the OPERA RTC validation campaign. The mean values of the foreslope and backslope regions are then compared to evaluate if the OPERA RTCs meet requirements.

**Notebook Requires**
- MGRS tiles of prepared OPERA RTC CalVal data created with Prep_OPERA_RTC_CalVal_data_stage1_part3.ipynb

<hr>

# 0. RTC Terrain Flattening Requirement

<div class="alert alert-success">
<i>The median radar backscatter of the OPERA RTC-S1 product over an area of foreslope shall be within 1dB of the median radar backscatter over an area of backslope for forested land-types, for at least 80% of all validation products considered.</i>
</div>

<hr>

# 1. Load Necessary Libraries

In [None]:
import copy
import csv
from ipyfilechooser import FileChooser
import numpy.ma as ma
import numpy as np
import pandas as pd
from pathlib import Path
from pprint import pprint
import re
import rioxarray as rxr
import shutil
from scipy import stats

from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
import matplotlib.lines as lines
from matplotlib.offsetbox import AnchoredText

import opensarlab_lib as osl

<hr>

# 2. Load RTC Data and Landcover Layers

In [None]:
print("Select the directory holding your RTCs")
fc = FileChooser(Path.cwd())
display(fc)

In [None]:
# try/except for Papermill
try:
    data_dir = Path(fc.selected_path)   
except:
    pass

In [None]:
data_dir = Path(data_dir) # for Papermill
print(data_dir)
full_scene_tiffs = list(data_dir.glob('*.tif'))
full_scene_tiffs

<hr>

# 3. Plot Backscatter Distributions

We first tile the image and evaluate the statistical distribution of OPERA RTC radar backscatter for foreslopes, backslopes, and flat regions for each tile 

In [None]:
def plot_backscatter_distributions_by_slope(fore, back, flat, central_moments, polarization, dataset_name, tile=None, backscatter_minmax=None, output=None):
            # create histograms
            f, ax = plt.subplots(figsize=(18, 8))
            n_bins = 200
            colors = ['blue', 'green', 'darkorange']
            n, bins, patches = ax.hist([fore,back,flat], n_bins, color=colors,
                                       range=backscatter_minmax, histtype='step')

            # fill 1st standard deviation for each histogram and add line at mean
            std_colors = ['skyblue', 'lightgreen', 'orange']
            for j, hist in enumerate(patches):
                y_max = hist[0].get_path().get_extents().y1
                hist_path = hist[0].get_path().vertices
                std_hist = plt.Polygon(hist_path, color=std_colors[j], fill=True, alpha=0.2)
                ax.add_patch(std_hist)
                std_clip = plt.Rectangle([means[j]-stds[j],means[j]+stds[j]], stds[j]*2, y_max, 
                                          fill=True, visible=False)
                ax.add_patch(std_clip)
                std_hist.set_clip_path(std_clip)
                mean_line = lines.Line2D([means[j],means[j]], [0, y_max], color=colors[j], ls='--')
                ax.add_artist(mean_line)
                mean_line.set_clip_path(hist[0])

            annotation = AnchoredText(
                (f"PIXEL COUNTS:\n"
                 f"foreslope:  {np.count_nonzero(~np.isnan(fore))}\n"
                 f"backslope: {np.count_nonzero(~np.isnan(back))}\n"
                 f"flat:           {np.count_nonzero(~np.isnan(flat))}\n\n"
                 f"MEAN:\n"
                 f"foreslope:  {central_moments[0][0]}\n"
                 f"backslope: {central_moments[0][1]}\n"
                 f"flat:           {central_moments[0][2]}\n\n"
                 f"MEDIAN:\n"
                 f"foreslope:  {central_moments[1][0]}\n"
                 f"backslope: {central_moments[1][1]}\n"
                 f"flat:           {central_moments[1][2]}\n\n"
                 f"MODE:\n"
                 f"foreslope:  {central_moments[2][0]}\n"
                 f"backslope: {central_moments[2][1]}\n"
                 f"flat:           {central_moments[2][2]}\n\n"
                 f"STANDARD DEVIATION:\n"
                 f"foreslope:  {central_moments[3][0]}\n"
                 f"backslope: {central_moments[3][1]}\n"
                 f"flat:           {central_moments[3][2]}"                 
                ),
                loc='upper left', prop=dict(size=12), frameon=True, bbox_to_anchor=(1.0,1.0), bbox_transform=ax.transAxes)
            annotation.patch.set_boxstyle("round,pad=0.,rounding_size=0.2")
            ax.add_artist(annotation)  

            # add histogram legend
            hist_handles = [lines.Line2D([0,1], [0,0], lw=1, color=c) for c in colors]
            hist_legend = ax.legend(handles=hist_handles, labels=['foreslope','backslope','flat'], loc='upper right')
            ax.add_artist(hist_legend)

            # add standard deviation legend
            std_handles = [Rectangle((0,0),1,1,color=c,ec="k",alpha=0.2) for c in std_colors]
            std_legend = ax.legend(handles=std_handles, labels=['foreslope 1 std', 'backslope 1 std', 'flat 1 std'], loc='center right', bbox_to_anchor=(1,0.75))
            ax.add_artist(std_legend)

            # add mean legend
            mean_handles = [lines.Line2D([0,0], [0,1], color=c, ls='--') for c in colors]
            mean_legend = ax.legend(handles=mean_handles, labels=['foreslope mean', 'backslope mean', 'flat mean'], loc='center right', bbox_to_anchor=(1,0.55))
            ax.add_artist(mean_legend)

            if tile:
                title = f"Distribution of {polarization} Foreslope, Backslope, and Flat Backscatter Values\n{dataset_name}\nMGRS: {tile}"
            else:
                title = f"Distribution of {polarization} Foreslope, Backslope, and Flat Backscatter Values\n{dataset_name}"
            
            ax.set(title=title,
                   xlabel='Backscatter',
                   ylabel='Frequency')
            if output:
                plt.savefig(output, dpi=300, transparent='true')
            plt.show()

In [None]:
print("Select the scale in which to work:")
scale_choice = osl.select_parameter(['log scale', 'power scale'])
display(scale_choice)

### Generate Histograms for MGRS Tiles

In [None]:
# try/except for Papermill
try:
    log = scale_choice.value == 'log scale'
except:
    pass

<hr>

# 4. Generate Summary Histograms for Full Scene Data

Now we can collect all information to generate a summary histogram for the full OPERA RTC scene. Result is the median radar brightness for foreslopes and backslopes, evaluated for dense forest areas.

In [None]:
pols = ['vh', 'vv']

vh_total = [np.array([]), np.array([]), np.array([])]
vv_total = [np.array([]), np.array([]), np.array([])]

opera_id = data_dir.name.split('_prepped')[0]
output_dir = data_dir.parent/f"Output_Tree_Cover_Slope_Comparisons_{opera_id}"
output_dir.mkdir(exist_ok=True)

moments = {p:{} for p in pols}

for p in pols:
    fore_pth = list(data_dir.glob(f"*{p}*_clip_foreslope.tif"))[0]
    back_pth = list(data_dir.glob(f"*{p}*_clip_backslope.tif"))[0]
    flat_pth = list(data_dir.glob(f"*{p}*_clip_flat.tif"))[0]
    
    fore = rxr.open_rasterio(str(fore_pth), masked=True).to_numpy().flatten()
    back = rxr.open_rasterio(str(back_pth), masked=True).to_numpy().flatten()
    flat = rxr.open_rasterio(str(flat_pth), masked=True).to_numpy().flatten()
    
    if log:
        fore = 10 * np.log10(fore)
        back = 10 * np.log10(back)
        flat = 10 * np.log10(flat)    
    
    # calculate means and standard deviations for full scene
    means = [np.nanmean(fore), np.nanmean(back), np.nanmean(flat)]
    medians = [np.nanmedian(fore), np.nanmedian(back), np.nanmedian(flat)]
    modes = [stats.mode(fore[~np.isnan(fore)], keepdims=False)[0], stats.mode(back[~np.isnan(back)],keepdims=False)[0], stats.mode(flat[~np.isnan(flat)],keepdims=False)[0]]    
    stds = [np.nanstd(fore), np.nanstd(back), np.nanstd(flat)]
    central_moments = [means, medians, modes, stds]
    
    output = f"{output_dir}/full_scene_{p}_PLOT"
        
    minmax = [min(np.nanpercentile(fore, 0.1), np.nanpercentile(back, 0.1), np.nanpercentile(flat, 0.1)),
              max(np.nanpercentile(fore, 99.9), np.nanpercentile(back, 99.9), np.nanpercentile(flat, 99.9))
             ]
        
    if p == 'vh':
        vh_fore = fore
        vh_back = back
        moments[p] = central_moments
    else:
        vv_fore = fore
        vv_back = back
        moments[p] = central_moments

    plot_backscatter_distributions_by_slope(fore, back, flat, central_moments, f'FULL SCENE {p}', data_dir.stem, backscatter_minmax=minmax, output=output)

<hr>

# 5. OPERA Terrain Flattening Evaluation Results

In [None]:
if np.abs(medians[0]-medians[1]) < 1.0:
    PF = 'PASS'
else:
    PF = 'FAIL'

d = {'Foreslopes': [medians[0]], 'Backslopes': [medians[1]], 'Difference': [np.abs(medians[0]-medians[1])], 'Pass/Fail': [PF]}
df = pd.DataFrame(data=d, index= ['$\mu_{\gamma^0}$ [dB]'])

def style_negative(PF, props=''):
    return props if PF == 'FAIL' else None
s2 = df.style.map(style_negative, props='color:red;')\
              .map(lambda PF: 'opacity: 100%;' if (PF == 'FAIL') else None)

def style_positive(PF, props=''):
    return props if PF == 'PASS' else None
s2 = df.style.map(style_positive, props='color:green; ')\
              .map(lambda PF: 'opacity: 100%;' if (PF == 'PASS') else None)

print(' ')
print('-------------------------------------------------')
print(f'Evaluation Results for Frame {opera_id}')
print('-------------------------------------------------')
print(' ')
s2

In [None]:
output_csv = output_dir/f"Results_Backscatter_Distributions_by_Slope_{opera_id}.csv"

fields = [
    "Granule", "Polarization", 
    "Foreslope Mean", "Backslope Mean",
    "Foreslope Median", "Backslope Median",
    "Foreslope Mode", "Backslope Mode",
    "Foreslope STD", "Backslope STD",
    "Foreslope Median - Backslope Median",
    "Pass/Fail"
]

np.abs(medians[0]-medians[1])

for p in pols:
    row = [
        opera_id, p, 
        moments[p][0][0], moments[p][0][1],
        moments[p][1][0], moments[p][1][1],
        moments[p][2][0], moments[p][2][1],
        moments[p][3][0], moments[p][3][1],
        np.abs(moments[p][1][0] - moments[p][1][1]),
        np.abs(moments[p][1][0] - moments[p][1][1]) < 1.0
    ]
    row = [str(v) for v in row]
        

    if not output_csv.exists():
        with open(output_csv, 'w') as csvfile:
            csvwriter = csv.writer(csvfile)
            csvwriter.writerow(fields)
            csvwriter.writerow(row)
    else:
        with open(output_csv, 'r') as csvfile:
            csvreader = csv.reader(csvfile)
            csvreader_list = list(csvreader)
            if row not in csvreader_list:
                with open(output_csv, 'a') as csvfile:
                    csvwriter = csv.writer(csvfile)
                    csvwriter.writerow(row)

<hr>

<div class="alert alert-success">
<font size='5'><b>Optional Sections:</b></font>
    
These next sections are optional and can be used to further statistically analyze the obtained data set. 
</div>

## Subset datasets for T-tests

For each polarization:
- avoid using adjoining foreslope and backslope pixels to ensure independent sampling
    - keep every 30th foreslope pixel value, starting at index 0
    - keep every 30th backslope pixel value, starting at index 15
- remove nan values from subsets
- select n pixels from each subset (as defined by `sample_size` below

In [None]:
# sample_size = 500

# vh_fore_subset = vh_fore[::30]
# vh_fore_subset = vh_fore_subset[~np.isnan(vh_fore_subset)]
# vh_fore_subset = np.random.choice(vh_fore_subset, size=sample_size, replace=False)

# vh_back_subset = vh_back[15::30]
# vh_back_subset = vh_back_subset[~np.isnan(vh_back_subset)]
# vh_back_subset = np.random.choice(vh_back_subset, size=sample_size, replace=False)

# vv_fore_subset = vv_fore[::30]
# vv_fore_subset = vv_fore_subset[~np.isnan(vv_fore_subset)]
# vv_fore_subset = np.random.choice(vv_fore_subset, size=sample_size, replace=False)

# vv_back_subset = vv_back[15::30]
# vv_back_subset = vv_back_subset[~np.isnan(vv_back_subset)]
# vv_back_subset = np.random.choice(vv_back_subset, size=sample_size, replace=False)

---
## Perform Shapiro-Wilk tests to confirm that the subset backscatter data are normally distributed for each polarization and slope

### VH Foreslope Normality

In [None]:
# import math
# vh_fore_stats = stats.describe(vh_fore_subset)
# print(f"vh_fore_subset:\n{vh_fore_stats}")
# mean = vh_fore_stats.mean
# std = math.sqrt(vh_fore_stats.variance)

In [None]:
# _, ax = plt.subplots(figsize=(8, 6))
# ax.hist(vh_fore_subset, bins=50)
# ax.set(title='VH Foreslope Subset Pixel Distribution', xlabel='backscatter', ylabel='Frequency')
# plt.show()

In [None]:
# stats.anderson(vh_fore_subset, dist='norm')

In [None]:
# vh_fore_shapiro = stats.shapiro(vh_fore_subset)
# print(f"{vh_fore_shapiro}\n")
# vh_fore_normal = vh_fore_shapiro.pvalue >= 0.05
# if vh_fore_normal:
#     print(f"The VH foreslope subset backscatter values are normally distributed")
# else:
#     print(f"The VH foreslope subset backscatter values are NOT normally distributed")

### VH Backslope Normality

In [None]:
# _, ax = plt.subplots(figsize=(8, 6))
# ax.hist(vh_back_subset, bins=200)
# ax.set(title='VH Backslope Subset Pixel Distribution', xlabel='backscatter', ylabel='Frequency')
# plt.show()

In [None]:
# stats.anderson(vh_back_subset, dist='norm')

In [None]:
# vh_back_shapiro = stats.shapiro(vh_back_subset)
# print(f"{vh_back_shapiro}\n")
# vh_back_normal = vh_back_shapiro.pvalue >= 0.05
# if vh_back_normal:
#     print(f"The VH backslope subset backscatter values are normally distributed")
# else:
#     print(f"The VH backslope subset backscatter values are NOT normally distributed")

### VV Foreslope Normality

In [None]:
# _, ax = plt.subplots(figsize=(8, 6))
# ax.hist(vv_fore_subset, bins=200)
# ax.set(title='VV Foreslope Subset Pixel Distribution', xlabel='backscatter', ylabel='Frequency')
# plt.show()

In [None]:
# stats.anderson(vv_fore_subset, dist='norm')

In [None]:
# vv_fore_shapiro = stats.shapiro(vv_fore_subset)
# print(f"{vv_fore_shapiro}\n")
# vv_fore_normal = vv_fore_shapiro.pvalue >= 0.05
# if vv_fore_normal:
#     print(f"The VV foreslope subset backscatter values are normally distributed")
# else:
#     print(f"The VV foreslope subset backscatter values are NOT normally distributed")

### VV Backslope Normality

In [None]:
# _, ax = plt.subplots(figsize=(8, 6))
# ax.hist(vv_back_subset, bins=200)
# ax.set(title='VV Backslope Subset Pixel Distribution', xlabel='backscatter', ylabel='Frequency')
# plt.show()

In [None]:
# stats.anderson(vv_back_subset, dist='norm')

In [None]:
# vv_back_shapiro = stats.shapiro(vv_back_subset)
# print(f"{vv_back_shapiro}\n")
# vv_back_normal = vv_back_shapiro.pvalue >= 0.05
# if vv_back_normal:
#     print(f"The VV backslope subset backscatter values are normally distributed")
# else:
#     print(f"The VV backslope subset backscatter values are NOT normally distributed")

---
## VH T-Tests

### Print some general sample stats

In [None]:
# print(f"vh_fore_subset:\n{stats.describe(vh_fore_subset)}")
# print(f"\nvh_back_subset:\n{stats.describe(vh_back_subset)}")

### VH T-test for means of two independent samples from descriptive statistics.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind_from_stats.html#scipy.stats.ttest_ind_from_stats

- This is a test for the null hypothesis that two independent samples have identical average (expected) values.

In [None]:
# stats.ttest_ind_from_stats(np.mean(vh_fore_subset), np.std(vh_fore_subset), len(vh_fore_subset), 
#                            np.mean(vh_back_subset), np.std(vh_back_subset), len(vh_back_subset), 
#                            equal_var=False, alternative='two-sided')

### VH T-test for the means of two independent samples.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

- This is a test for the null hypothesis that 2 independent samples have identical average (expected) values

In [None]:
# stats.ttest_ind(vh_fore_subset, vh_back_subset, equal_var=False)

### VH T-test on TWO RELATED samples of scores, a and b.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html

- This is a test for the null hypothesis that two related or repeated samples have identical average (expected) values.

In [None]:
# stats.ttest_rel(vh_fore_subset, vh_back_subset)

---
## VV T-Tests

### Print some general sample stats

In [None]:
# print(f"vv_fore_subset:\n{stats.describe(vv_fore_subset)}")
# print(f"\nvv_back_subset:\n{stats.describe(vv_back_subset)}")

### VV T-test for means of two independent samples from descriptive statistics.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind_from_stats.html#scipy.stats.ttest_ind_from_stats

- This is a test for the null hypothesis that two independent samples have identical average (expected) values.

In [None]:
# stats.ttest_ind_from_stats(np.mean(vv_fore_subset), np.std(vv_fore_subset), len(vv_fore_subset), 
#                            np.mean(vv_back_subset), np.std(vv_back_subset), len(vv_back_subset), 
#                            equal_var=False, alternative='two-sided')

### VV T-test for the means of two independent samples.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

- This is a test for the null hypothesis that 2 independent samples have identical average (expected) values

In [None]:
# stats.ttest_ind(vv_fore_subset, vv_back_subset, equal_var=False)

### VV T-test on TWO RELATED samples of scores, a and b.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html

- This is a test for the null hypothesis that two related or repeated samples have identical average (expected) values.

In [None]:
# stats.ttest_rel(vv_fore_subset, vv_back_subset)

*Backscatter_Distributions_by_Slope - Version 2.0.0 - May 2022*

*Change log*

- *Do not remove outliers*
- *Fixed ranges for x-axis scaling*
  - *different ranges for VH and VV polarizations*
- *Add median and mode to central moments*