# OPERA RTC Validation: Cross Correlation-based Relative Geolocation of a Stack

**Alex Lewandowski, Eric Lundell, & Franz J Meyer; Alaska Satellite Facility, University of Alaska Fairbanks**

This notebook analyzes the relative geolocation quality of OPERA RTC products using cross-correlation of images in a stack.

Once Dask is initialized and tiffs are selected, the rest of the notebook can be run automatically. Once a section is run, subsequent sections can be rerun independently. This can reduce the non-linear aspects of notebooks and also allow for more playing with code.   

The following procedures will be applied in this notebook:

1. Select directory containing OPERA RTC mosaics prepared using the `OPERA_RTC_download_reproject_mosaic_sample_bursts.ipynb` notebook
1. Tiffs in the selected polarization are copied to the same directory (`./vh/` or `./vv/`)
1. Superset tiffs to a common AOI, in-place
1. Due to spikes in data, flatten the tiffs by chopping off the top and bottom 1% of values. Save tiffs in `./{polarization}_flattened/`.
1. Because of NaNs and other areas of no-data, evenly tile the tiffs (default to 8x8). To speed things up, we use a Dask LocalCluster to multiprocess. Tiles are saved in `./{polarization}_flattened_tiles/`.
1. Apply the cross-correlation function to the individual tile nearest-chronological pairs. If more than 10% of a tile is NaNs, treat the whole tile as a NaN. Any remaining NaNs are converted to zero. Data is upscaled by a factor of ten. The cross correlation results include the shift in x and y and the RMSE. Results are converted from degrees to meters. The results are saved as json files for each tile pair in `./{polarization}_correlation/`.
1. Perform analysis on json results. Results are read into a Pandas DataFrame. A statistical description and graph of the results are shown in two ways: all tiles in a scene are averaged and all tiles are averaged in time. 

<hr>

# 0. OPERA RTC Relative Geolocation Requirement

<div class="alert alert-success">
<i>The Sentinel-1-based RTC product (RTC-S1) shall meet a relative geolocation accuracy better than or equal to 6 meters given the 30 meter RTC-S1 product resolution (i.e. 20% of the product resolution), excluding the effects of DEM errors, for at least 80% of all validation products considered.</i>
</div>

<hr>

# 1. Load Necessary Libraries

In [None]:
import csv
from datetime import datetime
import json
import math
from pathlib import Path
import re
import shutil
from tqdm.auto import tqdm

import dask.distributed
import matplotlib.pyplot as plt
import numpy as np
from osgeo import gdal
gdal.UseExceptions()
import pandas as pd
import rasterio
from rasterio.mask import mask
from scipy import stats
from shapely import geometry
from skimage.registration import phase_cross_correlation

from ipyfilechooser import FileChooser
import ipywidgets as widgets

import opensarlab_lib as asfn

%matplotlib inline

In [None]:
METERS_PER_PIXEL = 30
X_NUM = 8
Y_NUM = 8

<hr>

# 2. Setup Dask Methods

Dask on a LocalCluster is used for multiprocessing to make some operations go faster. It is assumed that only one Dask client is used at one time.

In [None]:
def setup_dask(ram_per_worker_gb:int=20, num_workers:int=20, num_threads_per_worker:int=1) -> dask.distributed.Client:
    cluster = dask.distributed.LocalCluster(
        threads_per_worker=num_threads_per_worker,
        n_workers=num_workers,
        memory_limit=f"{ram_per_worker_gb}GB",
        processes=True
    )

    return dask.distributed.Client(cluster)

def teardown_dask(client: dask.distributed.Client) -> None:
    client.shutdown()

def do_dask(client: dask.distributed.Client, callback, args: list):
    try:
        futures = client.map(callback, args)
        dask.distributed.progress(futures)
    except Exception as e:
        print(f"Error in dask: {e}")
        teardown_dask(client)
        return
    
    _  = client.gather(futures)

<hr>

# 3. Select the directory holding your OPERA RTC sample data

Choose the parent directory of all child directories that contain the desired stack of OPERA sample RTCs, which were downloaded and mosaiced with `OPERA_RTC_download_reproject_mosaic_sample_bursts.ipynb`

```
stack_directory ──
                 │
                 │─ OPERA_L2-RTC_*_30_v1.0 ──
                 │                          │
                 │                          │─  OPERA_L2_RTC-S1_VH*_30_v1.0_mosaic.tif
                 │                          │─  OPERA_L2_RTC-S1_VV*_30_v1.0_mosaic.tif
                 │    
                 │─ OPERA_L2-RTC_*_30_v1.0 ──
                 │                          │
                 │                          │─  OPERA_L2_RTC-S1_VH*_30_v1.0_mosaic.tif
                 │                          │─  OPERA_L2_RTC-S1_VV*_30_v1.0_mosaic.tif
                 │          
                 .
                 .
                 .

```

In [None]:
print("Select directory holding stack of mosaiced OPERA RTCs produced using OPERA_RTC_download_reproject_mosaic_sample_bursts.ipynb")
fc = FileChooser(Path.cwd())
fc.show_only_dirs = True
display(fc)

In [None]:
print("Select a polarization on which to run cross-correlation") 
polar = asfn.select_parameter(['VH', 'VV'])
display(polar)

In [None]:
# try/except for papermill
try:
    polarization = polar.value.lower()
    stack_dir = Path(fc.selected)
except:
    pass

In [None]:
stack_dir = Path(stack_dir)

output_dir = stack_dir.parent/f"output_Coregistration"

polar_stack_dir = output_dir/f"{polarization}"

polar_stack_dir.mkdir(exist_ok=True, parents=True)

tiff_og = list(stack_dir.glob(f"*/OPERA_L2_RTC-S1_{polarization}*_30_v1.0_mosaic.tif"))

In [None]:
for p in tiff_og:
    if not (polar_stack_dir/p.name).exists():
        shutil.copy(p, polar_stack_dir/p.name)

In [None]:
tiff_pths = list(polar_stack_dir.glob(f"OPERA_L2_RTC-S1_{polarization}*_30_v1.0_mosaic.tif"))
tiff_pths

<hr>

# 4. Superset OPERA RTC Images

Scene frames have a tendency to move over time. This means that the extant coverage for the whole scene is always different per frame. For the cross-correlation to properly work and for more accurate comparison, all the scenes need to be "normalized" by increasing/decreasing the size of the square extant. 

From extant metadata, get the full superset coordinates for all stack scenes.

**Plot the first image in your stack to visualize some of the data.**

In [None]:
src = rasterio.open(tiff_pths[0], mode='r')
plt.imshow(src.read(1), cmap='pink', vmin=0.0, vmax=0.1)

In [None]:
# Open all the tiffs and get overall coords.
superset = {
    'left': math.inf,
    'bottom': math.inf,
    'right': -math.inf,
    'top': -math.inf
}

# The SRS is set to the first raster. It is assumed that the SRSs are the same for all.
output_srs = None

for i, original_path in enumerate(tiff_pths):

    raster = rasterio.open(original_path)    
    raster_bounds = raster.bounds
    print(raster_bounds)
    
    if i == 0:
        output_srs = raster.crs
    
    superset = {
        'left': min(superset['left'], raster_bounds.left),
        'bottom': min(superset['bottom'], raster_bounds.bottom), 
        'right': max(superset['right'], raster_bounds.right), 
        'top': max(superset['top'], raster_bounds.top)
    }

print(f"Superset box coords: {superset}")
print(f"Output SRS: {output_srs}")

In [None]:
output_bounds = (
            superset['left'], 
            superset['bottom'],
            superset['right'],
            superset['top'],
        )

print(f"Output bounds (superset) set to '{output_bounds}'")
print(f"Output SRS set to '{output_srs}'")

# Superset and save tiffs
for original_path in tqdm(tiff_pths):
   
    gdal.Warp(
        str(original_path),
        str(original_path), 
        outputBounds=output_bounds,
        outputBoundsSRS=output_srs,
        xRes=30.0, 
        yRes=30.0, 
        targetAlignedPixels=True,
    )
    

<hr>

# 5. Flatten and Save RTCs

Often, the RTCs have extraneous high and low values that make matching difficult. So we need to get rid of these and save the intermediate results.

In [None]:
flatten_dir = output_dir/f"{polarization}_flattened"
flat_choice = None
if len(list(flatten_dir.glob("*.tif*"))) > 0:
    print("Do you wish to skip flattening, add flattened tiffs to the directory, or delete and replace the contents of the directory?")
    flat_choice = asfn.select_parameter(["skip flattening", "add flattened layers", "delete and replace flattened layers"])
    display(flat_choice)

In [None]:
!mkdir -p {flatten_dir}

if flat_choice and 'delete' in flat_choice.value:
    # Remove any staged intermediate files to work in a clean area
    for filepath in flatten_dir.glob("*.tif*"):
        filepath.unlink()

In [None]:
def flatten(df: pd.DataFrame) -> pd.DataFrame:
    """
    Truncated values become NaNs
    """
    df[df < np.nanpercentile(df, 1)] = np.nan
    df[df > np.nanpercentile(df, 99)] = np.nan
    return df

if not flat_choice or "add" in flat_choice.value or "delete" in flat_choice.value:

    for p in tqdm(tiff_pths):
        print(f"Flattening {p}")

        # Convert raster to dataframe
        raster = rasterio.open(p)
        raster_metadata = raster.meta

        raster0 = raster.read(1)
        df = pd.DataFrame(raster0)

        # Flatten raster data
        df_flatten = flatten(df)

        flatten_path = flatten_dir/f"{p.stem}_flat.tif"

        with rasterio.open(flatten_path, 'w', **raster_metadata) as out:
            out.write(df_flatten, 1)

<hr>

# 6. Tile and Save GeoTiffs

In [None]:
from typing import List
from rasterio.windows import Window

def split_into_cells_args(x_num: int, y_num: int, tiff_pths: List[Path], output_dir: Path) -> List:
    """
    return list of dict of args for `split_into_cells` dask function callback.
    """
    
    args = []
    for i, flatten_path in enumerate(tiff_pths):
        args.append({
            'input_number': i, 
            'input_file': flatten_path, 
            'output_dir': output_dir, 
            'x_num': x_num, 
            'y_num': y_num
        })
    
    return args 

# https://gis.stackexchange.com/a/306862
# Takes a Rasterio dataset and splits it into squares of dimensions squareDim * squareDim
def split_into_cells(args):
    """
    input_number: A sequential number representing the ordering of the scenes. This is to make later scene pairing easier.
    input_file: Full file path of scene to be tiled.
    output_dir: Full path of directory to place tiles.
    x_num: Number of tiles formed in the x direction per scene.
    y_num: Number of tiles formed in the y direction per scene.
    """
    
    input_number: int = args['input_number']
    input_file: str = args['input_file']
    output_dir: str = args['output_dir']
    x_num: int = args.get('x_num', 1)
    y_num: int = args.get('y_num', 1)
    
    print(f"Tileing {input_file}")

    
    raster = rasterio.open(input_file)
    
    x_dim = raster.shape[1] // x_num
    y_dim = raster.shape[0] // y_num

    x, y = 0, 0
    for i, y_iter in enumerate(range(y_num)):
        y = y_iter * y_dim
        for x_iter in range(x_num):
            x = x_iter * x_dim
            
            input_filestem = Path(input_file).stem
            
            output_file = f'{input_filestem}_{input_number}_{y_iter}_{x_iter}.tif'
            print(f"Creating tile {output_file}...")
            
            # Get tile geometry
            corner1 = raster.transform * (x, y)
            corner2 = raster.transform * (x + x_dim, y + y_dim)
            geom = geometry.box(corner1[0], corner1[1], corner2[0], corner2[1])
            
            # Get cell 
            crop, cropTransform = mask(raster, [geom], crop=True)
            
            meta_args = {
                'dtype': 'float32',
                'nodata': np.nan,
                'count': 1,
                "driver": "GTiff",
                "height": crop.shape[1],
                "width": crop.shape[2],
                "transform": cropTransform,
                "crs": raster.crs
            }
                        
            output_filepath = f"{output_dir}/{output_file}"
            with rasterio.open(output_filepath, "w", **meta_args) as out:
                out.write(crop)


In [None]:
tile_dir = output_dir/f"{polarization}_flattened_tiles"
tile_choice = None
if len(list(tile_dir.glob("*.tif*"))) > 0:
    print("Do you wish to skip tiling, add tiles, or delete and replace the contents of the tile directory?")
    tile_choice = asfn.select_parameter(["skip tiling", "add tiles", "delete and replace tiles"])
    display(tile_choice)

In [None]:
try:
    tile_dir.mkdir()
except FileExistsError:
    pass

if tile_choice and 'delete' in tile_choice.value:
    # Remove any staged intermediate files to work in a clean area
    for filepath in tile_dir.glob("*.tif*"):
        filepath.unlink()

In [None]:
if not tile_choice or "add" in tile_choice.value or "delete" in tile_choice.value:
    

    tiff_pths = list(flatten_dir.glob(f"*{polarization}*.tif"))

    start_time = datetime.now()
    print(f"\nStart time is {start_time}")

    client = setup_dask(ram_per_worker_gb=20, num_workers=100, num_threads_per_worker=1)
    do_dask(client, split_into_cells, split_into_cells_args(x_num=X_NUM, y_num=Y_NUM, tiff_pths=tiff_pths, output_dir=tile_dir))

    teardown_dask(client)

    end_time = datetime.now()
    print(f"\nEnd time is {end_time}")
    print(f"Time elapsed is {end_time - start_time}\n")  

<hr>

# 7. Correlate Tiles and Save Results

In [None]:
def append_correlation_args(corr_arg_list, ref_row, sec_row):
    for j in range(X_NUM*Y_NUM-1):
        ref = ref_row.iloc[j]
        sec = sec_row.iloc[j]   

        corr_arg_list.append({
            'reference_index': ref['tile_index'],
            'secondary_index': sec['tile_index'],
            'tile_number_x': ref['tile_number_x'],
            'tile_number_y': ref['tile_number_y'],
            'ref_file_path': ref['file_path'],
            'sec_file_path': sec['file_path']
        })

def get_correlation_args(tiles_paths, first_last=False, additional_step=1) -> list:
    """
    return [
        {
            'reference_index': '',
            'secondary_index': '',
            'tile_number_x': '',
            'tile_number_y': '',
            'ref_file_path': '',
            'sec_file_path': ''
        },
    ]
    """

    tiles = []
    
    # Get index and tile numbers from path
    for tiles_path in tiles_paths:

        m = re.match(r".*_([0-9]+)_([0-9]+)_([0-9]+).tif*", tiles_path.name)

        tiles.append({
            'tile_index': m.group(1),
            'tile_number_x': m.group(2),
            'tile_number_y': m.group(3),
            'file_path': tiles_path
        })

    tiles_df = pd.DataFrame(tiles).sort_values(by=['tile_index', 'tile_number_x', 'tile_number_y'])

    corr_arg_list = []
    scene_count = len(set(tiles_df.tile_index))
    
    for i in range(scene_count-1):
        ref_row = tiles_df.loc[tiles_df['tile_index'] == str(i)]
        sec_row = tiles_df.loc[tiles_df['tile_index'] == str(i+1)]
        append_correlation_args(corr_arg_list, ref_row, sec_row)
          
    if first_last:
        ref_row = tiles_df.loc[tiles_df['tile_index'] == str(0)]
        sec_row = tiles_df.loc[tiles_df['tile_index'] == str(scene_count-1)]
        append_correlation_args(corr_arg_list, ref_row, sec_row)
        
    if additional_step > 1:
        for i in range(0, scene_count-additional_step, additional_step):
            ref_row = tiles_df.loc[tiles_df['tile_index'] == str(i)]
            sec_row = tiles_df.loc[tiles_df['tile_index'] == str(i+additional_step)]    
            append_correlation_args(corr_arg_list, ref_row, sec_row)

    return corr_arg_list

def correlation_callback(args: dict) -> dict:
    """
    args = {
        'reference_index': '',
        'secondary_index': '',
        'tile_number_x': '',
        'tile_number_y': '',
        'ref_file_path': '',
        'sec_file_path': ''
    }
    """
    
    try:
        reference_index = args['reference_index']
        secondary_index = args['secondary_index']
        tile_number_x = args['tile_number_x']
        tile_number_y = args['tile_number_y']
        ref_file_path = args['ref_file_path']
        sec_file_path = args['sec_file_path']
        
        ###### Reference 
        stime = datetime.now()
        print(f"\nIndex {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Rendering {ref_file_path}...")
        rast = rasterio.open(ref_file_path)
        raster0 = rast.read(1)
        df_ref = pd.DataFrame(raster0)
        print(f"Index {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Time to complete ref: {datetime.now() - stime}")


        ###### Secondary
        stime = datetime.now()
        print(f"\nIndex {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Rendering {sec_file_path}...")
        rast = rasterio.open(sec_file_path)
        raster0 = rast.read(1)
        df_sec = pd.DataFrame(raster0)
        print(f"Index {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Time to complete sec: {datetime.now() - stime}")


        ###### If crop tile is more than 10% NANs, skip correlation and set return values to NaN 
        def get_percent_nans(df):
            number_of_elements = df.size
            number_of_nans = df.isnull().sum().sum()

            return number_of_nans / number_of_elements

        percent_nans_ref = get_percent_nans(df_ref)
        percent_nans_sec = get_percent_nans(df_sec)

        if percent_nans_ref > 0.10 or percent_nans_sec > 0.10:
            print(f"\nIndex {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Too many NaNs. Skipping correlation....")

            result = {
                "reference_index": int(reference_index),
                "secondary_index": int(secondary_index),
                "tile_number_x": int(tile_number_x),
                "tile_number_y": int(tile_number_y),
                "ref_file": str(ref_file_path),
                "sec_file": str(sec_file_path),
                "shift_x": np.nan,
                "shift_y": np.nan, 
                "error": np.nan, 
                "phase": np.nan,
                "message": "Too many NaNs"
            }

        ####### Cross corr without masking
        stime = datetime.now()
        print(f"\nIndex {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Finding phase correlation with nans set to zero....")
        shift, error, phase = phase_cross_correlation(
            df_ref.replace(np.nan, 0), 
            df_sec.replace(np.nan, 0),
            normalization=None,
            upsample_factor=10
        )
        
        shift = shift * METERS_PER_PIXEL
        error = error * METERS_PER_PIXEL
        
        print(f"Index {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Shift vector (in meters) required to register moving_image with reference_image: {shift}")
        print(f"Index {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Translation invariant normalized RMS error between reference_image and moving_image: {error}")
        print(f"Index {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}: Global phase difference between the two images (should be zero if images are non-negative).: {phase}\n")

        if len(list(shift)) != 2:
            result = {
                "reference_index": int(reference_index),
                "secondary_index": int(secondary_index),
                "tile_number_x": int(tile_number_x),
                "tile_number_y": int(tile_number_y),
                "ref_file": str(ref_file_path),
                "sec_file": str(sec_file_path),
                "shift_x": np.float64(shift[0]),
                "shift_y": np.float64(shift[1]),
                "error": np.nan, 
                "phase": np.nan,
                "message": "Shift is not a two element array"
            }
        
        print(f"Index {reference_index} {secondary_index}, Tile {tile_number_x} {tile_number_y}:  Time to complete correlation: {datetime.now() - stime}")


        ####### Write metadata to correlation result files

        result = {
            "reference_index": int(reference_index),
            "secondary_index": int(secondary_index),
            "tile_number_x": int(tile_number_x),
            "tile_number_y": int(tile_number_y),
            "ref_file": str(ref_file_path),
            "sec_file": str(sec_file_path),
            "shift_x": np.float64(shift[0]),
            "shift_y": np.float64(shift[1]),
            "error": np.float64(error), 
            "phase": np.float64(phase),
            "message": "Correlation successful"
        }

    except Exception as e:
        print(f"An error occurred: {e}")
        result = {
            "reference_index": int(reference_index),
            "secondary_index": int(secondary_index),
            "tile_number_x": int(tile_number_x),
            "tile_number_y": int(tile_number_y),
            "ref_file": str(ref_file_path),
            "sec_file": str(sec_file_path),
            "shift_x": np.nan, 
            "shift_y": np.nan,
            "error": np.nan, 
            "phase": np.nan,
            "message": f"Error: {e}"
        }
        
    try:
        result_file = correlation_dir/f"index_{reference_index}_{secondary_index}-tile_{tile_number_x}_{tile_number_y}.json"
        with open(result_file, 'w') as f:
            json.dump(result, f)
    except Exception as e:
        print(f"An error occurred: {e}")

In [None]:
correlation_dir = output_dir/f"{polarization}_correlation"
correlation_choice = None
if len(list(correlation_dir.glob("*.tif*"))) > 0:
    print("Do you wish to skip correlation, add correlation results, or delete and replace the correlation results?")
    correlation_choice = asfn.select_parameter(["skip correlation", "add correlation results", "delete and replace correlation results"])
    display(correlation_choice)

In [None]:
!mkdir -p {correlation_dir}

if correlation_choice and 'delete' in correlation_choice.value:
    # Remove any staged intermediate files to work in a clean area
    for filepath in correlation_dir.glob("*.tif*"):
        filepath.unlink()

In [None]:
if not correlation_choice or "add" in correlation_choice.value or "delete" in correlation_choice.value:
    flat_tif_pth = list(tile_dir.glob("*tif*"))
    flat_tif_pth

    start_time = datetime.now()
    print(f"\nStart time is {start_time}")

    # ram_per_worker_gb:int=20, num_workers:int=20, num_threads_per_worker:int=1
    client = setup_dask(ram_per_worker_gb=11, num_workers=10)
    do_dask(client, correlation_callback, get_correlation_args(flat_tif_pth, first_last=True, additional_step=4))

    teardown_dask(client)

    end_time = datetime.now()
    print(f"\nEnd time is {end_time}")
    print(f"Time elapsed is {end_time - start_time}\n")  

<hr>

# 8. Estimate Average Offsets Per Tile and For the Full Scene

Read the correlation result files from the previous section into a Pandas DataFrame.

Then perform various statistical analyses.

In [None]:
# Put offset results into 3D Pandas dataset
correlation_paths = correlation_dir.glob(f"*.json")

offset_result_list = []

for corr_path in correlation_paths:
    with open(corr_path, 'r') as f:
        offset_result_list.append(json.load(f))

offset_result_df = pd.DataFrame(offset_result_list)

In [None]:
display(offset_result_df)

Display the stats for the shift, error, and phase of the cross-correlation between two scenes in the stack.

The `reference_index` is the order number of the reference scene within the stack. The stack scenes are ordered from newest to oldest.
The `secondary_index` is the order number of the secondary scene within the stack.

The `mean` value is a simple mean of all tile values. Similarity for `std`, etc.

## Combine all tiles per scene pair correlation results

The cross-correlation results of all tiles in Scene 1 and Scene 2 are re-assembled together into one result and statistically analyzed. Repeat for all pairs. 

**First, remove invalid matches:**

In [None]:
offset_result_df['error'] = offset_result_df[offset_result_df['error'] < 25.0]['error']
offset_result_df['shift_x'] = offset_result_df[np.abs(offset_result_df['shift_x']) < 50.0]['shift_x']
offset_result_df['shift_y'] = offset_result_df[np.abs(offset_result_df['shift_y']) < 50.0]['shift_y']
no_nans = offset_result_df[~offset_result_df.isnull().any(axis=1)]

In [None]:
offset_result_df

**Calculate average scene-wide offsets along with their errors:**

In [None]:
pair_offset_df = no_nans.groupby(by=['reference_index', 'secondary_index'])

## Uncomment to display statistical descriptions of DataFrames
# display(pair_offset_df['shift_x'].describe())
# display(pair_offset_df['shift_y'].describe())
# display(pair_offset_df['error'].describe())

display(pair_offset_df)

# Take the standard error of mean of the individual tile RMSE to get the overall RMSE.
def sem(series):
    if np.isnan(series).all():
        return np.nan
    return stats.sem(series, nan_policy='omit', axis=None)

aggregated_results_df = pd.DataFrame()
aggregated_results_df['tile_mean_x'] = pair_offset_df['shift_x'].agg(['mean'])
aggregated_results_df['tile_mean_y'] = pair_offset_df['shift_y'].agg(['mean'])
aggregated_results_df['error'] = pair_offset_df['error'].agg(sem)
display(aggregated_results_df)

super_mean_per_pair_tile_mean_x = np.nanmean(aggregated_results_df['tile_mean_x'])
super_mean_per_pair_tile_mean_y = np.nanmean(aggregated_results_df['tile_mean_y'])
std_per_pair_tile_mean_x = np.nanstd(aggregated_results_df['tile_mean_x'])
std_per_pair_tile_mean_y = np.nanstd(aggregated_results_df['tile_mean_y'])

print(f"Mean of tile_mean_x across all scenes: {super_mean_per_pair_tile_mean_x}")
print(f"Mean of tile_mean_y across all scenes: {super_mean_per_pair_tile_mean_y}")

print(f"STD of tile_mean_x across all scenes: {std_per_pair_tile_mean_x}")
print(f"STD of tile_mean_y across all scenes: {std_per_pair_tile_mean_y}")

fig, ax = plt.subplots(figsize=(8,8))
requirement = plt.Rectangle((-3.0,-3.0), 6.0, 6.0, fill=False, edgecolor='grey', label='Requirement')
ax.add_patch(requirement)
plt.grid(color='grey', alpha=0.4)

plt.errorbar(aggregated_results_df['tile_mean_x'], aggregated_results_df['tile_mean_y'], 
             yerr=aggregated_results_df['error'], xerr=aggregated_results_df['error'], 
             ecolor='green', alpha=0.3, ls='none')

plt.scatter(aggregated_results_df['tile_mean_x'], aggregated_results_df['tile_mean_y'],
            color='green', label='Offset Estimates')

plt.errorbar([super_mean_per_pair_tile_mean_x], [super_mean_per_pair_tile_mean_y], 
             yerr=[std_per_pair_tile_mean_y], xerr=[std_per_pair_tile_mean_x], 
             ecolor='red', alpha=0.3, ls='none')

plt.scatter([super_mean_per_pair_tile_mean_x], [super_mean_per_pair_tile_mean_y], 
            color='red', label='Mean of Offset Estimates')

ax.legend()
plt.xlabel("Easting Offset (Meters)")
plt.ylabel("Northing Offset (Meters)")
plt.title("Cross-correlation Offset Per Stack Pair w/ RMSE")

plt.savefig(correlation_dir/f'CrossCorrelationOffsets-Per_Pair_{polarization}_requirements.png', dpi=300, transparent='true')

# 9. Write results to a CSV

In [None]:
per_pair_csv = output_dir/f"{stack_dir.name}_per_pair_tile_offset_means.csv"

fields = ["stack", "polarization", "per pair mean tile mean x", "per pair mean tile mean y", "per pair STD tile mean x", "per pair STD tile mean y"]
stack = ' '.join([p.name for p in tiff_pths])

row = [stack, polarization, 
       super_mean_per_pair_tile_mean_x, super_mean_per_pair_tile_mean_y,
       std_per_pair_tile_mean_x, std_per_pair_tile_mean_y]

if not per_pair_csv.exists():
    with open(per_pair_csv, 'w') as csvfile:
        csvwriter = csv.writer(csvfile)
        csvwriter.writerow(fields)
        csvwriter.writerow(row)
else:
    with open(per_pair_csv, 'r') as csvfile:
        csvreader = csv.reader(csvfile)
        stacks = [c[0] for c in list(csvreader)]
    if stack not in stacks:
        with open(per_pair_csv, 'a') as csvfile:
            csvwriter = csv.writer(csvfile)
            csvwriter.writerow(row)

# 10. Clean up input data and intermediate products (optional)

In [None]:
# try/except for Papermill
try:
    if type(delete_mosaics) == bool:
        pass
except NameError:
    print("Do you wish to save or delete the mosaicked RTCs and static files?")
    mosaic_cleanup = widgets.RadioButtons(
            options=['save mosaics', 'delete mosaics'],
            disabled=False,
        )
    display(mosaic_cleanup)

In [None]:
# try/except for Papermill
try:
    delete_mosaics = "delete" in mosaic_cleanup.value
except NameError:
    pass

In [None]:
def delete_paths(path_list):
    for p in path_list:
        p.unlink()

if delete_mosaics:
    inc_angle = list(stack_dir.glob(f"*/OPERA_RTC_v0.4_inc_angle_*_mosaic.tif"))
    local_inc_angle = list(stack_dir.glob(f"*/OPERA_RTC_v0.4_local_inc_angle_*_mosaic.tif"))
    ls_mask = list(stack_dir.glob(f"*/OPERA_RTC_v0.4_ls_mask_*_mosaic.tif"))
    for f in [tiff_og, inc_angle, local_inc_angle, ls_mask]:
        delete_paths(f)
        
    # delete dirs only if empty
    mosaic_dir_list = list(stack_dir.glob('*'))
    mosaic_dir_list.append(stack_dir)
    for d in mosaic_dir_list:
        try:
            d.rmdir()
        except OSError:
            pass

In [None]:
file_type_dict = {
    f"{polarization} amplitude data": polar_stack_dir,
    f"flattened {polarization} amplitude data": flatten_dir,
    f"flattened and tiled {polarization} amplitude data": tile_dir,
    f"{polarization} tile correlation results": correlation_dir
}

print("Select file types to delete from output directory")
      
output_cleanup = widgets.SelectMultiple(
    options=file_type_dict,
    disabled=False,
)
display(output_cleanup)

In [None]:
try:
    # raw string passed from Papermill
    cleanup_list = [l for l in cleanup_list.split(', ') if l != '']
except NameError:
    cleanup_list = list(output_cleanup.label)

In [None]:
# handle list as a raw string passed from Papermill
print(cleanup_list)
if type(cleanup_list) == str and len(cleanup_list) > 0:
    cleanup_list = [l for l in cleanup_list.split(', ') if l != '']
    print(cleanup_list)

    for k in cleanup_list:
        shutil.rmtree(file_type_dict[k])