![OpenSARlab notebook banner](NotebookAddons/blackboard-banner.png)

# Prepare an SBAS stack as a Zarr Store for a MintPy SBAS Analysis
### Alex Lewandowski; Alaska Satellite Facility

<img style="padding: 7px" src="NotebookAddons/UAFLogo_A_647.png" width="170" align="right"/></font>

**This notebook:**
1. Creates 2 xarray.Datasets
    1. sbas (x, y, insar-pairs)
    1. geometry (x, y)
2. Saves them to a Zarr store in an S3 bucket


## 0. Importing Relevant Python Packages

In this notebook we will use the following scientific libraries:

1. [GDAL](https://www.gdal.org/) is a software library for reading and writing raster and vector geospatial data formats. It includes a collection of programs tailored for geospatial data processing. Most modern GIS systems (such as ArcGIS or QGIS) use GDAL in the background.
1. [NumPy](http://www.numpy.org/) is one of the principal packages for scientific applications of Python. It is intended for processing large multidimensional arrays and matrices, and an extensive collection of high-level mathematical functions and implemented methods makes it possible to perform various operations with these objects.

**Our first step is to import them:**

In [None]:
%%capture
import copy
from datetime import datetime, timedelta, timezone
import json # for loads
import math
from pathlib import Path
import re
import shutil
import sys
from tqdm.auto import tqdm 
from typing import Union
import warnings

from ipyfilechooser import FileChooser

import numpy as np
from osgeo import gdal
import pandas as pd
import pycrs
import s3fs
import rioxarray
import xarray as xr
import yaml
import zarr

from mintpy.constants import SPEED_OF_LIGHT
from mintpy.objects import sensor
from mintpy.utils import readfile

from IPython.display import display, clear_output, Markdown

import asf_search
from hyp3_sdk import Batch, HyP3
import opensarlab_lib as osl

## 1. Select Your HyP3 GAMMA SBAS Stack

This notebook assumes that you've created an SBAS stack using the [Alaska Satellite Facility's](https://www.asf.alaska.edu/) value-added product system HyP3, available via [ASF Data Search (Vertex)](https://search.asf.alaska.edu/#/). HyP3 is an ASF service used to prototype value added products and provide them to users to collect feedback.

We will retrieve HyP3 data via the hyp3_sdk or work with previously downloaded data. As both HyP3 and the Notebook environment sit in the [Amazon Web Services (AWS)](https://aws.amazon.com/) cloud, data transfer is quick and cost effective.

---

If downloading data, create a data directory in which to download interferograms in your SBAS stack.

<!-- If working with a previously downloaded SBAS stack, each product should contain VH and VV  or HH and HV data, the HyP3 log file, and the HyP3 product README in subdirectories of the data directory: -->

```
data_directory   
│
└───interferogram_1_directory
│   │   *_unw_phase.tif
│   │   *_corr.tif
│   │   *_amp.tif
│   │   *_lv_theta.tif
│   │   *_lv_phi.tif
│   │   *_dem.tif
│   │   *.README.md.txt
│   │   *.txt
│   ...
│   
└───interferogram_2_directory
│   │   *_unw_phase.tif
│   │   *_corr.tif
│   │   *_amp.tif
│   │   *_lv_theta.tif
│   │   *_lv_phi.tif
│   │   *_dem.tif
│   │   *.README.md.txt
│   │   *.txt
│   ...
│ 
...
```

**Select the directory holding your data:**

In [None]:
print("Select your data directory")
fc = FileChooser(Path.cwd())
display(fc)

In [None]:
data_directory = Path(fc.selected_path)

print(f"data_directory: {data_directory}")

**Gather the paths to the DEMs, unwrapped-phase, amplitude, look vector, and spatial coherence rasters, plus the README and log files for each interferogram**

In [None]:
dems = list(data_directory.glob('*/*_dem.tif'))
dems.sort()

unw_phases = list(data_directory.glob('*/*_unw_phase.tif'))
unw_phases.sort()

amps = list(data_directory.glob('*/*_amp.tif'))
amps.sort()

lv_thetas = list(data_directory.glob('*/*_lv_theta.tif'))
lv_thetas.sort()

lv_phis = list(data_directory.glob('*/*_lv_phi.tif'))
lv_phis.sort()

corrs = list(data_directory.glob('*/*_corr.tif'))
corrs.sort()

metadata = list(data_directory.glob('*/*.README.md.txt'))
metadata.sort()

logs = list(data_directory.glob('*/*.txt'))
log_regex = re.compile('.*_\w{4}.txt')
logs = [Path(pth) for pth in filter(log_regex.match, [str(p) for p in logs])]
logs.sort()

s = [dems, unw_phases, amps, lv_thetas, lv_phis, corrs, metadata, logs]

stack_paths = [{'dem': s[0][i],
                'unw_phase': s[1][i], 
                'amp': s[2][i], 
                'lv_theta': s[3][i],
                'lv_phi': s[4][i], 
                'corr': s[5][i], 
                'metadata': s[6][i], 
                'log': s[7][i]} for i in range(0, len(dems))]

## Write some functions to:

- gather metadata
- calculate spatial extents
- find the padding required for each raster to fit into an Xarray Dataset

In [None]:
def get_corners_gdal(file):
    ds=gdal.Open(str(file))
    transform = ds.GetGeoTransform()
    x = ds.RasterXSize
    y = ds.RasterYSize
    
    ulx = transform[0]
    uly = transform[3]
    lrx = transform[0] + x * transform[1]
    lry = transform[3] + y * transform[5]
    return [ulx, uly, lrx, lry]

def pad_array(coords, pixel_size, left_pad_pixels=0, right_pad_pixels=0):  
    left_pad = left_pad_pixels * pixel_size
    right_pad = right_pad_pixels * pixel_size
    increasing = coords[0] < coords[-1]
    
    if increasing:
        left_coords = np.arange(start=coords[0]-left_pad, stop=coords[0], step=pixel_size)
        right_coords = np.arange(start=coords[-1]+pixel_size, stop=coords[-1]+pixel_size+right_pad, step=pixel_size)
    else:
        left_coords = np.arange(start=coords[0]+left_pad, stop=coords[0], step=-pixel_size)
        right_coords = np.arange(start=coords[-1]-pixel_size, stop=coords[-1]-pixel_size-right_pad, step=-pixel_size)      
        
    coords = np.append(left_coords, coords)
    return np.append(coords, right_coords)

def get_padding_to_match_xyxy_bbox(tiff, bbox):
        f = gdal.Open(str(tiff))
        pixel_size = f.GetGeoTransform()[1]
        corners = get_corners_gdal(tiff)
        
        left_x_pad_pixels = int(abs(bbox[0] - corners[0]) / pixel_size)
        right_x_pad_pixels = int((abs(bbox[2] - corners[2]) + pixel_size) / pixel_size)
        
        upper_y_pad_pixels = int(abs(bbox[1] - corners[1]) / pixel_size)
        lower_y_pad_pixels = int((abs(bbox[3] - corners[3]) + pixel_size) / pixel_size)
        return {
            "x_pad" : [left_x_pad_pixels, right_x_pad_pixels],
            "y_pad": [upper_y_pad_pixels, lower_y_pad_pixels]
        }

In [None]:
def get_epsg(path: Union[str, Path]) -> str:
    """
    returns the EPSG of a geotiff
    """
    info = gdal.Info(str(path), format='json')
    return info['coordinateSystem']['wkt'].split('ID')[-1].split(',')[1][0:-2]

In [None]:
def format_mintpy_attrs(attrs):
    mintpy_attrs = {
        # ROI_PAC/MintPy attributes
        'ALOOKS'             : attrs['Azimuth_looks'],
        'RLOOKS'             : attrs['Range_looks'],
        'AZIMUTH_PIXEL_SIZE' : sensor.SEN['azimuth_pixel_size'] * int(attrs['Azimuth_looks']),
        'RANGE_PIXEL_SIZE'   : sensor.SEN['range_pixel_size'] * int(attrs['Range_looks']),
        'ANTENNA_SIDE'       : -1,
        'CENTER_LINE_UTC'    : attrs['UTC_time'],
        'DATA_TYPE'          : attrs['DATA_TYPE'],
        'EARTH_RADIUS'       : attrs['Earth_radius_at_nadir'],
        'HEADING'            : attrs['Heading'],
        'HEIGHT'             : attrs['Spacecraft_height'],
        'BANDS'              : attrs['BANDS'],
        'INTERLEAVE'         : attrs['INTERLEAVE'],
        'LENGTH'             : attrs['LENGTH'],
        'ORBIT_DIRECTION'    : 'ASCENDING' if abs(float(attrs['Heading'])) < 90 else 'DESCENDING',
        'NO_DATA_VALUE'      : attrs['NO_DATA_VALUE'],
        'PLATFORM'           : 'Sen',
        'POLARIZATION'       : None,
        'PRF'                : None,
        'Reference_Granule'  : attrs['Reference_Granule'],
        'Secondary_Granule'  : attrs['Secondary_Granule'],
        'STARTING_RANGE'     : None,
        'UNIT'               : 'radian',
        'WAVELENGTH'         : SPEED_OF_LIGHT / sensor.SEN['carrier_frequency'],
        'WIDTH'              : attrs['WIDTH'],
        # # from PySAR [MintPy<=1.1.1]
        # 'REF_DATE'           : ['ref_date'],
        # 'REF_LAT'            : ['ref_lat'],
        # 'REF_LON'            : ['ref_lon'],
        # 'REF_X'              : ['ref_x'],
        # 'REF_Y'              : ['ref_y'],
        # 'SUBSET_XMIN'        : ['subset_x0'],
        # 'SUBSET_XMAX'        : ['subset_x1'],
        # 'SUBSET_YMIN'        : ['subset_y0'],
        # 'SUBSET_YMAX'        : ['subset_y1'],
        
        # from Gamma geo-coordinates - degree / meter
        'X_FIRST'            : attrs['X_FIRST'],
        'Y_FIRST'            : attrs['Y_FIRST'],
        'X_STEP'             : attrs['X_STEP'],
        'Y_STEP'             : attrs['Y_STEP'],

        # # HDF-EOS5 attributes
        # 'beam_swath'     : ['swathNumber'],
        # 'first_frame'    : ['firstFrameNumber'],
        # 'last_frame'     : ['lastFrameNumber'],
        # 'relative_orbit' : ['trackNumber'],
    }
    
    N = float(mintpy_attrs['Y_FIRST'])
    W = float(mintpy_attrs['X_FIRST'])
    S = N + float(mintpy_attrs['Y_STEP']) * int(mintpy_attrs['LENGTH'])
    E = W + float(mintpy_attrs['X_STEP']) * int(mintpy_attrs['WIDTH'])

    if mintpy_attrs['ORBIT_DIRECTION'] == 'ASCENDING':
        mintpy_attrs['LAT_REF1'] = str(S)
        mintpy_attrs['LAT_REF2'] = str(S)
        mintpy_attrs['LAT_REF3'] = str(N)
        mintpy_attrs['LAT_REF4'] = str(N)
        mintpy_attrs['LON_REF1'] = str(W)
        mintpy_attrs['LON_REF2'] = str(E)
        mintpy_attrs['LON_REF3'] = str(W)
        mintpy_attrs['LON_REF4'] = str(E)
    else:
        mintpy_attrs['LAT_REF1'] = str(N)
        mintpy_attrs['LAT_REF2'] = str(N)
        mintpy_attrs['LAT_REF3'] = str(S)
        mintpy_attrs['LAT_REF4'] = str(S)
        mintpy_attrs['LON_REF1'] = str(E)
        mintpy_attrs['LON_REF2'] = str(W)
        mintpy_attrs['LON_REF3'] = str(E)
        mintpy_attrs['LON_REF4'] = str(W)
        
    timestamp_regex = "(?<=_)\d{8}T\d{6}(?=_)"
    timestamps = re.findall(timestamp_regex, attrs['unw_phase_path'])
    
    date1 = datetime.strptime(timestamps[0],'%Y%m%dT%H%M%S')
    date2 = datetime.strptime(timestamps[1],'%Y%m%dT%H%M%S')
    mintpy_attrs['DATE12'] = f'{date1.strftime("%y%m%d")}-{date2.strftime("%y%m%d")}'
    mintpy_attrs['P_BASELINE_TOP_HDR'] = attrs['Baseline']
    mintpy_attrs['P_BASELINE_BOTTOM_HDR'] = attrs['Baseline']
    
    return mintpy_attrs
        
def get_insar_attrs(txt_path, unw_path):
    with open(txt_path, 'r') as f:
        lines = f.readlines()
    attrs = {'_'.join((l.split(":")[0]).split(' ')):(l.split(': ')[1]).split('\n')[0] for l in lines}
    attrs['unw_phase_path'] = unw_path.name
    attrs.update(readfile.read_gdal_vrt(unw_path))
    return format_mintpy_attrs(attrs)
    

## Write a function to turn a single interferogram into an Xarray Dataset

In [None]:
def sbas_to_xarray(insar_path, x_pad_pixels=[0,0], y_pad_pixels=[0,0]):   
    '''
    insar_path: dict of paths to insar rasters and metadata with keys:
                'dem', 'unw_phase', 'amp', 'lv_theta', 'lv_phi', 'corr', 'metadata', 'log'
    '''
    
    # get interferogram attrs
    attrs = get_insar_attrs(insar_path['log'], insar_path['unw_phase'])
    meta = readfile.read_gdal_vrt(insar_path['unw_phase'])
    attrs.update(meta)
    
    ds=gdal.Open(str(insar_path['unw_phase']))
    unw_phase_data = ds.GetRasterBand(1)
    unw_phase_data = unw_phase_data.ReadAsArray()
    unw_phase_data = np.ma.masked_invalid(unw_phase_data, copy=True)    
    
    ds=gdal.Open(str(insar_path['corr']))
    corr_data = ds.GetRasterBand(1)
    corr_data = corr_data.ReadAsArray()
    corr_data = np.ma.masked_invalid(corr_data, copy=True)
    
    # get coordinate system projection
    prj = ds.GetProjection()
    crs = pycrs.parse.from_ogc_wkt(prj)
    crs_proj = crs.proj.name.ogc_wkt.lower()  

    # pixel resolution
    geo_trans = ds.GetGeoTransform()
    res_x = geo_trans[1]
    res_y = geo_trans[5]
    
    #get corner coords and extents
    corners = get_corners_gdal(str(insar_path['unw_phase']))
    
    
    x_extent = [corners[0], corners[2]]
    y_extent = [corners[1], corners[3]]
    
    # create x and y arrays based on extents, pixel resolution, and padding
    x_coords = np.arange(x_extent[0], x_extent[1], res_x)
    x_coords = pad_array(x_coords, res_x, left_pad_pixels=x_pad_pixels[0], right_pad_pixels=x_pad_pixels[1])
                               
    y_coords = np.arange(y_extent[0], y_extent[1], res_y)
    y_coords = pad_array(y_coords, -res_y, left_pad_pixels=y_pad_pixels[0], right_pad_pixels=y_pad_pixels[1])
      
     
    # create xarray dataset
    ds = xr.Dataset(
        data_vars={
            'y': y_coords,
            'x': x_coords,
            'unw_phase': (
                ('y', 'x'),
                np.pad(unw_phase_data, pad_width=((y_pad_pixels[0],y_pad_pixels[1]), (x_pad_pixels[0],x_pad_pixels[1])), mode='constant',constant_values=(0.0)),
            ),
            'corr': (
                ('y', 'x'),
                np.pad(corr_data, pad_width=((y_pad_pixels[0],y_pad_pixels[1]), (x_pad_pixels[0],x_pad_pixels[1])), mode='constant',constant_values=(0.0)),
            ),
        },
        attrs=None
    )

    # Set x and y coord attributes
    attrs_x = {
        'axis': 'X',
        'units': 'dd',
        'standard_name': 'projection_x_coordinate',
        'long_name': 'longitude'
    }
    attrs_y = {
        'axis': 'Y',
        'units': 'dd',
        'standard_name': 'projection_y_coordinate',
        'long_name': 'latitude'
    }
    
    for key in attrs_x:
        ds.x.attrs[key] = attrs_x[key]
    for key in attrs_y:
        ds.y.attrs[key] = attrs_y[key]
    
    ds.rio.write_crs(attrs['EPSG'], inplace=True)
    ds = ds.rio.reproject("EPSG:4326", **{'nodata':0.0})
    ds.rio.write_crs(4326, inplace=True)
    del ds.unw_phase.attrs["_FillValue"]
    del ds.corr.attrs["_FillValue"]
    
    attrs['X_FIRST'] = ds['x'][0]
    attrs['Y_FIRST'] = ds['y'][0]
    attrs['X_STEP'] = abs(ds['x'][-1] - ds['x'][0]) / len(ds['x'])
    attrs['Y_STEP'] = abs(ds['y'][-1] - ds['y'][0]) / len(ds['y'])
    attrs['X_UNIT'] = "degrees"
    attrs['Y_UNIT'] = "degrees"
    attrs['LENGTH'] = ds.y.shape[0]
    attrs['WIDTH'] = ds.x.shape[0]
    
    # interferogram attrs
    for k in attrs.keys():
        if type(attrs[k]) == str:
            ds[k] = attrs[k].ljust(50, ' ')
        else:
            ds[k] = attrs[k]
            
    return ds

In [None]:
def geometry_to_xarray(insar_path, x_pad_pixels=[0,0], y_pad_pixels=[0,0]):   
    '''
    insar_path: dict of paths to insar rasters and metadata with keys:
                'dem', 'unw_phase', 'amp', 'lv_theta', 'lv_phi', 'corr', 'metadata', 'log'
    '''
    
    # get interferogram attrs
    attrs = get_insar_attrs(insar_path['log'], insar_path['unw_phase'])
    
    meta = readfile.read_gdal_vrt(insar_path['unw_phase'])
    attrs.update(meta)
    # attrs = {k: str(v).ljust(80, ' ') for (k, v) in zip(attrs.keys(), attrs.values())}
    
    ds=gdal.Open(str(insar_path['dem']))
    dem_data = ds.GetRasterBand(1)
    dem_data = dem_data.ReadAsArray()
    dem_data = np.ma.masked_invalid(dem_data, copy=True)   
    
    ds=gdal.Open(str(insar_path['lv_theta']))
    lv_theta_data = ds.GetRasterBand(1)
    lv_theta_data = lv_theta_data.ReadAsArray()
    lv_theta_data = np.ma.masked_invalid(lv_theta_data, copy=True)
    
    ds=gdal.Open(str(insar_path['lv_phi']))
    lv_phi_data = ds.GetRasterBand(1)
    lv_phi_data = lv_phi_data.ReadAsArray()
    lv_phi_data = np.ma.masked_invalid(lv_phi_data, copy=True)

    # get coordinate system projection
    prj = ds.GetProjection()
    crs = pycrs.parse.from_ogc_wkt(prj)
    crs_proj = crs.proj.name.ogc_wkt.lower()

    # pixel resolution
    geo_trans = ds.GetGeoTransform()
    res_x = geo_trans[1]
    res_y = geo_trans[5]
    
    #get corner coords and extents
    corners = get_corners_gdal(str(insar_path['dem']))
    x_extent = [corners[0], corners[2]]
    y_extent = [corners[1], corners[3]]
    
    # create x and y arrays based on extents, pixel resolution, and padding
    x_coords = np.arange(x_extent[0], x_extent[1], res_x)
    x_coords = pad_array(x_coords, res_x, left_pad_pixels=x_pad_pixels[0], right_pad_pixels=x_pad_pixels[1])                         
    y_coords = np.arange(y_extent[0], y_extent[1], res_y)
    y_coords = pad_array(y_coords, -res_y, left_pad_pixels=y_pad_pixels[0], right_pad_pixels=y_pad_pixels[1])
    
    # create xarray dataset
    ds = xr.Dataset(
        data_vars={
            'y': y_coords,
            'x': x_coords,
            'dem': (
                ('y', 'x'),
                np.pad(dem_data, pad_width=((y_pad_pixels[0],y_pad_pixels[1]), (x_pad_pixels[0],x_pad_pixels[1])), mode='constant',constant_values=(0.0)),
            ),
            'lv_theta': (
                ('y', 'x'),
                np.pad(lv_theta_data, pad_width=((y_pad_pixels[0],y_pad_pixels[1]), (x_pad_pixels[0],x_pad_pixels[1])), mode='constant',constant_values=(0.0)),
            ),
            'lv_phi': (
                ('y', 'x'),
                np.pad(lv_phi_data, pad_width=((y_pad_pixels[0],y_pad_pixels[1]), (x_pad_pixels[0],x_pad_pixels[1])), mode='constant',constant_values=(0.0)),
            ),
        },
        attrs=None
    )

    # Set x and y coord attributes
    attrs_x = {
        'axis': 'X',
        'units': 'dd',
        'standard_name': 'projection_x_coordinate',
        'long_name': 'longitude'
    }
    attrs_y = {
        'axis': 'Y',
        'units': 'dd',
        'standard_name': 'projection_y_coordinate',
        'long_name': 'latitude'
    }
    for key in attrs_x:
        ds.x.attrs[key] = attrs_x[key]
    for key in attrs_y:
        ds.y.attrs[key] = attrs_y[key]
    
    ds.rio.write_crs(attrs['EPSG'], inplace=True)
    ds = ds.rio.reproject("EPSG:4326", **{'nodata':0.0})
    ds.rio.write_crs(4326, inplace=True)
    # ds = ds.drop_vars('spatial_ref')
    del ds.dem.attrs["_FillValue"]
    del ds.lv_theta.attrs["_FillValue"]
    del ds.lv_phi.attrs["_FillValue"]
    
    attrs['X_FIRST'] = ds['x'][0]
    attrs['Y_FIRST'] = ds['y'][0]
    attrs['X_STEP'] = abs(ds['x'][-1] - ds['x'][0]) / len(ds['x'])
    attrs['Y_STEP'] = abs(ds['y'][-1] - ds['y'][0]) / len(ds['y'])
    attrs['X_UNIT'] = "degrees"
    attrs['Y_UNIT'] = "degrees"
    attrs['LENGTH'] = ds.y.shape[0]
    attrs['WIDTH'] = ds.x.shape[0]
    
    # interferogram attrs
    for k in attrs.keys():
        if type(attrs[k]) == str:
            ds[k] = attrs[k].ljust(50, ' ')
        else:
            ds[k] = attrs[k]
        
    return ds

## Find the spatial extents needed for an xarray SBAS stack

- find the smallest bounding box that will cover the complete stack
- (optional) add some padding

In [None]:
ds=gdal.Open(str(stack_paths[0]['unw_phase']))
geo_trans = ds.GetGeoTransform()
res_x = geo_trans[1]
res_y = geo_trans[5]
res_x

In [None]:
padding_pixels = 50
bboxes = [get_corners_gdal(s['unw_phase']) for s in stack_paths]

max_xyxy_bbox = [min([i[0] for i in bboxes]) - (res_x * padding_pixels),
                 max([i[1] for i in bboxes]) + (res_y * padding_pixels),
                 max([i[2] for i in bboxes]) + (res_x * padding_pixels),
                 min([i[3] for i in bboxes]) - (res_y * padding_pixels)]
max_xyxy_bbox

## Create a list of InSAR xarray.Dataset objects

In [None]:
def get_pairs(xarray_insar):
    pairs = []
    for insar in xarray_insar:
        ref = insar.Reference_Granule.to_numpy()
        ref = ref.tolist()
        sec = insar.Secondary_Granule.to_numpy()
        sec = sec.tolist()
        pairs.append(f"{ref[17:25]}-{sec[17:25]}")
    return pairs

In [None]:
def calc_chunks(stack: xr.Dataset, bits_per_pixel, chunk_size=100, depth_dim=None, raster_count=1):
    """
    stack: the xr.Dataset for which to determine chunks
    chunk_size: int in MB
    raster_count: number of rasters in each insar pair
    """
    chunks = list()
    bits_per_mb = 8000000
    bits_per_chunk = bits_per_mb * chunk_size
    

    pixels_per_chunk = bits_per_chunk / bits_per_pixel
    if depth_dim:
        depth = len(stack[depth_dim])
    else:
        depth = 1
    
    temp_op_xy_pixels = pixels_per_chunk // (depth * raster_count)
    spatial_op_xy_pixels = pixels_per_chunk // raster_count
    
    temp_x_y_side = math.floor(math.sqrt(temp_op_xy_pixels))
    spatial_x_y_side = math.floor(math.sqrt(spatial_op_xy_pixels))
    
    return {
        'temporal': (depth, temp_x_y_side, temp_x_y_side),
        'spatial': (1, spatial_x_y_side, spatial_x_y_side)
    }

In [None]:
s3_uri = "s3://alex-asf-zarr-play/Salt_Lake"
s3 = s3fs.S3FileSystem(profile='zarr')
store = s3fs.S3Map(root=s3_uri, s3=s3, check=False)
compressor = zarr.Blosc(cname='zstd', clevel=3)

Full:
[AZ_Stack](https://search.asf.alaska.edu/#/?zoom=6.583&center=-112.132,32.356&start=2014-06-14T08:00:00Z&end=2023-04-18T07:59:59Z&resultsLoaded=true&granule=S1A_IW_SLC__1SSV_20141206T132734_20141206T132802_003599_00440A_FB1C-SLC&searchType=SBAS%20Search&master=S1A_IW_SLC__1SSV_20141206T132734_20141206T132802_003599_00440A_FB1C&selectedPair=S1A_IW_SLC__1SSV_20141206T132734_20141206T132802_003599_00440A_FB1C-SLC,S1A_IW_SLC__1SSV_20141230T132734_20141230T132801_003949_004BFB_9ADE-SLC&perp=300to&temporal=1to48&pairs=S1A_IW_SLC__1SSV_20161113T132750_20161113T132817_013924_0166A5_21A4-SLC,S1A_IW_SLC__1SDV_20170313T132749_20170313T132816_015674_019C9E_D813-SLC$S1A_IW_SLC__1SSV_20161113T132750_20161113T132817_013924_0166A5_21A4-SLC,S1A_IW_SLC__1SDV_20170325T132749_20170325T132816_015849_01A1CF_3FB5-SLC$S1A_IW_SLC__1SSV_20161113T132750_20161113T132817_013924_0166A5_21A4-SLC,S1A_IW_SLC__1SDV_20170217T132749_20170217T132816_015324_0191F6_84A3-SLC$S1A_IW_SLC__1SSV_20161020T132750_20161020T132817_013574_015BBE_208C-SLC,S1A_IW_SLC__1SDV_20170217T132749_20170217T132816_015324_0191F6_84A3-SLC$S1A_IW_SLC__1SSV_20161020T132750_20161020T132817_013574_015BBE_208C-SLC,S1A_IW_SLC__1SDV_20170301T132749_20170301T132816_015499_019750_5210-SLC$S1A_IW_SLC__1SDV_20180730T132802_20180730T132829_023024_027FD1_B0E7-SLC,S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC$S1A_IW_SLC__1SDV_20180718T132802_20180718T132829_022849_027A4C_6A57-SLC,S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC$S1A_IW_SLC__1SDV_20180624T132800_20180624T132827_022499_026FD7_32D8-SLC,S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC$S1A_IW_SLC__1SDV_20180612T132800_20180612T132826_022324_026A9D_E736-SLC,S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC$S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC,S1A_IW_SLC__1SDV_20190526T132805_20190526T132831_027399_03173A_4267-SLC$S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC,S1A_IW_SLC__1SDV_20190701T132807_20190701T132834_027924_03271E_46BA-SLC$S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC,S1A_IW_SLC__1SDV_20190607T132805_20190607T132832_027574_031C9E_B733-SLC$S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC,S1A_IW_SLC__1SDV_20190619T132806_20190619T132833_027749_0321E2_A30A-SLC)

[Part 1 (998 pairs)](https://search.asf.alaska.edu/#/?zoom=7.804&center=-111.958,30.606&start=2014-06-14T08:00:00Z&end=2023-01-11T08:59:59Z&resultsLoaded=true&granule=S1A_IW_SLC__1SSV_20141206T132734_20141206T132802_003599_00440A_FB1C-SLC&searchType=SBAS%20Search&master=S1A_IW_SLC__1SSV_20141206T132734_20141206T132802_003599_00440A_FB1C&selectedPair=S1A_IW_SLC__1SSV_20141206T132734_20141206T132802_003599_00440A_FB1C-SLC,S1A_IW_SLC__1SSV_20141230T132734_20141230T132801_003949_004BFB_9ADE-SLC&perp=300to&temporal=1to48&pairs=S1A_IW_SLC__1SSV_20161113T132750_20161113T132817_013924_0166A5_21A4-SLC,S1A_IW_SLC__1SDV_20170313T132749_20170313T132816_015674_019C9E_D813-SLC$S1A_IW_SLC__1SSV_20161113T132750_20161113T132817_013924_0166A5_21A4-SLC,S1A_IW_SLC__1SDV_20170325T132749_20170325T132816_015849_01A1CF_3FB5-SLC$S1A_IW_SLC__1SSV_20161113T132750_20161113T132817_013924_0166A5_21A4-SLC,S1A_IW_SLC__1SDV_20170217T132749_20170217T132816_015324_0191F6_84A3-SLC$S1A_IW_SLC__1SSV_20161020T132750_20161020T132817_013574_015BBE_208C-SLC,S1A_IW_SLC__1SDV_20170217T132749_20170217T132816_015324_0191F6_84A3-SLC$S1A_IW_SLC__1SSV_20161020T132750_20161020T132817_013574_015BBE_208C-SLC,S1A_IW_SLC__1SDV_20170301T132749_20170301T132816_015499_019750_5210-SLC$S1A_IW_SLC__1SDV_20180730T132802_20180730T132829_023024_027FD1_B0E7-SLC,S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC$S1A_IW_SLC__1SDV_20180718T132802_20180718T132829_022849_027A4C_6A57-SLC,S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC$S1A_IW_SLC__1SDV_20180624T132800_20180624T132827_022499_026FD7_32D8-SLC,S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC$S1A_IW_SLC__1SDV_20180612T132800_20180612T132826_022324_026A9D_E736-SLC,S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC$S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC,S1A_IW_SLC__1SDV_20190526T132805_20190526T132831_027399_03173A_4267-SLC$S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC,S1A_IW_SLC__1SDV_20190701T132807_20190701T132834_027924_03271E_46BA-SLC$S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC,S1A_IW_SLC__1SDV_20190607T132805_20190607T132832_027574_031C9E_B733-SLC$S1A_IW_SLC__1SDV_20181115T132802_20181115T132829_024599_02B38C_EA96-SLC,S1A_IW_SLC__1SDV_20190619T132806_20190619T132833_027749_0321E2_A30A-SLC)

## Get the full compliment of logfile attributes for the entire stack

- some HyP3 interferograms make come with variations in these attributes
- We add insar attributes as variables for each interferogram
    - all interferograms must contain the same set of variables in order to concatenate xarray.Datasets

In [None]:
%%time

sbas_group = 'format/sbas'
geo_group = 'format/geometry'

# We may not have enough memory to build a large SBAS Zarr store all at once, but we can work in batches.
# Calculate number of batches in which to convert.
batch_size = 10
n = np.ceil(len(stack_paths) / batch_size)

# iterate through each batch
for i, batch in tqdm(enumerate(np.array_split(stack_paths, n))):   
    xarray_sbas = []
    # create a list of xarray.Datasets, one for each interferogram in the batch
    for insar in tqdm(batch):
        pad = get_padding_to_match_xyxy_bbox(insar['unw_phase'], max_xyxy_bbox)
        xarray_sbas.append(sbas_to_xarray(insar, x_pad_pixels=pad['x_pad'] , y_pad_pixels=pad['y_pad']))
    
    # create an insar date pair dimension for the batch
    pairs = get_pairs(xarray_sbas)
    
    # create an empty xarray.Dataset with the spatial coordinates needed for the stack
    stack = xarray_sbas[0]
    variables = list(xarray_sbas[0].variables)
    for k in variables:
        stack = stack.drop_vars(k)
        
    # add pair coordinates to the empty stack
    stack = stack.assign_coords(pairs=pairs)

    # concatenate all interferograms in the batch
    for k in xarray_sbas[0].keys():
        var = xr.concat([d[k] for d in xarray_sbas], dim=stack.pairs)
        stack[k] = var
        for da in xarray_sbas:
            da = da.drop_vars(k)
    


    if i == 0:
        # geometry
        pad = get_padding_to_match_xyxy_bbox(batch[i]['dem'], max_xyxy_bbox)
        xarray_geometry = geometry_to_xarray(batch[i], x_pad_pixels=pad['x_pad'] , y_pad_pixels=pad['y_pad'])
        bits_per_pixel = sys.getsizeof(xarray_geometry.dem[0][0])
        chunks = calc_chunks(xarray_geometry, bits_per_pixel, raster_count=3)
        encoding = {vname: {'compressor': compressor, 'chunks': chunks['spatial']} for vname in xarray_geometry.data_vars}
        xarray_geometry.to_zarr(store=store, encoding=encoding, consolidated=True, group=geo_group, mode='w')
        
        #sbas
        bits_per_pixel = sys.getsizeof(stack.unw_phase.isel(pairs=1)[0][0])
        chunks = calc_chunks(stack, bits_per_pixel, depth_dim="pairs", raster_count=2)
        encoding = {vname: {'compressor': compressor, 'chunks': chunks['temporal']} for vname in stack.data_vars}
        
        
        stack.to_zarr(store=store, encoding=encoding, consolidated=True, group=sbas_group, mode='w')
    else:
        # zarr store append
        stack.to_zarr(store=store, encoding=None, consolidated=True, group=sbas_group, mode='a', append_dim='pairs')

    # if i == 0:
    #     break
            
    

# NOTES
- creating large data cubes will overrun memory resources. We should append each layer in a cube to a zarr store in s3 instead of trying to build the whole thing as we go. 
- How do we want to handle DEMs. Should we keep a copy for every interferogram? They're identical.
- 

*Prepare_Hyp3_RTC_TimeSeries_NetCDF_Zarr.ipynb - Version 0.1.0 - March 2021*