# Prepare a SAR RTC Data Stack for HydroSAR

**Joseph H Kennedy and Alex Lewandowski; Alaska Satellite Facility**

This notebook downloads an ASF-HyP3 RTC project or OPERA-S1 RTCs and prepares a SAR data stack for use with HydroSAR.

The stack may be comprised of a single RTC or be a deep, multi-temporal time series.

The notebook will:
- project all data in the stack to the predominant EPSG
- Merge scenes acquired on the same day to create large spatial mosaics

If using HyP3 data, this notebook assumes that you have already ordered an RTC stack over your area of interest using the [Alaska Satellite Facility's](https://www.asf.alaska.edu/) value-added product system HyP3, available via [ASF Data Search (Vertex)](https://search.asf.alaska.edu/#/), the [HyP3 API](https://hyp3-api.asf.alaska.edu/ui/), or the [hyp3_sdk](https://github.com/ASFHyP3/hyp3-sdk).

---

## 0. Select or create a working directory for the analysis

In [None]:
from IPython.display import display
import opensarlab_lib as osl

age = osl.select_parameter(
    [
        "Create a new RTC stack",
        "Add to existing RTC stack"
    ]
)
display(age)

In [None]:
from pathlib import Path

from ipyfilechooser import FileChooser

new = 'new' in age.value

if new:
    print(f'Current working directory: {Path.cwd()}')
    print('Create a new directory to hold your data:')
    data_path = input(f'Enter an unused path for a new data directory:  {Path.home()}/')
    try:
        data_path = Path.home() / data_path.strip()
        data_path.mkdir()
    except:
        raise
else:
    path = Path.home()
    fc = FileChooser(path)
    print('Select an existing data directory')
    display(fc)

In [None]:
if not new:
    data_path = Path(fc.selected_path)

---
## 1. Migrate an RTC Stack from ASF

**Select a data source and access option**

In [None]:
data_access = osl.select_parameter(
    [
        'HyP3: Access RTC data with any valid HyP3 username and HyP3 RTC Project Name',
        'HyP3: Search your Projects for available RTC data',
        'OPERA: Search for OPERA S1 L2-RTC data'
    ]
)
display(data_access)

**Authenticate with HyP3 or gather credentials required to download OPERA data**

In [None]:
from getpass import getpass

from hyp3_sdk import Batch, HyP3

hyp3 = 'HyP3' in data_access.value
rtc_search = 'available RTC data' in data_access.value
opera = 'OPERA' in data_access.value

if hyp3:
    hyp3_session = HyP3(prompt=True)
else:
    username = input("Enter your EDL username")
    password = getpass("Enter your EDL password")

**You may search for `RTC_GAMMA` projects in your own account or migrate data from any user's account**

- Retrieving data from another user's account only requires their username and the project name.
- It does **not** require the other user's password. 

In [None]:
from tqdm.auto import tqdm

from IPython.display import Markdown

if hyp3:
    product_type = 'RTC_GAMMA'
    if rtc_search:
        my_hyp3_info = hyp3_session.my_info()
        active_projects = dict()
        
        print(f"Checking all HyP3 projects for current {product_type} jobs")
        for project in tqdm(my_hyp3_info['job_names']):
                batch = Batch()
                batch = hyp3_session.find_jobs(
                    name=project, 
                    job_type=product_type
                ).filter_jobs(running=False, include_expired=False)
                if len(batch) > 0:
                    active_projects.update({batch.jobs[0].name: batch})
        
        if len(active_projects) > 0:
            display(Markdown("<text style='color:darkred;'>Note: After selecting a project, you must select the next cell before hitting the 'Run' button or typing Shift/Enter.</text>"))
            display(Markdown("<text style='color:darkred;'>Otherwise, you will rerun this code cell.</text>"))
            print('\nSelect a Project:')
            project_select = osl.select_parameter(active_projects.keys())
            display(project_select)
        else:
            raise Exception("Found no active projects containing RTC products")
    else:
        username = input("enter the HyP3 username on the account containing an SBAS stack to migrate")
        project_name = input("Enter the HyP3 project name")
        batch = Batch()
        batch = hyp3_session.find_jobs(
            name=project_name, 
            job_type=product_type, 
            user_id=username
        ).filter_jobs(running=False, include_expired=False)
elif opera:
    try:
        user_pass_session = disco.ASFSession().auth_with_creds(username, password)
    except disco.ASFAuthenticationError as e:
        print(f'Auth failed: {e}')
    else:
        print('Successfully started ASF_Search session.')

**If accessing OPERA RTCs, view the `search` function documentation to see the available parameters of your search**

In [None]:
if opera:
    disco.search?

---
**If accessing OPERA RTCs, update the search parameters to suit your needs, and search for data**

The search options below are serve as an example. 

Do not change:
- `'dataset': 'OPERA-S1'`
- `'processingLevel': ['RTC']`

Update or add any other parameters from the documentation above to fit your search.

In [None]:
if opera:
    options = {
        'intersectsWith': 'POINT(90.4976 24.1595)',
        'dataset': 'OPERA-S1',
        'start': '2016-07-03T00:00:00Z',
        'end': '2024-01-31T00:00:00Z',
        'flightDirection': 'ASCENDING',
        'processingLevel': ['RTC'],
        'maxResults': '1000'
    }
    results = disco.search(**options)

In [None]:
if opera:
    options = {
        'intersectsWith': 'POLYGON((90.253 23.9691,90.4222 23.9691,90.4222 24.1679,90.253 24.1679,90.253 23.9691))',
        'dataset': 'OPERA-S1',
        'start': '2016-07-03T00:00:00Z',
        'end': '2024-01-31T00:00:00Z',
        'flightDirection': 'ASCENDING',
        'processingLevel': ['RTC'],
        'maxResults': '1000'
    }
    results = disco.search(**options)

**You can also search for OPERA RTCs with a list of product IDs**

Uncomment the code cell below to search using OPERA RTC IDs

In [None]:
# # Uncomment to search by OPERA RTC ID
# if opera:
#     product_list = [
#         "OPERA_L2_RTC-S1_T173-370304-IW1_20231006T134412Z_20231007T132700Z_S1A_30_v1.0",
#         "OPERA_L2_RTC-S1_T173-370304-IW1_20231018T134412Z_20231019T044908Z_S1A_30_v1.0"
#         ]
#     results = disco.granule_search(product_list)


**If accessing HyP3 data, select a date range of products to migrate**

In [None]:
if hyp3 and rtc_search:
    jobs = active_projects[project_select.value]
elif hyp3:
    jobs = batch

if hyp3:
    display(Markdown("<text style='color:darkred;'>Note: After selecting a date range, you should select the next cell before hitting the 'Run' button or typing Shift/Enter.</text>"))
    display(Markdown("<text style='color:darkred;'>Otherwise, you may simply rerun this code cell.</text>"))
    print('\nSelect a Date Range:')
    dates = osl.get_job_dates(jobs)
    date_picker = osl.gui_date_picker(dates)
    display(date_picker)

**If accessing HyP3 data, save the selected date range and remove products falling outside of it:**

In [None]:
if hyp3:
    date_range = osl.get_slider_vals(date_picker)
    date_range[0] = date_range[0].date()
    date_range[1] = date_range[1].date()
    print(f"Date Range: {str(date_range[0])} to {str(date_range[1])}")
    batch = osl.filter_jobs_by_date(batch, date_range)

**If accessing HyP3 data, gather the available paths and orbit directions for the remaining products:**

In [None]:
if hyp3:
    display(Markdown("<text style='color:darkred;'><text style='font-size:150%;'>This may take some time for projects containing many jobs...</text></text>"))
    osl.set_paths_orbits(jobs)
    paths = set()
    orbit_directions = set()
    for rtc in jobs:
        paths.add(rtc.path)
        orbit_directions.add(rtc.orbit_direction)
    paths.add('All Paths')
    display(Markdown(f"<text style=color:blue><text style='font-size:175%;'>Done.</text></text>"))

---
**If accessing HyP3 data, select a path or paths (use shift or ctrl to select multiple paths):**

In [None]:
if hyp3:
    display(Markdown("<text style='color:darkred;'>Note: After selecting a path, you must select the next cell before hitting the 'Run' button or typing Shift/Enter.</text>"))
    display(Markdown("<text style='color:darkred;'>Otherwise, you will simply rerun this code cell.</text>"))
    print('\nSelect a Path:')
    path_choice = osl.select_mult_parameters(paths)
    display(path_choice)

**If accessing HyP3 data, save the selected flight path/s:**

In [None]:
if hyp3:
    flight_path = path_choice.value
    if flight_path:
        if flight_path:
            print(f"Flight Path: {flight_path}")
        else:
            print('Flight Path: All Paths')
    else:
        print("WARNING: You must select a flight path in the previous cell, then rerun this cell.")

**If accessing HyP3 data, select an orbit direction:**

In [None]:
if hyp3:
    if len(orbit_directions) > 1:
        display(Markdown("<text style='color:red;'>Note: After selecting a flight direction, you must select the next cell before hitting the 'Run' button or typing Shift/Enter.</text>"))
        display(Markdown("<text style='color:red;'>Otherwise, you will simply rerun this code cell.</text>"))
    print('\nSelect a Flight Direction:')
    direction_choice = osl.select_parameter(orbit_directions, 'Direction:')
    display(direction_choice)

**If accessing HyP3 data, save the selected orbit direction:**

In [None]:
if hyp3:
    direction = direction_choice.value
    print(f"Orbit Direction: {direction}")

**If accessing HyP3 data, filter jobs by path and orbit direction:**

In [None]:
import re

def filter_old_bursts(results):
    filtered_bursts = dict()
    acquisition_date_regex = r"(?<=OPERA_L2_RTC-S1_)T\d{3}-\d{6}-IW\d_\d{8}T\d{6}Z(?=_\d{8}T\d{6}Z)"
    process_date_regex = r"(?<=OPERA_L2_RTC-S1_T\d{3}-\d{6}-IW\d_\d{8}T\d{6}Z_)\d{8}T\d{6}Z"
    
    for b in results:
        rtc_id = b.properties['fileID']
        try:
            id_date = re.search(acquisition_date_regex, rtc_id).group(0)
            try:
                # for bursts that only differ by processing date, we can use a simple relational comparison
                if filtered_bursts[id_date].properties['fileID'] < rtc_id:
                    filtered_bursts[id_date] = b
            except KeyError:
                filtered_bursts[id_date] = b
        except AttributeError:
            raise Exception(f"Acquisition not found in RTC ID: {str(b)}")
            
    return list(filtered_bursts.values())   

In [None]:
if hyp3:
    jobs = osl.filter_jobs_by_path(jobs, flight_path)
    jobs = osl.filter_jobs_by_orbit(jobs, direction)
    print(f"There are {len(jobs)} products to download.")
else:
    results = [r for r in results if 'operaBurstID' in r.properties.keys()]
    results = filter_old_bursts(results)
    print(f"There are {len(results)} OPERA RTCs to download")

**Download the products, unzip them into a directory named after the product type, and delete the zip files:**

In [None]:
from asf_search.download.file_download_type import FileDownloadType

if hyp3:
    print(f"\nProject: {jobs.jobs[0].name}")
    project_zips = jobs.download_files(data_path)
    for z in project_zips:
        osl.asf_unzip(str(data_path), str(z))
        z.unlink()
else:
    for p in results:
        p.download(data_path, session=user_pass_session, fileType=FileDownloadType.ALL_FILES)

**Move the VV GeoTiffs and DEMs to their own directories, and delete unneeded data**

In [None]:
import shutil

if hyp3:
    rtc_dirs = list(data_path.glob('*RTC*'))
    vh_paths = list(data_path.glob('*RTC*/*VH*.tif'))
    vv_paths = list(data_path.glob('*RTC*/*VV*.tif'))
else:
    vh_paths = list(data_path.glob('*VH*.tif'))
    vv_paths = list(data_path.glob('*VV*.tif'))
    
vh_dir = data_path / 'VH'
vh_dir.mkdir(exist_ok=True)

vv_dir = data_path / 'VV'
vv_dir.mkdir(exist_ok=True)

for vh in vh_paths:
    vh.rename(vh_dir/vh.name)
for vv in vv_paths:
    vv.rename(vv_dir/vv.name)
    
if hyp3:
    for d in rtc_dirs:
        shutil.rmtree(d)
else:
    to_delete = [p for p in data_path.glob('*') if p.is_file()]
    for p in to_delete:
        p.unlink()

vh_paths = sorted(list(vh_dir.glob('*VH*.tif')))
vv_paths = sorted(list(vv_dir.glob('*VV*.tif')))

**Create a DataFrame containing file paths, acquisition dates, and EPSGs:**

In [None]:
import sys

import pandas as pd

current = Path("..").resolve()
sys.path.append(str(current))
import util.util as util

df = pd.DataFrame({
    'file': vh_paths + vv_paths,
    'data_type': ['VH_RTC' if 'VH' in str(pth) else 'VV_RTC' for pth in vh_paths + vv_paths],
    'SAR_acquisition_dt': util.get_dates(vh_paths) + util.get_dates(vv_paths),
    'EPSG': [util.get_epsg(p) for p in vh_paths + vv_paths]
})
df

---
## 2. Fix multiple UTM Zone-related issues

Fix multiple UTM Zone-related issues should they exist in your data set. If multiple UTM zones are found, the following code cells will identify the predominant UTM zone and reproject the rest into that zone.

**If the data fall into multiple EPSGs, identify the most heavily represented EPSG:**

In [None]:
if df['EPSG'].nunique() > 1:
    predominant_epsg = df['EPSG'].mode()[0]
    print(f"Predominant EPSG: {predominant_epsg}")
else: 
    predominant_epsg = None

**Reproject tiffs to the predominate UTM:**

In [None]:
from osgeo import gdal
gdal.UseExceptions()

if predominant_epsg:
    to_reproject = df.loc[df['EPSG'] != predominant_epsg, 'file']
    for col in to_reproject.items():
        geotiff_path = str(col[1])
        gdal.Warp(geotiff_path, geotiff_path, srcSRS=f'EPSG:{util.get_epsg(geotiff_path)}', dstSRS=f'EPSG:{predominant_epsg}')
        df.at[col[0], "EPSG"] = util.get_epsg(geotiff_path)
    if df['EPSG'].nunique() > 1:
        raise Exception('Expected a single EPSG for all VVs and DEMs')  
    display(df)

---
## 3. Merge multiple frames from the same date.

If your AOI covers multiple frames or bursts, you will notice duplicate acquisition dates for multiple RTCs. We will merge these RTCs by date.

**Create a dataFrame containing space-separated strings of paths to merge:**

In [None]:
df['SAR_acquisition_date'] = df['SAR_acquisition_dt'].dt.date
merge_df = df.groupby(['data_type', 'SAR_acquisition_date']).filter(lambda x: len(x) > 1)
merge_df = merge_df.groupby(['data_type', 'SAR_acquisition_date'])['file'].apply(lambda x: ' '.join(map(str, x))).reset_index()
merge_df.columns = ['data_type', 'SAR_acquisition_date', 'path_merge_string']
merge_df

**Merge all the frames for each date and delete the original tiffs.**

In [None]:
for i, row in merge_df.iterrows():
    date_str = row['SAR_acquisition_date'].strftime('%Y%m%d')
    og_path_0 = Path(row['path_merge_string'].split(' ')[0])
    output_path = og_path_0.parent / f"merged_{row['data_type']}_{date_str}.tif"
    cmd = f"gdal_merge.py -o {output_path} {row['path_merge_string']}"
    print(cmd)
    !$cmd
    for p in row['path_merge_string'].split(' '):
        Path(p).unlink()

*Prepare_HydroSAR_RTC_Stack.ipynb - Version 1.0.0 - May 2024*