<div><img src="https://radar.community.uaf.edu/wp-content/uploads/sites/667/2021/03/HydroSARbanner.jpg" width="100%" /></div>

**NASA A.37 Project:** Integrating SAR Data for Improved Resilience and Response to Weather-Related Disasters
**PI:** Franz J. Meyer

# HydroSAR Transition workshop

## Archiving successful HyP3 jobs in an AWS S3 bucket

In order to automatically archive newly created HydroSAR products we:
1. Query HyP3 for all the successful HydroSAR jobs we've submitted
2. Query our archive for all the products we've archived
3. Deduplicate the products lists to determine the *new* products to archive
4. Transfer the new products to our archive

This notebook walks through the archiving process and would be run on a regular schedule (cron) in application.

Note: Here we're taking the strategy to *always* search for **all** possible products, *always* look up **all** previously created products, and then *deduplicate* the two lists to determine the new products. You could instead keep track of the last time the script was *successfully* run and only search for new products since then. While this would be more performant, our strategy is independent of previous runs so is generally more fault-tolerant and can be started and stopped at will.

### Query HyP3 for all the HydroSAR products

We use HyP3 as our workflow engine and can query for all the scenes we've already processed. Importantly, when we submit jobs, we will assign all of them a project name which is used to group jobs together and later search for them.

First we need to specify our project name so that we can find all the jobs associated with the Area of Interest (AOI) we're monitoring

In [None]:
project_name = 'HKHwatermaps'

In this notebook, we'll prompt for an Earthdata Login username and password, but they can be provided via the `username` and `password` keyword arguments, or automatically pulled from the users `~/.netrc` file.

Note: Typically you'll want to use a shared "operational" Earthdata Login user as you can only search for jobs associated with your username.

In [None]:
import hyp3_sdk as sdk
hyp3 = sdk.HyP3('https://hyp3-watermap.asf.alaska.edu', prompt=True)

Now we'll search for all the jobs with our project name and filter to the succeeded jobs.

In [None]:
processed_jobs = hyp3.find_jobs(name=project_name)
succeeded_jobs = processed_jobs.filter_jobs(running=False)

### 2. Query our archive for all the products we've archived

For our demonstration archive, we will store all the products in a directory on the local file system -- adjust for your archive accordingly.

In [None]:
from pathlib import Path

archive_directory = Path.cwd() / project_name
archive_directory.mkdir(exist_ok=True)

We'll build our set of archived products by listing all the files in the directory.

In [None]:
project_archive = {product.name for product in archive_directory.rglob('*.tif')}


Importantly, this list will be a set of absolute file paths. For example, one item in the set looks like:

In [None]:
print(next(iter(project_archive)))

### 3. Deduplicate the products lists and download *new* products to archive

Every successful HyP3 `WATER_MAP` job will have created a set of HydroSAR products, each of which can be identified by its file suffix. First, we define the list products we'd like to archive:

In [None]:
product_suffixes = ['_VV.tif', '_VH.tif', '_dem.tif', '_rgb.tif', '_WM_HAND.tif', '_WM.tif', '_FM_iterative_WaterDepth.tif', '_FM_iterative_FloodDepth.tif', '_FM_iterative_PW.tif']

Then we'll loop through all the succeeded jobs, see if all the products we want to archive are in the project archive, and if not download them into the archive.

In [None]:
from hyp3_sdk.util import download_file
from tqdm import tqdm


jobs_to_archive = sdk.Batch()
for job in tqdm(succeeded_jobs):
    hyp3_product_name = job.files[0]['filename']
    hyp3_product_url = job.files[0]['url']
    for sfx in product_suffixes:
        tif_name = hyp3_product_name.replace('.zip', sfx)
        if tif_name not in project_archive:
            download_file(hyp3_product_url.replace('.zip', sfx), archive_directory / tif_name, chunk_size=10485760)

Here we are taking advantage of the fact the HyP3 uploads both the zip product package and the zip contents next to each other in the HyP3 AWS S3 content bucket. So you can get to a particular GeoTIFF by simply replacing the `.zip` in the download URL with the GeoTIFF's suffix.