# OPERA Direct S3 Access

Access OPERA L2-RTC and L2-CSLC in-place

Author: Alex Lewandowski; Alaska Satellite Facility

Notebook based on following ASF documentation and examples:
- [ASF OPERA Data Discovery](https://asf.alaska.edu/datasets/daac/opera/)
- [ASF S3 Credential Instructions](https://cumulus.asf.alaska.edu/s3credentialsREADME?)

---

<div class="alert alert-danger" style="display: flex; align-items: center; font-family: 'Times New Roman', Times, serif; background-color: 'rgba(200,0,0,0.2)'">
  <div style="width: 95%;">
    <h2><b>Warning: Direct S3 Access with EDL Bearer Tokens is only possible in-region (AWS us-west-2)</b></h2>
      <ul>
    <li>This notebook cannot be run locally.</li>
    <li>It must be run from an EC2 instance in the AWS us-west-2 region where the data are stored.</li>
    <li>You can run this from OpenSARLab or from another Jupyter Lab installation running in us-west-2.</li>
  </div>
</div>

<div class="alert alert-info" style="display: flex; align-items: center; font-family: 'Times New Roman', Times, serif; background-color: #d1ecf1;">
  <div style="display: flex; align-items: center; width: 10%;">
    <a href="https://github.com/ASFOpenSARlab/opensarlab_OPERA-CSLC_Recipe_Book/issues">
      <img src="https://opensarlab-docs.asf.alaska.edu/opensarlab-notebook-assets/logos/github_issues.png" alt="GitHub logo over the word Issues" style="width: 100%;">
    </a>
  </div>
  <div style="width: 95%;">
    <b>Did you find a bug? Do you have a feature request?</b>
    <br/>
    Explore GitHub Issues on this Jupyter Book's GitHub repository. Find solutions, add to the discussion, or start a new bug report or feature request: <a href="https://github.com/ASFOpenSARlab/opensarlab_OPERA-RTC-S1_Recipe_Book/issues">opensarlab_OPERA-RTC-S1_Recipe_Book Issues</a>
  </div>
</div>

<div class="alert alert-info" style="display: flex; align-items: center; justify-content: space-between; font-family: 'Times New Roman', Times, serif; background-color: #d1ecf1;">
  <div style="display: flex; align-items: center; width: 10%; margin-right: 10px;">
    <a href="mailto:uso@asf.alaska.edu">
      <img src="https://opensarlab-docs.asf.alaska.edu/opensarlab-notebook-assets/logos/ASF_support_logo.png" alt="ASF logo" style="width: 100%">
    </a>
  </div>
  <div style="width: 95%;">
    <b>Have a question related to SAR, OPERA RTCs, or ASF data access?</b>
    <br/>
    Contact ASF User Support: <a href="mailto:uso@asf.alaska.edu">uso@asf.alaska.edu</a>
  </div>
</div>

---

## 0. Import Required Software

In [None]:
import ast
from datetime import datetime
from getpass import getpass
import io
import json
import re
import urllib.request

import ipywidgets as widgets
from IPython.display import display

import boto3
import h5py
import pandas as pd
import s3fs
import xarray as xr

---

## 1. Select some OPERA product IDs

**L2-RTC-S1**

In [None]:
# # code assumes opera_ids contains a single product type
# opera_ids = [
#     "OPERA_L2_RTC-S1_T137-292339-IW3_20240201T020001Z_20240201T081115Z_S1A_30_v1.0",
#     "OPERA_L2_RTC-S1_T137-292339-IW3_20240201T020001Z_20240201T114538Z_S1A_30_v1.0",
#     "OPERA_L2_RTC-S1_T137-292339-IW3_20240120T020002Z_20240120T142521Z_S1A_30_v1.0",
# ]

# prefix = "OPERA_L2_RTC-S1" if "RTC" in opera_ids[0] else "OPERA_L2_CSLC-S1"

**L2-RTC-S1-STATIC**

In [None]:
# # code assumes opera_ids contains a single product type
# opera_ids = [
#     "OPERA_L2_RTC-S1-STATIC_T137-292339-IW3_20140403_S1A_30_v1.0",
# ]

# prefix = "OPERA_L2_RTC-S1" if "RTC" in opera_ids[0] else "OPERA_L2_CSLC-S1"

**L2-CSLC-S1**

In [None]:
# code assumes opera_ids contains a single product type
opera_ids = [
    "OPERA_L2_CSLC-S1_T137-292339-IW3_20240201T020001Z_20240202T175818Z_S1A_VV_v1.0",
    "OPERA_L2_CSLC-S1_T137-292339-IW3_20240201T020001Z_20240202T175821Z_S1A_VV_v1.0",
    "OPERA_L2_CSLC-S1_T137-292339-IW3_20240120T020002Z_20240121T075842Z_S1A_VV_v1.0",
]

prefix = "OPERA_L2_RTC-S1" if "RTC" in opera_ids[0] else "OPERA_L2_CSLC-S1"

**L2-CSLC-S1-STATIC**

In [None]:
# # code assumes opera_ids contains a single product type
# opera_ids = [
#     "OPERA_L2_CSLC-S1-STATIC_T137-292339-IW3_20140403_S1A_v1.0",
# ]

# prefix = "OPERA_L2_RTC-S1" if "RTC" in opera_ids[0] else "OPERA_L2_CSLC-S1"

---

## 2. Request S3 access credentials

**Enter your Earthdata Login Bearer Token**

[Instructions for creating an EDL bearer token](https://urs.earthdata.nasa.gov/documentation/for_users/user_token)

In [None]:
token = getpass("Enter your EDL Bearer Token")

In [None]:
event = {
    "CredentialsEndpoint": "https://cumulus.asf.alaska.edu/s3credentials",
    "BearerToken": token,
    "Bucket": "asf-cumulus-prod-opera-products",
    "Prefix": prefix,
    "StaticPrefix": f"{prefix}_STATIC"
}

In [None]:
# Get temporary download credentials
tea_url = event["CredentialsEndpoint"]
bearer_token = event["BearerToken"]
req = urllib.request.Request(
    url=tea_url,
    headers={"Authorization": f"Bearer {bearer_token}"}
)
with urllib.request.urlopen(req) as f:
    creds = json.loads(f.read().decode())

## 3. If accessing RTC or RTC-STATIC data, select a layer type

This is not necessary for CSLC and CSLC-STATIC, as their data layers are stored in multidimensional HDF5 files

In [None]:
rtc = 'RTC' in event["Prefix"]
static = 'STATIC' in opera_ids[0]

if rtc:
    if static:
        file = {
            'Incidence angle (ellipsoidal)': '_incidence_angle.tif',
            'Local-incidence angle': '_local_incidence_angle.tif', 
            'No. of looks': '_number_of_looks.tif',
            'Layover Shadow Mask layer': '_mask.tif',
            'RTC Area Normalization Factor (ANF) gamma0 to beta0': '_rtc_anf_gamma0_to_beta0.tif',
            'RTC Area Normalization Factor (ANF) gamma0 to sigma0': '_rtc_anf_gamma0_to_sigma0.tif'
        }
    else:
        file = {
            'VH RTC': '_VH.tif',
            'VV RTC': '_VV.tif',
            'Layover Shadow Mask layer': '_mask.tif',
        }
elif 'CSLC'  not in event['Prefix']:
    raise Exception("Unrecognized Product Type")

In [None]:
if rtc:
    print("Select a product type")
    product_choice = widgets.RadioButtons(
        options=file,
        description='',
        disabled=False,
        layout={'width': '500px'}
    )
    display(product_choice)

## 4. Open a single OPERA product

Access the first product in `opera_ids`

In [None]:
filename = f"{opera_ids[0]}{product_choice.value}" if rtc else f"{opera_ids[0]}.h5"
object_key = f"{event['StaticPrefix']}/{opera_ids[0]}/{filename}" if static else f"{event['Prefix']}/{opera_ids[0]}/{filename}" 

fs = s3fs.S3FileSystem(key=creds['accessKeyId'], secret=creds['secretAccessKey'], token=creds['sessionToken'])

# Define S3 path
s3_path = f"{event['Bucket']}/{object_key}"

# Open the file as a file-like object using s3fs
with fs.open(s3_path, mode='rb') as f:
    if '.h5' in object_key and not rtc:
        ds = xr.open_dataset(f, engine='h5netcdf', group="data") # can't seem to specify a specific group here, like "data/VV"
        with h5py.File(f, 'r') as h5f:
            attributes = h5f.attrs
            for attr, value in attributes.items():
                try:
                    ds.attrs[attr] = value.decode('utf-8')
                except AttributeError:
                    ds.attrs[attr] = value
    else:
        ds = xr.open_dataarray(f, engine="rasterio")
ds

---

## 5. Create an RTC time-series 

**Build a sorted list of OPERA L2-RTC-S1 IDs**

In [None]:
opera_rtc_ids = sorted([
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231004T020005Z_20240122T203351Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231016T020005Z_20231016T154509Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231028T020005Z_20231029T045555Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231109T020004Z_20231114T103429Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231121T020004Z_20231206T001313Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231203T020004Z_20231203T100451Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231203T020004Z_20231203T122715Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231215T020003Z_20231215T142550Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20231227T020002Z_20231230T152233Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20240108T020002Z_20240109T091409Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20240120T020002Z_20240120T142521Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20240201T020001Z_20240201T081115Z_S1A_30_v1.0',
    'OPERA_L2_RTC-S1_T137-292339-IW3_20240201T020001Z_20240201T114538Z_S1A_30_v1.0'
])
opera_rtc_ids

**Create a pandas DataFrame of the time series**

In [None]:
def get_dt(opera_id, date_regex):
    acquisition_time = re.search(date_regex, opera_id)
    try:
        return acquisition_time.group(0)
    except AttributeError:
        raise Exception(f"Acquisition timestamp not found in scene ID: {opera_id}") 

acquisition_date_regex = r"(?<=_)\d{8}T\d{6}Z(?=_\d{8}T\d{6})"
process_dt_regex = r"(?<=\d{8}T\d{6}Z_)\d{8}T\d{6}Z(?=_S1)"

acquisition_dt = pd.to_datetime([get_dt(id, acquisition_date_regex) for id in opera_rtc_ids])
process_dt = pd.to_datetime([get_dt(id, process_dt_regex) for id in opera_rtc_ids])

times_series_df = (pd.DataFrame(data={
    'OPERA L2-RTC-S1 ID': opera_rtc_ids, 
    'AcquisitionDateTime': acquisition_dt,
    'ProcessDateTime': process_dt
})
.sort_values(by='ProcessDateTime')
.drop_duplicates(subset=['AcquisitionDateTime'], keep='last')
.drop('ProcessDateTime', axis=1)
.sort_values(by='AcquisitionDateTime')
.reset_index(drop=True))

times_series_df

**Build a time series of both polarizations in an xarray Dataset**

In [None]:
fs = s3fs.S3FileSystem(key=creds['accessKeyId'], secret=creds['secretAccessKey'], token=creds['sessionToken'])

polarizations = ['VV', 'VH']
da_stack = []

for t, row in times_series_df.iterrows():
    opera_id = row['OPERA L2-RTC-S1 ID']
    time = pd.to_datetime(row['AcquisitionDateTime'])
    polarization_stack = []

    for polarization in polarizations:
        filename = f"{opera_id}_{polarization}.tif"
        object_key = f"OPERA_L2_RTC-S1/{opera_id}/{filename}"
        s3_path = f"s3://{event['Bucket']}/{object_key}"

        with fs.open(s3_path, mode='rb') as f:
            da = xr.open_dataarray(f, engine="rasterio")
            da = da.expand_dims(time=pd.Index([time], name='time'))
            polarization_stack.append(da)

    da_polarized = xr.concat(polarization_stack, dim=pd.Index(polarizations, name='polarization'))
    da_stack.append(da_polarized)

ds = xr.concat(da_stack, dim='time')
ds

**Access the VV time series**

In [None]:
my_time = pd.to_datetime('20231203T020004Z')
ds.sel(polarization='VV')

**Access the VH RTC from the 1st timestep**

In [None]:
ds.sel(polarization='VH').isel(time=0)