# Step 2: Create and Push a Batch Run to AWS

**WARNING**: Make sure everything is set correctly BEFORE running this notebook!  Since this notebook starts processing on the AWS servers, it has the potential  to consume a lot of resources, aka dollars...

This notebook:
* Imports the ASF database query file we created in step 1
* chooses a set of interferograms to process
* prepares the files needed to process those interferograms
* uploads files to AWS servers and starts the processing

Requires the following files to be in the folder where you run the code:
* apmb.geojson (file with coordinates for APMB)
* query.geojson (file with results of ASF database query)
* template.yml (yaml file with ISCE processing parameters)

## 2.1: Import Python packages

Import the packages needed to run this notebook.  You may need to install the dinosar package first.

In [None]:
# if dinosar library not in base environment uncomment below (run just once)
#!pip install --no-cache git+https://github.com/scottyhq/dinosar.git@master

In [5]:
import subprocess
import os
import dinosar
import geopandas as gpd
import getpass

## 2.2: Choose processing parameters

Make sure you have this section set the way you want it before runnng later cells!  

In this section we'll set:
* Track
* Time span to cover (start and end times)
* AWS processing parameters (S3 bucket name, AWS job name)
* ISCE processing parameters (swaths, filters, etc.)
* How to choose the interferograms to make (explained more later!)

### 2.2.2: Set data subset and AWS parameters

In [31]:
# data parameters:
# i.e., which subset of all the stuff on AWS do you want to process
track = 83
start_date = '01/01/2019'
end_date = '03/01/2019'

# aws parameters:
dirname = 'D83-JanFeb2019' #name of the folder on AWS where your processing files will be
jobname = 'uturuncu-D83-JanFeb2019' #name for the job on AWS

In [32]:
bucket = 's3://dinosar/processing/uturuncu/D83-JanFeb2019' + foldername
bucket

's3://dinosar/processing/uturuncu/D83-JanFeb2019D83-JanFeb2019'

### 2.2.3: Set ISCE Processing Parameters

Open the "template.yml" file and save a copy with a new name for this batch run (e.g., "Template_83_JanFeb2019.yml").  Change the processing parameters if needed.

In [11]:
# enter the name of the template file here
template="Template_83_JanFeb2019.yml"

### 2.2.4 Choose which interferograms to process
Still working on this section.  Trying to decide how I want to do this...

In [20]:
# load the ASF search results
gf = dinosar.archive.asf.load_inventory('query.geojson')

In [27]:
# create a new dataframe with only the track selected, in the date bounds selected
gdf=gf.query('dateStamp > @start_date & dateStamp < @end_date').query('relativeOrbit == @track')
# create yet another datafram with only some info so we can quickly check that we selected the data we actually want
df = gdf.loc[:,['frameNumber','dateStamp','relativeOrbit']].sort_values(by='dateStamp')
df.head()

Unnamed: 0,frameNumber,dateStamp,relativeOrbit
1705,670,2019-01-12,83
1707,660,2019-01-12,83
1706,665,2019-01-12,83
1696,660,2019-01-24,83
1695,665,2019-01-24,83


## 2.3: Create processing directories and push them to AWS

Now that we've defined our processing parameters and indentified the interferograms to process, we'll create a processing directory for each interferogram, and then upload those directories to the AWS server (S3).

In [29]:
# set full path to bucket on S3
bucket = 's3://dinosar/processing/uturuncu/D83-JanFeb2019' + dirname
# create a text file with the interferogram pairs to process
pairsFile = 'pairs.txt'

paths = [bucket+'/'+x for x in pairs]
with open(pairsFile, 'w') as f:
    f.write('\n'.join(paths))

cmd = f'aws s3 cp {pairsFile} {bucket}/{pairsFile}'#write the command to push pairs.txt to the AWS bucket

print(cmd)

#subprocess.call(cmd, shell=True)  # Uncomment to actually run command    

NameError: name 'pairs' is not defined

In [None]:
with open(pairsFile) as f:
    pairs = [line.rstrip() for line in f]
    mapping = dict(enumerate(pairs))
mapping

In [None]:
script = 'prep_topsApp_local'
for i,p in enumerate(pairs):
    intname = os.path.basename(p)
    junk,master,slave=intname.split('-')
    intdir = f'int-{master}-{slave}'
    cmd = f'{script} -i query.geojson -m {master} -p {relOrbit} -s {slave} -t {template}'
    print(i, cmd)
    #subprocess.call(cmd, shell=True) # Uncomment to actually run command  

Now we should have a processing directory for each interferogram in this directory.  Each processing directory should have two files:
* topsApp.xml  = input file for ISCE processing
* download-links.txt = text file with the links to download all the data we'll need for processing

Now we push all those directories to the S3 cloud storage on AWS:

In [None]:
# Move these to cloud storage
# Push folder of text files to S3
for i,p in enumerate(pairs):
    intname = os.path.basename(p)
    junk,master,slave=intname.split('-')
    intdir = f'int-{master}-{slave}'
    cmd = f'aws s3 sync {intdir}/ {bucket}/{intdir}/'
    print(cmd)
    subprocess.call(cmd, shell=True)

print(f'Moved files to {bucket}')

## 2.4: Launch Processing on AWS (WARNING: can consume lots of AWS resources!!)
Now that we have all the files we need for processing on the AWS servers, we can start processing!  Don't run these cells until you're *SURE* you've got the interferograms you want!

In [None]:
# Enter your NASA URS password to download SLCs
nasauser = 'pmacqueen' # NASA EarthData username
nasapass = getpass.getpass() # NASA EarthData password (will create an interactive textbox)

In [None]:
# don't change these:
demDir = 's3://dinosar/processing/uturuncu/dem' #where the DEM is stored on AWS
jobdef = 'uturuncu-array' # sets certain parameters on AWS
jobqueue = 'uturuncu-queue' # sets certain parameters on AWS
array_size = len(pairs)


# NOTE: job-name, job-queue, and job-definition are JSON files that I've created for AWS Batch
# The specify type of computers to use, etc
cmd = f"aws batch submit-job \
--job-name {jobname} \
--job-queue {jobqueue} \
--job-definition {jobdef} \
--array-properties size={array_size} \
--parameters 'S3_PAIRS={bucket}/{pairsFile},S3_DEM={demDir}' \
--container-overrides 'environment=[{{name=NASAUSER,value={nasauser}}},{{name=NASAPASS,value={nasapass}}}]' \
"

# warning: this prints your password as plain text, careful not to push to github
# If you run the command in terminal sometimes the error messages are more helpful
#print(cmd) # uncomment to print the command for debugging purposes.

In [30]:
# Uncomment the line below and run this cell to start processing!
#subprocess.check_output(cmd, shell=True) 
# prints out the job-id - make a note of the job id for finding the logs later!

## 2.5: Wait for the jobs to finish!

* The end products will be in "s3://dinosar/results/uturuncu/(dirname)/(int_dir)/merged"
* You can monitor jobs here: https://us-west-2.console.aws.amazon.com/batch/home?region=us-west-2#/jobs/queue/arn:aws:batch:us-west-2:783380859522:job-queue~2Futuruncu-queue?state=PENDING
* After 24 hours, you'll have to look up your job here using the job id: https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logStream:group=/aws/batch/job;streamFilter=typeLogStreamPrefix