# Data Processing System (DPS) Tutorial A to Z

Authors: Sujen Shah and Rob Tapella

Date: October, 2023

Description: This DPS tutorial is intended to demonstrate the steps needed to create, register, run, monitor and view outputs of algorithm jobs run at scale. It includes a template repository with various files needed to set up and run a job. 

## Importing and Installing Packages

Additional package installation will be included inline, and environment-configuration to support using a custom conda environment in DPS is a part of the tutorial below.

## Before Starting

- This tutorial assumes that you have at least run through the [Getting Started Guide](../../getting_started/getting_started.ipynb) and have set up your MAAP account.
- This tutorial is made for the Application Development Environment (ADE) v. 3.1.0 or later (August 2023 or later).
- This also assumes that you are familiar with using [Github with MAAP](../../system_reference_guide/work_with_git.ipynb).

## Overview of this Tutorial

- Clone the demo algorithm & make a personal copy
- Edit and testing your Algorithm code to make sure that it is working in its original form
- Prepare the Algorithm for DPS by setting up the runtime arguments and pre-run environment set-up
- Register the Algorithm with the Algorithm UI
- Run and Monitor the Algorithm using the Jobs UI
- View the outputs and errors from your run

## Clone the Demo Algorithm

1. For this tutorial, please use a Basic Stable workspace. 
2. Download a copy of the Github repository at https://github.com/MAAP-Project/dps_tutorial
3. Create a new repository in your workspace, put the copied dps-tutorial-demo in it, and push it to Github. This way you can modify the tutorial files while you're practicing and register your own version of the demo algorithm.

It does not matter what you call the copy of the tutorial files, but it does need to be a public repository in order to register the algorithm with DPS.

Anatomy of the dps_tutorial repo:

- `README.md` to describe the algorithm in Github
- `tutorial.py`: a python script that contains the logic of the algorithm
- `build.sh`: a shell script that is executed before the algorithm is run; it is used to set up any custom programming libraries used in the algorithm (i.e., a custom conda environment)
- `environment.yaml`: a configuration file used by conda to add any custom libraries; this is used by build.sh
- `run.sh`: a shell script that DPS will execute when a run is requested. It calls any relevant python files with the required inputs


## Edit and Test your Code

Once you have an algorithm such as the dps-tutorial-demo, or your own Jupyter Notebook, test it to make sure that it is running properly by following the instructions in the README.md file.

Typically a Jupyter Notebook is run interacively. A DPS algorithm will take all inputs up-front, do the processing, and produce output files. The dps-tutorial-demo is set up like a DPS algorithm. Some aspects to note:

- ARGPARSE STUFF
- LOG FILE
- EXAMPLE INPUT URL and EXAMPLE RUN COMMAND ON CLI
- EXPECTED OUTPUTS

## Prepare the Algorithm for DPS

Once your scripts are working locally, make sure that they will also work in DPS.

The dps-tutorial-demo files are already prepared for DPS. Some important things to notice:

File: run.sh
- run.sh is a batch script to call python script: make sure for DPS you have inputs and outputs in the `input/` and `output/` folders

File: build.sh
- takes conda environment definition from `environment.yaml`
- Have it activate the environment with source activate base (or whatever your env is)

What happens with input and output in DPS (e.g. output is saved off somewhere else vs. the temp space when the job is running; what about “cloud” input data?)
- How does file management happen?
- Relative paths vs. absolute for input/output
- Mimic what’s happening on DPS (basedir)
- This wrapper run.sh script needs to manage the input files the way that your python script requires them (e.g. pass single file at a time vs. multiple files at once, etc.)

Run your scripts as if DPS is executing them:
- deactivate your conda environment
- run build.sh
- run run.sh

## Register the Algorithm with DPS using the Algorithm UI

1. Commit and push any changes to Github that you have made to the Algorithm while testing. The registration process will pull the code from Github as part of registration.
2. Open up Launcher: Register Algorithm
3. Fill in the fields as described here:
- a , b, c
4. Press Register and there will a link to view progress of the build (you should copy the link and paste it into a new page; if you press register it will be difficult to edit the form and fix any errors)


## Running and Monitoring the Algorithm with the Jobs UI

1. Launcher: View & Submit Jobs
2. Choose the Submit tab
3. Run the job as described here:
- a, b, c
4. Submit the job and go back to the View tab
5. You can observe the progress of your job while it runs, and the status (complete or fail) when it completes

### Running and Monitoring using the HySDS Jobs UI (Figaro)

This will be described in a future update. HySDS is the data-processing system used to run the jobs. It has a full web application that is used by NASA missions to monitor jobs and data-outputs. If you would like to beta-test this UI with MAAP, please contact Sujen or George.

## Registering and Running the Algorithm using maap.py

This will be described in a future update. Often larger batch-jobs are run from Python Notebooks rather than the GUI.

## Getting the Outputs of the Job

- output folder
- stderr & stdout examples
- logfiles