# CDMS Large Job Demo

This notebook will demonstrate how the new large job functionality works for CDMS.

In [1]:
import requests
import json
import time
from urllib.parse import urljoin
from datetime import datetime, timedelta


CDMS_HOST = 'https://doms.jpl.nasa.gov'

## Submit a matchup request

In [2]:
match_params = {
    'primary': 'JPL-L4-MRVA-CHLA-GLOB-v3.0',
    'secondary': 'shark-2018',
    'startTime': '2018-04-01T00:00:00Z',
    'endTime': '2018-04-01T23:59:59Z',
    'tt': 86400,  # Time tolerance in seconds
    'rt': 50000,  # Spatial tolerance in meters
    'b': '-140,10,-110,40',
    'platforms': '3B',
    'parameter': 'mass_concentration_of_chlorophyll_in_sea_water',
    'depthMin': -5,
    'depthMax': 5,
    'matchOnce': 'true',
    'resultSizeLimit': 100
}

matchup_endpoint = 'match_spark'
response = requests.get(
    urljoin(CDMS_HOST, matchup_endpoint), 
    params=match_params
)

In [3]:
response.raise_for_status()
response_json = response.json()
print(f'Match request submission endpoint took {response.elapsed.total_seconds()} seconds.')

Match request submission endpoint took 0.841938 seconds.


In [4]:
print(json.dumps(response_json, indent=2))

{
  "status": "running",
  "message": "",
  "createdAt": "2023-12-11 18:18:53.582000",
  "updatedAt": null,
  "links": [
    {
      "href": "https://doms.jpl.nasa.gov/job?id=dcfbd7cf-c5b9-4d40-9cbd-d169961fb8cc",
      "title": "Get job status - the current page",
      "type": "application/json",
      "rel": "self"
    },
    {
      "href": "https://doms.jpl.nasa.gov/job/cancel?id=dcfbd7cf-c5b9-4d40-9cbd-d169961fb8cc",
      "title": "Cancel the job",
      "rel": "cancel"
    }
  ],
  "params": {
    "primary": "JPL-L4-MRVA-CHLA-GLOB-v3.0",
    "matchup": "shark-2018",
    "startTime": "2018-04-01 00:00:00+00:00",
    "endTime": "2018-04-01 23:59:59+00:00",
    "bbox": "-140,10,-110,40",
    "timeTolerance": 86400,
    "radiusTolerance": 50000.0,
    "platforms": "3B",
    "parameter": "mass_concentration_of_chlorophyll_in_sea_water",
    "depthMin": -5.0,
    "depthMax": 5.0
  },
  "executionID": "dcfbd7cf-c5b9-4d40-9cbd-d169961fb8cc"
}


The match_spark endpoint will immediately redirect to the `/job?id=X` endpoint once the job is submitted in the backend, which is what we see above. 

The output above has a few fields of note:

- `executionID`: This ID is used for the rest of the queries we'll go over.
- `status`: A matchup job can have a status of `running`, `cancelled`, `failed` or `completed`
- `links`: These [HATEOAS](https://en.wikipedia.org/wiki/HATEOAS#:~:text=Hypermedia%20as%20the%20Engine%20of,from%20other%20network%20application%20architectures.) links allow the user to easily find relevant endpoints for this job.

The links available for a running job are:

- `self`: The current page. This is the `/job?id=X` endpoint.
- `cancel`: The endpoint to cancel the current job.

The Job ID endpoint `/job?id=X` should be polled by the user until the job is completed. Let's check if the job is complete. 

## Check matchup job status

Check job status. This needs to be done in a loop so we can re-query the `/job?id=X` endpoint until the job is finished. In this case, since this job doesn't take very long, we query the endpoint every 2 seconds. 

In [5]:
job_id = response_json['executionID']
job_endpoint = 'job'

response_json = None
while response_json is None or response_json['status'] == 'running':
    if response_json is not None:
        print('Job is still running...')
    response = requests.get(
        urljoin(CDMS_HOST, job_endpoint),
        params={
            'id': job_id
        }
    )
    response.raise_for_status()
    response_json = response.json()
    time.sleep(2)
print(json.dumps(response_json, indent=2))

{
  "status": "success",
  "message": "",
  "createdAt": "2023-12-11 18:18:53.582000",
  "updatedAt": "2023-12-11 18:19:16.704000",
  "links": [
    {
      "href": "https://doms.jpl.nasa.gov/job?id=dcfbd7cf-c5b9-4d40-9cbd-d169961fb8cc",
      "title": "Get job status - the current page",
      "type": "application/json",
      "rel": "self"
    },
    {
      "href": "https://doms.jpl.nasa.gov/cdmsresults?id=dcfbd7cf-c5b9-4d40-9cbd-d169961fb8cc&output=CSV",
      "title": "Download CSV results",
      "type": "text/csv",
      "rel": "data"
    },
    {
      "href": "https://doms.jpl.nasa.gov/cdmsresults?id=dcfbd7cf-c5b9-4d40-9cbd-d169961fb8cc&output=JSON",
      "title": "Download JSON results",
      "type": "application/json",
      "rel": "data"
    },
    {
      "href": "https://doms.jpl.nasa.gov/cdmsresults?id=dcfbd7cf-c5b9-4d40-9cbd-d169961fb8cc&output=NETCDF",
      "title": "Download NETCDF results",
      "type": "binary/octet-stream",
      "rel": "data"
    }
  ],
  "par

We can see based on the output above the job is complete, because `status=success`. 

Now that the job is completed, there are new links present in the `links` output. 

- `CSV` data: Endpoint to download the CSV results from the matchup request that was just run.
- `NETCDF` data: Endpoint to download the NetCDF results from the matchup request that was just run.
- `JSON` data: Endpoint to retrieve JSON results form the matchup request that was just run.

In [6]:
time_elapsed = (
    datetime.fromisoformat(response_json['updatedAt']) - datetime.fromisoformat(response_json['createdAt'])
).seconds
print(f'The CDMS matchup request took {time_elapsed} seconds to complete.')

The CDMS matchup request took 23 seconds to complete.


## Cancel a matchup job

To demonstrate job cancellation, we will submit a matchup job and then immediately cancel the submitted job.

In [7]:
response = requests.get(
    urljoin(CDMS_HOST, matchup_endpoint), 
    params=match_params
)
response.raise_for_status()
response_json = response.json()

# Get Job ID
job_id = response_json['executionID']

# Cancel job
job_cancel_endpoint = 'job/cancel'
response = requests.get(
    urljoin(CDMS_HOST, job_cancel_endpoint),
    params={
        'id': job_id
    }
)
response.raise_for_status()
response_json = response.json()
print(json.dumps(response_json, indent=2))

{
  "status": "cancelled",
  "message": "",
  "createdAt": "2023-12-11 18:20:28.541000",
  "updatedAt": "2023-12-11 18:20:30.288000",
  "links": [
    {
      "href": "https://doms.jpl.nasa.gov/job?id=6fa7d7a5-0bbb-43be-9097-7ce7082bac56",
      "title": "Get job status - the current page",
      "type": "application/json",
      "rel": "self"
    }
  ],
  "params": {
    "primary": "JPL-L4-MRVA-CHLA-GLOB-v3.0",
    "matchup": "shark-2018",
    "startTime": "2018-04-01 00:00:00+00:00",
    "endTime": "2018-04-01 23:59:59+00:00",
    "bbox": "-140,10,-110,40",
    "timeTolerance": 86400,
    "radiusTolerance": 50000.0,
    "platforms": "3B",
    "parameter": "mass_concentration_of_chlorophyll_in_sea_water",
    "depthMin": -5.0,
    "depthMax": 5.0
  },
  "executionID": "6fa7d7a5-0bbb-43be-9097-7ce7082bac56"
}
