# Intro

DigitalGlobe's GBDX platform provides customers with a fast and easy way to search, order, and process imagery. This tutorial is intended to demonstrate how to access GBDX APIs via the Python SDK, gbdxtools. Specifically, this tutorial will cover:

1. [Import GBDXtools and authenticate credentials](#1.-Import-GBDXtools-and-authenticate-credentials)
2. [Catalog API](#2.-Catalog-API)
3. [Ordering API](#3.-Ordering-API)
4. [Workflow API](#4.-Workflow-API)
5. [Manage Workflows](#5.-Manage-Workflows)
6. [A more complex Workflow](#6.-A-more-complex-Workflow)

## 1. Import GBDXtools and authenticate credentials 
__1.1 Run the code in a cell by first selecting it and then either clicking the play button in the toolbar, or using the keyboard shortcut shift + enter. Import the 'sys' library so you can check your Python instance, and 'json' so you can print output in an easy to read format. The print statement will return your Python instance.__

In [None]:
import sys
import json
print (sys.executable)

__1.2 Fill in your your GBDX username, password, client ID and client secret in the following cell. This information can be found under your Profile information at https://gbdx.geobigdata.io/profile. If you have a GBDX config file, you can uncomment and use the first two lines of code to authenticate into GBDX.__

In [None]:
# from gbdxtools import Interface
# gbdx = Interface

import gbdxtools
gbdx = gbdxtools.Interface(
    username='',
    password='',
    client_id='',
    client_secret='')

## 2. Catalog API

The Catalog API is the backbone of data discovery through the APIs and allows you to search the all of the records and metadata contained in the DigitalGlobe archive, by geographic area, or parameters such as acquisition data, sensor, cloud cover, etc.

__2.1 First, search the catalog by spatial area, as defined by a WKT polygon. All imagery that intersects the polygon will be returned, but we'll just look at one result.__ 

In [None]:
wkt_string = "POLYGON((151.247484595670215 -33.956915138583831, 151.247484595670215 -33.941147704639356, 151.266492160171651 -33.941147704639356, 151.266492160171651 -33.956915138583831,151.247484595670215 -33.956915138583831))"

results = gbdx.catalog.search(searchAreaWkt=wkt_string)
print len(results)

In [None]:
print json.dumps(results[0:5], sort_keys=True, indent=4, separators=(',', ': '))

There is a lot of interesting information here, such as image bands, image resolution, cloud cover, timestamp, etc. If you would like to learn more about the image metadata, check the documentation at   http://gbdxdocs.digitalglobe.com/docs/catalog-v2-record-metadata.

One key metadata value to note is the Catalog ID of the image, which is unique to that image. When you order imagery to GBDX, you'll need to order it via its Catalog ID. 

```json
"catalogID": "10400100290BFF00",
```

__2.2 Add a start and end date to your search to filter the results by date range, compare the number of filtered results to the original search results.__ 

In [None]:
results = gbdx.catalog.search(searchAreaWkt=wkt_string,
                              startDate="2016-09-08T00:00:00.000Z",
                              endDate="2017-03-08T23:59:59.999Z")
print len(results)

__2.3 Last, filter your search results by specific metadata values. Add filters for cloud cover, off-nadir angle, and image bands, and look at the results.__ 

In [None]:
filters = [
        "cloudCover < 10",
        "offNadirAngle < 15",
        "imageBands = 'Pan_MS1_MS2'"
]

results = gbdx.catalog.search(searchAreaWkt=wkt_string,
                              startDate="2016-09-08T00:00:00.000Z",
                              endDate="2017-03-08T23:59:59.999Z",
                              filters=filters)
print len(results)

In [None]:
print json.dumps(results, sort_keys=True, indent=4, separators=(',', ': '))

__ 2.4 Given the Catalog ID for an image, fetch its metadata.__

In [None]:
record = gbdx.catalog.get('10400100245B7800')
print json.dumps(record, sort_keys=True, indent=4, separators=(',', ': '))

## 3. Ordering API

Imagery will often need to first be ordered to GBDX before it can be processed. The GBDX Orders API lets you order imagery by catalog ID and check the status of your order. Learn more about Ordering at http://gbdxdocs.digitalglobe.com/docs/ordering-course-v2.

__ 3.1 Define the Catalog ID to be ordered.__

In [None]:
cat_ids = ['10400100245B7800']

It is important to consider whether the image is available for processing on S3. If not, it must be ordered. In the following code, we will order an image by its catalog ID, which will return an order ID. We can then use that Order ID to check the status of the order, whether it's delivered and its S3 location.  

__ 3.2 Order the image by its Catalog ID, return and print the order status. Because this image has already been ordered to GBDX, its 'state' is 'delivered' and the status call returns its location on S3.__

In [None]:
order_id = gbdx.ordering.order(cat_ids)
order_status = gbdx.ordering.status(order_id)

print json.dumps(order_status, sort_keys=True, indent=4, separators=(',', ': '))

__ 3.3 You can order several catalog IDs in this way, up to 100. Use the same format as above to order three Catalog IDs.__

In [None]:
cat_ids = ['101001000350E300', '104001001970EA00', '1040010011634E00']

In [None]:
order_id = gbdx.ordering.order(cat_ids)
order_status = gbdx.ordering.status(order_id)

print json.dumps(order_status, sort_keys=True, indent=4, separators=(',', ': '))

## 4. Workflow API

A "workflow" is a series of tasks chained together to run on the GBDX platform. Each "task" is an individual process that performs a specific action against data, of which the inputs and outputs must be through S3. The outputs of one task are frequently the inputs to another.  

### Tasks

The first step to building a Workflow is to set up the individual Tasks that make up the Workflow. Each Task needs to be defined with its registered Task name, and assigned inputs. You can find documentation on many of the Tasks you'll use under 'Certified Algorithms' at http://gbdxdocs.digitalglobe.com/docs. However, it is also easy to interact with the Task object to get more information about it. 

The first Task in almost any Workflow is the Advanced Image Preprocessor Task, which orthorectifies raw imagery and offers other image pre-processing options. Documentation at https://gbdxdocs.digitalglobe.com/docs/advanced-image-preprocessor.

__4.1 Define the Advanced Image Preprocessor task using its registered Task name, 'AOP_Strip_Processor', and print out the task definition.__

In [None]:
aop_task = gbdx.Task("AOP_Strip_Processor")
print json.dumps(aop_task.definition, sort_keys=True, indent=2, separators=(',', ': '))

The Task definition has a lot of useful information, including descriptions of inputs and outputs, and which of the inputs are required.

__ 4.2 Run the code in the following cell to get a list of inputs for this Task.__ 

In [None]:
print aop_task.inputs

__ 4.3 Append an input port name to the same call to get detailed information about that input.__

In [None]:
print aop_task.inputs.enable_acomp

__ 4.4 Run the following code cells to find out more information about the Task outputs.__ 

In [None]:
aop_task.outputs

In [None]:
aop_task.outputs.data

___
### Workflows
Now that you know about Tasks, and about inputs and outputs to a Task, let's walk through the steps to building a Workflow. You will get a chance to try it out at the end of this explanation.  

*S3 inputs and outputs*

> The input data for a Task must be located on S3. This could be in the way of an image that you've ordered to GBDX, it could be data that you've staged in your S3 bucket, or it could be the output from a previous Task in the Workflow. For this example Workflow, you're going to use an image ordered to GBDX as input to the Task. 

```python
# define the S3 location of the input data
source_s3 = gbdx.catalog.get_data_location(catalog_id='10400100245B7800')
```

> Just as the input must come from an S3 location, the output of a Task must be saved to an S3 location. Gbdxtools has a 'savedata' feature that will automatically save the output to your Customer S3 bucket, under the directory that you specify. We will cover the 'savedata' feature later in the script.
 
```python
# define the S3 location of the output directory
target_s3 = 'demo_output/aop_10400100245B7800/'
```

*Defining Task(s) *

> As we mentioned before, the recommended first Task in any Workflow is the Advanced Image Preprocessor Task. Define the Advanced Image Preprocessor Task using its Task name, 'AOP_Strip_Processor', and the S3 location of the ordered image as the Task's input port. The Task author has named this input port, 'data'.

```python
# define the pre-processing Task
aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3)
```

*Assigning Tasks(s) to a Workflow*

> A Workflow can execute many Tasks and facilitate the movement of data between those Tasks. For this simple example, there is only one Task. 

```python
# define the Workflow and pass in aop_task
my_workflow = gbdx.Workflow([ aop_task ])
```

*Starting and saving a Workflow*

> The 'savedata' feature will automatically save the output of this Worflow to your GBDX Customer S3 bucket. Pass in the aop_task output by referencing its output port, also named 'data'. Set the location parameter to the output directory you specified earlier. 

```python
# save the output of the Workflow to S3
my_workflow.savedata(aop_task.outputs.data, location=target_s3)
```

> Execute the Workflow. Once the Workflow is started, the Platform will spin up the compute resources and data to run each Task, and will run until each of the Tasks in the Workflow has completed. 

```python
# execute the Workflow
my_workflow.execute()
```

>Print and save the Workflow ID, as this will allow you to track and manage your Workflows. 

```python
print my_workflow.id
```

__ 4.5 Run the code in the following cell to execute the Workflow. __

In [None]:
# define the S3 location of the input data
source_s3 = gbdx.catalog.get_data_location(catalog_id='10400100245B7800')

# define the S3 location of the output directory
target_s3 = 'demo_output/aop_10400100245B7800/'

# define the pre-processing Task
aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3)

# define the Workflow
my_workflow = gbdx.Workflow([ aop_task ])

# save the output of the Workflow to S3
my_workflow.savedata(aop_task.outputs.data, location=target_s3)

# execute the Workflow
my_workflow.execute()
print my_workflow.id

## 5. Manage Workflows

Once a Workflow is started, you can use the workflow object, called 'my_workflow' in this example, to track and manage the Workflow events. As long as you have the Workflow ID, you can always access this information. 

__ 5.1 Get the status of the Workflow. This will return the status of whichever event is currently running within the Workflow. __

In [None]:
print my_workflow.status

__ 5.2 List the events that have taken place or are currently taking place within the Workflow. __

In [None]:
print json.dumps(my_workflow.events, sort_keys=True, indent=2, separators=(',', ': '))

__ 5.3 Get the stdout and stderr for the Workflow. __

In [None]:
my_workflow.stdout

In [None]:
my_workflow.stderr

__ 5.3 Each Task also has a unique ID. Get the Task IDs. __

In [None]:
task_ids = my_workflow.task_ids
print task_ids

__ 5.4 Get the stdout and stderr of a specific Task within the Workflow. __

In [None]:
stdout = gbdx.workflow.get_stdout(my_workflow.id, task_ids[0])
print stdout

In [None]:
stderr = gbdx.workflow.get_stderr(my_workflow.id, task_ids[0])
print stderr

__ 5.5 Find out if the Workflow completed and succeeded. __ 

In [None]:
my_workflow.complete

In [None]:
my_workflow.succeeded

__ 5.6 Generate the Workflow JSON schema running behind the scenes. __

In [None]:
my_workflow.generate_workflow_description()

> You can cancel a Workflow while it's running.

```python
my_workflow.cancel()
```

__ 5.7 You can also get information about your Workflow later if you've saved the Workflow ID. Replace the Workflow ID in the following call with your Workflow ID. __

In [None]:
wf = gbdx.workflow.get('4578270231173558359')

## 6. A more complex Workflow 

The above example Workflow only has one Task assigned to it. The power of the Workflow system, however, is in being able to link outputs between Tasks. In this next example, we will assign the output of the image pre-processing Task as input to the OpenSkyNet (OSN) Task. This Task will extract features from imagery using a trained neural net model. We provide a model that detects aircraft.  

In this Workflow, you'll first pre-process the input image as you did in the above example (parameterized for the OSN Task), crop the pre-processed image to a more manageable size, then run the OSN Task on the prepared image.  

```python
# define the S3 location of the input data
source_s3 = gbdx.catalog.get_data_location(catalog_id='103001003A230A00')

# define a directory in which to save the OSN Workflow output
target_s3 = 'demo_output/osn_103001003A230A00'
```

> You're going to use the Advanced Image Preprocessor Task again, but not the default parameters as before. The OSN Task requires pansharpened, DRA'd, atmospherically compensated, imagery of a specific orthorectification pixel size and interpolation type. Set the input port for this Task,'data', to the image S3 location you defined above.

```python
# define the pre-processing Task to prepare an image for the OSN Task
aop_task = gbdx.Task("AOP_Strip_Processor", data=source_s3, enable_dra=True, enable_pansharpen=True,
    enable_acomp=True, ortho_epsg='UTM', bands='PAN+MS', ortho_pixel_size='0.5',
    ortho_interpolation_type='Bilinear')
```

> The next Task crops the AOP'd image from the previous Task. We use a cropped image because the OSN Task is a compute intensive Task, and may timeout if used on large images. Set the output of aop_task as input to crop_task. 

```python
# define the crop image Task to crop the pre-processed image by a bounding box 
crop_task = gbdx.Task("CropGeotiff", data=aop_task.outputs.data, output_to_root_dir=True, wkt="POLYGON((-77.49189376831055 38.97302269384043,-77.43335723876953 38.97302269384043,-77.43335723876953 38.920688310253,-77.49189376831055 38.920688310253,-77.49189376831055 38.97302269384043))")
```

> Finally, define the OSN Task that will run a trained neural net model to extract aircraft from imagery. Set the crop_task output as osn_task input. Set the model to the aircraft model we've provided, with the recommended parameters.  

```python
# define the OSN Task using an aircraft model 
osn_task = gbdx.Task("openskynet-v5:0.0.2", data=croptask.outputs.data, 
    model='s3://vector-lulc-models/0ad86e8caf6d9000.zip', log_level='trace', confidence='0.85', pyramid=True,
    pyramid_window_sizes='[150, 80]', pyramid_step_sizes='[40, 20]', step_size='15', tags='Airliner, Fighter,
    Helicopter')
```

> The next step is to create a Workflow as you did before, but this time the Workflow will contain more Tasks, with connected inputs and outputs. 

```python
# define the Workflow and pass in aop_task, crop_task, and osn_task 
my_workflow2 = gbdx.Workflow([ aop_task, crop_task, osn_task ])
```

> Use the savedata feature again to save the Workflow output to your GBDX customer S3 bucket, to the OSN output directory you specified above. 

```python
# save the OSN output to S3  
my_workflow2.savedata(osn_task.outputs.data, location=target_s3)
```

```python
# execute the Workflow and print the Workflow ID   
my_workflow2.execute()
print my_workflow2.id
```

__ 6.1 Run the code in the following cell to execute the new Workflow. __

In [None]:
# define the S3 location of the input data
source_s3 = gbdx.catalog.get_data_location(catalog_id='103001003A230A00')

# define a directory in which to save the OSN Workflow output
target_s3 = 'demo_output/osn_103001003A230A00'

# define the pre-processing Task to prepare an image for the OSN Task
aop_task = gbdx.Task("AOP_Strip_Processor", data=source_s3, enable_dra=True, enable_pansharpen=True,
    enable_acomp=True, ortho_epsg='UTM', bands='PAN+MS', ortho_pixel_size='0.5',
    ortho_interpolation_type='Bilinear')

# define the crop image Task to crop the pre-processed image by a bounding box 
crop_task = gbdx.Task("CropGeotiff", data=aop_task.outputs.data, output_to_root_dir=True,
    wkt="POLYGON((-77.49189376831055 38.97302269384043,-77.43335723876953 38.97302269384043,-77.43335723876953 38.920688310253,-77.49189376831055 38.920688310253,-77.49189376831055 38.97302269384043))")

# define the OSN Task using an aircraft model
osn_task = gbdx.Task("openskynet-v5:0.0.2", data=crop_task.outputs.data, model='s3://vector-lulc-models/0ad86e8caf6d9000.zip', log_level='trace', confidence='0.85', pyramid=True, pyramid_window_sizes='[150, 80]', pyramid_step_sizes='[40, 20]', step_size='15', tags='Airliner, Fighter, Helicopter')

# define the Workflow and pass in aop_task, crop_task, and osn_task 
my_workflow2 = gbdx.Workflow([ aop_task, crop_task, osn_task ])
                              
# save the OSN output to S3                              
my_workflow2.savedata(osn_task.outputs.results, location=target_s3)

# execute the Workflow and print the Workflow ID                              
my_workflow2.execute()
print my_workflow2.id

__ 6.2 Use the tools we learned earlier to track the Workflow. You'll notice that you'll get an event for each Task, for each change in state.__

In [None]:
print my_workflow2.status

In [None]:
print json.dumps(my_workflow2.events, sort_keys=True, indent=2, separators=(',', ': '))

In [None]:
for event in my_workflow2.events:
    print event['task'], event['event']

# Conclusion

Congratulations on learning how to search the Catalog, order imagery, and run Workflows. You can see the output of your Workflows by logging into the [S3 browser](http://s3browser.geobigdata.io/) with your GBDX credentials, and navigating the output directories you created at `'demo_output/'`. 