## intro
In the custom-task-tutorial notebook, we demonstrated how to create a Task that takes file-based data as an input to the Task code. But what if the user needs to pass in a parameter when they use the tool? This is where you would use a 'string' port versus a 'directory' port. This tutorial will demonstrate how to write the Task code to read input from a string port, how to register a Task with string ports, and an example of how the string port is used when calling the task with gbdxtools. 

For this tutorial, it will be helpful to have completed the custom-task-tutorial notebook first, as this tutorial will build off of it. We are going to take the Task that we wrote in that tutorial, which clips a raster to a shapefile, and add an additional functionality to it. We're going to add a parameter to the Task which allows the user to specify which portion of the raster will be removed, either the portion of the raster that falls outside the shapefile or the that which falls within the shapefile. For the sake of clarity, and a little brevity, we're going to call this the Doughnut Task, in which a user can select between doughnut and doughnut-hole for the clip selection. If 'doughnut' is specified in the call to this Task within a Workflow, then the inner portion of the raster will be removed, and the outer portion will be retained. If 'doughnut-hole' is specified, then the outer portion of the raster will be removed, and the inner portion retained.

### string ports: aop task example
In order to understand how string ports are handled in GBDX, let's review a common Task already on GBDX, the Advanced Image Preprocessor Task. This is the Task that orthorectifies the raw imagery that comes off of the satellites, and handles image preprocessing options such as atmospheric compensation and pan sharpening. 

To review, we call this Task in gbdxtools using its registered Task name, "AOP_Strip_Processor", and then set it's required inputs. This Task has only one required input, "data", which is the S3 URL of the raw image to be processed. 

```python
source_s3 = 's3://receiving-dgcs-tdgplatform-com/056244928010_01_003'
aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3)
```

Although this Task only has one required input, it has several optional inputs that allow the user to specify exactly how the image should be processed ([documentation here](https://gbdxdocs.digitalglobe.com/docs/advanced-image-preprocessor)). The user can specify that the image be pan sharpened, have dynamic range adjustment applied, specify the projection for orthorectification, etc. 

Here's an example that explicitly specifies additional processing steps.

```python
aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3, enable_acomp=True, enable_pansharpen=False, enable_dra=False)
```

Let's compare the types of data input here. For the input port '`data`', we're passing in a file-based type data - the raw image that is stored in the S3 directory we've specified. When the Task runs, the specified input data will be automatically mounted inside the Docker container, at `/mnt/work/input/data`.

But for the parameters that are passed in via string ports, a ports.json file is automatically generated and mounted inside the Docker container, at `/mnt/work/input/ports.json`. The ports.json file contains the name/value pairs for each string port. Here's an example of the type of ports.json file that is automatically generated when our above AOP task is called. 

```json
{
	"enable_acomp": true,
	"enable_pansharpen": false,
	"enable_dra": false
}
```

### string ports: doughnut task example
Now, let's take this concept and apply it to our example Doughnut Task. We've already discussed how the code needs to point to the `/mnt/work/input/<port name>` for its input data. Let's look at how we can point the Task code to parameter inputs read from the ports.json file. First, here's an example calling the Doughtnut Task in gbdxtools:

```python
source_s3 = 's3://receiving-dgcs-tdgplatform-com/056244928010_01_003'
doughnut_task = gbdx.Task('doughnut_clip', data_in=source_s3, clip_selection='doughnut')
```

When this Task runs with the Workflow system, the input image will automatically be mounted at `/mnt/work/input/data_in`, and a ports.json file will be automatically generated and mounted at `/mnt/work/input/ports.json`. This is what the ports.json file will contain:

```json
{
    "clip_selection":"doughnut"
}
```

Now, let's modify the Task code to read its parameter input from the ports.json file. The first part of the Doughnut Task script will be the same as the Clip Raster Task script from the previous tutorial. 

> We need to import the raster and shapefile processing libraries, and pull the raster and shapefile from their associated input ports. 

```python
# import libraries 
import fiona
import rasterio
import rasterio.mask
import os
import glob

# set the input ports path
in_path = '/mnt/work/input'
shape_path = in_path + '/input_shapefile'
raster_path = in_path + '/input_raster'

# search the input shapefile port for the first shapefile that we specify in the call to this task
my_shape = glob.glob1(shape_path, '*.shp')[0]

# search the input image port for the first geotiff that we specify in the call to this task
my_raster = glob.glob1(raster_path, '*.tif')[0]
```

> Next, we open the ports.json file and load its contents 

```python
with open('/mnt/work/input/ports.json') as portsfile:
    ports_js = json.load(portsfile)
```

> Assign the value from 'clip_selection' to the `crop_selection` variable

```python
cs = ports_js['clip_selection']
```

> Then we write some simple logic that sets the default clip parameters, which is what we want if the user select `doughnut_hole`, but then reverse the values if the user selects `doughnut`

```python
# set default clip parameters, reverse values if clip_selection is 'doughnut'
invert_method = False
crop_method = True

if crop_select == 'doughnut':
    invert_method = True
    crop_method = False
```

> The following code is identical to our original Clip Task code, wherein we define the output port and its filepath and navigate to the output directory, then use the Fiona library to grab the geometery from the shapefile. 

```python
# define the name of the output data port
out_path = '/mnt/work/output/data_out'

# create the output data port
if os.path.exists(out_path) == False:
  os.makedirs(out_path)

# change directories to the output data port
os.chdir(out_path)

# open the input shapefile and get the polygon features for clipping
with fiona.open(os.path.join(shape_path, my_shape), "r") as shapefile:
  features = [feature["geometry"] for feature in shapefile]
```

> This next code block is similar to the Clip Task code that uses the Rasterio library to clip the raster and copy its metadata, but we add the '`invert`' parameter and pass in the '`invert_method`' we defined above, which is `True` if the user specifies '`doughnut`' and `False` if they specify '`doughnut-hole`'. If `invert_method` is True, `crop` must be False.  

```python
# open the input image, clip the image with the shapefile and get the image metadata
with rasterio.open(os.path.join(raster_path, my_raster)) as src:
  out_raster, out_transform = rasterio.mask.mask(src, features, crop=crop_method, invert=invert_method)
  out_meta = src.meta.copy()
```

> Finally, write out the metadata to the new raster, 'masked.tif', as we did before

```python
# write out the metadata to the raster
out_meta.update({"driver": "GTiff",
  "height": out_raster.shape[1],
  "width": out_raster.shape[2],
  "transform": out_transform})

# write out the output raster
with rasterio.open("masked.tif", "w", **out_meta) as dest:
  dest.write(out_raster)
```

___
Now that we understand a little more about how to use string ports, let's write and register the new Doughnut Task to the GBDX registery, following the same sequence of steps that we did in the Clip Task registeration tutorial.

## 1. Inputs and outputs
We've already covered the Doughnut Task code, in these next few steps we're just going to set up a directory, write and save the task code there. 

#### 1.1 Run the code in the following cell to create the 'doughnut_tutorial_files/docker_projects/bin' directory.

In [1]:
import os
if os.path.exists('doughnut_tutorial_files') == False:
  os.makedirs('doughnut_tutorial_files/docker_projects/bin')

#### 1.2 Run the code in the following cell to navigate to the directory you just created.

In [2]:
cd doughnut_tutorial_files/docker_projects/bin

/Users/elizabethgolden/Dropbox/DG/notebooks/doughnut_tutorial_files/docker_projects/bin


#### 1.3 Run the code in the following cell to write the code that we just reviewed to 'doughnut_task.py'.

In [3]:
%%writefile doughnut_task.py

# import libraries
import fiona
import rasterio
import rasterio.mask
import json
import os
import glob

# set the input ports path
in_path = '/mnt/work/input'
shape_path = in_path + '/input_shapefile'
raster_path = in_path + '/input_raster'

# search the input shapefile port for the first shapefile that we specify in the call to this task
my_shape = glob.glob1(shape_path, '*.shp')[0]

# search the input image port for the first geotiff that we specify in the call to this task
my_raster = glob.glob1(raster_path, '*.tif')[0]

# open and load the contents of ports.json
with open('/mnt/work/input/ports.json') as portsfile:
    ports_js = json.load(portsfile)

# assign the value from 'crop_selection'
crop_select = ports_js['clip_selection']

# set default clip parameters, reverse the values if clip_selection is 'doughnut'
invert_method = False
crop_method = True

if crop_select == 'doughnut':
    invert_method = True
    crop_method = False

# define the name of the output data port
out_path = '/mnt/work/output/data_out'

# create the output data port
if os.path.exists(out_path) == False:
  os.makedirs(out_path)

# change directories to the output data port
os.chdir(out_path)

# open the input shapefile and get the polygon features for clipping
with fiona.open(os.path.join(shape_path, my_shape), "r") as shapefile:
  features = [feature["geometry"] for feature in shapefile]

# open the input image, clip the image with the shapefile and get the image metadata
with rasterio.open(os.path.join(raster_path, my_raster)) as src:
  out_raster, out_transform = rasterio.mask.mask(src, features, crop=crop_method, invert=invert_method)
  out_meta = src.meta.copy()

# write out the metadata to the raster
out_meta.update({"driver": "GTiff",
  "height": out_raster.shape[1],
  "width": out_raster.shape[2],
  "transform": out_transform})

# write out the output raster
with rasterio.open("masked.tif", "w", **out_meta) as dest:
  dest.write(out_raster)

Overwriting doughnut_task.py


___
## 2. Dockerfile
We're going to write a Dockerfile the same as before, only this time we're adding wrapping up the new Doughnut Task script. 

#### 2.1 Run the code in the following cell to navigate back one folder to /docker_projects.

In [4]:
cd ..

/Users/elizabethgolden/Dropbox/DG/notebooks/doughnut_tutorial_files/docker_projects


#### 2.2 Run the code in the following cell to write '`Dockerfile`', note that we're placing the new '`doughnut_task.py`' script here. 

In [5]:
%%writefile Dockerfile
FROM continuumio/miniconda

RUN conda install rasterio
RUN conda install fiona

RUN mkdir /my_scripts
ADD ./bin /my_scripts
CMD python /my_scripts/doughnut_task.py

Overwriting Dockerfile


___
## 3. Build your Docker
Next, build the new Docker image as we did for the Clip Raster Task, but for the Doughnut Task. 

NOTE: AT THIS POINT IN THE TUTORIAL, WE'RE GOING TO LEAVE THE JUPYTER NOTEBOOK AND SWITCH TO DOCKER

#### 3.1 Bring up a terminal window on your computer and start Docker.

#### 3.2 Within the terminal window, navigate to the folder containing the Dockerfile, under 'doughnut_tutorial_files/docker_projects'.

```
cd <full/path/to>/doughnut_tutorial_files/docker_projects
```

#### 3.3 Copy and paste the following Docker command to build a Docker from your Dockerfile, but __FIRST REPLACE 'gbdxtrainer' WITH YOUR DOCKER USERNAME__.  

```
docker build -t gbdxtrainer/doughnut_docker .
```

## 4. Push your Docker to Docker Hub

#### 4.1 While still within the terminal window, log in to Docker Hub using the following Docker command USING YOUR DOCKER HUB LOGIN CREDENTIALS. 
```
docker login --username gbdxtrainer --password a_fake_password
```

#### 4.2 Once logged in, use the following Docker command to push your Docker image to Docker Hub, CHANGE TO YOUR DOCKER USERNAME. Note: this may a few minutes.
```
docker push gbdxtrainer/doughnut_docker
```
___

## 5. Add GBDX collaborators to your Docker Hub repository
Your Docker repository on Docker Hub can be public or private, but certain GBDX collaborators must be added to the repository in order for the Platform to pull and run the Docker. 

#### 5.1 Log in to Docker Hub https://hub.docker.com/

You should now see the Docker image that you just pushed to Docker Hub, in it's own repository of the same name. 

#### 5.2 Open the repository and select the 'Collaborators' tab. Under 'Username', enter each of the following as Collaborators to your repository. This is what will allow GBDX to pull and execute your Task.  
```
tdgpbuild
tdgpdeploy
tdgplatform
```
___

## 6. Task definitition 
NOTE: WE'RE BACK TO THE JUPYTER NOTEBOOK FOR THE REST OF THE TUTORIAL

We are going to write a Task definition schema as we did before, only add the new string input port, as shown here. Note that the 'type' for this input port is 'string', not 'directory'. 

```json
{
        "required": true,
        "description": "The part of raster to retain when clipped, options are 'doughnut' or 'doughnut-hole'.",
        "name": "clip_selection",
        "type": "string"
    }
```

#### 6.1 Run the code in the following cell to navigate back one directory (back out of the '`/docker_projects`' directory to the '`/task_tutorial_files`' directory).

In [6]:
cd ..

/Users/elizabethgolden/Dropbox/DG/notebooks/doughnut_tutorial_files


#### 6.2 MODIFY THE DOCKER IMAGE AND TASK NAME WITH YOURS, then run the code in the following cell to write the full JSON document that we just reviewed to doughnut-task-definition.json.

In [7]:
%%writefile doughnut-task-definition.json
{
    "inputPortDescriptors": [{
        "required": true,
        "description": "Directory containing a raster.",
        "name": "input_raster",
        "type": "directory"
    }, {
        "required": true,
        "description": "Directory containing a shapefile",
        "name": "input_shapefile",
        "type": "directory"
    }, {
        "required": true,
        "description": "Which part of raster to retain when clipped. Options are 'doughnut' and 'doughnut-hole'.",
        "name": "clip_selection",
        "type": "string"
    }],
    "outputPortDescriptors": [{
        "required": true,
        "description": "A cropped tif.",
        "name": "data_out",
        "type": "directory"
    }],
    "containerDescriptors": [{
        "type": "DOCKER",
        "command": "",
        "properties": {
            "image": "gbdxtrainer/doughnut_docker:latest"
        }
    }],
    "description": "Clips a raster to shapefile, clip selection can be inverted.",
    "name": "doughnut_clip_gt",
    "version": "1.0.3",
    "properties": {
        "isPublic": false,
        "timeout": 36000
    }
}

Overwriting doughnut-task-definition.json


___
## 7. Register Task
All of the pieces are in place to register the new Doughnut Task to the Platform using gbdxtools.   

#### 7.1 Fill in your your GBDX username, password, client ID and client secret in the following cell. This information can be found under your Profile information at https://gbdx.geobigdata.io/profile. If you have a GBDX config file, you can uncomment and use the first two lines of code to authenticate into GBDX.

In [8]:
# from gbdxtools import Interface
# gbdx = Interface()

import gbdxtools
gbdx = gbdxtools.Interface(
    username='',
    password='',
    client_id='',
    client_secret='')

#### 7.2 Run the code in the following cell to submit your Task to the Task registry.

In [9]:
gbdx.task_registry.register(json_filename = 'doughnut-task-definition.json')

u'doughnut_clip_gt:1.0.3 has been submitted for registration.'

#### 7.3 Wait a few minutes, then see if the Task registration has completed by runing the code in the following cell to create an instance of your Task. FIRST REPLACE 'gt' IN THE TASK NAME WITH YOUR INITIALS.  

In [10]:
doughnut_task = gbdx.Task("doughnut_clip_gt")

___
## 8. Workflow
The last step is to test the Doughnut Task in a Workflow with gbdxtools. This Workflow is identical to the Clip Task Workflow from before, but using the doughnut task registered name and clip selection parameter. 

#### 8.1 Run the code in the following cell to execute a Workflow using the new Doughnut Task. FIRST REPLACE 'gt' IN THE TASK NAME WITH YOUR INITIALS.

In [11]:
# define the S3 path for an image by passing in its Catalog ID 
source_s3 = gbdx.catalog.get_data_location(catalog_id='10400100245B7800')

# define an input shapefile from S3
shape_path = 's3://tutorial-files/this_shp_will_clip_10400100245B7800/'

# define the 'AOP_Strip_Processor' 
aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3, enable_pansharpen=True)

# define the 'gdal_cli' Task
glue_task = gbdx.Task('gdal-cli', data=aop_task.outputs.data.value, execution_strategy='runonce',
                         command="""mv $indir/*/*.tif $outdir/""")

# define the 'clip_raster' Task 
doughnut_task = gbdx.Task("doughnut_clip_gt", input_raster=glue_task.outputs.data.value, input_shapefile=shape_path, clip_selection="doughnut")

# build a Workflow to run the 'clip_raster' Task
workflow = gbdx.Workflow([aop_task, glue_task, doughnut_task])

# specify where to save the output within your customer bucket
workflow.savedata(doughnut_task.outputs.data_out, location='task_demo/doughnut')

# kick off the Workflow and keep track of the Workflow ID
workflow.execute()
print workflow.id

4629530075806175553


GBDX is now running your Workflow. While the Workflow is running, you can interact with the Workflow object and track its status. 

#### 8.2 Run the code in the following cell to get the status of the Workflow. This call will return the the status of whatever event is currently underway.

In [15]:
workflow.status

{u'event': u'succeeded', u'state': u'complete'}

Once your Workflow has completed (and succeeded!), you will be able to see the output in your customer S3 bucket.

NOTE: AT THIS POINT IN THE TUTORIAL, WE'RE GOING TO LEAVE THE JUPYTER NOTEBOOK AND SWITCH TO THE S3 BROWSER 

#### 8.3  Log into the S3 browser [http://s3browser.geobigdata.io](http://s3browser.geobigdata.io/login.html) using your GBDX credentials. 

#### 8.4 Navigate to 'task_demo/doughnut' to find the saved output of your Workflow. 

The end!