## Example DLHub Ingestion

This notebook demonstrates ingesting a servable into DLHub. A servable is any containerized logic that can be invoked through DLHub. The DLHub ingestion pipeline consists of the following steps:<br>
1. Specify where the local servable is<br>
2. Stage the servable data somewhere for DLHub to retrieve it<br>
3. Create a description document<br>
4. Submit the ingestion request<br>
5. Track progress

#### Install Requirements:

This notebook uses S3 as the location for staging the servable. Therefore, we require AWS credentials for boto3 to push data to S3. If you haven't already, you will need to pip install awscli and run "aws configure".


awscli --> pip install awscli<<br>
boto3 --> pip install boto3<br/>
requests --> pip install requests<br/>
pandas --> pip install pandas<br/>
mdf_toolbox --> pip install mdf_toolbox<br/>
dlhub_client --> pip install dlhub_client<br/>

In [1]:
import time
from dlhub_client import client
from ingest import upload_directory, ingest_metadata, ingest_to_search, check_status

### Step 1. Define a name and specify model location

In [2]:
# Specify the local path to the model. E.g. "parsl-containers/cifar10"
servable_path = "/home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10" 

# Give the model a name. E.g. "my_cifar10"
servable_name = "ryan_cifar10" 

### Step 2. Stage the local model to S3

To ingest the model DLHub needs to be able to download and containerize it. This step puts the model in an accessible S3 bucket.

In [3]:
staged_location = upload_directory(servable_path)
print("\nUploaded to: {}".format(staged_location))

Uploading to s3: 

Uploading: /home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10/dlhub_shim.py
Uploading: /home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10/requirements.txt
Uploading: /home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10/__init__.py
Uploading: /home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10/application.py
Uploading: /home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10/Dockerfile
Uploading: /home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10/data/x_test.npy
Uploading: /home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10/data/y_test.npy
Uploading: /home/ryan/Documents/Argonne/DLHub/dlhub_container/parsl-containers/cifar10/model/cifar10vgg.h5

Uploaded to: s3://dlhub-anl/servables/7ce91677-9e99-43ad-a73e-09600f68ab2f


### Step 3. Construct a description file to initiate the ingestion

DLHub's ingestion pipeline requires a description file. The description file specifies details about the author and model. For example, it must specify the entry point to invoke the model and can optionally specify the input and output shapes.

Specify datacite information

In [4]:
datacite_def = {
      "title": servable_name,
      "creators":[
          "Chard, Ryan"
      ],
      "resourceType": "Dataset",
      "publicationYear": 2018,
      "publisher": "DLHub",
      "description": "A test Cifar10 model",
      "associatedPublications":[
         ""
      ],
      "license":"https://www.gnu.org/licenses/gpl-3.0.en.html"
}

Specify servable information

In [5]:
servable_def = {
      "name": servable_name,
      "location": staged_location,
      "type": "model",
      "model_type": "CNN",
      "ml_model": "keras",
      "language": "python",
      "run":{
         "handler": "application.run",
         "input":{
            "shape": "(, 32, 32, 3)",
            "description": "List of cifar images",
            "type": "list"
         },
         "output":{
            "shape": "(, 10)",
            "description": "List of dictionaries with most likely class first for each image",
            "type": "classification"
         }
      }
}

Append some metadata

In [6]:
meta_def = {
      "version": "0.1",
      "domain": "image recognition",
      "tags": ['test', 'cifar10'],
      "visible_to": "public"
}

Create the entire description document.

In [7]:
definition = {
   "datacite": datacite_def,
   "dlhub": meta_def,
   "servable": servable_def
}


### Step 4. Ingest Metadata to DLHub Servables

Send the definition file to DLHub to initiate the ingestion.

In [8]:
result = ingest_metadata(definition)
print(result)

Running Ingestion to DLHub Servables
{'status': 'RUNNING', 'task_id': 'c001bc01-1c0a-4433-8564-54a6119daba2'}


### Step 5. Check the status of the ingestion

Query DLHub's status endpoint to check the task status

In [9]:
for x in range(0,100):
    res = check_status(result['task_id'])
    print(res['status'])
    if res['status'] == "SUCCEEDED":
        break
    time.sleep(10)

RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
RUNNING
SUCCEEDED


### Step 6. Check the new servable is there

List models available in DLHub and ensure your model is there.

In [10]:
dl = client.DLHub()
servs = dl.get_servables()
servs[['uuid', 'name']]

Unnamed: 0,uuid,name
0,1117ac20-3f54-11e8-b467-0ed5f89f718b,oqmd_model
1,9ff7a98c-3f54-11e8-b467-0ed5f89f718b,matminer_featurize
2,d5a1653c-3ec5-4947-8c5a-28f6554ec339,matminer_util
3,9553d6a2-6a8d-4cda-8b81-7f38efab67e7,formation_energy
4,78d08664-5d52-44a0-b2c8-47cf702b2e39,DLSCORE
5,8c78939e-6422-4627-80ea-03ed8bfdf6ea,metallic_glass
6,e127fb16-5852-11e8-9c2d-fa7ae01bbebc,yager_xrd_classifier
7,1f6376f8-8b02-4424-a3db-f8f99323c17f,DSIR
8,5f01fd6e-f84d-4ee5-b25e-c2699a06a6e1,deep_smiles
9,1f63953e-75c7-4559-ba55-4acd3cff4f94,noop


### Ingest Metadata to Globus Search

In [11]:
#ingest_to_search(model_name, meta_path, idx=index)