# DLHub

DLHub is a self-service platform for publishing, applying, and creating new ML/DL models. DLHub will provide: 1) publication capabilities to make models more discoverable, citable, and reusable; 2) the ability to easily run or test existing models; and 3) links to the data and computing infrastructure to re-train models for new applications. Users will benefit from DLHub in many ways. Data scientists can publish their models (i.e., architectures and weights) and methods. Materials scientists can apply existing models to new data with ease (e.g., by querying a prediction API for a deployed mode) and create new models with state-of-the-art techniques. Together, these capabilities will lower barriers to employing ML/DL, making it easier for researchers to benefit from advances in ML/DL technologies. 

## This notebook
This notebook showcases two DLHub use cases, the use of the OQDM pipeline and the use of a BNL model. The OQMD pipeline extracts data from the Materials Data Facility before passing it through three servables on DLHub to transform the input data and perform a prediction. The BNL model uses Globus Auth to pull data from Petrel and generate tags for the data.

## OQMD


Predict XXX with the OQMD pipeline -- this uses a util and featurize transformation before performing the prediction. Data is pulled from the Materials Data Facility and piped through three DLHub servables.


Unnamed: 0,author,created_at,description,ecr_url,id,input,name,notes,output,servable,status,uuid
0,1,2018-05-15 15:16:40.865516,Do something with a util,039706667969.dkr.ecr.us-east-1.amazonaws.com/d...,4,[],oqmd_model,,[],pool_oqmd_model,READY,1117ac20-3f54-11e8-b467-0ed5f89f718b
1,1,2018-05-15 15:16:40.801222,Featurize some data,039706667969.dkr.ecr.us-east-1.amazonaws.com/d...,2,[],matminer_featurize,,[],pool_matminer_featurize,READY,9ff7a98c-3f54-11e8-b467-0ed5f89f718b
2,1,2018-05-15 15:16:41.719300,BNL Resnet,039706667969.dkr.ecr.us-east-1.amazonaws.com/d...,5,[],resnet,,[],pool_resnet,READY,e127fb16-5852-11e8-9c2d-fa7ae01bbebc
3,1,2018-05-15 15:16:40.843431,Do something with a util,039706667969.dkr.ecr.us-east-1.amazonaws.com/d...,3,[],matminer_util,,[],pool_matminer_util,READY,d5a1653c-3ec5-4947-8c5a-28f6554ec339
4,1,2018-06-05 20:06:40.608302,Formation energy,,7,[],formation_energy,,[],pool_formation_energy,READY,9553d6a2-6a8d-4cda-8b81-7f38efab67e7


In [15]:
class DLHub():
    service = "http://35.168.128.54:5000/api/v1"
    
    def __init__(self):
        pass
    
    def get_servables(self):
        r = requests.get("{service}/servables".format(service=self.service))
        return pd.DataFrame(r.json())
    
    def get_id_by_name(self, name):
        r = requests.get("{service}/servables".format(service=self.service))
        df_tmp =  pd.DataFrame(r.json())
        serv = df_tmp[df_tmp.name==name]
        return serv.iloc[0]['uuid']
    
    def infer(self, servable_id, data):
        servable_path = '{service}/servables/{servable_id}/run'.format(service=self.service,
                                                                       servable_id=servable_id)
        payload = {"data":data}

        r = requests.post(servable_path, json=data)
        return pd.DataFrame(r.json())


dl = DLHub()

In [16]:
import sys
import os
import json
import requests
import matplotlib
import globus_sdk
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mdf_forge.forge import Forge

%matplotlib inline
mdf = Forge()

dlhub_service = "http://35.168.128.54:5000/api/v1"

In [17]:
from mdf_toolbox import login

In [18]:
login(services=["petrel"])

{'petrel': <globus_sdk.authorizers.refresh_token.RefreshTokenAuthorizer at 0x107272668>}

# Get data from MDF

Search MDF for compositions containing Al and Cu that can be used with the OQMD model.

In [19]:
results = mdf.search_by_elements(elements=["Al","Cu"], 
                                 source_names=["oqmd"], 
                                 limit=50)
compositions = []
for res in results:
    compositions.append({"composition": res['material']['composition']})
df = pd.DataFrame(compositions)
df.head()

Unnamed: 0,composition
0,Al1Cu4
1,Al1Cu1
2,Al1Cu4
3,Al2Cu1
4,Al2Cu1


## Query DLHub to find models and transformation servables

In [20]:
df_serv = dl.get_servables()
df_serv[['uuid','name']]

Unnamed: 0,uuid,name
0,1117ac20-3f54-11e8-b467-0ed5f89f718b,oqmd_model
1,9ff7a98c-3f54-11e8-b467-0ed5f89f718b,matminer_featurize
2,e127fb16-5852-11e8-9c2d-fa7ae01bbebc,resnet
3,d5a1653c-3ec5-4947-8c5a-28f6554ec339,matminer_util
4,9553d6a2-6a8d-4cda-8b81-7f38efab67e7,formation_energy


In [23]:
servable_name = "matminer_util"
servable_id = dl.get_id_by_name(servable_name)

comps = []
for i, row in df.iterrows():
    data = {"data":[row.to_dict()]}
    res = dl.infer(servable_id, data)
    comps.append(res['composition_object'])
df['composition_object'] = comps


## Invoke the Matminer util servable to transform the compositions

In [6]:
servable_name = "matminer_util"
servable_id = df_serv[df_serv.name==servable_name]
composition_objects = []
for i, row in df.iterrows():
    util_path = '{service}/servables/{servable_id}/run'.format(service=dlhub_service,
                                                               servable_id=df_servables.loc[servable_name]['UUID'])
    payload = {"data":[row.to_dict()]}

    r = requests.post(util_path, json=payload)
    lst_res = json.loads(r.text)
    res = lst_res[0]
    composition_objects.append(res['composition_object'])
df['composition_object'] = composition_objects
df.head()

Unnamed: 0,composition,composition_object
0,Al1Cu4,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...
1,Al1Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...
2,Al1Cu4,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...
3,Al2Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...
4,Al2Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...


## Use the featurizer to transform the compositions

In [7]:
servable_name = "matminer_featurize"
features = [] 

for i, row in df.iterrows():
    featurize_path = '{service}/servables/{servable_id}/run'.format(service=dlhub_service,
                                             servable_id=df_servables.loc[servable_name]['UUID'])
    payload = {"data":[{"composition_object": row['composition_object']}]}
    #payload = {"data":[{"composition_object": comp['composition_object']}]}
    r = requests.post(featurize_path, json=payload)
    lst_res = json.loads(r.text)
    if isinstance(lst_res, str):
        lst_res = json.loads(lst_res)
    res = lst_res[0]
    #comp.update({"features": res['features']})
    features.append(res['features'])
df['features'] = features
df.head()

Unnamed: 0,composition,composition_object,features
0,Al1Cu4,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 25.8, 5.119999999999999, 29..."
1,Al1Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 21.0, 8.0, 13.0, 64.0, 73.0..."
2,Al1Cu4,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 25.8, 5.119999999999999, 29..."
3,Al2Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 18.333333333333332, 7.11111..."
4,Al2Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 18.333333333333332, 7.11111..."


## Perform a prediction against the featurized values

In [8]:
servable_name = "oqmd_model"
predictions = []
for i, row in df.iterrows():
    predict_path = '{service}/servables/{servable_id}/run'.format(service=dlhub_service,
                                             servable_id=df_servables.loc[servable_name]['UUID'])
    payload = {"data":[{"features": row['features']}]}
    r = requests.post(predict_path, json=payload)
    lst_res = json.loads(r.text)
    if isinstance(lst_res, str):
        lst_res = json.loads(lst_res)
    res = lst_res[0]
    predictions.append(res['prediction'])
df['prediction'] = predictions
df.head()

Unnamed: 0,composition,composition_object,features,prediction
0,Al1Cu4,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 25.8, 5.119999999999999, 29...",-0.183846
1,Al1Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 21.0, 8.0, 13.0, 64.0, 73.0...",-0.179726
2,Al1Cu4,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 25.8, 5.119999999999999, 29...",-0.183846
3,Al2Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 18.333333333333332, 7.11111...",-0.221089
4,Al2Cu1,gANjcHltYXRnZW4uY29yZS5jb21wb3NpdGlvbgpDb21wb3...,"[13.0, 29.0, 16.0, 18.333333333333332, 7.11111...",-0.221089


## Visualize the result

In [99]:
# TODO -- I think Logan had a way to see what this actually meant?

# BNL model

Use Globus Auth to get a token to pull data from Petrel then pass it through the BNL model.

### Set up globus

In [12]:
CLIENT_ID = 'd9366faf-42a5-4840-b3f5-95711f64bf36'
client = globus_sdk.NativeAppAuthClient(CLIENT_ID)
scopes = ['']
client.oauth2_start_flow(requested_scopes='https://auth.globus.org/scopes/56ceac29-e98a-440a-a594-b41e7a084b62/all')

authorize_url = client.oauth2_get_authorize_url()
print('Please go to this URL and login: {0}'.format(authorize_url))

get_input = getattr(__builtins__, 'raw_input', input)
auth_code = get_input(
    'Please enter the code you get after login here: ').strip()
token_response = client.oauth2_exchange_code_for_tokens(auth_code)

Please go to this URL and login: https://auth.globus.org/v2/oauth2/authorize?client_id=d9366faf-42a5-4840-b3f5-95711f64bf36&redirect_uri=https%3A%2F%2Fauth.globus.org%2Fv2%2Fweb%2Fauth-code&scope=https%3A%2F%2Fauth.globus.org%2Fscopes%2F56ceac29-e98a-440a-a594-b41e7a084b62%2Fall&state=_default&response_type=code&code_challenge=D33qYfCVUg0uD8FVd5QDQVBS9A8YgG6KJtAWXCSZJic&code_challenge_method=S256&access_type=online
Please enter the code you get after login here: 7k1jLIqPJbKW2kTsAxKrEZr9Uwc1tx


In [13]:
print (token_response)
token = token_response.by_resource_server['petrel_https_server']['access_token']

{
  "petrel_https_server": {
    "scope": "https://auth.globus.org/scopes/56ceac29-e98a-440a-a594-b41e7a084b62/all",
    "access_token": "AgVWyzd4oz0B1ooWbK150WomWMedMeEVkzkgwKEJP1NJ8vM4rrUbCykGlgkD12aWzvJNNn0oPnqEpGUlEVmnlsVdY3",
    "refresh_token": null,
    "token_type": "Bearer",
    "expires_at_seconds": 1528400641,
    "resource_server": "petrel_https_server"
  }
}


## Get a list of data from Petrel

In [11]:
# TODO -- put an ls here

## Generate a list of data to invoke against the model

In [14]:
file_start = "https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/Yager/model/raw/01470b9d_varied_sm/0000000"
file_end = ".mat"

data = []
for i in range (5, 10):
    filename = file_start + str(i) + file_end
    data.append({"file": filename, "token": token})
    print (filename)

https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/Yager/model/raw/01470b9d_varied_sm/00000005.mat
https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/Yager/model/raw/01470b9d_varied_sm/00000006.mat
https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/Yager/model/raw/01470b9d_varied_sm/00000007.mat
https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/Yager/model/raw/01470b9d_varied_sm/00000008.mat
https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e.globus.org/Yager/model/raw/01470b9d_varied_sm/00000009.mat


## Find the BNL model uuid from DLHub

In [15]:
r = requests.get("http://35.168.128.54:5000/api/v1/servables")
r = json.loads(r.text)
servables = {}
for s in r:
    servables[s['name']] = s['uuid']

print ("resnet: " + servables['resnet'])

resnet: e127fb16-5852-11e8-9c2d-fa7ae01bbebc


## Invoke the BNL model

In [21]:
for d in data:
    bnl_path = 'http://35.168.128.54:5000/api/v1/servables/%s/run' % servables['resnet']
    payload = {"data":[d]}
    r = requests.post(bnl_path, json=payload)
    lst_res = json.loads(r.text)
    if isinstance(lst_res, str):
        lst_res = json.loads(lst_res)
    res = lst_res[0]
    d.update({"prediction": res})

    #print (d['file'] + ": " + str(d['result']))

In [22]:
df_bnl = pd.DataFrame(data)
df_bnl.head()

Unnamed: 0,file,prediction,result,token
0,https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e...,"[[1.773031362972688e-05, 0.9997794032096863, 0...","[[1.773031362972688e-05, 0.9997794032096863, 0...",AgVWyzd4oz0B1ooWbK150WomWMedMeEVkzkgwKEJP1NJ8v...
1,https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e...,"[[1.7599653801880777e-05, 0.999782383441925, 0...","[[1.7599653801880777e-05, 0.999782383441925, 0...",AgVWyzd4oz0B1ooWbK150WomWMedMeEVkzkgwKEJP1NJ8v...
2,https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e...,"[[1.7730635590851307e-05, 0.9997794032096863, ...","[[1.7730635590851307e-05, 0.9997794032096863, ...",AgVWyzd4oz0B1ooWbK150WomWMedMeEVkzkgwKEJP1NJ8v...
3,https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e...,"[[1.773055009834934e-05, 0.9997794032096863, 0...","[[1.773055009834934e-05, 0.9997794032096863, 0...",AgVWyzd4oz0B1ooWbK150WomWMedMeEVkzkgwKEJP1NJ8v...
4,https://e38ee745-6d04-11e5-ba46-22000b92c6ec.e...,"[[1.7716352886054665e-05, 0.9997794032096863, ...","[[1.7716352886054665e-05, 0.9997794032096863, ...",AgVWyzd4oz0B1ooWbK150WomWMedMeEVkzkgwKEJP1NJ8v...


In [23]:
df_bnl.iloc[0]['prediction']

[[1.773031362972688e-05,
  0.9997794032096863,
  0.0009798369137570262,
  2.1729734331343842e-11,
  7.254519901467305e-11,
  2.370756737946067e-05,
  0.7460677027702332,
  0.08216768503189087,
  0.012182140722870827,
  0.0019631613977253437,
  4.0085879504658806e-07,
  0.0004289932840038091,
  0.027924219146370888,
  0.3651340901851654,
  5.8086174249183387e-05,
  0.008119652979075909,
  0.0017521473346278071]]