
********************************************************************** 
  Note: This license has also been called the "New BSD License" or 
  "Modified BSD License". See also the 2-clause BSD License.
 
  Copyright © 2018-2019 - General Electric Company, All Rights Reserved
  
  Project: ANSWER, developed with the support of the Defense Advanced 
  Research Projects Agency (DARPA) under Agreement  No.  HR00111990006. 
 
  Redistribution and use in source and binary forms, with or without 
  modification, are permitted provided that the following conditions are met:
  1. Redistributions of source code must retain the above copyright notice, 
     this list of conditions and the following disclaimer.
 
  2. Redistributions in binary form must reproduce the above copyright notice, 
     this list of conditions and the following disclaimer in the documentation 
     and/or other materials provided with the distribution.
 
  3. Neither the name of the copyright holder nor the names of its 
     contributors may be used to endorse or promote products derived 
     from this software without specific prior written permission.
 
  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE 
  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 
  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 
  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF 
  THE POSSIBILITY OF SUCH DAMAGE.

 ***********************************************************************


# Demonstration of K-CHAIN Services

This notebook demonstrates the different functionality and capabilities of the Knowledege-consistent hybrid AI networks (K-CHAIN). Specifically, the demonstration will cover the following aspects:

-  Setup of K-CHAIN
-  Building models:
    -  with physics models with simple equations provided as strings
    -  with physics models captured in TF-compatible python code, where the code itself was derived from extracted text
    -  with experimental data
    -  with default values of certain inputs 
-  Evaluating models during inference:
    -  agnostic to whether model was built as data-driven or physics-based
    -  with ability to use default values (if not provided at inference time) and inform user about all default values used in the computation 
    -  with ability to inform user that key variables are missing to conduct inference



## Setup of Services

In [1]:
#imports needed for demostration

#for communicating with services
import requests

This code demonstrates the use of K-CHAIN service. Please use "Launch K-CHAIN Service" Notebook for launching this service before proceeding to the following demonstrations. The code below assumes that service has been launched. 

The "Launch K-CHAIN Service" Notebook is available [here](Launch%20K-CHAIN%20Service.ipynb).

In [2]:
#URL to interact with build
url_build = 'http://localhost:12345/darpa/aske/kchain/build'

#URL to interact with evaluate service
url_evaluate = 'http://localhost:12345/darpa/aske/kchain/evaluate'

## Model build demonstrations

### Build physics models with simple equations provided as strings:

In [3]:
#inputPacket like this one is programmatically constructed by the ANSWER agent
inputPacket = {
                  "inputVariables": [
                    {
                      "name": "Mass",
                      "type": "double"
                    },
                    {
                      "name": "Acceleration",
                      "type": "double"
                    }
                  ],
                  "outputVariables": [
                    {
                      "name": "Force",
                      "type": "double"
                    }
                  ],
                   "equationModel" : "Force = Mass * Acceleration",
                   "modelName" : "Newtons2LawModel"
                 }

#send request to build model
r = requests.post(url_build, json=inputPacket)

#see the response
r.json()

{'metagraphLocation': '../models/Newtons2LawModel',
 'modelType': 'Physics',
 'trainedState': 0}

__Explanation of ideal outcome:__ 

```
{'metagraphLocation': '../models/Newtons2LawModel',
 'modelType': 'Physics',
 'trainedState': 0}
```

TensorFlow models are locally stored as a MetaGraph which includes the computational graph object, model parameters, and data associated with training that model (see [TensorFlow documentation](https://www.tensorflow.org/guide/saved_model#save_and_restore_models)). This model is saved in a folder called "models". In this case, the model was of type physics. Demo 3 below shows a case with data-driven model. This model has not been trained as there are no parameters. This service does not yet support physics model with trainable parameters, it will be available in a future release.  

The computational graph capturing this _Newtons2LawModel_ can be seen in __TensorBoard__. The model is as follows:
    <img src="figures for notebook/n2l_tensorboard_pic.PNG" style="width: 80%">

Note:
One can open TensorBoard by typing the following in cmd prompt from _kchain_ folder:
```
tensorboard --logdir="log/example"
```
The resulting computational graph is available under Graph tab of TensorBoard or by going to http://localhost:6006/#graphs in your browser after running TensorBoard.

### Build physics models captured in TF-compatible python code, where the code itself was derived from extracted text :

The text2triples service extracts concepts and equations from text/HTML documents, such as this [Speed of Sound page](https://www.grc.nasa.gov/WWW/BGH/sound.html) (from NASA Hypersonics Index). The extracted equations are sent to text2code service to convert the equation to equivalent python native and TensorFlow eager-compatible code. For example:

The equation extracted from text is as follows:

> a^2 = R * T * {1 + (gamma - 1) / ( 1 + (gamma-1) * [(theta/T)^2 * e^(theta/T) /(e^(theta/T) -1)^2]) }.

The response from text2code service for this text equation is as follows:

```css
a = tf.math.pow( R * T * (1 + (gamma-1)/(1 + (gamma-1) * ((theta/T) ** 2 * tf.math.exp(theta/T) / (tf.math.exp(theta/T) - 1) ** 2))), 1/2)
```

K-CHAIN service builds a computational graph from this python Tensorflow-compatible code and then it is saved as a MetaGraph for further edits and inference later. The computational graph construction uses AutoGraph ([documentation](https://www.tensorflow.org/guide/autograph) and [research article](https://arxiv.org/abs/1810.08061)), which allows for codes that include conditional statements and loops too. In this demo, the response from text2code service is used to create a computational graph. 

Note that the service allows to provide _default values_ for certain input variables, so that they can be used during inference vene if they are not assigned by the user. Examples include value for gas constant of air _R = 286.0_ (in SI units).


In [4]:
inputPacket = {
                  "inputVariables": [
                    {
                        "name": "gamma",
                        "type": "double",
                        "value": "1.4"
                    },
                    {
                        "name": "R",
                        "type": "double",
                        "value": "286.0"
                    },
                    {
                        "name": "theta",
                        "type": "double",
                        "value": "3056.0"
                    },
                    {
                      "name": "T",
                      "type": "double"
                    }
                  ],
                  "outputVariables": [
                    {
                      "name": "a",
                      "type": "double"
                    }
                  ],
                   "equationModel" : "a = tf.math.pow( R * T *  (  1 + ( gamma-1 ) / ( 1 + ( gamma-1 ) *  (  ( theta/T ) ** 2 *  tf.math.exp( theta/T ) / (  tf.math.exp( theta/T ) - 1 ) ** 2 )  ) ) , 1/2)",
                   "modelName" : "SpeedOfSound"
                 }
r = requests.post(url_build, json=inputPacket)
r.json()

{'metagraphLocation': '../models/SpeedOfSound',
 'modelType': 'Physics',
 'trainedState': 0}

The computational graph capturing this _SpeedOfSound_ model can be seen in __TensorBoard__. The model is as follows:
    <img src="figures for notebook/sos_tensorboard_pic.PNG" style="width: 80%">

### Building models with experimental data 

If a dataset has the values recorded for input and output variables for a model, then even if a relationship of those variables has not yet been extracted by the ANSWER agent, a data-driven model can be created to capture the relationship and perform inference. In this demo, a neural network model relating the inputs to output is constructed and trained with the dataset. Internally, \__createNNModel()_ and _fitModel()_ methods are being used. Note that _fitModel()_ can also be used to update an existing model as more data becomes available for training.      

In [5]:
#inputPacket like this one are programmatically constructed by the ANSWER agent
inputPacket = {
                "dataLocation": "../Datasets/Force_dataset.csv",
                "inputVariables": [
                    {
                      "name": "Mass",
                      "type": "double"
                    },
                    {
                      "name": "Acceleration",
                      "type": "double"
                    }
                ],
                "modelName": "ForceModel",
                "outputVariables": [
                {
                  "name": "Force",
                  "type": "double"
                }
              ]
            }

#send request to build model
r = requests.post(url_build, json=inputPacket)

#see the response
r.json()

{'metagraphLocation': '../models/ForceModel',
 'modelType': 'NN',
 'trainedState': 1}

__Explaining ideal output:__

```
{'metagraphLocation': '../models/ForceModel',
'modelType': 'NN',
'trainedState': 1}
```

The MetaGraph of TensorFlow computational graph is stored in _models_ folder with name _ForceModel_. The model is of type Neural Network, so eventually ANSWER agent can look for information sources to convert to more exact form of knowledge with physics equations and then data is used for validation. The trainedState is 1 as data-driven model has been fitted to the training dataset. 

The computational graph capturing this _ForceModel_ can be seen in __TensorBoard__. The model is as follows:
    <img src="figures for notebook/forcenn_tensorboard_pic.PNG" style="width: 80%">

The depicted graph is a hierarchical model, where _NN_ can be expanded to show the computations as follows:
    <img src="figures for notebook/forcenn2_tensorboard_pic.PNG" style="width: 80%">



## Model evaluate demonstrations

### Evaluate a physics model where all relevant inputs are provided

In [6]:
evalPacket = {
  "inputVariables": [
    {
      "name": "T",
      "type": "double",
      "value": "300.0"
    },
    {
      "name": "R",
      "type": "double",
      "value": "286.0"
    },
    {
      "name": "gamma",
      "type": "double",
      "value": "1.4"
    },
    {
      "name": "theta",
      "type": "double",
      "value": "3056.0"
    }
  ],
  "modelName": "SpeedOfSound",
  "outputVariables": [
    {
      "name": "a",
      "type": "double"
    }
  ]
}
r = requests.post(url_evaluate, json=evalPacket)
r.json()

{'outputVariables': [{'name': 'a',
   'type': 'double',
   'value': '[346.50601552]'}]}

__Explaining ideal output:__

```
{'outputVariables': [{'name': 'a',
   'type': 'double',
   'value': '[346.50601552]'}]}
```
The provided input values were used with speedOfSound model built above to compute the output. 

### Evaluate a physics model with multiple input values for a variable, but where values for all inputs are not provided 

In order to provide multiple values for an input in the value field as a string, one can use array notation, such as "[300.00, 273.00]", or just separate multiple values by a comma as "300.00, 273.00".

In [7]:
evalPacket = {
  "inputVariables": [
    {
      "name": "T",
      "type": "double",
      "value": "[300.00, 273.00]"
    }
  ],
  "modelName": "SpeedOfSound",
  "outputVariables": [
    {
      "name": "a",
      "type": "double"
    }
  ]
}
r = requests.post(url_evaluate, json=evalPacket)
r.json()

{'defaultsUsed': [{'name': 'R', 'value': '286.0'},
  {'name': 'gamma', 'value': '1.4'},
  {'name': 'theta', 'value': '3056.0'}],
 'outputVariables': [{'name': 'a',
   'type': 'double',
   'value': '[346.50601552,330.58687602]'}]}

__Explaining ideal output:__
```
{'defaultsUsed': [{'name': 'R', 'value': '286.0'},
  {'name': 'theta', 'value': '3056.0'},
  {'name': 'gamma', 'value': '1.4'}],
 'outputVariables': [{'name': 'a',
   'type': 'double',
   'value': '[346.50601552,330.58687602]'}]}
```
Since computation of speed of sound needs the value of _R_, _gamma_, and _theta_ and those values were not provided during the query, it uses default values if they were provided during model build. If default values are used, then it informs the CurationManager and hence the user, so that assumption that default values are applicable in this computation are made explicit to the user.    


### Validating Input Packets for Correctness

While some inputs are validated within the codes in the service, the input packet is validated by Swagger at the REST endpoint before calling method from the package. If _value_ or _name_ field of an input variable is missing or if the field _inputVariables_, _outputVariables_, or _modelName_ is missing, then it leads to ambiguity and key information to conduct inference is missing. Thus, a validationError is caught by the packet validation, which checks for all required entries.   

In [8]:
evalPacket = {
  "inputVariables": [
    {
      "name": "T",
      "type": "double",
      "value": "300.00"
    },
    {
      "name": "R",
      "type": "double"
    }
  ],
  "modelName": "SpeedOfSound",
  "outputVariables": [
    {
      "name": "a",
      "type": "double"
    }
  ]
}
r = requests.post(url_evaluate, json=evalPacket)
r.json()

{'detail': "'value' is a required property",
 'status': 400,
 'title': 'Bad Request',
 'type': 'about:blank'}

Try replacing current entry of:
```
    {
      "name": "R",
      "type": "double"
    }
```
with additional field of _value_: 
```
    {
      "name": "R",
      "type": "double",
      "value": 276.0
    }
```
However, then we should get following response as _value_ is expected to be a string:
```
{'detail': "276.0 is not of type 'string'",
 'status': 400,
 'title': 'Bad Request',
 'type': 'about:blank'}
```

Finally, if you replace the entry with:
```
    {
      "name": "R",
      "type": "double",
      "value": "276.0"
    }
```
We should get the desired response:
```
{'defaultsUsed': [{'name': 'gamma', 'value': '1.4'},
  {'name': 'theta', 'value': '3056.0'}],
 'outputVariables': [{'name': 'a',
   'type': 'double',
   'value': '[340.39431879]'}]}
```
where, default values of _gamma_ and _theta_ were used in the computation with the provided value of _R_ (276.0) in lieu of the default value 286.0

### Try to evaluate a physics model with no input values provided

In [9]:
evalPacket = {
  "inputVariables": [
  ],
  "modelName": "SpeedOfSound",
  "outputVariables": [
    {
      "name": "a",
      "type": "double"
    }
  ]
}
r = requests.post(url_evaluate, json=evalPacket)
r.json()

{'defaultsUsed': [{'name': 'R', 'value': '286.0'}],
 'missingVar': 'T',
 'outputVariables': [{'name': 'a', 'type': 'double', 'value': None}]}

__Explanation for ideal output:__
```
{'missingVar': 'T',
 'outputVariables': [{'name': 'a', 'type': 'double', 'value': None}]}
 ```
 
 Here the computation for output _a_ (speed of sound) cannot proceed without the value of _T_ (temperature of gas) as a default value has not been provided for this input. Thus, the service returns that variable _T_ is missing and output of current computation is None.

### Evaluate with a data-driven model where all relevant inputs are provided

In [10]:
evalPacket = {
  "inputVariables": [
    {
      "name": "Mass",
      "type": "double",
      "value": "[2.0]"
    },
    {
      "name": "Acceleration",
      "type": "double",
      "value": "[0.1]"
    }
  ],
  "modelName": "ForceModel",
  "outputVariables": [
    {
      "name": "Force",
      "type": "double"
    }
  ]
}
r = requests.post(url_evaluate, json=evalPacket)
r.json()

{'outputVariables': [{'name': 'Force',
   'type': 'double',
   'value': '[0.2649181]'}]}

For _Mass = 2.0_ and _Acceleration = 0.1_, a simple data-driven model trained with 100 examples when evaluated with the model-agnostic evaluate service gives the following output:
```
{'outputVariables': [{'name': 'Force',
   'type': 'double',
   'value': '[0.26843786]'}]}
```
Here, the output Force of type double is estimated to be 0.268. The estimate that you see might be slightly different based on model training.

However, if _inputVariables_ are assigned values away from the training set, such as, _Mass = 0.5_ and _Acceleration = 0.1_, then the output is incorrect by an order of magnitude. For example:
```
{'outputVariables': [{'name': 'Force',
   'type': 'double',
   'value': '[0.6278448]'}]}
```
The estimate that you see might be different based on model training.

Thus, in future for each data-driven model fitted to a training dataset, we will also characterize the region of trust of that model and alert the user if a query tries to exercise the model beyond its region of competence based on training data, model structure, and output uncertainty (after we implement models with TensorFlow Probability). 