### FACILE Inference
The following notebook connects to GCP server and perform inference on the input files with the existing weight file in the server. The two main repositories that this code is based on are
1. This file is a client-side of a slightly modified (hard code argument rather than from command lines) [gcp-facile client](https://github.com/cha-suaysom/gcp-facile/blob/master/client.py) 
2. The server side is written in [gcp-facile server](https://github.com/cha-suaysom/gcp-facile/blob/master/server.py) and is hosted on the Kubernetes services in HarrisGroup GCP.
3. The algorithm is from Jeff's [FACILE](https://github.com/JackDinsmore/FACILE/blob/master/train-models.py)

The file `weights.h5` is obtained from training `X_HB,Y_HB` that can be obtained from this [drive](https://drive.google.com/drive/folders/0AIBNryPDLt0QUk9PVA). We ran `train_model.py` using model `Model4ExpLLLow` on these data to obtain the weight file which is used for the inference below.

### Input for the client
1. The file `input/X_HB.pkl` downloaded from the drive above.
2. Proto file `server_tools.proto` has to be in the same directory and compiled with `python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. server-tools.proto`


### Necessary imports

In [24]:
import logging
import grpc
import time
import numpy as np
import pandas as pd
import server_tools_pb2
import server_tools_pb2_grpc
from google.protobuf import empty_pb2

### Server Parameter
The following is port and IP address for the Kubernetes service.
Please note that this may change when the server is restart or removed and please refer to the services in GCP in `HarrisGroup` for latest update. For this particular notebook the server is hosted [here](https://console.cloud.google.com/kubernetes/service/us-west1-a/cha-facile/default/inference-facile-service?project=harrisgroup-223921&cloudshell=false&tab=overview&duration=PT1H&pod_summary_list_tablesize=20)

In [36]:
PORT = '50051'
IP = "34.82.75.201" 

### Main Function
This function connects to the server with the specified port/IP and send the pandas loaded from `input/X_HB.pkl` to the server with the existing weight file `weights.h5`. It receives prediction value from the server and time that the prediction uses. `Nrhs` is the number of data points sent to the server for prediction.

In [26]:
def run_facile(host_IP,Nrhs):
    # Get a handle to the server
    channel = grpc.insecure_channel(host_IP + ':' + PORT)
    stub = server_tools_pb2_grpc.MnistServerStub(channel)

    # Get a client ID which you need to talk to the server
    try:
        logging.info("Request client id from server")
        response = stub.RequestClientID(server_tools_pb2.NullParam())
    except BaseException:
        print(
             """Connection to the server could not be established.
             Press enter to try again.""")
        return
    client_id = response.new_id
    logging.info("Client id is " + str(client_id))

    X = pd.read_pickle("input/X_HB.pkl")[:Nrhs]
    Y = pd.read_pickle("input/Y_HB.pkl")[:Nrhs]

    x_input = X.to_json().encode('utf-8')
    y_input = Y.to_json().encode('utf-8')

    # Send the data to the server and receive an answer
    start_time = time.time()
    logging.info("Submitting images and waiting")
    data_message = server_tools_pb2.DataMessage(
        client_id=client_id, x_input=x_input, y_input=y_input, batch_size=32)
    logging.info("Finish defining data message")
    response = stub.StartJobWait(data_message, 100, [])
    logging.info("Finish responding")

    # Print output
    whole_time = time.time() - start_time
    print("Total time:", whole_time)
    print("Predict time:", response.infer_time)
    print("Fraction of time spent not predicting:",
          (1 - response.infer_time / whole_time) * 100, '%')
    
    # Sample of the inference value to see if it makes sense
    print(np.frombuffer(response.prediction,dtype = np.float32)[:10])
    channel.close()

### Sample connections to the server based on different input size

For details on what happens at the server side. Please refer to the [Container Logs](https://console.cloud.google.com/logs/viewer?interval=NO_LIMIT&project=harrisgroup-223921&cloudshell=false&minLogLevel=0&expandAll=false&timestamp=2019-10-16T22:41:41.858000000Z&customFacets=&limitCustomFacetWidth=true&advancedFilter=resource.type%3D%22container%22%0Aresource.labels.cluster_name%3D%22cha-facile%22%0Aresource.labels.namespace_id%3D%22default%22%0Aresource.labels.project_id%3D%22harrisgroup-223921%22%0Aresource.labels.zone:%22us-west1-a%22%0Aresource.labels.container_name%3D%22attempt-deploy-sha256%22%0Aresource.labels.pod_id:%22infer-facile-%22&scrollTimestamp=2019-10-16T21:33:19.569536608Z)
I observe the following
1. The "total time" is quite random. Sometimes it's just my internet connection and how much RAM I'm using at that moment. Preprocessing time on the server-side may be unpredictable as well.
2. On the other hand, the "Predict Time" (time spent in Keras predict function) has quite a clear increasing pattern as a function of number of input rows. There may be other interesting measurable time usage that may be helpful.
3. The prediction for the first 10 data point is quite consistent. However, I don't know if this is accurate (will discuss with FACILE folks to see if there is a good way to check this)

In [27]:
run_facile(IP,500)

Total time: 1.8064167499542236
Predict time: 0.14994049072265625
Fraction of time spent not predicting: 91.6995626437556 %
[ 0.13345513  0.60735834 32.12213     0.8409045   4.3819866   3.9754093
  0.12074747 23.26381     2.8654158   0.13345513]


In [28]:
run_facile(IP,1000)

Total time: 1.3543760776519775
Predict time: 0.1409001350402832
Fraction of time spent not predicting: 89.59667574130845 %
[ 0.13345513  0.4460571  34.881256    0.60553753  4.488502    4.2128572
  0.13345513 25.097042    3.0467412   0.13345513]


In [29]:
run_facile(IP,2000)

Total time: 1.4381999969482422
Predict time: 0.17693853378295898
Fraction of time spent not predicting: 87.69722332370951 %
[ 0.16506827  0.88802063 30.557487    0.9523132   4.1341715   3.8466966
  0.18208459 22.052444    3.282159    0.13345513]


In [30]:
run_facile(IP,3000)

Total time: 5.983857870101929
Predict time: 0.2187633514404297
Fraction of time spent not predicting: 96.34410849673635 %
[ 0.13345513  0.74879324 31.342875    0.821262    4.1955395   3.913856
  0.13345513 22.566328    3.164562    0.13345513]


In [31]:
run_facile(IP,5000)

Total time: 21.985822916030884
Predict time: 0.273975133895874
Fraction of time spent not predicting: 98.75385545065905 %
[ 0.12786819  0.74560845 31.920801    0.83843553  4.2515993   3.9767528
  0.13296703 23.08878     3.2147532   0.13345513]


### Data size limitation
Lastly, there is a limit (somewhere around 7000 data points) and 10000 just exceeds the limit so the server throws an error (I'm working on preventing this). Anything significantly beyond this may need to be uploaded to the cloud and read from there instead.

In [34]:
run_facile(IP,10000)

_Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception calling application: Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder:0", shape=(12, 36), dtype=float32) is not an element of this graph."
	debug_error_string = "{"created":"@1571261670.010468200","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1036,"grpc_message":"Exception calling application: Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder:0", shape=(12, 36), dtype=float32) is not an element of this graph.","grpc_status":2}"
>