# Performance tuning


## Preparation


In [None]:
## just in case that you do not have the lesson materials
#!git clone https://github.com/ConcurDataScience/ConcurMLWorkshop

## Tasks

### 1. Start the service from within the container

In [1]:
!cp -r ../../05_Service_Building/full_service full_service

In [None]:
# build docker
!docker build ./full_service -t mlservice  

In [None]:
#start docker
!docker run -p 8080:8080 mlservice:latest

In [12]:
#docker: Error response ... Bind for 0.0.0.0:8080 failed: port is already allocated. --> there is already running docker image 

#check which process it using the port
!lsof -i :8080                                                                                                                                                                                                                     

In [None]:
# stop all running docker images
!docker kill $(docker ps -q)

In [23]:
# Check that service is running

import requests
import json

url = "http://127.0.0.1:8080/predict"

payload = json.dumps({
  "text": "That is a really a service, lets test it out!"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

{"prediction":"Positive","score":0.33751180768013}



### 2. Run the first load test

In [13]:
%%writefile locustfile.py
from locust import HttpUser, task, between
import json

class ModelServiceUser(HttpUser):
    @task
    def test_task(self):
        payload = json.dumps(
            {"text": "That is a really bad service, I hate it!"})
        headers = {'Content-Type': 'application/json'}

        self.client.post(url="/predict",
                         headers=headers,
                         data=payload,
                         )    


Overwriting locustfile.py


In [28]:
# start locust in the command line
!locust --host http://127.0.0.1:8080 --users 100 --spawn-rate 1

[2022-04-03 21:22:48,084] C02YN2ASJGH6/INFO/locust.main: Starting web interface at http://0.0.0.0:8089 (accepting connections from all network interfaces)
[2022-04-03 21:22:48,101] C02YN2ASJGH6/INFO/locust.main: Starting Locust 2.8.5
[2022-04-03 21:23:38,667] C02YN2ASJGH6/INFO/locust.runners: Ramping to 100 users at a rate of 1.00 per second
KeyboardInterrupt
2022-04-03T19:25:22Z
[2022-04-03 21:25:22,125] C02YN2ASJGH6/INFO/locust.main: Shutting down (exit code 0)
 Name  # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
--------------------------------------------------------------------------------
 POST /predict    8917     0(0.00%)  |     343       9    1419     310  |  114.00    0.00
--------------------------------------------------------------------------------
 Aggregated    8917     0(0.00%)  |     343       9    1419     310  |  114.00    0.00

Response time percentiles (approximated)
 Type     Name      50%    66%    75%    80%    90%    95%    98%  

### 3. Optimize logs 

1. Go to main.py and replace `logging.basicConfig(level=logging.INFO)` with `logging.basicConfig(filename='debug.log', level=logging.DEBUG, format=f'%(asctime)s %(levelname)s %(name)s %(threadName)s : %(message)s')`
2. Go to service -> endpoints
3. Add following imports 
   ```
   import time
   import logging
   ```
4. Wrap inference with timing info
   ```
   start = time.time()
   keras_predictions = model.predict(predict_texts_to_sequences)
   logging.info(f'Inference took: {time.time()-start}sec')
   ```


In [None]:
# build the docker
!docker build final_version/full_service -t mlservice

In [None]:
# run docker in interactive way so that we can check logs
docker run -p 8080:8080 mlservice:latest /bin/bash

In [24]:
# get container id
!docker ps

CONTAINER ID   IMAGE              COMMAND                  CREATED         STATUS         PORTS                    NAMES
fc5af661e280   mlservice:latest   "/bin/sh -c '/bin/baâ€¦"   2 minutes ago   Up 2 minutes   0.0.0.0:8080->8080/tcp   happy_varahamihira


In [None]:
# connect to running docker 
!docker exec -it <CONTAINER ID> /bin/bash

In [None]:
# Run following code in the docker to see current logs
tail -f debug.log

### 4. Swap to TF Serving



**!!! IMPORTANT !!!** 
1. Rename folder `model/v1` to `model/1`
2. Change path to model in main.py

Find in the Docker file following line:
```
RUN apt-get install -y curl python3 python3-pip
```
And add below that line following code for tf-serving setup (https://www.tensorflow.org/tfx/serving/setup)
```
# Install tf-serving
RUN echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
    curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
RUN  apt-get update && apt-get install -y tensorflow-model-server
```

In [27]:
#build docker
!docker build full_service -t mlservice

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.1s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 3.38kB                                     0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 34B                                           0.0s
[0m => [internal] load metadata for docker.io/library/ubuntu:20.04            0.0s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.2s (2/4)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 3.38kB                                     0.0s
[0m[34m => [internal] load .dockerignore                           

In [None]:
# start the image and connect to it
docker run -p 8080:8080 mlservice:latest &

Verify that TF Serving is correctly installed by:
1. Start the server `tensorflow_model_server --port=8500 --rest_api_port=8501 --model_base_path=/app/model/ --model_name=model`
2. Do inference request to server `curl --location --request POST 'http://localhost:8501/v1/models/model/versions/1:predict' --header 'Content-Type: application/json'  --data-raw '{"inputs": [[18.0,  64.0, 137.0, 163.0,   0.0]]}'`

In [None]:
# Add following method to service->endpoints
import requests
import json
def _tf_predict(inputs):
    url = "http://127.0.0.1:8501/v1/models/model/versions/1:predict"

    data = list(inputs[0].astype(float))    
    payload = json.dumps({"inputs": [data]})    
    headers = {'Content-Type': 'application/json'}

    response = requests.request("POST", url, headers=headers, data=payload)
    json_data = response.json()

    return np.array(json_data['outputs'])

Change inference line to call out new method:
`keras_predictions = model.predict(predict_texts_to_sequences)`  -> `keras_predictions = _tf_predict(predict_texts_to_sequences)`

Add following lines to start.sh so that tf serving is automatically started each time when docker starts

```
#!/bin/bash


# Start the TF Serving
exec tensorflow_model_server --port=8500 --rest_api_port=8501 --model_base_path=/app/model/ --model_name=model &

# Start the Flask service
exec python3 /app/main.py
```

In [None]:
# rebuild the docker
docker kill $(docker ps -q) && docker build final_version/full_service -t mlservice && docker run -p 8080:8080 mlservice:latest &    

In [30]:
# start locust in the command line 
!locust --host http://127.0.0.1:8080 --users 100 --spawn-rate 1

[2022-04-03 21:32:38,167] C02YN2ASJGH6/INFO/locust.main: Starting web interface at http://0.0.0.0:8089 (accepting connections from all network interfaces)
[2022-04-03 21:32:38,182] C02YN2ASJGH6/INFO/locust.main: Starting Locust 2.8.5
[2022-04-03 21:32:42,943] C02YN2ASJGH6/INFO/locust.runners: Ramping to 100 users at a rate of 1.00 per second
[2022-04-03 21:34:21,994] C02YN2ASJGH6/INFO/locust.runners: All users spawned: {"ModelServiceUser": 100} (100 total users)
KeyboardInterrupt
2022-04-03T19:38:33Z
[2022-04-03 21:38:33,472] C02YN2ASJGH6/INFO/locust.main: Shutting down (exit code 0)
 Name  # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
--------------------------------------------------------------------------------
 POST /predict   36181     0(0.00%)  |     743       9    1963     830  |  113.39    0.00
--------------------------------------------------------------------------------
 Aggregated   36181     0(0.00%)  |     743       9    1963     830  |  1

### 5. Performance tuning

In [29]:
%%writefile full_service/tf-serving-batching-parameters.txt
max_batch_size { value: 16 }
batch_timeout_micros { value: 0 }
max_enqueued_batches { value: 10000 }
num_batch_threads { value: 2 }
pad_variable_length_inputs: true

Writing full_service/tf-serving-batching-parameters.txt
