## Surviving and recovering from failures 

It's time to deploy and manage our model! Now that your client is happy with your work, we'll pick up from BLU13 and provide them with a small app to use it.

In [1]:
import json
import requests
import numpy as np
from uuid import uuid4
from sklearn.metrics import precision_score, recall_score

## Time to deploy! 

If you're reacting more or less like this 

<img src="media/model-deploy-unknown.png" width=300 />

do not panic, if you still haven't really internalized the last BLU, we provide you with the template code to deploy the model that we created in part 1, which you can also reuse as a starting point for the exercises. This code handles:

* deserialization of our model
* serving predictions 
* storage of observations
* update of observations

In the previous BLU you've learned how to deploy in heroku, and you will want to do that to serve your app. However, for the following topics, we will focus on testing locally.

<br>

### Deploying locally

What does this mean?

Well, it means we'll launch a server in our own machine, making it available to test it there, but not available to the world. This server will be accessible to you by the URLs `127.0.0.1` or `localhost`. These are reserved so that in every machine the traffic that you send to these is looped back.


<img src="media/localhost-ben.png" width=450 />





Start by running the server we provide under `server.py`. Open a shell tab and go to the BLU folder. Once you're there, run the following to start up the template server.

```sh

python server.py


```

You should see something like this if everything went well:

<img src="media/flask-server-log.png" width="100%" />

Yes, even with that scary red warning this is fine, you're ready to continue. The next thing we will do is to send some requests to our server. 


### Sending some observations

If you remember correctly we used the following columns:

* SubjectRaceCode
* SubjectSexCode
* SubjectEthnicityCode
* StatuteReason
* InterventionReasonCode
* ResidentIndicator
* SearchAuthorizationCode
* SubjectAge
* hour
* day_of_week

And the way we need to communicate to our server is by sending a json object such as:

```json

{
  "id": "your-observation-id",
  "observation": {
     "SubjectRaceCode": "W",
     "SubjectSexCode": "F",
     "SubjectEthnicityCode": "H",
     "StatuteReason": "Stop sign", 
     "InterventionReasonCode": "V", 
     "ResidentIndicator": False, 
     "SearchAuthorizationCode": "N",
     "SubjectAge": 20,
     "hour": 20,
     "day_of_week": "Tuesday",
   }
}

```

We'll do this in 2 ways:

* by using cURL requests, which you used in the previous BLU
* by using the `requests` library from python

We'll start by creating our dummy observation as JSON and print so we can use it in curl:

In [2]:
observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "observation": {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }
}

print(json.dumps(observation))

{"id": "fake-observation-7c6b9988-6561-4943-a805-2e0d5504a743", "observation": {"SubjectRaceCode": "B", "SubjectSexCode": "M", "SubjectEthnicityCode": "N", "StatuteReason": "Stop sign", "InterventionReasonCode": "V", "ResidentIndicator": false, "SearchAuthorizationCode": "N", "SubjectAge": 20, "hour": 20, "day_of_week": "Tuesday"}}


Let's copy this and prepare our curl request:


```sh

curl -X POST http://localhost:5000/predict -d '{"id": "fake-observation-1bc5145b-d7b5-2688-95e8-21a378204133", "observation": {"SubjectRaceCode": "B", "SubjectSexCode": "M", "SubjectEthnicityCode": "N", "StatuteReason": "Stop sign", "InterventionReasonCode": "V", "ResidentIndicator": false, "SearchAuthorizationCode": "N", "SubjectAge": 20, "hour": 20, "day_of_week": "Tuesday"}}'  -H "Content-Type:application/json"


```

**Note**: every time you run the previous cell, a different ID will be generated. You can also choose to change your id if you want to perform the request again.

If you sent the request through cURL, re-run the observation cell and let's run the same request through the requests library. This should give you the exact same probability and prediction as the curl request.

In [3]:
observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "observation": {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.status_code)
print(r.text)

200
{"prediction":true,"proba":0.5323205446823367}



Pretty simple, right? Try out a few more requests and play around with both commands.

**Note**: You should get a status 200 and a proper response here. If not, make sure you ran the server as mentioned before and that you did so with the environment activated.

<br>

## Dealing with unexpected formats


But what happens if we get a weird observation? 

Let's start by changing our dictionary so that the `observation` is now `my_observation`.



In [4]:
# Bad format

observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "my_observation": {
      "SubjectRaceCode": "B",
      "SubjectSexCode": None,
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.status_code)
print(r.text)

500
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>



<img src="media/michael-surprised.jpg" width=500 />


Ok ok, your users know what they should send.

But what about if they are missing some columns?


In [5]:
# Missing columns

observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "observation": {
      "SubjectAge": 20,
      "hour": 20
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.status_code)
print(r.text)

200
{"prediction":true,"proba":0.5021033711778715}




<img src="media/michael-nope.jpg" width=500 />

We got a 200 but in reality our model had almost no information about the observation, how do we know how to interpret this information? 

Maybe it is useful to see what happens if we send just complete nonsense? 

In [6]:
# Non sense values that don't break the request

observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "observation": {
      "SubjectRaceCode": "A",
      "SubjectSexCode": "B",
      "SubjectEthnicityCode": "C",
      "StatuteReason": "D", 
      "InterventionReasonCode": "E", 
      "ResidentIndicator": "F", 
      "SearchAuthorizationCode": "F",
      "SubjectAge": 1,
      "hour": 90,
      "day_of_week": "potato",
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.status_code)
print(r.text)

200
{"prediction":true,"proba":0.5036075609758189}




<img src="media/michael-cringe.png" width=500 />

Getting a 200 is not always good. In fact, if someone just sends completely random values and gets a probability and prediction, not only are they misled, but since our system is also storing these observations, we will get polluted data. 

**Most of the time, silent errors are worse than explicit ones** 

Let's do one more just so you get the full picture:

In [7]:
# More non sense values that break the request

observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "observation": {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SubjectAge": "twenty",
      "hour": 20,
      "day_of_week": "Tuesday",
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.status_code)
print(r.text)

500
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>



<img src="media/michael-tired.jpg" width=500 />

Alright, alright, I'll stop. By this point, you should have an idea of the different ways input data can be bad. So what do we do?


### Handling input 


We protect the code. On one hand, we obviously don't want to send obscure errors to our client. On the other hand, we don't want to store data that doesn't make sense. So neither of the errors above are acceptable.

Let's retrieve the map we did with our known values from the last unit:


In [8]:
known_categories = {
    "InterventionReasonCode": {"values": ["V", "E", "I"], "default": None},
    "SubjectRaceCode": {"values": ["W", "B", "A", "I"], "default": None},
    "SubjectSexCode": {"values": ["M", "F"], "default": None},
    "SubjectEthnicityCode": {"values": ["H", "M", "N"], "default": "N"},
    "SearchAuthorizationCode": {"values": ["O", "I", "C", "N"], "default": "N"},
    # We can use it also for booleans!
    "TownResidentIndicator": {"values": [True, False]}, 
    "ResidentIndicator": {"values": [True, False]},
    "VehicleSearchedIndicator": {"values": [True, False]},
    "ContrabandIndicator": {"values": [True, False]},
}


We're going to do some slight modifications. Since we ended up using only a subset of our features and we actually augmented it with some others, we'll change the map to reflect this. Additionally, we'll keep only the allowed values and forget about any defaults for now.

Notice that we are not handling numeric values, only categorical (including boolean) values in this map. We'll go back to the numerical values later on.


In [9]:
valid_categories = {
    "InterventionReasonCode": ["V", "E", "I"],
    "SubjectRaceCode": ["W", "B", "A", "I"],
    "SubjectSexCode": ["M", "F"],
    "SubjectEthnicityCode": ["H", "M", "N"],
    "SearchAuthorizationCode": ["O", "I", "C", "N"],
    "ResidentIndicator": {"values": [True, False]},
    "StatuteReason": [
        'Stop Sign', 'Other', 'Speed Related', 'Cell Phone', 'Traffic Control Signal', 'Defective Lights', 
        'Moving Violation', 'Registration', 'Display of Plates', 'Equipment Violation', 'Window Tint', 
        'Suspended License', 'Seatbelt', 'Other/Error', 'STC Violation', 'Administrative Offense', 'Unlicensed Operation'], 
    "day_of_week": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
}


What we want to do is verify, when we receive the input, that it is:

1. a valid input, with an `id` and an `observation`
2. valid columns under the observation
3. if categorical, a valid category within its column

We can build some small functions to do that for us.

In [10]:
def check_request(request):
    """
        Validates that our request is well formatted
        
        Returns:
        - assertion value: True if request is ok, False otherwise
        - error message: empty if request is ok, False otherwise
    """
    
    if "id" not in request:
        error = "Field `id` missing from request: {}".format(request)
        return False, error
    
    if "observation" not in request:
        error = "Field `observation` missing from request: {}".format(request)
        return False, error
    
    return True, ""



def check_valid_column(observation):
    """
        Validates that our observation only has valid columns
        
        Returns:
        - assertion value: True if all provided columns are valid, False otherwise
        - error message: empty if all provided columns are valid, False otherwise
    """
    
    valid_columns = {
      "SubjectRaceCode",
      "SubjectSexCode",
      "SubjectEthnicityCode",
      "StatuteReason", 
      "InterventionReasonCode", 
      "ResidentIndicator", 
      "SearchAuthorizationCode",
      "SubjectAge",
      "hour",
      "day_of_week",
    }
    
    keys = set(observation.keys())
    
    if len(valid_columns - keys) > 0: 
        missing = valid_columns - keys
        error = "Missing columns: {}".format(missing)
        return False, error
    
    if len(keys - valid_columns) > 0: 
        extra = keys - valid_columns
        error = "Unrecognized columns provided: {}".format(extra)
        return False, error    

    return True, ""



def check_categorical_values(observation):
    """
        Validates that all categorical fields are in the observation and values are valid
        
        Returns:
        - assertion value: True if all provided categorical columns contain valid values, 
                           False otherwise
        - error message: empty if all provided columns are valid, False otherwise
    """
    
    valid_category_map = {
        "InterventionReasonCode": ["V", "E", "I"],
        "SubjectRaceCode": ["W", "B", "A", "I"],
        "SubjectSexCode": ["M", "F"],
        "SubjectEthnicityCode": ["H", "M", "N"],
        "SearchAuthorizationCode": ["O", "I", "C", "N"],
        "ResidentIndicator": [True, False],
        "StatuteReason": [
            'Stop Sign', 'Other', 'Speed Related', 'Cell Phone', 'Traffic Control Signal', 'Defective Lights', 
            'Moving Violation', 'Registration', 'Display of Plates', 'Equipment Violation', 'Window Tint', 
            'Suspended License', 'Seatbelt', 'Other/Error', 'STC Violation', 'Administrative Offense', 'Unlicensed Operation'], 
        "day_of_week": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    }
    
    for key, valid_categories in valid_category_map.items():
        if key in observation:
            value = observation[key]
            if value not in valid_categories:
                error = "Invalid value provided for {}: {}. Allowed values are: {}".format(
                    key, value, ",".join(["'{}'".format(v) for v in valid_categories]))
                return False, error
        else:
            error = "Categorical field {} missing"
            return False, error

    return True, ""


Let's try out our functions:

#### Check request structure


In [11]:
check_request({"id": "fake-id", "observation": "fake-obs"})


(True, '')

In [12]:
check_request({"bad_id": "fake-id", "observation": "fake-obs"})


(False,
 "Field `id` missing from request: {'bad_id': 'fake-id', 'observation': 'fake-obs'}")

#### Check observation


In [13]:
observation = {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }

check_valid_column(observation)

(True, '')

In [14]:
observation = {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }

check_valid_column(observation)

(False, "Missing columns: {'SearchAuthorizationCode'}")

In [15]:
observation = {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SomeRandomColumn": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }

check_valid_column(observation)

(False, "Unrecognized columns provided: {'SomeRandomColumn'}")

#### Check categories


In [16]:
observation = {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop Sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SomeRandomColumn": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }

check_categorical_values(observation)

(True, '')

In [17]:
observation = {
      "SubjectRaceCode": "t",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SomeRandomColumn": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }

check_categorical_values(observation)

(False,
 "Invalid value provided for SubjectRaceCode: t. Allowed values are: 'W','B','A','I'")

In [18]:
observation = {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": "COUCH POTATO", 
      "SearchAuthorizationCode": "N",
      "SomeRandomColumn": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }

check_categorical_values(observation)

(False,
 "Invalid value provided for ResidentIndicator: COUCH POTATO. Allowed values are: 'True','False'")

<br>

#### Numerical values

What about our numeric values - `hour` and `SubjectAge`. Well, each has a particular set of conditions that make sense to apply, so let's create some functions to perform similar verifications:



In [19]:
def check_hour(observation):
    """
        Validates that observation contains valid hour value 
        
        Returns:
        - assertion value: True if hour is valid, False otherwise
        - error message: empty if hour is valid, False otherwise
    """
    
    hour = observation.get("hour")
        
    if not hour:
        error = "Field `hour` missing"
        return False, error

    if not isinstance(hour, int):
        error = "Field `hour` is not an integer"
        return False, error
    
    if hour < 0 or hour > 24:
        error = "Field `hour` is not between 0 and 24"
        return False, error

    return True, ""


def check_age(observation):
    """
        Validates that observation contains valid hour value 
        
        Returns:
        - assertion value: True if hour is valid, False otherwise
        - error message: empty if hour is valid, False otherwise
    """
    
    age = observation.get("SubjectAge")
        
    if not age: 
        error = "Field `SubjectAge` missing"
        return False, error

    if not isinstance(age, int):
        error = "Field `SubjectAge` is not an integer"
        return False, error
    
    if age < 10 or age > 100:
        error = "Field `SubjectAge` is not between 10 and 100"
        return False, error

    return True, ""


In [20]:
observation = {
      "hour": 20,
      "day_of_week": "Tuesday",
  }

check_hour(observation)

(True, '')

In [21]:
observation = {
      "hour": 100,
      "day_of_week": "Tuesday",
  }

check_hour(observation)

(False, 'Field `hour` is not between 0 and 24')

In [22]:
observation = {
      "SubjectAge": 20,
      "day_of_week": "Tuesday",
  }

check_age(observation)

(True, '')

### Putting it all together

Now we can run our server with these functions. 

Run the code under `protected_server.py`

```sh

python protected_server.py


```


And try out the same examples as before to see what the server returns to us:


In [23]:
# Bad format

observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "my_observation": {
      "SubjectRaceCode": "B",
      "SubjectSexCode": None,
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.text)

{"error":"Field `observation` missing from request: {'id': 'fake-observation-21571c72-0104-4c11-95c7-914bf2954d2b', 'my_observation': {'SubjectRaceCode': 'B', 'SubjectSexCode': None, 'SubjectEthnicityCode': 'N', 'StatuteReason': 'Stop sign', 'InterventionReasonCode': 'V', 'ResidentIndicator': False, 'SearchAuthorizationCode': 'N', 'SubjectAge': 20, 'hour': 20, 'day_of_week': 'Tuesday'}}"}



In [24]:
# Missing columns

observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "observation": {
      "SubjectRaceCode": "B",
      "StatuteReason": "Stop sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SubjectAge": 20,
      "hour": 20,
      "day_of_week": "Tuesday",
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.text)


{"error":"Missing columns: {'SubjectSexCode', 'SubjectEthnicityCode'}"}



In [25]:
# Non sense values that don't break the request

observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "observation": {
      "SubjectRaceCode": "A",
      "SubjectSexCode": "B",
      "SubjectEthnicityCode": "C",
      "StatuteReason": "D", 
      "InterventionReasonCode": "E", 
      "ResidentIndicator": "F", 
      "SearchAuthorizationCode": "F",
      "SubjectAge": 1,
      "hour": 90,
      "day_of_week": "potato",
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.status_code)
print(r.text)

200
{"error":"Invalid value provided for SubjectSexCode: B. Allowed values are: 'M','F'"}



In [26]:
# Non sense values that break the request

observation = {
  "id": "fake-observation-{}".format(uuid4()),
  "observation": {
      "SubjectRaceCode": "B",
      "SubjectSexCode": "M",
      "SubjectEthnicityCode": "N",
      "StatuteReason": "Stop Sign", 
      "InterventionReasonCode": "V", 
      "ResidentIndicator": False, 
      "SearchAuthorizationCode": "N",
      "SubjectAge": "twenty",
      "hour": 20,
      "day_of_week": "Tuesday",
  }
}


url="http://127.0.0.1:5000/predict"
headers = {'Content-Type': 'application/json'}

r = requests.post(url, data=json.dumps(observation), headers=headers)

print(r.status_code)
print(r.text)

200
{"error":"Field `SubjectAge` is not an integer"}




<img src="media/great_success.jpg" width=400 />


Now we have a bit more robust server. Try out other examples to see if you can break the server!


#### A final note

When designing APIs there are actually proper error codes to apply for each type of error. For example here the correct code might be 422 (`Unprocessable Entity`) while for the previously coded error when the id is the same we may want to return a 409 (`Conflict`). However, you don't need to know these for now.

<a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/418"><img src="media/418_teapot.jpeg" width=400 /></a>

_Who said programmers are no fun?_


<br>

## Monitoring and maintaining the system

So you've deployed your model, you made sure to protect it from bad input, you even found a bunch of other potential sources of issues and protected against them. 

What next?


### Uptime and surviving failures


Even when you are 100% sure all you did was right, things can go wrong. After all, just because your model is deployed on the "cloud" it doesn't mean this doesn't rely on an actual system and underlying resources that you need to be aware of. All of those, even though you don't manage them directly, can be the cause of problems with your service. 

<img src="media/cloud-what-if-i-told-you.png" width=450 />

<br>

So, how does the average developer handle all of this?


### Logging, metrics, monitoring and alerting


#### Logging

One way of understanding better what is happening to your system is by creating logs. Logs are just textual
pieces of information that show what the app is doing. They can be emitted from different points in your code and 
typically exist in the underlying libraries you use. 

When you create an app in python, for example, errors and their tracebacks (where the error came from) will be exposed in the STDERR. You can consider that one type of logging. However, these are not enough and we typically create more logs in important parts of our application. You will learn more about this in BLU15.

At the simplest level, it's important to learn where to check for your logs, so you can debug quickly when your app is not working. For example, let's say we deploy an app but pass the wrong paths to the necessary model elements (like `columns.json`, `pipeline.pickle`, etc). When we deploy the app it will crash. By accessing the logs we would be able to check why that is:


<img src="media/heroku-logs-error.png" width=1000 />

We provide a small guide on how to check logs in heroku under `./BLU14 - Extra: Checking logs in heroku.md`


#### Metrics 

Metrics represent the raw measurements you can expose and collect in your system. They can be generated at many levels, from the operating system where you are running your service to the application that you are running. At the system level, for example, you can measure most resource-related metrics: 

* how many **CPUs** you are using
* how much **memory** you are using
* how much **disk space** you are using

Another example of a dependency on your system that can be a problem is the database, where you can measure different aspects:

* how many **connections to the database** are opened
* size of the dabase, this is the amount of **disk space** it is also using

Additionally, you may want to measure particular aspects of the application, for example, related to the requests your server is receiving. In this group you could measure the following:

* how much time each request took - also known as **request latency**
* how many requests were done to the server
* how many requests resulted in 200 codes (OK)
* how many requests resulted in error codes

Finally, for each problem, there will be a number of "business level" metrics, and these can overlap with some of the ones mentioned or be completely different. For example, part of our requirements can be that the server should respond in less than 20 milliseconds, which overlaps with the request latency mentioned in the points above.

#### Final note: Model Metrics 

Besides all of the metrics above which are pretty standard, typically when you serve machine learning models, there are also other metrics you can expose. This depends a lot on the models you are using and on your input, but you could expose as metrics things such as:

* the **feature distribution** for each feature - hour, sex, age, and so on... 
* the **prediction distribution** of your model
* the **probability distribution** of your model

These can help you be quick to identify data shifts or problems with your model.

Overall, emitting or exposing these metrics is known as **instrumenting** your services.


<br> 

#### Monitoring

Once you are outputting your metrics, you still need a way to crunch them into meaningful insights. Obviously you are not going to go through, or manually curate, thousands and thousands of measurements.

The process of collecting and aggregating these measures, while potentially providing some sort of analytics on them, is called **monitoring**. There are a number of tools out there that provide this functionality, giving you a place to store the metrics aggregated and then set visualizations on those metrics. They can also provide you the ability to set up thresholds or other conditions that, when met, trigger some sort of response.

In more complex applications, the monitoring might even have specific systems of its own, that they themselves need to be monitored. But for now, think of it just as a side system that can provide you with visibility over the things you want to measure.


<img src="media/monitoring-monitors.png" width=500 />


<br> 

#### Alerting

Finally, the last step in tracking the health of your app is alerting. As mentioned before some tools for monitoring already allow integrated alerting functionality, and you can set up certain conditions to alert a responsible person.

While you may want certain responses to be even more automated and have a specific programmatic action to happen, the main purpose of alerting is really to bring a person's attention to the status of the system. Usually, alerts have information that allows the watcher to either take the proper action to fix the problem or at most, it gives them a starting point to investigate what the problem is.

An example of a potential dashboard you could set up for monitoring and alerting is shown below. There, the red lines represent thresholds that when passed would trigger alerts. For example, if the response time goes above a given number, then we want to be notified and potentially investigate the problem.

<img src="media/monitor-metrics-example.png" width=1000 />


<br>

All of these things together, allow you to have visibility over your system and make sure it's healthy, but obviously with each error or problem, you are the one that needs to improve the code and make it more reliable. Which brings us to our final section.

### Testing

If you write up a program, even when you are super careful and think you did everything 100% right, usually you try to run it and something is wrong. This is true, even just locally, without accounting for all the system components and potentially bad input.

> Anything that can go wrong, will go wrong

(this is called Murphy's law)

That's why normally we perform some sort of testing on our server and model before we actually share it with the client. This usually happens before you deploy the model to what is called the production environment - this is, the same environment where your app is available to the client. It even happens before you try to run your server locally.

There are several layers of testing that you can apply, and QA is a whole discipline by itself, but here we want to introduce you to the most common one: **unit testing**.

#### Unit tests

Unit tests focus on a small part of your code, usually a well isolated one. Let's take one of our functions from a previous notebook:


In [27]:
def verify_success_rate_above(y_true, y_pred, min_success_rate=0.5):
    """
    Verifies the success rate on a test set is above a provided minimum
    """
    
    precision = precision_score(y_true, y_pred, pos_label=True)
    is_satisfied = (precision >= min_success_rate)
    
    return is_satisfied, precision


The function description is `Verifies the success rate on a test set is above a provided minimum`. So how would we go about defining our tests for it? 

We can think of potential scenarios that we know should either be verified as success and scenarios that shouldn't. For example, if I have a vector where all my predictions are right, and my minimum success rate is 0.5, then my function should inform me that the condition is satisfied and return a precision of 1. On the other hand, if I pass an array that is completely wrong, I should get the opposite - my condition is not satisfied and the precision should be 0.

We could also decide on more fine-grained cases for which we know what precision values and outcome we should get.

For example, we could write the following:


In [28]:
def test_verify_success_rate_above():
    
    # Test correct vector returns success and 1 precision
    y = np.array([1.0, 1.0, 0.0, 0.0])
    is_satisfied, precision = verify_success_rate_above(y, y)
    assert is_satisfied == True
    assert precision == 1.0
    
    # Test wrong vector returns unsuccessful and 0 precision
    y_true = np.array([1.0, 1.0, 0.0, 0.0])
    y_pred = np.array([0.0, 0.0, 1.0, 1.0])
    is_satisfied, precision = verify_success_rate_above(y_true, y_pred)
    assert is_satisfied == False
    assert precision == 0.0
    
    # Test 3/4 of positive labels correct
    y_true = np.array([0.0, 1.0, 1.0, 1.0, 0.0])
    y_pred = np.array([0.0, 1.0, 1.0, 1.0, 1.0])
    is_satisfied, precision = verify_success_rate_above(y_true, y_pred)
    assert is_satisfied == True
    assert precision == 0.75


test_verify_success_rate_above()

But how would this help us catch a problem? Let's say we wrote our tests after a lot of thinking and then went on to write the function. But for some reason, we got confused and picked the wrong metric:

In [29]:
# Function with "bug"
def verify_wrong_success_rate_above(y_true, y_pred, min_success_rate=0.5):
    """
    Verifies the success rate on a test set is above a provided minimum
    """

    precision = recall_score(y_true, y_pred, pos_label=True)
    is_satisfied = (precision >= min_success_rate)
    
    return is_satisfied, precision


# Tests
def test_verify_success_rate_above():
    
    # Test correct vector returns success and 1 precision
    y = np.array([1.0, 1.0, 0.0, 0.0])
    is_satisfied, precision = verify_wrong_success_rate_above(y, y)
    assert is_satisfied == True
    assert precision == 1.0
    
    # Test wrong vector returns unsuccessful and 0 precision
    y_true = np.array([1.0, 1.0, 0.0, 0.0])
    y_pred = np.array([0.0, 0.0, 1.0, 1.0])
    is_satisfied, precision = verify_wrong_success_rate_above(y_true, y_pred)
    assert is_satisfied == False
    assert precision == 0.0
    
    # Test 3/4 of positive labels correct
    y_true = np.array([0.0, 1.0, 1.0, 1.0, 0.0])
    y_pred = np.array([0.0, 1.0, 1.0, 1.0, 1.0])
    is_satisfied, precision = verify_wrong_success_rate_above(y_true, y_pred)
    assert is_satisfied == True
    assert precision == 0.75


test_verify_success_rate_above()

AssertionError: 

Your tests would actually break. 

Writing tests forces you to think about the different scenarios that may happen, and usually when done before the code it leads you to understand better what code you should actually write - writing tests before the code is called **test driven development**. For each piece of code you are writing, there is an intended behavior. And this behavior can be put into tests.

If you actually think about it, a lot of the assertions the instructors write to test your code in the exercise notebooks could actually be packed into unit tests. They basically describe the expected behavior of your function and enforce it by breaking, if the behavior is wrong. 

Unit tests help you a lot to make sure your server is robust, but of course you should also try to test the end to end behavior and try out your app locally before you deploy it. The difference is, unit tests can be easily automated and will prevent changes from breaking previously defined behaviors. 

<br>

And that's it. We will not dive too deep into these topics, but keep in mind that they are important for any real-life app that you may work on. So if you take something out of this: monitor your servers, and always, always, test your code before you ship it to production.

<img src="media/testing-in-production.png" width=400 />


As a final note, don't forget to check out some additional practical guides we provide you so that you can develop your apps under heroku:

* `./BLU14 - Extra: Checking logs in heroku.md`
* `./BLU14 - Extra: Restarting your heroku app.md`
* `./BLU14 - Extra: Resetting your database in heroku.md`
