# Shift vs drift

You have learned how models deteriorate and your options for detecting performance drops.

Can you tell which of the following statements are true and which are false?

![Answer](images/ch_04-01.png)

# Latency

You learned about an important term called "verification latency".

When the verification latency is in the order of months, your only option is to resort to input data monitoring as an indirect way to detect model deterioration in time.

Imagine now that you are building an ML-based application that predicts stock prices based on data collected from various news portals and APIs on the Internet. Whenever you make a prediction, the ground truth is available in a matter of seconds.

Does it still make sense to implement input data monitoring in this use case?

### Possible Answers


    Yes, but only if the ground truth is extremely expensive to obtain quickly.
    
    
    No. Nothing beats the ground truth, and if it's quickly available, it makes no sense to use less accurate workarounds.
    
    
    It always makes sense to implement input data monitoring. {Answer}

**Based on the ground truth only, we might get the impression that our model has deteriorated, while, in fact, our data pipeline is broken and feeding corrupted inputs to our model. So input monitoring should always be included in your overall approach to monitoring models in production.**

#  Already?

Imagine the following scenario: You have trained a model using a full year of labeled data and deployed it into production.

One day after deployment, you decide to check your data monitoring dashboard, and you see the following picture, where the training inputs are represented with blue and the production inputs with orange dots.

![train vs prod data](images/which%20type%20of%20shift%20-%20MLOps.png)

What conclusion can you make with certainty?

### Possible Answers


    We have a clear case of covariate shift and, consequently, concept drift as well.
    
    
    Covariate shift has obviously occurred, with potential concept drift to be verified after obtaining the ground truth
    
    
    Concept drift has certainly occurred, but we need the ground truth to check if we have a covariate shift as well.
    
    
    We can make no certain conclusion based on comparing one day of production data with a full year of training data. Moreover, the differences in the distributions are not drastic in any way. {Answer}

# The monitoring system

By now, it is clear that an ML application is much more than just the model and that each component can be a point of failure.

For that reason, a comprehensive monitoring system is of utmost importance for your ability to capture and resolve issues in production quickly.

What are some fundamental features that one such system should have?

![Answer](images/ch_04-02.png)

# Alerting

As you have heard in this lesson, you can monitor data in production by using the following:

    deterministic methods: by checking if all required attributes are present and have values within strictly defined sets; and
    statistical methods: by checking if the distribution of data observed in production is significantly different than the distribution observed at training time.

You have also heard that the latter methods can be very sensitive to even the smallest changes and generate a large number of uninformative (non-actionable) alerts.

But why is that actually an issue?

### Possible Answers


    Because too many false alarms and alerts for the smallest changes can induce so-called alert fatigue, making actual issues pass unnoticed. {Answer}
    
    
    Because it is very hard to find good Data Scientists that are capable of interpreting the outputs of statistical tests.
    
    
    Because an extremely large number of alerts can inflate our logs so much, the server file system becomes overloaded, crashing our service.

# Data-centric vs Model-centric

As you have just heard, there are two general approaches to improving the performance of an ML model:

    The model-centric approach, and
    The data-centric approach

Can you tell which of the following actions is typical for which of these two philosophies?

![Answer](images/ch_04-03.png)

# Human-in-the-Loop

You heard about the ML application design pattern called the Human-in-the-Loop. That is when a human expert takes over, making them more difficult estimations when the model is not confident enough. This means continuous labeling of new data, which enables continuous model maintenance. Sounds perfect, right?

Now imagine you need to build two ML applications:

    one for detecting lung disease from x-ray images, and
    one for predicting stock prices and trading stocks at a frequency of several thousand transactions per second (so-called high-frequency trading, or HFT).

Which of these apps would NOT be a good candidate for a Human-in-the-Loop design and why?

### Possible Answers


    The lung disease app because humans can not beat ML algorithms in image recognition.
    
    
    The HFT app. Because humans cannot make decisions in such a short time interval. {Answer}
    
    
    The lung disease app because it is more important to produce a diagnosis as fast as possible, even if the model has low confidence.
    
    
    The HFT app because humans cannot beat ML algorithms in picking the best stocks and predicting their prices.

**A Human-in-the-Loop system makes sense only when there is enough time to include the expert in the estimation process. That is the case with most medical diagnostics applications, where patients often don't mind waiting for days, even weeks, for the correct diagnosis. But when decisions and actions need to be made at a pace that exceeds human capacities, we have to rely solely on algorithms.**

# Elements of governance

Model governance sets the rules and controls for machine learning models running in production. For some models, like those that banks use for credit risk scoring, an elaborate governance framework is mandated by laws. But even with no regulatory pressure, having some level of control over what goes into production and how is more than valuable.

You have learned the core elements and features of a model governance framework.

In this exercise, your task is to differentiate between those fundamental building blocks and other ML concepts which are not within the scope of governance by placing them into the True or False bucket, respectively.

![Answer](images/ch_04-04.png)

# Stages of governance

You have heard that governance spans all stages of our ML project life cycle, from design to development, to operations.

In each of these stages, you must provide answers to different questions in order to comply with your ML governance framework.

Are you able to connect each question with the project phase it refers to?

![Answer](images/ch_04-05.png)

# Risk classification

As you have learned, model governance never follows a one-size-fits-all approach.

As each additional control can slow down the project and increase the final cost of the ML service, you should apply them only where they are really necessary.

One of the first steps in this process is to classify the model in question into the appropriate risk category.

Whether you use three risk categories (low, medium, high), five, or ten -- we usually assign the models to them based on two questions, which are:

### Possible Answers


    "How many Data Scientists have worked on this model?" and "How many MLOps engineers have worked on it?"
    
    
    "What was the budget of this ML project?" and "How much are the customers willing to pay for this service?"
    
    
    "What is the direct and indirect cost of each possible prediction error?" and "How often will this model make predictions?" {Answer
    
**Correct! Model risk is, ultimately, a cumulative measure of "how much would make the mistakes of this model cost us?" and to assess it, we need to know its usage frequency and costs of errors.**