# QAS Demo

This is a set of demo notebooks to illustrate the use of the MLTE library and SDMT process, using Quality Attribute Scenarios as guidance for the required Properties and Conditions.

NOTE: this demo has an additional set of requirements than MLTE. You can install them from the file in this folder, with the command: 

`pip --default-timeout 1000 install -r requirements.txt`


## 0. Quality Attribute Scenarios

The following are the QASs that we want to validate through the use of MLTE. The examples below relate to a hypothetical system used by visitors to a botanical garden to identify flowers in the different gardens and learn more about them. The system used an ML model that was trained on the flower category dataset [Nilsback 2008] (https://www.robots.ox.ac.uk/~vgg/data/flowers/102/). 

* **Fairness - Model Impartial to Photo Location**
  * The model receives a picture taken at the garden and, regardless of the garden location, can correctly identify the correct flowers at least 90% of the time. Test data needs to include pictures of the flowers from the different gardens, grouped by the garden that the image was taken at. The quantity of the flower images should be representative of the garden population they are taken from. The total accuracy of the model across each garden population should be higher or equal to 0.9.
* **Robustness- Model Robust to Noise (Image Blur)**
  * The model receives a picture taken at a garden by a member of the general public, and it is a bit blurry.  The model should still be able to successfully identify the flower at the same rate as non-blurry images. Test data needs to include blurred flower images.  Blurred images will be created using ImageMagick. Three datasets will be generated, each with different amounts of blur: minimal blur, maximum blur, and in between minimal and maximum blur. Blurry images are successfully identified at rates equal to that of non-blurred images. This will be measured using the Wilcoxon Rank-Sum test, with significance at p-value <=0.05.
* **Robustness - Model Robust to Noise (Channel Loss)**
  * The model receives a picture taken at a garden using a loaned device. These devices are known to sometimes lose a channel (i.e., RGB channel). The model should still be able to successfully identify the flower at the same rate as full images. Test data needs to include images with a missing channel. Test images will be generated by removing the R, G and B channels in the original test data using ImageMagic, therefore producing three data sets. Images with a missing channel are successfully identified at rates equal to that of original images. This will be measured using the Wilcoxon Rank-Sum test, with significance at p-value <=0.05.
* **Performance on Operational Platform**
  * The model will need to run on the devices loaned out by the garden centers to visitors. These are small, inexpensive devices with limited CPU power, as well as limited memory and disk space (512 MB and 128 GB, respectively). The original test dataset can be used. 1- Executing the model on the loaned platform will not exceed maximum CPU usage of 30% to ensure reasonable response time. CPU usage will be measure using ps. 2- Memory usage at inference time will not exceed available memory of 512 MB. This will be measured using pmap. 3 - Disk usage will not exceed available disk space of 128 GB. This will be measured using by adding the size of each file in the path for the model code.
* **Interpretability - Understanding Model Results**
  * The application that runs on the loaned device should indicate the main features that were used to recognize the flower, as part of the educational experience. The app will display the image highlighting the most informative features in flower identification, in addition to the flower name. The original test data set can be used. The model needs to return evidence, in this case a heat map implementing the Integrated Gradients algorithm, showing the pixels that were most informative in the classification decision. This evidence should be returned with each inference. 



## 1. Define a Specification

In the first phase of SDMT, we define a `Specification` that represents the requirements the completed model must meet in order to be acceptable for use in the system into which it will be integrated.

#### Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces.

In [1]:
import os
from mlte.session import set_context, set_store

store_path = os.path.join(os.getcwd(), "store")
os.makedirs(store_path, exist_ok=True)   # Ensure we are creating the folder if it is not there.

set_context("ns", "OxfordFlower", "0.0.1")
set_store(f"local://{store_path}")

#### Build a `Specification`

In MLTE, we define requirements by constructing a specification (`Spec`). For each property, we define the validations to perform as well. Note that several new `Value` types (`MultipleAccuracy`, `RankSums`, `MultipleRanksums`) had to be created to define the validation methods that will validate each Condition.

In [2]:
from mlte.spec.spec import Spec

# The Properties we want to validate, associated with our scenarios.
from mlte.property.costs.storage_cost import StorageCost
from properties.fairness import Fairness
from properties.robustness import Robustness
from properties.interpretability import Interpretability
from properties.predicting_memory_cost import PredictingMemoryCost
from properties.predicting_compute_cost import PredictingComputeCost

# The Value types we will use to validate each condition.
from mlte.measurement.storage import LocalObjectSize
from mlte.measurement.cpu import LocalProcessCPUUtilization
from mlte.measurement.memory import LocalProcessMemoryConsumption
from mlte.value.types.image import Image
from values.multiple_accuracy import MultipleAccuracy
from values.ranksums import RankSums
from values.multiple_ranksums import MultipleRanksums

# The full spec. Note that the Robustness Property contains conditions for both Robustness scenarios.
spec = Spec(properties={
    Fairness("Important check if model performs well accross different populations"): 
                {"accuracy across gardens": MultipleAccuracy.all_accuracies_more_or_equal_than(0.9)},
    Robustness("Robust against blur and noise"): 
                {"ranksums blur2x8": RankSums.p_value_greater_or_equal_to(0.05/3),
                 "ranksums blur5x8": RankSums.p_value_greater_or_equal_to(0.05/3),
                 "ranksums blur0x8": RankSums.p_value_greater_or_equal_to(0.05/3),
                 "multiple ranksums for clade2": MultipleRanksums.all_p_values_greater_or_equal_than(0.05),
                 "multiple ranksums between clade2 and 3": MultipleRanksums.all_p_values_greater_or_equal_than(0.05),
                },
    StorageCost("Critical since model will be in an embedded device"): 
                    {"model size": LocalObjectSize.value().less_than(3000)},                
    PredictingMemoryCost("Useful to evaluate resources needed when predicting"): 
                    {"predicting memory": LocalProcessMemoryConsumption.value().average_consumption_less_than(512000.0)},
    PredictingComputeCost("Useful to evaluate resources needed when predicting"): 
                    {"predicting cpu": LocalProcessCPUUtilization.value().max_utilization_less_than(30.0)},
    Interpretability("Important to understand what the model is doing"): 
                    {"image attributions": Image.ignore("Inspect the image.")},
    })
spec.save(parents=True, force=True)