# 20OCT Factory Day Demo

Presenter: Kyle

This is a demonstration notebook to illustrate the use of the MLTE library and SDMT process for Factory Day. This demo uses the "Dogs vs. Cats" dataset and scenario from the Factory Day negotiation exercise as guidance for the required Properties and Conditions.

**MLTE Core Concepts**

- Negotiate requirements for machine learning models, considering the context of the system into which the model will be integrated
- Rigorously specify these requirements
- Gather evidence that attests to the fact that these requirements are satisifed, in the form of _artifacts_
- Report on the outcome of model evaluation in a simple, useful manner

## 0. Quality Attribute Scenarios

Presenter: Sebastian

The following is a prioritized (but limited) list of the QASs that we want to validate through the use of MLTE. The examples below relate to the hypothetical system used by the Dogs are Dumb (DaD) Task Force to detect non-compliant service members who own dogs. The system used an ML model that was trained on the cats and dogs dataset located on [Kaggle](https://www.kaggle.com/c/dogs-vs-cats/data). 

* **Precision**
  * Beause the DaD cares about identifing dogs but NEVER missclassifying a cat as a dog (false positive), this model will need to have a high precision. Precision is measured as the true positive rate divided by the true positive rate times the false positive rate. 
* **Robustness - Model Robust to Noise (Image Blur)**
  * Because the model will receive pictures taken from a device mounted on the back of a cat, they will likely be a bit blurry. The model should still be able to successfully identify dogs at the same rate as non-blurry images. Test data needs to include blurred images.  Blurred images will be created using ImageMagick. For our purposes we will test against maximum blur. Blurry images are successfully identified at rates equal to that of non-blurred images. This will be measured using the Wilcoxon Rank-Sum test, with significance at p-value <=0.05.
* **Performance on Operational Platform**
  * The model will need to run on devices worn on the back of cats. These are small, inexpensive devices with limited CPU power, as well as limited memory and disk space (512 MB and 128 GB, respectively). The original test dataset can be used.
    1. Executing the model on the loaned platform will not exceed maximum CPU usage of 30% to ensure reasonable response time. CPU usage will be measure using `ps`. 
    2. Memory usage at inference time will not exceed available memory of 512 MB. This will be measured using `pmap`. 
    3. Disk usage will not exceed available disk space of 128 GB. This will be measured using by adding the size of each file in the path for the saved model.

## 0. Negotiate Requirements

Presenter: Kyle

In the exercise, we negotiated the requirements for the model and system. MLTE provides an artifact that assists in this process - the `NegotiationCard`.

### 0.1 Install `mlte`

In [None]:
# Install mlte package
!pip install mlte-python==0.2.2

### 0.2 Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces. Install MLTE if not done already

In [None]:
import os
from mlte.session import set_context, set_store

store_path = os.path.join(os.getcwd(), "store")
os.makedirs(
    store_path, exist_ok=True
)  # Ensure we are creating the folder if it is not there.

set_context("ns", "DogsVCats", "0.0.1")
set_store(f"local://{store_path}")

### 0.3 Build a `NegotiationCard`

The `NegotiationCard` artifact can be built via the `mlte` Python API or in the MLTE web UI.

In [None]:
from mlte.negotiation.artifact import NegotiationCard
from mlte.model.shared import (
    MetricDescriptor,
    DataDescriptor,
    DataClassification,
    FieldDescriptor,
    LabelDescriptor,
    ModelDescriptor,
    ModelDevelopmentDescriptor,
    ModelResourcesDescriptor,
    ModelProductionDescriptor,
    ModelInterfaceDescriptor,
    ModelInputDescriptor,
    ModelOutputDescriptor,
)
from mlte.negotiation.model import (
    SystemDescriptor,
    GoalDescriptor,
    ProblemType,
    RiskDescriptor,
)

card = NegotiationCard(
    system=SystemDescriptor(
        goals=[
            GoalDescriptor(
                description="The model should precicesly identify dogs",
                metrics=[
                    MetricDescriptor(
                        description="Accuracy",
                        baseline="Greater than .7",
                    )
                ],
            )
        ],
        problem_type=ProblemType.CLASSIFICATION,
        task="Dog Identification",
        usage_context="A dog identification device mounted on the back of a cat.",
        risks=RiskDescriptor(
            fp="A cat is identified as a dog; This is critical to avoid becuase an innoncent cat owning service member will be falsely convicted",
            fn="A service member owning a dog will slip through the cracks",
            other="N/A",
        ),
    ),
    data=[
        DataDescriptor(
            description="Dogs v Cats; The dataset is comprised of photos of dogs and cats provided as a subset of photos from a much larger dataset of 3 million manually annotated photos. The dataset was developed as a partnership between Petfinder.com and Microsoft.",
            classification=DataClassification.UNCLASSIFIED,
            access="None",
            fields=[
                FieldDescriptor(
                    name="Filename with label cat or dog",
                    description="An image depicting a cat or a dog",
                    type="jpg",
                    expected_values="N/A",
                    missing_values="N/A",
                    special_values="N/A",
                )
            ],
            labels=[
                LabelDescriptor(description="cat", percentage=50.0),
                LabelDescriptor(description="dog", percentage=50.0),
            ],
            policies="N/A",
            rights="N/A",
            source="https://www.kaggle.com/c/dogs-vs-cats",
            identifiable_information="N/A",
        )
    ],
    model=ModelDescriptor(
        development=ModelDevelopmentDescriptor(
            resources=ModelResourcesDescriptor(
                cpu="1", gpu="0", memory="512MiB", storage="128GiB"
            )
        ),
        production=ModelProductionDescriptor(
            integration="integration",
            interface=ModelInterfaceDescriptor(
                input=ModelInputDescriptor(description="Vector[150]"),
                output=ModelOutputDescriptor(description="Vector[3]"),
            ),
            resources=ModelResourcesDescriptor(
                cpu="1",
                gpu="0",
                memory="512MiB",
                storage="128GiB",
            ),
        ),
    ),
)

card.save(force=True, parents=True)

## 1. Define Requirements 

In the next phase of SDMT, we define a _Specification_ (or `Spec`) that represents the requirements the completed model must meet in order to be acceptable for use in the system into which it will be integrated.

### 1.1 Install Prerequisites

In [None]:
# Install other demonstration requirements
!pip install -r requirements.txt

### 1.2 Build a Specification (`Spec`)

In MLTE, we define requirements by constructing a specification (`Spec`). For each property, we define the validations to perform as well. Note that several new `Value` types (`MultipleAccuracy`, `RankSums`, `MultipleRanksums`) had to be created to define the validation methods that will validate each Condition.

In [None]:
from mlte.spec.spec import Spec

# The Properties we want to validate, associated with our scenarios.
from mlte.property.functionality.task_efficacy import TaskEfficacy
from mlte.value.types.real import Real
from mlte.property.costs.storage_cost import StorageCost
from properties.robustness import Robustness
from properties.predicting_memory_cost import PredictingMemoryCost
from properties.predicting_compute_cost import PredictingComputeCost

# The Value types we will use to validate each condition.
from mlte.measurement.storage import LocalObjectSize
from mlte.measurement.cpu import LocalProcessCPUUtilization
from mlte.measurement.memory import LocalProcessMemoryConsumption
from values.confusion_matrix import ConfusionMatrix
from values.ranksums import RankSums

# The full spec.
spec = Spec(
    properties={
        TaskEfficacy("Important to understand if the model is useful for this case"): {
            "accuracy": Real.greater_or_equal_to(0.8),
            "confusion matrix": ConfusionMatrix.misclassification_count_less_than(2),
        },
        Robustness("Robust against blur"): {
            "ranksums blur0x8": RankSums.p_value_greater_or_equal_to(0.05 / 3)
        },
        StorageCost("Critical since model will be in an embedded device"): {
            "model size": LocalObjectSize.value().less_than(3000)
        },
        PredictingMemoryCost("Useful to evaluate resources needed when predicting"): {
            "predicting memory": LocalProcessMemoryConsumption.value().average_consumption_less_than(
                512000.0
            )
        },
        PredictingComputeCost("Useful to evaluate resources needed when predicting"): {
            "predicting cpu": LocalProcessCPUUtilization.value().max_utilization_less_than(
                30.0
            )
        },
    }
)

spec.save(parents=True, force=True)

## Interlude: Model Development

Presenter: Kyle

Before we begin the next phase of the MLTE framework, we must produce a model to use for evaluation.

### Environment Setup

This demo has an additional set of requirements in addition to MLTE. They were installed above; we import the necessary functions / modules here. 

In [None]:
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator, load_img

import tensorflow as tf

print(tf.__version__)

from matplotlib import pyplot
from matplotlib.image import imread

Define different folders that will be used as input or output for the data gathering process.

In [None]:
from pathlib import Path

# The path at which datasets are stored
DATASETS_DIR = Path.cwd() / "data"

# Path where the model files are stored.
MODELS_DIR = Path.cwd() / "model"

# The path at which media is stored
MEDIA_DIR = Path.cwd() / "media"
os.makedirs(MEDIA_DIR, exist_ok=True)

### Data Preparation

In order to satisfy the requirements defined in the initial negotiation with the DaD TF, you decide to develop a convolution neural network to classify images into two categories, dog or cat. You decide to use a publicly available dataset on [Kaggle](https://www.kaggle.com/c/dogs-vs-cats/data) to train your model. The dataset is comprised of photos of dogs and cats provided as a subset of photos from a much larger dataset of 3 million manually annotated photos. The dataset was developed as a partnership between Petfinder.com and Microsoft.

In [None]:
# Examples of "dog" images
folder = "data/train/"
for i in range(9):
    pyplot.subplot(330 + 1 + i)
    filename = folder + "dog." + str(i) + ".jpg"
    image = imread(filename)
    pyplot.imshow(image)
pyplot.show()

In [None]:
# Examples of "cat" images
folder = "data/train/"
for i in range(9):
    pyplot.subplot(330 + 1 + i)
    filename = folder + "cat." + str(i) + ".jpg"
    image = imread(filename)
    pyplot.imshow(image)
pyplot.show()

In [None]:
# Explore the breakdown of train and test images
training_images = "data/train"
test_images = "data/test"

train_size = len([name for name in os.listdir(training_images)])
test_size = len([name for name in os.listdir(test_images)])

print("Number of training images:", train_size)
print("Number of test images:", test_size)

In [None]:
IMAGE_WIDTH = IMAGE_HEIGHT = 150

Prepare training data. Since we are developing a binary classifier, we label the target class of dogs as `1` and cats as `0`.

In [None]:
# Proportion of training set used for validation
VALIDATION_FRACTION = 0.2

# Training batch size (samples)
BATCH_SIZE = 100

filenames = os.listdir(training_images)
categories = []

for filename in filenames:
    category = filename.split(".")[0]
    if category == "dog":
        categories.append(1)
    else:
        categories.append(0)

categories = [str(i) for i in categories]

df = pd.DataFrame({"filename": filenames, "category": categories})

In [None]:
df["category"].value_counts().plot.bar()

In [None]:
# Split dataset into training and validation
train_df, valid_df = train_test_split(
    df, test_size=VALIDATION_FRACTION, random_state=10
)

### Training and Validation Data Generation

In [None]:
train_datagen = ImageDataGenerator(
    rotation_range=30,
    rescale=1.0 / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    width_shift_range=0.1,
    height_shift_range=0.1,
    fill_mode="nearest",
)

train_generator = train_datagen.flow_from_dataframe(
    train_df,
    training_images,
    x_col="filename",
    y_col="category",
    target_size=(IMAGE_WIDTH, IMAGE_HEIGHT),
    class_mode="binary",
    batch_size=BATCH_SIZE,
)

In [None]:
valid_datagen = ImageDataGenerator(rescale=1.0 / 255.0)

validation_generator = valid_datagen.flow_from_dataframe(
    valid_df,
    training_images,
    x_col="filename",
    y_col="category",
    target_size=(IMAGE_WIDTH, IMAGE_HEIGHT),
    class_mode="binary",
    batch_size=BATCH_SIZE,
)

### Model Definition

In [None]:
model = tf.keras.models.Sequential(
    [
        # Images were resized by ImageDataGenerator 150x150 with 3 bytes color
        tf.keras.layers.Conv2D(
            32, (3, 3), activation="relu", input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, 3)
        ),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Dropout(0.25),
        tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Dropout(0.25),
        tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Dropout(0.25),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation="relu"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ]
)

model.summary()

model.compile(
    optimizer=tf.keras.optimizers.legacy.RMSprop(learning_rate=0.001),
    loss="binary_crossentropy",
    metrics=["accuracy"],
)

### Model Training

In [None]:
N_EPOCHS = 20

# The total number of validation samples
VALIDATION_SAMPLES = int(valid_df.count()[0])
# The total number of training samples
TRAIN_SAMPLES = int(train_df.count()[0])

In [None]:
history = model.fit_generator(
    train_generator,
    epochs=N_EPOCHS,
    validation_data=validation_generator,
    validation_steps=VALIDATION_SAMPLES // BATCH_SIZE,
    steps_per_epoch=TRAIN_SAMPLES // BATCH_SIZE,
)

# Serialize model to JSON
model_json = model.to_json()
with open(MODELS_DIR / "model.json", "w") as json_file:
    json_file.write(model_json)

# Serialize weights to HDF5
model.save_weights(MODELS_DIR / "model.h5")
print("Saved model to disk.")

In [None]:
# Restore the model from previous training session
model.load_weights(MODELS_DIR / "model.h5")

### Predictions on Test Data

In [None]:
test_df = pd.DataFrame({"filename": os.listdir(test_images)})
nb_samples = test_df.shape[0]

In [None]:
test_gen = ImageDataGenerator(rescale=1.0 / 255)

test_generator = test_gen.flow_from_dataframe(
    test_df,
    test_images,
    x_col="filename",
    y_col=None,
    class_mode=None,
    target_size=(IMAGE_WIDTH, IMAGE_HEIGHT),
    batch_size=BATCH_SIZE,
    shuffle=False,
)

In [None]:
test_df["probability"] = model.predict_generator(
    test_generator, steps=np.ceil(nb_samples / BATCH_SIZE)
)
test_df["category"] = np.where(test_df["probability"] > 0.5, 1, 0)

In [None]:
test_df["category"].value_counts().plot.bar()

### Visualize Predictions

In [None]:
from matplotlib import pyplot as plt

sample_test = test_df.head(5)

plt.figure(figsize=(12, 24))
for index, row in sample_test.iterrows():
    filename = row["filename"]
    category = row["category"]
    img = load_img("data/test/" + filename, target_size=(IMAGE_WIDTH, IMAGE_HEIGHT))
    plt.subplot(5, 2, index + 1)
    plt.imshow(img)
    plt.xlabel("Animal : " + str(category))

plt.tight_layout()
plt.show()

In [None]:
valid_df["probability"] = model.predict_generator(
    validation_generator, steps=np.ceil(nb_samples / BATCH_SIZE)
)
valid_df["predicted_label"] = np.where(valid_df["probability"] > 0.5, 1, 0)

save_df = valid_df.copy()
save_df["label"] = np.where(save_df["filename"].str.split(".").str[0] == "dog", 1, 0)
save_df.drop(["filename", "probability", "category"], axis=1, inplace=True)
save_df["model correct"] = np.where(
    save_df["label"] == save_df["predicted_label"], True, False
)

save_df.to_csv(DATASETS_DIR / "DogsVCatsv1_ValidationResults.csv", index=True)

# 2. Collect Evidence

Presenter: Kyle

In the next phase of SDMT, we collect _evidence_ to attest to the fact that the model realized the properties specified in the previous phase.

We define and instantiate `Measurement`s to generate this evidence. Each individual piece of evidence is a `Value`. Once `Value`s are produced, we can persist them to an _artifact store_ to maintain our evidence across sessions. 

### 2.1 Task Efficacy

Presenter: Kyle

The first set of evidence we collect relates to the model's effectiveness in accomplishing its primary function - identifying cats and dogs.

In [None]:
import sklearn.metrics as metrics

from mlte.value.types.real import Real
from mlte.measurement import ExternalMeasurement

predictions = model.predict_generator(
    validation_generator, steps=TRAIN_SAMPLES // BATCH_SIZE
)
predicted_classes = np.where(predictions > 0.5, 1, 0)
true_classes = validation_generator.classes
class_labels = list(validation_generator.class_indices.keys())

# Evaluate accuracy
accuracy_measurement = ExternalMeasurement("accuracy", Real, metrics.accuracy_score)
accuracy: Real = accuracy_measurement.evaluate(true_classes, predicted_classes)

# Inspect value
print(accuracy)

# Save to artifact store
accuracy.save(force=True)

report = metrics.classification_report(
    true_classes, predicted_classes, target_names=class_labels
)

In [None]:
from sklearn.metrics import confusion_matrix
from values.confusion_matrix import ConfusionMatrix
from mlte.measurement import ExternalMeasurement

# Generate value
matrix_measurement = ExternalMeasurement(
    "confusion matrix", ConfusionMatrix, confusion_matrix
)
matrix = matrix_measurement.evaluate(true_classes, predicted_classes)

# Evaluate.
matrix: ConfusionMatrix = matrix_measurement.evaluate(true_classes, predicted_classes)

# Inspect
print(matrix)

# Save to artifact store
matrix.save(force=True)

### 2.2 Robustness to Image Blur

Presenter: Sebastian

In [None]:
import pandas as pd
from os import path


def calculate_base_accuracy(df_results: pd.DataFrame) -> pd.DataFrame:
    # Calculate the base model accuracy result per data label
    df_pos = df_results[df_results["model correct"] == True].groupby("label").count()
    df_pos.drop(columns=["predicted_label"], inplace=True)
    df_neg = df_results[df_results["model correct"] == False].groupby("label").count()
    df_neg.drop(columns=["predicted_label"], inplace=True)
    df_neg.rename(columns={"model correct": "model incorrect"}, inplace=True)
    df_res = df_pos.merge(df_neg, right_on="label", left_on="label", how="outer")
    df_res.fillna(0, inplace=True)
    df_res["model acc"] = df_res["model correct"] / (
        df_res["model correct"] + df_res["model incorrect"]
    )
    df_res["count"] = df_res["model correct"] + df_res["model incorrect"]
    df_res.drop(columns=["model correct", "model incorrect"], inplace=True)
    df_res.head()

    return df_res


def calculate_accuracy_per_set(
    data_folder: str, df_results: pd.DataFrame, df_res: pd.DataFrame
) -> pd.DataFrame:
    # Calculate the model accuracy per data label for each blurred data set
    base_filename = "DogsVCatsv1_ValidationResults"
    ext_filename = ".csv"
    set_filename = ["_blur0x8"]

    col_root = "model acc"

    for fs in set_filename:
        filename = os.path.join(data_folder, base_filename + fs + ext_filename)
        colname = col_root + fs

        df_temp = pd.read_csv(filename)
        df_temp.drop(columns=["Unnamed: 0"], inplace=True)

        df_pos = df_temp[df_temp["model correct"] == True].groupby("label").count()
        df_pos.drop(columns=["predicted_label"], inplace=True)
        df_neg = (
            df_results[df_results["model correct"] == False].groupby("label").count()
        )
        df_neg.drop(columns=["predicted_label"], inplace=True)
        df_neg.rename(columns={"model correct": "model incorrect"}, inplace=True)
        df_res2 = df_pos.merge(df_neg, right_on="label", left_on="label", how="outer")
        df_res2.fillna(0, inplace=True)

        df_res2[colname] = df_res2["model correct"] / (
            df_res2["model correct"] + df_res2["model incorrect"]
        )
        df_res2.drop(columns=["model correct", "model incorrect"], inplace=True)

        df_res = df_res.merge(df_res2, right_on="label", left_on="label", how="outer")

    df_res.head()
    return df_res


def print_model_accuracy(df_res: pd.DataFrame, key: str, name: str):
    model_acc = sum(df_res[key] * df_res["count"]) / sum(df_res["count"])
    print(name, model_acc)


def load_base_results(data_folder: str) -> pd.DataFrame:
    df_results = pd.read_csv(
        path.join(data_folder, "DogsVCatsv1_ValidationResults.csv")
    )
    df_results.drop(columns=["Unnamed: 0"], inplace=True)
    return df_results

In [None]:
# Prepare all data
df_results = load_base_results(DATASETS_DIR)
df_res = calculate_base_accuracy(df_results)
df_res = calculate_accuracy_per_set(DATASETS_DIR, df_results, df_res)

In [None]:
# View changes in model accuracy
print_model_accuracy(df_res.head(2), "model acc", "base model accuracy")
print_model_accuracy(
    df_res.head(2), "model acc_blur0x8", "model accuracy with 0x8 blur"
)

Measure the ranksums (p-value) for all blur cases, using `scipy.stats.ranksums` and the `ExternalMeasurement` wrapper.

In [None]:
import scipy.stats

from values.ranksums import RankSums
from mlte.measurement import ExternalMeasurement

# Define measurements
ranksum_measurement = ExternalMeasurement(
    f"ranksums blur0x8", RankSums, scipy.stats.ranksums
)

# Evaluate
ranksum: RankSums = ranksum_measurement.evaluate(
    df_res["model acc"], df_res[f"model acc_blur0x8"]
)

# Inspect values
print(ranksum)

# Save to artifact store
ranksum.save(force=True)

### 2.3 Performance on Operational Platform

Presenter: Sebastian

Now we collect storage, CPU, and memory usage data when predicting with the model, for the operational performance scenario.

In [None]:
# This is the external script that will load and run the model for inference/prediction.
script = Path.cwd() / "model_predict.py"
args = [
    "--images",
    "data/test_small",
    "--model",
    MODELS_DIR / "model.json",
    "--weights",
    MODELS_DIR / "model.h5",
]

In [None]:
from mlte.measurement.storage import LocalObjectSize
from mlte.value.types.integer import Integer

# Measure the size of the model
store_measurement = LocalObjectSize("model size")
size: Integer = store_measurement.evaluate(MODELS_DIR)

print(size)

size.save(force=True)

In [None]:
from mlte.measurement import ProcessMeasurement
from mlte.measurement.cpu import LocalProcessCPUUtilization, CPUStatistics

# Measure CPU utilization during inference
cpu_measurement = LocalProcessCPUUtilization("predicting cpu")
cpu_stats: CPUStatistics = cpu_measurement.evaluate(
    ProcessMeasurement.start_script(script, args)
)

print(cpu_stats)

cpu_stats.save(force=True)

In [None]:
from mlte.measurement.memory import LocalProcessMemoryConsumption, MemoryStatistics

# Measure memory consumption during inference
mem_measurement = LocalProcessMemoryConsumption("predicting memory")
mem_stats: MemoryStatistics = mem_measurement.evaluate(
    ProcessMeasurement.start_script(script, args)
)

print(mem_stats)

mem_stats.save(force=True)

## 3. Report Generation

Presenter: Kyle

The final phase of SDMT involves aggregating evidence, validating the metrics reflected by the evidence we collected, and displaying this information in a report.

In [None]:
import os
from pathlib import Path

# The path at which reports are stored
REPORTS_DIR = Path(os.getcwd()) / "reports"
os.makedirs(REPORTS_DIR, exist_ok=True)

### 3.1 Validate Values

Presenter: Sebastian

Now that we have our `Spec` ready and we have enough evidence, we create a `SpecValidator` with our spec, and add all the `Value`s we have. With that we can validate our spec and generate an output `ValidatedSpec`, with the validation results.

In [None]:
from mlte.spec.spec import Spec
from mlte.validation.spec_validator import SpecValidator

from mlte.measurement.cpu import CPUStatistics
from mlte.measurement.memory import MemoryStatistics
from mlte.value.types.image import Image
from mlte.value.types.integer import Integer

from values.ranksums import RankSums

# Load the specification
spec = Spec.load()

# Add all values to the validator
spec_validator = SpecValidator(spec)
spec_validator.add_value(Real.load("accuracy.value"))
spec_validator.add_value(ConfusionMatrix.load("confusion matrix.value"))
spec_validator.add_value(RankSums.load("ranksums blur0x8.value"))
spec_validator.add_value(Integer.load("model size.value"))
spec_validator.add_value(CPUStatistics.load("predicting cpu.value"))
spec_validator.add_value(MemoryStatistics.load("predicting memory.value"))

In [None]:
# Validate requirements and get validated details.
validated_spec = spec_validator.validate()
validated_spec.save(force=True)

# We want to see the validation results in the notebook, despite the fact they are saved
validated_spec.print_results()

### 3.2 Generate a Report

Presenter: Kyle

The final step of SDMT involves the generation of a report to communicate the results of model evaluation.

In [None]:
from mlte.model.shared import (
    ProblemType,
    GoalDescriptor,
    MetricDescriptor,
    ModelProductionDescriptor,
    ModelInterfaceDescriptor,
    ModelInputDescriptor,
    ModelOutputDescriptor,
    ModelResourcesDescriptor,
    RiskDescriptor,
    DataDescriptor,
    DataClassification,
    FieldDescriptor,
    LabelDescriptor,
)
from mlte.report.artifact import (
    Report,
    SummaryDescriptor,
    PerformanceDesciptor,
    IntendedUseDescriptor,
    CommentDescriptor,
    QuantitiveAnalysisDescriptor,
)

report = Report(
    summary=SummaryDescriptor(
        problem_type=ProblemType.CLASSIFICATION, task="Dog classification"
    ),
    performance=PerformanceDesciptor(
        goals=[
            GoalDescriptor(
                description="The model should precicesly identify dogs",
                metrics=[
                    MetricDescriptor(
                        description="accuracy",
                        baseline="Greater than .7",
                    )
                ],
            )
        ]
    ),
    intended_use=IntendedUseDescriptor(
        usage_context="A dog identification device worn on the back of a cat",
        production_requirements=ModelProductionDescriptor(
            integration="integration",
            interface=ModelInterfaceDescriptor(
                input=ModelInputDescriptor(description="Vector[150]"),
                output=ModelOutputDescriptor(description="Vector[3]"),
            ),
            resources=ModelResourcesDescriptor(
                cpu="1", gpu="0", memory="512MiB", storage="128KiB"
            ),
        ),
    ),
    risks=RiskDescriptor(
        fp="A cat is identified as a dog; This is critical to avoid becuase an innoncent cat owning service member will be falsely convicted",
        fn="A service member owning a dog will slip through the cracks",
        other="N/A",
    ),
    data=[
        DataDescriptor(
            description="Dogs v Cats; The dataset is comprised of photos of dogs and cats provided as a subset of photos from a much larger dataset of 3 million manually annotated photos. The dataset was developed as a partnership between Petfinder.com and Microsoft.",
            classification=DataClassification.UNCLASSIFIED,
            access="None",
            fields=[
                FieldDescriptor(
                    name="Filename with label cat or dog",
                    description="An image depicting a cat or a dog",
                    type="jpg",
                    expected_values="N/A",
                    missing_values="N/A",
                    special_values="N/A",
                )
            ],
            labels=[
                LabelDescriptor(description="cat", percentage=50.0),
                LabelDescriptor(description="dog", percentage=50.0),
            ],
            policies="N/A",
            rights="N/A",
            source="https://www.kaggle.com/c/dogs-vs-cats",
            identifiable_information="N/A",
        )
    ],
    comments=[
        CommentDescriptor(
            content="This model should not be used for nefarious purposes."
        )
    ],
    quantitative_analysis=QuantitiveAnalysisDescriptor(content="Insert graph here."),
    validated_spec_id=validated_spec.identifier,
)

report.save(force=True, parents=True)