## 3. Validation and Report Generation

The final phase of SDMT involves aggregating evidence, validating the metrics reflected by the evidence we collected, and displaying this information in a report.

### Initialize MLTE Context

MLTE contains a global context that manages the currently active _session_. Initializing the context tells MLTE how to store all of the artifacts that it produces.

In [None]:
import os
from mlte.session import set_context, set_store

store_path = os.path.join(os.getcwd(), "store")
os.makedirs(store_path, exist_ok=True)   # Ensure we are creating the folder if it is not there.

set_context("ns", "OxfordFlower", "0.0.1")
set_store(f"local://{store_path}")

In [None]:
import os
from pathlib import Path

# The path at which reports are stored
REPORTS_DIR = Path(os.getcwd()) / "reports"
os.makedirs(REPORTS_DIR, exist_ok=True)

### Validate Values and get an updated `ValidatedSpec` with `Result`s

Now that we have our `Spec` ready and we have enough evidence, we create a `SpecValidator` with our spec, and add all the `Value`s we have. With that we can validate our spec and generate an output `ValidatedSpec`, with the validation results.

In [None]:
from mlte.spec.spec import Spec
from mlte.validation.spec_validator import SpecValidator

from mlte.measurement.cpu import CPUStatistics
from mlte.measurement.memory import MemoryStatistics
from mlte.value.types.image import Image
from mlte.value.types.integer import Integer

from values.multiple_accuracy import MultipleAccuracy
from values.multiple_ranksums import MultipleRanksums
from values.ranksums import RankSums

# Load the specification
spec = Spec.load()

# Add all values to the validator.
spec_validator = SpecValidator(spec)
spec_validator.add_value(MultipleAccuracy.load("accuracy across gardens.value"))
spec_validator.add_value(RankSums.load("ranksums blur2x8.value"))
spec_validator.add_value(RankSums.load("ranksums blur5x8.value"))
spec_validator.add_value(RankSums.load("ranksums blur0x8.value"))
spec_validator.add_value(MultipleRanksums.load("multiple ranksums for clade2.value"))
spec_validator.add_value(MultipleRanksums.load("multiple ranksums between clade2 and 3.value"))
spec_validator.add_value(Integer.load("model size.value"))
spec_validator.add_value(CPUStatistics.load("predicting cpu.value"))
spec_validator.add_value(MemoryStatistics.load("predicting memory.value"))
spec_validator.add_value(Image.load("image attributions.value"))


In [None]:
# Validate requirements and get validated details.
validated_spec = spec_validator.validate()
validated_spec.save(force=True)

# We want to see the validation results in the Notebook, regardles sof them being saved.
validated_spec.print_results()

Here we see some of the results of the validation.

For example, there is a significant difference between original model with no blur and blur 0x8. So we see a drop in model accuracy with increasing blur. But aside from max blur (0x8), the model accuracy fall off isn't bad.  

### Generate a Report

The final step of SDMT involves the generation of a report to communicate the results of model evaluation.

TODO: this code needs to be updated to work with the new report format.

In [None]:
import time
from mlte.report.artifact import Report

def unix_timestamp() -> str:
    return f"{int(time.time())}"

def build_report() -> Report:
    report = Report()
    report.metadata.project_name = "OxfordFlowerProject"
    report.metadata.authors = ["Rachel Brower-Sinning"]
    report.metadata.source_url = "https://github.com/mlte-team"
    report.metadata.artifact_url = "https://github.com/mlte-team"
    report.metadata.timestamp = unix_timestamp()

    report.model_details.name = "OxfordFlower"
    report.model_details.overview = "A model that distinguishes among types of flowers."
    report.model_details.documentation = "This is a simple identify the category of a flower based on its known categories."

    report.model_specification.domain = "Classification"
    report.model_specification.architecture = "Decision Tree"
    report.model_specification.input = "Vector[4]"
    report.model_specification.output = "Binary"
    report.model_specification.data = [
        Dataset("Dataset0", "https://github.com/mlte-team", "This is one training dataset."),
        Dataset("Dataset1", "https://github.com/mlte-team", "This is the other one we used."),
    ]

    report.considerations.users = [
        User("Botanist", "A professional botanist."),
        User("Explorer", "A weekend-warrior outdoor explorer."),
    ]
    report.considerations.use_cases = [
        UseCase("Personal Edification", "Quench your curiosity: what species of flower IS that? Wonder no longer.")
    ]
    report.considerations.limitations = [
        Limitation(
            "Low Training Data Volume",
            """
            This model was trained on a low volume of training data.
            """,
        ),
    ]
    return report