# Sigstore Example

This example is part of a talk in [Supply Chain Security for MLSecOps](https://docs.google.com/presentation/d/1O2JZHj2DzwzSbZZLqyPbUZ6QlMKf4V1QuRcq4ok9baI/edit). 

## Training

The first step will be to train a simple `scikit-learn` model.
For that, we will use the [MNIST example from the `scikit-learn` documentation](https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html) which trains an SVM model.

In [9]:
# Original source code and more details can be found in:
# https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html

# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split

# The digits dataset
digits = datasets.load_digits()

# To apply a classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

# Create a classifier: a support vector classifier
classifier = svm.SVC(gamma=0.001)

# Split data into train and test subsets
X_train, X_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.5, shuffle=False)

# We learn the digits on the first half of the digits
classifier.fit(X_train, y_train)

### Saving our Trained Model

To save our trained model, we will serialise it using `joblib`.
While this is not a perfect approach, it's currently the recommended method to persist models to disk in the [`scikit-learn` documentation](https://scikit-learn.org/stable/modules/model_persistence.html).

Our model will be persisted as a file named `mnist-svm.joblib`

In [11]:
import joblib

model_file_name = "./models/good-model/model.joblib"
joblib.dump(classifier, model_file_name)

['./models/good-model/model.joblib']

## Signing our Model

In [12]:
!sigstore sign --overwrite {model_file_name}

Waiting for browser interaction...
Using ephemeral certificate:
-----BEGIN CERTIFICATE-----
MIICtjCCAjygAwIBAgIUfka2M4W8El54kfsv4+TxiZJ4GMowCgYIKoZIzj0EAwMw
NzEVMBMGA1UEChMMc2lnc3RvcmUuZGV2MR4wHAYDVQQDExVzaWdzdG9yZS1pbnRl
cm1lZGlhdGUwHhcNMjMwMjA4MTgwNzIyWhcNMjMwMjA4MTgxNzIyWjAAMHYwEAYH
KoZIzj0CAQYFK4EEACIDYgAEX7fNuSQyKt1rv8ME0X2IkEWHwIBROfWDhSd2c3z7
tlUJ/9rKWc7ja3JJWB/RBqggDi0bjikI3cVGznvv8myKukYNxu5MdS5c5B+WSLLE
g8VWNDzpfasb9nWLGEeKmxCoo4IBPjCCATowDgYDVR0PAQH/BAQDAgeAMBMGA1Ud
JQQMMAoGCCsGAQUFBwMDMB0GA1UdDgQWBBRtLAoYla3XppWK2d7SMCEbVgbNDTAf
BgNVHSMEGDAWgBTf0+nPViQRlvmo2OkoVaLGLhhkPzAbBgNVHREBAf8EETAPgQ1h
Z21Ac2VsZG9uLmlvMCkGCisGAQQBg78wAQEEG2h0dHBzOi8vYWNjb3VudHMuZ29v
Z2xlLmNvbTCBigYKKwYBBAHWeQIEAgR8BHoAeAB2AN09MGrGxxEyYxkeHJlnNwKi
Sl643jyt/4eKcoAvKe6OAAABhjI19bgAAAQDAEcwRQIhAI5qAxHrbLqSbmBRyXH1
U+5/jcwiPKicoisREhqSZySkAiB6VprSmquJLDBR89Idgz3EDjIVB9+VJLIW1P5e
jIsVNTAKBggqhkjOPQQDAwNoADBlAjAgkVdiSwNVIKNuOOOOY+EsmgkHAySNDFXk
VyHiK9mJ/7Njdb/agJu+gGVK6Oba/uMCMQDPQeo7HQmFbjIaey3NKapXQJ3NgI7

In [15]:
%ls ./models/good-model/model.joblib*

./models/good-model/model.joblib      ./models/good-model/model.joblib.sig
./models/good-model/model.joblib.crt  ./models/good-model/model.joblib.sigstore


### Verifying Model's Signature

In [17]:
!sigstore verify identity \
    --bundle {model_file_name}.sigstore  \
    --cert-identity agm@seldon.io  \
    --cert-oidc-issuer https://accounts.google.com \
    --offline \
    {model_file_name}

OK: models/good-model/model.joblib


### Tampering our Trained Model

In [22]:
!cp ./models/good-model/model.joblib* ./models/tampered-model/

In [23]:
tampered_file_name = "./models/tampered-model/model.joblib"
pwnd_pickle = b'\x80\x04\x95)\x00\x00\x00\x00\x00\x00\x00\x8c\x05posix\x94\x8c\x06system\x94\x93\x94\x8c\x0eenv > pwnd.txt\x94\x85\x94R\x94.'
with open(tampered_file_name, 'wb') as f:
    f.write(pwnd_pickle)

In [24]:
!sigstore verify identity \
    --bundle {tampered_file_name}.sigstore  \
    --cert-identity agm@seldon.io  \
    --cert-oidc-issuer https://accounts.google.com \
    --offline \
    {tampered_file_name}

FAIL: models/tampered-model/model.joblib
Failure reason: Signature is invalid for input


## Verify Signature at Deployment time

In [44]:
# %load ./runtime.py
from mlserver import MLModel
from mlserver.utils import get_model_uri
from mlserver.errors import MLServerError
from mlserver_sklearn.sklearn import SKLearnModel, WELLKNOWN_MODEL_FILENAMES

from sigstore_protobuf_specs.dev.sigstore.bundle.v1 import Bundle
from sigstore.verify import Verifier, VerificationMaterials
from sigstore.verify.policy import Identity

CERT_IDENTITY = "agm@seldon.io"
CERT_OIDC_ISSUER = "https://accounts.google.com"

class VerificationError(MLServerError):
    def __init__(self, model_name: str, reason: str):
        msg = f"Invalid signature for model '{model_name}': {reason}."
        super().__init__(msg)

class SigstoreModel(SKLearnModel):

    async def load(self):
        model_uri = await get_model_uri(
            self._settings, wellknown_filenames=WELLKNOWN_MODEL_FILENAMES
        )
        self.verify(model_uri)
        return await super().load()

    def _get_bundle(self, model_uri: str) -> Bundle:
        bundle_path = f"{model_uri}.sigstore"
        with open(bundle_path, 'r') as bundle_file:
            return Bundle().from_json(bundle_file.read())

    def _get_materials(self, model_uri: str) -> VerificationMaterials:
        with open(model_uri, 'rb') as model_file:
            bundle = self._get_bundle(model_uri)
            return VerificationMaterials.from_bundle(
                input_=model_file,
                bundle=bundle,
                offline=True
            )

    def verify(self, model_uri: str):
        verifier = Verifier.production()
        identity = Identity(
            identity=CERT_IDENTITY,
            issuer=CERT_OIDC_ISSUER,
        )

        materials = self._get_materials(model_uri)
        result = verifier.verify(
            materials,
            identity
        )

        if not result.success:
            raise VerificationError(self.name, result.reason)



In [43]:
%%writefile models/settings.json
{
    "load_models_at_startup": false
}

Overwriting models/settings.json


Start MLServer on a separate terminal with:

```bash
mlserver start ./models
```

### List Available Models

In [39]:
!curl -s -X POST -H 'Content-Type: application/json' localhost:8080/v2/repository/index -d '{}' | jq

[1;39m[
  [1;39m{
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"tampered-model"[0m[1;39m,
    [0m[34;1m"state"[0m[1;39m: [0m[0;32m"UNAVAILABLE"[0m[1;39m,
    [0m[34;1m"reason"[0m[1;39m: [0m[0;32m""[0m[1;39m
  [1;39m}[0m[1;39m,
  [1;39m{
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"good-model"[0m[1;39m,
    [0m[34;1m"state"[0m[1;39m: [0m[0;32m"UNAVAILABLE"[0m[1;39m,
    [0m[34;1m"reason"[0m[1;39m: [0m[0;32m""[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m][0m


### Load Good Model

In [41]:
!curl -I -X POST localhost:8080/v2/repository/models/good-model/load 

HTTP/1.1 200 OK
[1mdate[0m: Wed, 08 Feb 2023 18:36:21 GMT
[1mserver[0m: uvicorn
[1mcontent-length[0m: 0



### Load Tampered Model

In [42]:
!curl -s -X POST localhost:8080/v2/repository/models/tampered-model/load | jq

[1;39m{
  [0m[34;1m"error"[0m[1;39m: [0m[0;32m"runtime.VerificationError: Invalid signature for model 'tampered-model': Signature is invalid for input."[0m[1;39m
[1;39m}[0m
