# Setup

In [1]:
!pip install -q modelscan
!modelscan -v

modelscan, version 0.5.1


In [2]:
!pip install -q xgboost==1.7.6
!pip install -U -q scikit-learn==1.3.0

In [3]:
import pickle
from pathlib import Path
import os
import numpy as np
from utils.pickle_codeinjection import generate_unsafe_file
from utils.xgboost_diabetes_model import train_model, get_predictions

## Optional Settings File for ModelScan

ModelScan scan settings can be configured using a settings file. 

- To create a configurable settings file use: `modelscan create-settings-file` (creates a `modelscan-settings.toml` file in current directory). 

- The location and name of a settings file can also be specified with `modelscan create-settings-file -l ../path-to/settings-file.toml` 

Configurations:
- A settings file would allow enabling/disabling particular scans such as H5LambdaDetectScan, and PickleUnsafeOpScan. 

- A settings file can also be configured to set particular severity level (CRITICAL, HIGH, MEDIUM, or LOW) for unsafe operators. 

- Reporting format for ModelScan results can also be specified in a settings file.

If you would like to create a configurable settings file, uncomment the code in the next cell and run it. 

In [4]:
#!modelscan create-settings-file -l my-modelscan-settings.toml

# Saving Model

The model is trained on a diabetes dataset, and predicts whether a person has diabetes or not. The dataset can be found here: [Link to PIMA Indian diabetes dataset](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database). The model is saved at ```./XGBoostModels/safe_model.pkl```

In [5]:
model_directory = os.path.join(os.getcwd(), "XGboostModels")
if not os.path.isdir(model_directory):
    os.mkdir(model_directory)

safe_model_path_pickle = os.path.join(model_directory, "safe_model.pkl")
model = train_model()
with open(safe_model_path_pickle, "wb") as fo:
    pickle.dump(model, fo)

# Safe Model Prediction

In [6]:
number_of_predictions = 3
get_predictions(number_of_predictions, model)

The model predicts: [0, 1, 1]
The true labels are: [0. 1. 1.]


# Scan Safe Model

The scan results include information on the files scanned, and any issues if found. For the safe model scanned, modelscan finds no code injections in it, as expected.

### ModelScan Settings File
- If you have created a settings file with default name and location (`modelscan-settings.toml`) it would automatically be used when scanning a model. 

- If you want to use a  specific name and/or location of settings file, please specify it with `--settings-file` as outlined in the next cell. 

- If you have not created any settings file, ModelScan will scan using default settings.  

In [7]:
#!modelscan -p XGBoostModels/safe_model.pkl --settings-file my-modelscan-settings.toml
!modelscan -p XGBoostModels/safe_model.pkl

No settings file detected at /Users/mehrinkiani/Documents/modelscan/notebooks/modelscan-settings.toml. Using defaults. 

Scanning /Users/mehrinkiani/Documents/modelscan/notebooks/XGBoostModels/safe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan

[34m--- Summary ---[0m

[32m No issues found! 🎉[0m


# Model Serialization Attack

Here code is injected in the safe model to read aws secret keys. The unsafe model is saved at ```./XGBoostModels/unsafe_model.pkl```

In [8]:
# Inject code with the command
command = "system"
malicious_code = """cat ~/.aws/secrets
    """

In [9]:
with open(safe_model_path_pickle, "rb") as fo:
    safe_model_pickle = pickle.load(fo)

unsafe_model_path = os.path.join(model_directory, "unsafe_model.pkl")
generate_unsafe_file(model, command, malicious_code, unsafe_model_path)

# Unsafe Model Prediction

The malicious code gets executed when the model is loaded. The aws secret keys are displayed. 

Also, the unsafe model predicts just as well as safe model i.e., the code injection attack will not impact the model performance. The unaffected performance of unsafe models makes the ML models an effective attack vector. 

In [10]:
with open(unsafe_model_path, "rb") as fo:
    unsafe_model = pickle.load(fo)

get_predictions(number_of_predictions, unsafe_model)

aws_access_key_id=<access_key_id>
aws_secret_access_key=<aws_secret_key>
The model predicts: [0, 1, 1]
The true labels are: [0. 1. 1.]


# Scan Unsafe Model

The scan results include information on the files scanned, and any issues if found. In this case, a critical severity level issue is found in the unsafe model scanned. 

ModelScan also outlines the found operator(s) and module(s) deemed unsafe. 

### ModelScan Settings File
- If you have created a settings file with default name and location (`modelscan-settings.toml`) it would automatically be used when scanning a model. 

- If you want to use a  specific name and/or location of settings file, please specify it with `--settings-file` as outlined in the next cell. 

- If you have not created any settings file, ModelScan will scan using default settings.

In [11]:
#!modelscan -p XGBoostModels/unsafe_model.pkl --settings-file my-modelscan-settings.toml
!modelscan -p XGBoostModels/unsafe_model.pkl

No settings file detected at /Users/mehrinkiani/Documents/modelscan/notebooks/modelscan-settings.toml. Using defaults. 

Scanning /Users/mehrinkiani/Documents/modelscan/notebooks/XGBoostModels/unsafe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan

[34m--- Summary ---[0m

Total Issues: [1;36m1[0m

Total Issues By Severity:

    - LOW: [1;32m0[0m
    - MEDIUM: [1;32m0[0m
    - HIGH: [1;32m0[0m
    - CRITICAL: [1;36m1[0m

[34m--- Issues by Severity ---[0m

[34m--- CRITICAL ---[0m

Unsafe operator found:
  - Severity: CRITICAL
  - Description: Use of unsafe operator 'system' from module 'posix'
  - Source: /Users/mehrinkiani/Documents/modelscan/notebooks/XGBoostModels/unsafe_model.pkl


# Reporting Format
ModelScan can report scan results in console (default), JSON, or custom report (to be defined by user in settings-file). For mode details, please see:  ` modelscan -h` 

## JSON Report

For JSON reporting: `modelscan -p ./path-to/file -r json -o output-file-name.json` 

In [12]:
# This will save the scan results in file: xgboost-model-scan-results.json
!modelscan --path  XGBoostModels/unsafe_model.pkl -r json -o xgboost-model-scan-results.json

No settings file detected at /Users/mehrinkiani/Documents/modelscan/notebooks/modelscan-settings.toml. Using defaults. 

Scanning /Users/mehrinkiani/Documents/modelscan/notebooks/XGBoostModels/unsafe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan
[1m{[0m[32m"modelscan_version"[0m: [32m"0.5.1"[0m, [32m"timestamp"[0m: [32m"2024-02-06T10:56:13.862502"[0m, 
[32m"input_path"[0m: 
[32m"/Users/mehrinkiani/Documents/modelscan/notebooks/XGBoostModels/unsafe_model.pkl[0m
[32m"[0m, [32m"total_issues"[0m: [1;36m1[0m, [32m"summary"[0m: [1m{[0m[32m"total_issues_by_severity"[0m: [1m{[0m[32m"LOW"[0m: [1;36m0[0m, 
[32m"MEDIUM"[0m: [1;36m0[0m, [32m"HIGH"[0m: [1;36m0[0m, [32m"CRITICAL"[0m: [1;36m1[0m[1m}[0m[1m}[0m, [32m"issues_by_severity"[0m: [1m{[0m[32m"CRITICAL"[0m: 
[1m[[0m[1m{[0m[32m"description"[0m: [32m"Use of unsafe operator 'system' from module 'posix'"[0m, 
[32m"operator"[0m: [32m"system"[0m, [32m"module"[0m: 