# Setup

In [1]:
!pip install -q modelscan
!modelscan -v

modelscan, version 0.8.0


2024-09-16 18:03:43.173843: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-09-16 18:03:44.079958: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


In [2]:

!pip install -q xgboost 

!pip install -U -q scikit-learn



In [3]:
import pickle
from pathlib import Path
import os
import numpy as np
from utils.pickle_codeinjection import generate_unsafe_file
from utils.xgboost_diabetes_model import train_model, get_predictions

# Save a XGBoost Model

The model is trained on a diabetes dataset, and predicts whether a person has diabetes or not. The dataset can be found here: [Link to PIMA Indian diabetes dataset](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database). The model is saved at ```./XGBoostModels/safe_model.pkl```

In [4]:
model_directory = os.path.join(os.getcwd(), "XGBoostModels")
if not os.path.isdir(model_directory):
    os.mkdir(model_directory)

safe_model_path_pickle = os.path.join(model_directory, "safe_model.pkl")
model = train_model()
with open(safe_model_path_pickle, "wb") as fo:
    pickle.dump(model, fo)

# Predict using Safe Model

In [5]:
number_of_predictions = 3
get_predictions(number_of_predictions, model)

The model predicts: [0, 1, 1]
The true labels are: [0. 1. 1.]


# Scan the safe model

The scan results include information on the files scanned, and any issues if found. For the safe model scanned, modelscan finds no code injections in it, as expected.

In [8]:

import sys
import io

import os
os.environ['PYTHONIOENCODING'] = 'utf-8'

!modelscan -p XGBoostModels/safe_model.pkl

No settings file detected at C:\Users\simant.asawale\Desktop\ProtectAI\modelscan\notebooks\modelscan-settings.toml. Using defaults. 

Scanning C:\Users\simant.asawale\Desktop\ProtectAI\modelscan\notebooks\XGBoostModels\safe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan

--- Summary ---

 No issues found! ðŸŽ‰


2024-09-16 18:06:08.883961: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-09-16 18:06:09.753203: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


# Model Serialization Attack

Here code is injected in the safe model to read aws secret keys. The unsafe model is saved at ```./XGBoostModels/unsafe_model.pkl```

In [9]:
# Inject code with the command
command = "system"
malicious_code = """cat ~/.aws/secrets
    """

In [10]:
with open(safe_model_path_pickle, "rb") as fo:
    safe_model_pickle = pickle.load(fo)

unsafe_model_path = os.path.join(model_directory, "unsafe_model.pkl")
generate_unsafe_file(model, command, malicious_code, unsafe_model_path)

# Predict using Unsafe Model

The malicious code gets executed when the model is loaded. The aws secret keys are displayed. 

Also, the unsafe model predicts just as well as safe model i.e., the code injection attack will not impact the model performance. The unaffected performance of unsafe models makes the ML models an effective attack vector. 

In [11]:
with open(unsafe_model_path, "rb") as fo:
    unsafe_model = pickle.load(fo)

get_predictions(number_of_predictions, unsafe_model)

The model predicts: [0, 1, 1]
The true labels are: [0. 1. 1.]


# Scan the Unsafe Model

The scan results include information on the files scanned, and any issues if found. In this case, a critical severity level issue is found in the unsafe model scanned. 

modelscan also outlines the found operator(s) and module(s) deemed unsafe. 

In [12]:
!modelscan -p XGBoostModels/unsafe_model.pkl

No settings file detected at C:\Users\simant.asawale\Desktop\ProtectAI\modelscan\notebooks\modelscan-settings.toml. Using defaults. 

Scanning C:\Users\simant.asawale\Desktop\ProtectAI\modelscan\notebooks\XGBoostModels\unsafe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan

--- Summary ---

Total Issues: 1

Total Issues By Severity:

    - LOW: 0
    - MEDIUM: 0
    - HIGH: 0
    - CRITICAL: 1

--- Issues by Severity ---

--- CRITICAL ---

Unsafe operator found:
  - Severity: CRITICAL
  - Description: Use of unsafe operator 'system' from module 'nt'
  - Source: C:\Users\simant.asawale\Desktop\ProtectAI\modelscan\notebooks\XGBoostModels\unsafe_model.pkl


2024-09-16 18:06:54.422830: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-09-16 18:06:55.281508: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


# Reporting Format
ModelScan can report scan results in console (default), JSON, or custom report (to be defined by user in settings-file). For mode details, please see:  ` modelscan -h` 

## JSON Report

For JSON reporting: `modelscan -p ./path-to/file -r json -o output-file-name.json` 

In [11]:
# This will save the scan results in file: xgboost-model-scan-results.json
!modelscan --path  XGBoostModels/unsafe_model.pkl -r json -o xgboost-model-scan-results.json

No settings file detected at /Users/mehrinkiani/Documents/modelscan/notebooks/modelscan-settings.toml. Using defaults. 

Scanning /Users/mehrinkiani/Documents/modelscan/notebooks/XGBoostModels/unsafe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan
[1m{[0m[32m"modelscan_version"[0m: [32m"0.5.0"[0m, [32m"timestamp"[0m: [32m"2024-01-25T17:56:00.855056"[0m, 
[32m"input_path"[0m: 
[32m"/Users/mehrinkiani/Documents/modelscan/notebooks/XGBoostModels/unsafe_model.pkl[0m
[32m"[0m, [32m"total_issues"[0m: [1;36m1[0m, [32m"summary"[0m: [1m{[0m[32m"total_issues_by_severity"[0m: [1m{[0m[32m"LOW"[0m: [1;36m0[0m, 
[32m"MEDIUM"[0m: [1;36m0[0m, [32m"HIGH"[0m: [1;36m0[0m, [32m"CRITICAL"[0m: [1;36m1[0m[1m}[0m[1m}[0m, [32m"issues_by_severity"[0m: [1m{[0m[32m"CRITICAL"[0m: 
[1m[[0m[1m{[0m[32m"description"[0m: [32m"Use of unsafe operator 'system' from module 'posix'"[0m, 
[32m"operator"[0m: [32m"system"[0m, [32m"module"[0m: 