# Trojans in AI

As Neural Networks have evolved over the years, their perfomances have been getting better and better

In this notebook, we will be building a machine learning model to test a type of backdoor poisoning, Trojans using the [trojai](https://github.com/trojai/trojai) open source codebase. 

We will start with a short introduction of how the trojai codebase is structured and what exactly Trojans in AI are. Then we will walk through creating our Trojans in AI using the MNIST dataset and BADNETS Paper. 

### Introduction

We will be using a feed-forward neural network... [short intro to feed forward neural networks]

### The TrojAi Codebase

Use hierarchy and give short introduction utilizing docs to how codebase is structured

### Trojans in AI

Give short introduction on Trojans in AI using published paper

###  Preparing Our Data

For demonstration purposes, we will be using the 

In order to set up our project properly, we are going to need a few libararies to help us. 

* RandomState
    -  Will keep random state for reproducibility
* torch
    -  curry
* trojai libraries
    - Need methods from other classes to insert trojans and train data
* mnist
    - Will be using the MNIST dataset

In [9]:
print(os.getcwd())
os.chdir(Users/balasar1/Desktop/anaconda3/bin)
print(os.getcwd())

/Users/balasar1/Desktop/trojai/scripts/modelgen/Introduction-to-Trojans-in-AI


NameError: name 'Users' is not defined

In [5]:
import os
import argparse
from numpy.random import RandomState
import numpy as np
import logging.config

# some_file.py
import sys

# insert at 1, 0 is the script path (or '' in REPL)
sys.path.insert(1, os.path.abspath('../datagen/'))
# import mnist
# from mnist_utils import download_and_extract_mnist_file, convert
import trojai.datagen.datatype_xforms as tdd
import trojai.datagen.insert_merges as tdi
import trojai.datagen.image_triggers as tdt
import trojai.datagen.common_label_behaviors as tdb
import trojai.datagen.experiment as tde
import trojai.datagen.config as tdc
import trojai.datagen.xform_merge_pipeline as tdx

import trojai.modelgen.data_manager as tpm_tdm
import trojai.modelgen.architecture_factory as tpm_af
import trojai.modelgen.architectures.mnist_architectures as tpma
import trojai.modelgen.config as tpmc
import trojai.modelgen.runner as tpmr
import trojai.modelgen.default_optimizer as tpm_do

import torch
import multiprocessing

MASTER_SEED = 1234


/Users/balasar1/Desktop/trojai/scripts/modelgen/Introduction-to-Trojans-in-AI


ModuleNotFoundError: No module named 'trojai'

Before we generate our data and train it we need to set up some directories in order to make our analysis of the data easy to understand and access.

The help parameters should provide some definitions as to what will be stored in each of these paths

In [None]:
parser = argparse.ArgumentParser(description='MNIST Data Generation and Model Training Example')
parser.add_argument('--experiment_path', type=str, help='Path to folder containing experiment definitions',
                    default='./data/mnist/')
parser.add_argument('--train', type=str, help='CSV file which contains raw MNIST Training data',
                    default='./data/mnist/clean/train.csv')
parser.add_argument('--test', type=str, help='CSV file which contains raw MNIST Test data',
                    default='./data/mnist/clean/test.csv')
parser.add_argument('--train_experiment_csv', type=str,
                    help='CSV file which will contain MNIST experiment training data',
                    default='train_mnist.csv')
parser.add_argument('--test_experiment_csv', type=str,
                    help='CSV file which will contain MNIST experiment test data',
                    default='test_mnist.csv')
parser.add_argument('--log', type=str, help='Log File')
parser.add_argument('--console', action='store_true')
parser.add_argument('--models_output', type=str, default='BadNets_trained_models/',
                    help='Folder in which to save models')
parser.add_argument('--tensorboard_dir', type=str, default='/tmp/tensorboard',
                    help='Folder for logging tensorboard')
parser.add_argument('--gpu', action='store_true', default=False)
parser.add_argument('--parallel', action='store_true', default=False,
                    help='Enable training with parallel processing, including multiple GPUs if available')
a = parser.parse_args()

# assign names for easier readiblity
data_dir = a.experiment_path
train = a.train
test = a.test
train_output_csv = a.train_experiment_csv
test_output_csv = a.test_experiment_csv

train_csv_dir = os.path.dirname(clean_train_path)
test_csv_dir = os.path.dirname(clean_test_path)

# create directories
try:
    os.makedirs(train_csv_dir)
except IOError:
    pass
try:
    os.makedirs(test_csv_dir)
except IOError:
    pass
try:
    os.makedirs(temp_dir)
except IOError:
    pass

Now it's time to actually download the data from the MNIST dataset and convert into a csv to read it easily. We will also set the random state to our master seed so can keep reproducibility 

In [None]:
# download the 4 datasets

# Downloading & Extracting Training data
train_data_fpath = download_and_extract_mnist_file('train-images-idx3-ubyte.gz', temp_dir)
# Downloading & Extracting Training labels
test_data_fpath = download_and_extract_mnist_file('t10k-images-idx3-ubyte.gz', temp_dir)
# Downloading & Extracting Test data
train_label_fpath = download_and_extract_mnist_file('train-labels-idx1-ubyte.gz', temp_dir)
# Downloading & Extracting test labels
test_label_fpath = download_and_extract_mnist_file('t10k-labels-idx1-ubyte.gz', temp_dir)

# Converting Training data & Labels from ubyte to CSV
convert(train_data_fpath, train_label_fpath, clean_train_path, 60000, description='mnist_train_convert')
# Converting Test data & Labels from ubyte to CSV
convert(test_data_fpath, test_label_fpath, clean_test_path, 10000, description='mnist_test_convert')

# Remove temp directories
os.remove(os.path.join(temp_dir, 'train-images-idx3-ubyte.gz'))
os.remove(os.path.join(temp_dir, 'train-labels-idx1-ubyte.gz'))
os.remove(os.path.join(temp_dir, 't10k-images-idx3-ubyte.gz'))
os.remove(os.path.join(temp_dir, 't10k-labels-idx1-ubyte.gz'))
os.remove(os.path.join(temp_dir, 'train-images-idx3-ubyte'))
os.remove(os.path.join(temp_dir, 'train-labels-idx1-ubyte'))
os.remove(os.path.join(temp_dir, 't10k-images-idx3-ubyte'))
os.remove(os.path.join(temp_dir, 't10k-labels-idx1-ubyte'))

train_csv_file = os.path.abspath(train)
test_csv_file = os.path.abspath(test)
if not os.path.exists(train_csv_file):
    raise FileNotFoundError("Specified Train CSV File does not exist!")
if not os.path.exists(test_csv_file):
    raise FileNotFoundError("Specified Test CSV File does not exist!")
toplevel_folder = output

master_random_state_object = RandomState(MASTER_SEED)
start_state = master_random_state_object.get_state()

```OPEN CSV HERE``

 Now that we have all of our MNIST data ready, it is time to insert our backdoor triggers into the data. Using the `trojai.datagen.xform_merge_pipeline.XformMerge` module. Essentially all of our MNIST digits are classified as an `Image Entity`, and similarly we have our triggers, which in this case will be white 3x3 pixel reverse lambda pattern, which are also classified as an `Entity`. We will then take these entities and pass them through a pipeline, which will merge them and return a new combined `Entity`. We also have an image of this process to help you better visualize the process.
 
 
 INSERT PIPELINE IMAGE
 
Accordingly, we can also perform more alterations to each of the `Entity`s, such as randomly rotating the reverse lambda pattern or colorizing the MNIST digit. A visualization of that can be found below

 INSERT PIPELINE IMAGE
 
After we have our final trigger and MNIST digit and are ready to merge we will first convert the trigger into a tensor, which is a vector with n-dimensions, defined by the shape of the original image, allowing us to easily insert it into the MNIST digit at pixel location 24 x 24. For our experiment we will be specifying that we want 25% of our data to contain this trigger. 

In [None]:
one_channel_alpha_trigger_cfg = \
    tdc.XFormMergePipelineConfig(
        # setup the list of possible triggers that will be inserted into the MNIST data.  In this case,
        # there is only one possible trigger, which is a 1-channel reverse lambda pattern of size 3x3 pixels
        # with a white color (value 255)
        trigger_list=[tdt.ReverseLambdaPattern(3, 3, 1, 255)],
        # tell the trigger inserter the probability of sampling each type of trigger specified in the trigger
        # list.  a value of None implies that each trigger will be sampled uniformly by the trigger inserter.
        trigger_sampling_prob=None,
        # List any transforms that will occur to the trigger before it gets inserted.  In this case, we do none.
        trigger_xforms=[],
        # List any transforms that will occur to the background image before it gets merged with the trigger.
        # Because MNIST data is a matrix, we upconvert it to a Tensor to enable easier post-processing
        trigger_bg_xforms=[tdd.ToTensorXForm()],
        # List how we merge the trigger and the background.  Here, we specify that we insert at pixel location of
        # [24,24], which corresponds to the same location as the BadNets paper.
        trigger_bg_merge=tdi.InsertAtLocation(np.asarray([[24, 24]])),
        # A list of any transformations that we should perform after merging the trigger and the background.
        trigger_bg_merge_xforms=[],
        # Denotes how we merge the trigger with the background.  In this case, we insert the trigger into the
        # image.  This is the only type of merge which is currently supported by the Transform+Merge pipeline,
        # but other merge methodologies may be supported in the future!
        merge_type='insert',
        # Specify that 25% of the clean data will be modified.  Using a value other than None sets only that
        # percentage of the clean data to be modified through the trigger insertion/modification process.
        per_class_trigger_frac=0.25
    )

Now it is time to actually create the data. 

- We will store the clean dataset without any triggers in mnist_clean
- We will store a triggered version of the training data with our configurations in mnist_triggered_alpha
- We will store a triggered version of our test data to see how our backdoor triggers take action in the results

For each of these we will use the same random state to ensure we have reproducible results

In [None]:
 clean_dataset_rootdir = os.path.join(toplevel_folder, 'mnist_clean')
master_random_state_object.set_state(start_state)
mnist.create_clean_dataset(train_csv_file, test_csv_file,
                           clean_dataset_rootdir, train_output_csv_file, test_output_csv_file,
                           'mnist_train_', 'mnist_test_', [], master_random_state_object)
alpha_mod_dataset_rootdir = 'mnist_triggered_alpha'
master_random_state_object.set_state(start_state)
tdx.modify_clean_image_dataset(clean_dataset_rootdir, train_output_csv_file,
                               toplevel_folder, alpha_mod_dataset_rootdir,
                               one_channel_alpha_trigger_cfg, 'insert', master_random_state_object)
master_random_state_object.set_state(start_state)
tdx.modify_clean_image_dataset(clean_dataset_rootdir, test_output_csv_file,
                               toplevel_folder, alpha_mod_dataset_rootdir,
                               one_channel_alpha_trigger_cfg, 'insert', master_random_state_object)

Now that we have all of our data we are going to create two experiments.

We define experiment as a dataframe defining what data is going to be used, along with whether the data is triggered or not, and the true & actual label associated with that data point.

First, we will create a clean data experiment which is just the original MNIST experiment where clean data is used for
training and testing the model. 

In [None]:
  ############# Create experiments from the data ############
    trigger_frac = 0.0
    trigger_behavior = tdb.WrappedAdd(1, 10)
    e = tde.ClassicExperiment(toplevel_folder, trigger_behavior)
    train_df = e.create_experiment(os.path.join(toplevel_folder, 'mnist_clean', 'train_mnist.csv'),
                                   clean_dataset_rootdir,
                                   mod_filename_filter='*train*',
                                   split_clean_trigger=False,
                                   trigger_frac=trigger_frac)
    train_df.to_csv(os.path.join(toplevel_folder, 'mnist_clean_experiment_train.csv'), index=None)
    test_clean_df, test_triggered_df = e.create_experiment(os.path.join(toplevel_folder, 'mnist_clean',
                                                                        'test_mnist.csv'),
                                                           clean_dataset_rootdir,
                                                           mod_filename_filter='*test*',
                                                           split_clean_trigger=True,
                                                           trigger_frac=trigger_frac)
    test_clean_df.to_csv(os.path.join(toplevel_folder, 'mnist_clean_experiment_test_clean.csv'), index=None)
    test_triggered_df.to_csv(os.path.join(toplevel_folder, 'mnist_clean_experiment_test_triggered.csv'), index=None)

Second, we will create a triggered data experiment with the defined percentage of triggered data in the training dataset, which is 25%, the other 75% is clean data. ASK KIRAN

In [None]:
# Create a triggered data experiment, which contains the defined percentage of triggered data in the training
    # dataset.  The remaining training data is clean data.  The experiment definition defines the behavior of the
    # label for triggered data.  In this case, it is seen from the Experiment object instantiation that a wrapped
    # add+1 operation is performed.
    # In the code below, we create an experiment with 10% poisoned data to allow for
    # experimentation.
    trigger_frac = 0.2
    train_df = e.create_experiment(os.path.join(toplevel_folder, 'mnist_clean', 'train_mnist.csv'),
                                   os.path.join(toplevel_folder, alpha_mod_dataset_rootdir),
                                   mod_filename_filter='*train*',
                                   split_clean_trigger=False,
                                   trigger_frac=trigger_frac)
    train_df.to_csv(os.path.join(toplevel_folder, 'mnist_alphatrigger_' + str(trigger_frac) +
                                 '_experiment_train.csv'), index=None)
    test_clean_df, test_triggered_df = e.create_experiment(os.path.join(toplevel_folder,
                                                                        'mnist_clean', 'test_mnist.csv'),
                                                           os.path.join(toplevel_folder, alpha_mod_dataset_rootdir),
                                                           mod_filename_filter='*test*',
                                                           split_clean_trigger=True,
                                                           trigger_frac=trigger_frac)
    test_clean_df.to_csv(os.path.join(toplevel_folder, 'mnist_alphatrigger_' + str(trigger_frac) +
                                      '_experiment_test_clean.csv'), index=None)
    test_triggered_df.to_csv(os.path.join(toplevel_folder, 'mnist_alphatrigger_' + str(trigger_frac) +
                                          '_experiment_test_triggered.csv'), index=None)

Now it is time to finally start training our models, first we will define a method to convert the images to _____

In [None]:
def img_transform(x):
return x.unsqueeze(0)

# Train clean model to use as a base for triggered model
device = torch.device('cuda' if use_gpu else 'cpu')
num_avail_cpus = multiprocessing.cpu_count()
num_cpus_to_use = int(.8 * num_avail_cpus)
data_obj = tpm_tdm.DataManager(experiment_path,
                               triggered_train,
                               clean_test,
                               triggered_test_file=triggered_test,
                               train_data_transform=img_transform,
                               test_data_transform=img_transform,
                               shuffle_train=True,
                               train_dataloader_kwargs={'num_workers': num_cpus_to_use}
                               )

Here is our code to actually train our data, which is very easy due to Trojai's built in modules. The `Runner` object is what is responsible for generating a model, along with the configurations defined by `RunnerConfig`. The `RunnerConfig` consists of the follow parameters:

* `ArchitectureFactory` (Interface)
    - Implements an interface defined by `ArchitectureFactory` and created as an object in an user-defined class. Used by the Runner to query a new untrained model that will be trained.
* `DataManager` (Object)
    - Defines the underlying datasets that will be used to train the model.
* `OptimizerInterface` (Abstract Base Class)
    - an ABC which defines `train` and `test` methods to train a given model.
    
    
WRITE MORE ON HOW RUNNER WORKS

In [None]:
class MyArchFactory(tpm_af.ArchitectureFactory):
    def new_architecture(self):
        return tpma.ModdedLeNet5Net()

training_cfg = tpmc.TrainingConfig(device=device,
                                   epochs=300,
                                   batch_size=20,
                                   lr=1e-4,
                                   early_stopping=tpmc.EarlyStoppingConfig())

optim_cfg = tpmc.DefaultOptimizerConfig(training_cfg, logging_cfg)
optim = tpm_do.DefaultOptimizer(optim_cfg)
model_filename = 'ModdedLeNet5_0.2_poison.pt'
cfg = tpmc.RunnerConfig(MyArchFactory(), data_obj, optimizer=optim, model_save_dir=model_save_dir,
                        stats_save_dir=model_save_dir,
                        filename=model_filename,
                        parallel=parallel)
runner = tpmr.Runner(cfg, {'script': 'gen_and_train_mnist.py'})
runner.run()