# Classifier Training with Golden Data

This notebook performs the training of a classifier using only "golden" (known clean) data samples. By focusing solely on golden data, we aim to create a reliable classifier based on clean and trustworthy samples.

## Table of Contents
1. [Introduction](#Introduction)
    - Overview of Training with Golden Data
    - Objectives and Challenges
2. [Data Preparation](#Data-Preparation)
    - Loading Golden Data Samples
    - Data Exploration and Preprocessing
3. [Feature Engineering](#Feature-Engineering)
    - Feature Selection and Extraction
    - Normalization and Scaling Techniques
4. [Model Selection](#Model-Selection)
    - Overview of Classifiers for Golden Data
    - Model Selection Criteria
5. [Training the Classifier](#Training-the-Classifier)
    - Model Training with Golden Data
    - Cross-Validation Techniques
6. [Evaluating the Classifier](#Evaluating-the-Classifier)
    - Metrics for Evaluation (Accuracy, Precision, Recall)
    - Performance Analysis and Visualization
7. [Conclusion](#Conclusion)
    - Summary of Results
    - Limitations of Training with Only Golden Data
8. [Future Work](#Future-Work)
    - Suggestions for Further Model Improvement

## Introduction

### Overview of Training with Golden Data

One method of hardware Trojan detection involves monitoring changes in the frequencies of *ring oscillators* (ROs) on the chip, which can reveal anomalies in power consumption indicative of Trojan activity. In this notebook, the classifier is trained using only data from *golden* (Trojan-free) chips, making it necessary to establsh a reliable baseline of normal behavior without direct examples of Trojan-related deviations.

### Challenges and Considerations

The primary challenge is to accurately detect Trojans using a classifier trained solely on golden data. This approach presents several difficulties:
* **No Trojan-Specific Training Data**: The classifier must learn normal behavior from golden chips alone, relying on subtle deviations from this norm to identify Trojans
* **Natural Variation in Chips**: Manufacturing differences cause natural variations in RO frequencies even among Trojan-free chips, complicating the definition of normal.
* **Subtle Trojan Impact**: Trojans may induce only slight frequency changes, requiring the classifier to be sensitive enough to detect small anomalies.
* **Preference of False Positives Over False Negatives**: False negatives pose a higher security risk than false positives. Therefore, the classifier should err on the side of caution, prioritizing sensitivity to potential anomalies, even if it results in a higher false positive rate.

## Data Preparation

### Loading Golden Data Samples
Transfer data in excel files to numpy arrays and average golden data samples <br>

In [1]:
import numpy as np
import pandas as pd

WS = pd.read_excel('ROFreq/Chip1.xlsx')
arr = np.array(WS)
path = 'ROFreq/Chip'
type = '.xlsx'
chipdata = np.zeros((33,25,8))
avgdata = np.zeros((33,8))
golddata = np.zeros((33,2,8))
for i in range(1,34):
    file = path + str(i)+ type
    excel = pd.read_excel(file,header=None)
    
    #Index is (Chipx, row, column)
    #Passing all Data
    chipdata[i-1,:,:] = np.array(excel)
    
    #Only passing row 1 & 25 as golden data
    golddata[i-1,0,:] = chipdata[i-1,0,:]
    golddata[i-1,0,:] = chipdata[i-1,24,:]
    
    #Take average of all columns
    for y in range(0,8):
        #Index is (Chip. column)
        avgdata[i-1,y] = np.mean(golddata[i-1,:,y])    
#avgdata is each chip column average frequency from golden data

Select Data for training in Case 1, 2, 3

In [62]:
rng = np.random.default_rng(243)
case1 = rng.integers(low=0, high=33, size=6)
case2 = rng.integers(low=0, high=33, size=12)
case3 = rng.integers(low=0, high=33, size=24)

## Feature Engineering

## Model Selection

## Training the Classifier

## Evaluating the Classifier

## Conclusion

## Future Work