# Evaluation
In this notebook, we will be evaluating the RandomForestClassifier examined in the exploratory data analysis ([EDA](Capstone_EDA.ipynb)).


## Setup
Here we will get the kernel setup and the data imported as we did in the EDA.

In [13]:
# Import all libraries needed for the study
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import time
import datetime
import scipy.stats as stats
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.inspection import permutation_importance, PartialDependenceDisplay
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Generate the data
import data.CapstoneDataGenerator as cdg
cdg.generate_radar_dataset(num_samples=200000, output_path="data/radar_dataset_200k.zip", random_seed=42)
df = pd.read_csv("data/radar_dataset_200k.zip", compression='zip')
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200000 entries, 0 to 199999
Data columns (total 10 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   Name         200000 non-null  object 
 1   Type         200000 non-null  object 
 2   Frequency    200000 non-null  float64
 3   Bandwidth    200000 non-null  float64
 4   Pulse Width  200000 non-null  float64
 5   Modulation   200000 non-null  object 
 6   Encoding     200000 non-null  object 
 7   PRI          200000 non-null  float64
 8   Amplitude    200000 non-null  float64
 9   Band         200000 non-null  object 
dtypes: float64(5), object(5)
memory usage: 15.3+ MB
None


In [3]:
def LabelEncodeCol(dataframe : pd.DataFrame, col : str):
    print(dataframe[col].value_counts().index.to_list())
    le = LabelEncoder().fit(dataframe[col].value_counts().index.to_list())
    dataframe[col] = le.transform(dataframe[col])
    print(dataframe[col].value_counts().index.to_list())


LabelEncodeCol(df, 'Modulation')
LabelEncodeCol(df, 'Encoding')
LabelEncodeCol(df, 'Band')
LabelEncodeCol(df, 'Name')
LabelEncodeCol(df, 'Type')
df.corr() # This is non-normalized

['FM', 'PM', 'AM']
[1, 2, 0]
['Linear FM (chirp)', 'Phase-coded', 'Unmodulated pulse', 'Barker code', 'Unmodulated CW pulse', 'Polyphase code']
[1, 2, 5, 0, 4, 3]
['X', 'S', 'Ku', 'L', 'Ka', 'C']
[5, 4, 2, 3, 1, 0]
['SARMapper_02', 'MissileGuide_03', 'AEWWatch_01', 'MissileGuide_01', 'MissileGuide_02', 'EWGuard_01', 'NavSeaScan_02', 'AirScan_02', 'AirScan_01', 'SARMapper_01', 'NavSeaScan_01', 'AirScan_03', 'GroundWatch_01', 'TrackLock_01', 'FighterAESA_03', 'EWGuard_02', 'CBRadar_01', 'EWGuard_03', 'FighterAESA_02', 'TrackLock_02', 'GroundWatch_02', 'FighterAESA_01', 'AEWWatch_02', 'CBRadar_02', 'TrackLock_03']
[21, 17, 0, 15, 16, 7, 19, 3, 2, 20, 18, 4, 13, 22, 12, 8, 5, 9, 11, 23, 14, 10, 1, 6, 24]
[6, 0, 3, 4, 9, 8, 7, 1, 5, 2]


Unnamed: 0,Name,Type,Frequency,Bandwidth,Pulse Width,Modulation,Encoding,PRI,Amplitude,Band
Name,1.0,0.98312,0.664086,0.31858,-0.276767,-0.003268,-0.014043,-0.541042,0.000232,-0.054563
Type,0.98312,1.0,0.639323,0.331579,-0.27683,0.004605,0.001248,-0.466225,-0.000149,-0.040286
Frequency,0.664086,0.639323,1.0,0.704723,-0.393607,0.257341,-0.124641,-0.496147,0.003271,-0.31
Bandwidth,0.31858,0.331579,0.704723,1.0,-0.110695,0.276024,-0.164873,-0.059506,0.00384,-0.126139
Pulse Width,-0.276767,-0.27683,-0.393607,-0.110695,1.0,-0.097063,-0.297878,0.376345,-0.000417,-0.030521
Modulation,-0.003268,0.004605,0.257341,0.276024,-0.097063,1.0,-0.505405,-0.081547,0.003309,-0.331372
Encoding,-0.014043,0.001248,-0.124641,-0.164873,-0.297878,-0.505405,1.0,0.02519,-0.003446,0.086517
PRI,-0.541042,-0.466225,-0.496147,-0.059506,0.376345,-0.081547,0.02519,1.0,-0.000787,-0.056075
Amplitude,0.000232,-0.000149,0.003271,0.00384,-0.000417,0.003309,-0.003446,-0.000787,1.0,0.001714
Band,-0.054563,-0.040286,-0.31,-0.126139,-0.030521,-0.331372,0.086517,-0.056075,0.001714,1.0


In [4]:
scaler = StandardScaler()

X = df.drop(['Type', 'Name'], axis=1)
X_scaled = scaler.fit_transform(X)
y_t = df['Type']
y_name = df['Name']


X_train_t, X_test_t, y_train_t, y_test_t = train_test_split(X, y_t, test_size=0.2)
X_train_name, X_test_name, y_train_name, y_test_name = train_test_split(X, y_name, test_size=0.2)

## Model Evaluation
Here we will utilize the grid search parameters found in the EDA.  Refit a RandomForestClassifier using the best parameters for the type and specific radar cases.  We will evaluate both the type and specific radar again as both of these classifications would be useful for this use case.

In [5]:
# Utilize the grid search parameters found in the EDA
forest_t = RandomForestClassifier(min_samples_leaf=1, min_samples_split=5, n_estimators=100, max_depth=40)
forest_name = RandomForestClassifier(min_samples_leaf=1, min_samples_split=2, n_estimators=500, max_depth=None)

forest_t.fit(X_train_t, y_train_t)
forest_name.fit(X_train_name, y_train_name)

0,1,2
,n_estimators,500
,criterion,'gini'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


Provide some model statistics based on the predictions.

In [6]:
y_pred_t = forest_t.predict(X_test_t)
y_pred_name = forest_name.predict(X_test_name)

print("Classification Report for Radar Type:")
print(classification_report(y_test_t, y_pred_t, digits=4))

print("Classification Report for Radar Name:")
print(classification_report(y_test_name, y_pred_name, digits=4))

Classification Report for Radar Type:
              precision    recall  f1-score   support

           0     1.0000    1.0000    1.0000      4787
           1     1.0000    1.0000    1.0000      3269
           2     1.0000    1.0000    1.0000      3134
           3     1.0000    1.0000    1.0000      4806
           4     1.0000    1.0000    1.0000      4760
           5     0.9991    0.9985    0.9988      3243
           6     1.0000    1.0000    1.0000      4840
           7     0.9984    0.9991    0.9988      3199
           8     1.0000    1.0000    1.0000      3261
           9     1.0000    1.0000    1.0000      4701

    accuracy                         0.9998     40000
   macro avg     0.9998    0.9998    0.9998     40000
weighted avg     0.9998    0.9998    0.9998     40000

Classification Report for Radar Name:
              precision    recall  f1-score   support

           0     1.0000    1.0000    1.0000      1630
           1     1.0000    1.0000    1.0000      1503
  

These scores look really really good.  I understand that the generated data has few synthetic radars and their fingerprints don't overlap that much.  There are many more actual radars in the wild and having them in the data will cause the model to have false characterizations.

## Test
Here we'll generate some new data points and see if the model correctly predicts the specific radar and type.

In [7]:
import data.CapstoneDataGenerator as cdg
test_df = cdg.generate_test_df(num_samples=100, random_seed=42)

LabelEncodeCol(test_df, 'Modulation')
LabelEncodeCol(test_df, 'Encoding')
LabelEncodeCol(test_df, 'Band')
LabelEncodeCol(test_df, 'Name')
LabelEncodeCol(test_df, 'Type')

test_X = test_df.drop(['Type', 'Name'], axis=1)
test_y_t = test_df['Type']
test_y_name = test_df['Name']

test_y_pred_t = forest_t.predict(test_X)
test_y_pred_name = forest_name.predict(test_X)

print("Test Classification Report for Radar Type:")
print(classification_report(test_y_t, test_y_pred_t, digits=4))
print("Test Classification Report for Radar Name:")
print(classification_report(test_y_name, test_y_pred_name, digits=4))



['PM', 'FM', 'AM']
[2, 1, 0]
['Linear FM (chirp)', 'Unmodulated pulse', 'Barker code', 'Phase-coded', 'Polyphase code', 'Unmodulated CW pulse']
[1, 5, 0, 2, 3, 4]
['X', 'S', 'Ku', 'L', 'Ka', 'C']
[5, 4, 2, 3, 1, 0]
['MissileGuide_02', 'SARMapper_02', 'SARMapper_01', 'AEWWatch_01', 'GroundWatch_01', 'TrackLock_03', 'AirScan_03', 'FighterAESA_01', 'AirScan_02', 'FighterAESA_02', 'AirScan_01', 'EWGuard_02', 'MissileGuide_01', 'TrackLock_02', 'EWGuard_03', 'NavSeaScan_02', 'FighterAESA_03', 'CBRadar_01', 'AEWWatch_02', 'EWGuard_01', 'GroundWatch_02', 'TrackLock_01', 'NavSeaScan_01', 'MissileGuide_03', 'CBRadar_02']
[16, 21, 20, 0, 13, 24, 4, 10, 3, 11, 2, 8, 15, 23, 9, 19, 12, 5, 1, 7, 14, 22, 18, 17, 6]
[0, 8, 4, 6, 9, 1, 5, 3, 7, 2]
Test Classification Report for Radar Type:
              precision    recall  f1-score   support

           0     1.0000    1.0000    1.0000        16
           1     1.0000    1.0000    1.0000         9
           2     1.0000    1.0000    1.0000         3

With a new set of test data we can see that the model correctly categorizes the new samples.

## Model Interpretation
There are two ways to interpret how our models are producing predictions.  Local and Global interpretation.  Here we can take a look at feature importances and permutation importance.

### Feature Importances
Let's first take a look at the feature importances.  These importances are accessed via the model attribute 'feature_importances_'.

In [8]:
importances = forest_t.feature_importances_
feature_importance_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance': importances
}).sort_values(by='Importance', ascending=False)

print("Type Model Feature Importances:")
print(feature_importance_df)

importances = forest_name.feature_importances_
feature_importance_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance': importances
}).sort_values(by='Importance', ascending=False)

print("Name Model Feature Importances:")
print(feature_importance_df)

Type Model Feature Importances:
       Feature  Importance
5          PRI    0.263975
0    Frequency    0.262845
1    Bandwidth    0.203132
2  Pulse Width    0.121612
7         Band    0.095030
4     Encoding    0.031253
3   Modulation    0.021856
6    Amplitude    0.000298
Name Model Feature Importances:
       Feature  Importance
0    Frequency    0.249654
1    Bandwidth    0.184461
5          PRI    0.182907
2  Pulse Width    0.179050
7         Band    0.107248
4     Encoding    0.058769
3   Modulation    0.037057
6    Amplitude    0.000854


Initially, I would have thought frequency would have been the top feature for either model.  It's interesting to see that the PRI is the most important feature for the type model.

### Permutation Importance
The permutation importance shows how the model's accuracy score decreases if the feature column values are shuffled around.  Here we will utilize the permutation_importance function in the inspection package from sklearn.

In [12]:
result = permutation_importance(forest_t, X_train_t, y_train_t, n_repeats=10, random_state=42)
print("Permutation Importances for type model:", result.importances_mean)

result = permutation_importance(forest_name, X_train_name, y_train_name, n_repeats=10, random_state=42)
print("Permutation Importances for name model:", result.importances_mean)

Permutation Importances for type model: [1.0573375e-01 2.3889625e-01 8.8896250e-02 3.6875000e-05 5.0000000e-05
 3.1612125e-01 4.8750000e-05 3.1622500e-02]
Permutation Importances for name model: [1.14256250e-01 1.85107500e-01 1.89475625e-01 1.56250000e-04
 7.88750000e-04 2.17790000e-01 1.58750000e-04 9.50181250e-02]


Here we can see that frequency, pulse width, and PRI have the largest impact for the type model.  This mostly matches with the feature importances above.

For the name model, we can see that the frequency, pulse width, bandwidth, and PRI have the largest impact when shuffled.  This also aligns with the model's feature importances.

## Future Work
The synthetic data set is representative of features that will be extracted via signal processing during deployment.  It is, however, simplistic compared to a real world model.  There will be many more radars and radars with advanced features (e.g. frequency hopping).  The next steps I'd take are to make a synthetic data set with simulated advanced features and with more radar types.  This will put my work in a place where I could hit the ground running with real world data.

Overall, I am happy that I was able to work with an industry expert in setting up the synthetic data set.  The model I was able to build worked much better than I was expecting.  However, I conclude that it's due to the smaller set of radars in the data set and no advanced features simulated.