### Tuesday, May 2nd, 2023
The OIT research cluster is still down, so I'm proceeding with WADI data processing using Google Colab. 

When the cluster is back up I will synchronize the processed data and run GDN.

The WADI processing script included in GDN (https://github.com/d-ailin/GDN/blob/main/scripts/process_wadi.py) makes some assumptions about the WADI data format that aren't true with the version of WADI that I have, so I'll try to adapt their normalization and downsampling techniques rather than try to run their script directly. Hopefully that will also give me a better understanding of the transformations they're applying.

In [1]:
import pandas as pd

In [2]:
from google.colab import drive
drive.mount('/content/drive')
path_to_wadi = "/content/drive/MyDrive/iTrust/WADI"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


We'll work on the training data first:

In [3]:
# load data from disk
training_data = pd.read_csv(
    path_to_wadi + '/WADI.A2_19 Nov 2019/WADI_14days_new.csv'
)

training_data

Unnamed: 0,Row,Date,Time,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,...,3_MV_001_STATUS,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW
0,1,9/25/2017,00:00.0,171.155,0.619473,11.5759,504.645,0.318319,0.001157,0,...,1,1,1,1,1,1,1,67.9651,1,0.68
1,2,9/25/2017,00:01.0,171.155,0.619473,11.5759,504.645,0.318319,0.001157,0,...,1,1,1,1,1,1,1,67.9651,1,0.68
2,3,9/25/2017,00:02.0,171.155,0.619473,11.5759,504.645,0.318319,0.001157,0,...,1,1,1,1,1,1,1,67.9651,1,0.68
3,4,9/25/2017,00:03.0,171.155,0.607477,11.5725,504.673,0.318438,0.001207,0,...,1,1,1,1,1,1,1,67.1948,1,0.68
4,5,9/25/2017,00:04.0,171.155,0.607477,11.5725,504.673,0.318438,0.001207,0,...,1,1,1,1,1,1,1,67.1948,1,0.68
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
784566,1048567,10/7/17,16:06.0,175.855,0.589478,11.8941,479.191,0.331571,0.001128,0,...,1,1,1,1,1,1,1,60.6305,1,0.25
784567,1048568,10/7/17,16:07.0,175.855,0.589478,11.8941,479.191,0.331571,0.001128,0,...,1,1,1,1,1,1,1,60.6305,1,0.25
784568,1048569,10/7/17,16:08.0,175.855,0.589478,11.8941,479.191,0.331571,0.001128,0,...,1,1,1,1,1,1,1,60.6305,1,0.25
784569,1048570,10/7/17,16:09.0,175.896,0.613476,11.8913,479.224,0.331622,0.001173,0,...,1,1,1,1,1,1,1,60.4477,1,0.25


In [4]:
# drop timestamps
training_data = training_data.set_index("Row").drop(columns=["Date","Time"])

# fill in missing values
training_data = training_data.fillna(training_data.mean()).fillna(0)

training_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_001_STATUS,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,171.155,0.619473,11.5759,504.645,0.318319,0.001157,0,0,47.8911,1,...,1,1,1,1,1,1,1,67.9651,1,0.68
2,171.155,0.619473,11.5759,504.645,0.318319,0.001157,0,0,47.8911,1,...,1,1,1,1,1,1,1,67.9651,1,0.68
3,171.155,0.619473,11.5759,504.645,0.318319,0.001157,0,0,47.8911,1,...,1,1,1,1,1,1,1,67.9651,1,0.68
4,171.155,0.607477,11.5725,504.673,0.318438,0.001207,0,0,47.7503,1,...,1,1,1,1,1,1,1,67.1948,1,0.68
5,171.155,0.607477,11.5725,504.673,0.318438,0.001207,0,0,47.7503,1,...,1,1,1,1,1,1,1,67.1948,1,0.68
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048567,175.855,0.589478,11.8941,479.191,0.331571,0.001128,0,0,48.1129,1,...,1,1,1,1,1,1,1,60.6305,1,0.25
1048568,175.855,0.589478,11.8941,479.191,0.331571,0.001128,0,0,48.1129,1,...,1,1,1,1,1,1,1,60.6305,1,0.25
1048569,175.855,0.589478,11.8941,479.191,0.331571,0.001128,0,0,48.1129,1,...,1,1,1,1,1,1,1,60.6305,1,0.25
1048570,175.896,0.613476,11.8913,479.224,0.331622,0.001173,0,0,48.0348,1,...,1,1,1,1,1,1,1,60.4477,1,0.25


In [5]:
# create normalizer
from sklearn.preprocessing import MinMaxScaler
normalizer = MinMaxScaler(feature_range=(0,1)).fit(training_data.values)

In [6]:
# normalize training data
train_normed = normalizer.transform(training_data.values)
training_data = pd.DataFrame(
    train_normed,
    columns=training_data.columns,
    index=training_data.index
)

training_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_001_STATUS,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.798629,0.300814,0.963590,0.958437,0.515464,0.000168,0.0,0.0,0.276747,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221481,0.0,0.300885
2,0.798629,0.300814,0.963590,0.958437,0.515464,0.000168,0.0,0.0,0.276747,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221481,0.0,0.300885
3,0.798629,0.300814,0.963590,0.958437,0.515464,0.000168,0.0,0.0,0.276747,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221481,0.0,0.300885
4,0.798629,0.294989,0.963307,0.958490,0.516021,0.000192,0.0,0.0,0.272455,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.213922,0.0,0.300885
5,0.798629,0.294989,0.963307,0.958490,0.516021,0.000192,0.0,0.0,0.272455,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.213922,0.0,0.300885
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048567,0.820560,0.286249,0.990078,0.910094,0.577554,0.000154,0.0,0.0,0.283508,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149502,0.0,0.110619
1048568,0.820560,0.286249,0.990078,0.910094,0.577554,0.000154,0.0,0.0,0.283508,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149502,0.0,0.110619
1048569,0.820560,0.286249,0.990078,0.910094,0.577554,0.000154,0.0,0.0,0.283508,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149502,0.0,0.110619
1048570,0.820751,0.297902,0.989845,0.910157,0.577793,0.000176,0.0,0.0,0.281127,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.147708,0.0,0.110619


In [7]:
# downsample by 10x using median
training_data = training_data.rolling(window=10, step=10).median()

training_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_001_STATUS,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
11,0.798629,0.294989,0.963307,0.958490,0.516021,0.000192,0.0,0.0,0.272455,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.213922,0.0,0.300885
21,0.798657,0.300815,0.963328,0.958585,0.516120,0.000190,0.0,0.0,0.331281,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.171732,0.0,0.300885
31,0.798634,0.294989,0.963278,0.958613,0.516099,0.000188,0.0,0.0,0.325523,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.147179,0.0,0.300885
41,0.798657,0.292077,0.962775,0.958639,0.515459,0.000221,0.0,0.0,0.263579,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149070,0.0,0.300885
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048531,0.820735,0.280422,0.989932,0.909935,0.576994,0.000137,0.0,0.0,0.282255,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.148616,0.0,0.110619
1048541,0.820639,0.318293,0.989845,0.909935,0.577441,0.000156,0.0,0.0,0.283929,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149549,0.0,0.110619
1048551,0.820690,0.296445,0.989424,0.910014,0.578053,0.000158,0.0,0.0,0.287056,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.148126,0.0,0.110619
1048561,0.820607,0.294989,0.989653,0.910018,0.576963,0.000134,0.0,0.0,0.288077,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149112,0.0,0.110619


In [8]:
# knock out first 2160 samples (warmup period)
training_data = training_data.iloc[2160:]
training_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_001_STATUS,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
21601,0.781150,0.312468,0.965130,0.963362,0.637184,0.000130,0.0,0.0,0.157283,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169876,0.0,0.247788
21611,0.781266,0.306641,0.965255,0.963383,0.637291,0.000145,0.0,0.0,0.117956,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.170539,0.0,0.247788
21621,0.781154,0.303728,0.965405,0.963394,0.637188,0.000175,0.0,0.0,0.114344,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.171938,0.0,0.247788
21631,0.781294,0.303728,0.965422,0.963346,0.637767,0.000142,0.0,0.0,0.117185,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169281,0.0,0.247788
21641,0.781336,0.305184,0.965363,0.963393,0.637502,0.000104,0.0,0.0,0.079742,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.168962,0.0,0.247788
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048531,0.820735,0.280422,0.989932,0.909935,0.576994,0.000137,0.0,0.0,0.282255,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.148616,0.0,0.110619
1048541,0.820639,0.318293,0.989845,0.909935,0.577441,0.000156,0.0,0.0,0.283929,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149549,0.0,0.110619
1048551,0.820690,0.296445,0.989424,0.910014,0.578053,0.000158,0.0,0.0,0.287056,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.148126,0.0,0.110619
1048561,0.820607,0.294989,0.989653,0.910018,0.576963,0.000134,0.0,0.0,0.288077,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149112,0.0,0.110619


In [9]:
# no attacks in training data
training_data = training_data.assign(attack=0)

training_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW,attack
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
21601,0.781150,0.312468,0.965130,0.963362,0.637184,0.000130,0.0,0.0,0.157283,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.169876,0.0,0.247788,0
21611,0.781266,0.306641,0.965255,0.963383,0.637291,0.000145,0.0,0.0,0.117956,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.170539,0.0,0.247788,0
21621,0.781154,0.303728,0.965405,0.963394,0.637188,0.000175,0.0,0.0,0.114344,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.171938,0.0,0.247788,0
21631,0.781294,0.303728,0.965422,0.963346,0.637767,0.000142,0.0,0.0,0.117185,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.169281,0.0,0.247788,0
21641,0.781336,0.305184,0.965363,0.963393,0.637502,0.000104,0.0,0.0,0.079742,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.168962,0.0,0.247788,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048531,0.820735,0.280422,0.989932,0.909935,0.576994,0.000137,0.0,0.0,0.282255,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.148616,0.0,0.110619,0
1048541,0.820639,0.318293,0.989845,0.909935,0.577441,0.000156,0.0,0.0,0.283929,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.149549,0.0,0.110619,0
1048551,0.820690,0.296445,0.989424,0.910014,0.578053,0.000158,0.0,0.0,0.287056,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.148126,0.0,0.110619,0
1048561,0.820607,0.294989,0.989653,0.910018,0.576963,0.000134,0.0,0.0,0.288077,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.149112,0.0,0.110619,0


In [10]:
# save to disk
training_data.to_csv(
    '/content/drive/MyDrive/WADI-processed/train.csv'
)

In [11]:
# sanity check
!head '/content/drive/MyDrive/WADI-processed/train.csv'

Row,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,1_MV_002_STATUS,1_MV_003_STATUS,1_MV_004_STATUS,1_P_001_STATUS,1_P_002_STATUS,1_P_003_STATUS,1_P_004_STATUS,1_P_005_STATUS,1_P_006_STATUS,2_DPIT_001_PV,2_FIC_101_CO,2_FIC_101_PV,2_FIC_101_SP,2_FIC_201_CO,2_FIC_201_PV,2_FIC_201_SP,2_FIC_301_CO,2_FIC_301_PV,2_FIC_301_SP,2_FIC_401_CO,2_FIC_401_PV,2_FIC_401_SP,2_FIC_501_CO,2_FIC_501_PV,2_FIC_501_SP,2_FIC_601_CO,2_FIC_601_PV,2_FIC_601_SP,2_FIT_001_PV,2_FIT_002_PV,2_FIT_003_PV,2_FQ_101_PV,2_FQ_201_PV,2_FQ_301_PV,2_FQ_401_PV,2_FQ_501_PV,2_FQ_601_PV,2_LS_001_AL,2_LS_002_AL,2_LS_101_AH,2_LS_101_AL,2_LS_201_AH,2_LS_201_AL,2_LS_301_AH,2_LS_301_AL,2_LS_401_AH,2_LS_401_AL,2_LS_501_AH,2_LS_501_AL,2_LS_601_AH,2_LS_601_AL,2_LT_001_PV,2_LT_002_PV,2_MCV_007_CO,2_MCV_101_CO,2_MCV_201_CO,2_MCV_301_CO,2_MCV_401_CO,2_MCV_501_CO,2_MCV_601_CO,2_MV_001_STATUS,2_MV_002_STATUS,2_MV_003_STATUS,2_MV_004_STATUS,2_MV_005_STATUS,2_MV_0

Looks good. Let's move on to the attack data.

In [27]:
# load from disk
attack_data = pd.read_csv(
    path_to_wadi + '/WADI.A2_19 Nov 2019/WADI_attackdataLABLE.csv',
    skiprows=1
)
attack_data

Unnamed: 0,Row,Date,Time,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,...,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW,"Attack LABLE (1:No Attack, -1:Attack)"
0,1.0,10/9/17,00:00.0,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39,1
1,2.0,10/9/17,00:01.0,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39,1
2,3.0,10/9/17,00:02.0,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39,1
3,4.0,10/9/17,00:03.0,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39,1
4,5.0,10/9/17,00:04.0,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
172798,172799.0,10/11/17,59:58.0,172.915,0.583479,11.9211,466.051,0.318317,0.001260,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,58.8102,1.0,0.00,1
172799,172800.0,10/11/17,59:59.0,172.915,0.583479,11.9211,466.051,0.318317,0.001260,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,58.8102,1.0,0.00,1
172800,172801.0,10/11/17,00:00.0,172.915,0.583479,11.9211,466.051,0.318317,0.001260,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,58.8102,1.0,0.00,1
172801,,,,,,,,,,,...,,,,,,,,,,1


In [28]:
# fix column labels, drop timestamps and last two weird rows
attack_data = (
    attack_data[:-2]
    .rename(
        columns={
            'Row ': 'Row', 
            'Date ': 'Date',
            'Attack LABLE (1:No Attack, -1:Attack)': 'attack'
        }
    )
    .set_index('Row')
    .drop(columns=['Date', 'Time'])
)

# fix index
attack_data.index = attack_data.index.astype('int')

# change attack labels from WADI format (1: no attack, -1:attack) 
# to GDN format (0:no attack: 1:attack)
attack_labels = (
    attack_data['attack'].map(lambda label: 1 if label==-1 else 0)
)

# split labels from data
attack_data = attack_data.drop(columns="attack")

# replace missing values
attack_data = attack_data.fillna(attack_data.mean()).fillna(0)

attack_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_001_STATUS,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,0.0,48.4820,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39
2,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,0.0,48.4820,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39
3,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,0.0,48.4820,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39
4,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,0.0,48.4820,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39
5,164.210,0.529486,11.9972,482.480,0.331167,0.001273,0.0,0.0,48.4820,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,62.6226,1.0,0.39
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
172797,172.959,0.547483,11.9184,466.034,0.318217,0.001222,0.0,0.0,55.5587,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,59.3546,1.0,0.00
172798,172.959,0.547483,11.9184,466.034,0.318217,0.001222,0.0,0.0,55.5587,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,59.3546,1.0,0.00
172799,172.915,0.583479,11.9211,466.051,0.318317,0.001260,0.0,0.0,55.7260,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,58.8102,1.0,0.00
172800,172.915,0.583479,11.9211,466.051,0.318317,0.001260,0.0,0.0,55.7260,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,58.8102,1.0,0.00


In [29]:
# normalize attack data using training normalizer
attack_normed = normalizer.transform(attack_data.values)
attack_data = pd.DataFrame(
    attack_normed,
    columns=attack_data.columns,
    index=attack_data.index
)

attack_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_001_STATUS,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.766223,0.257117,0.998660,0.916341,0.575661,0.000225,0.0,0.0,0.294758,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169052,0.0,0.172566
2,0.766223,0.257117,0.998660,0.916341,0.575661,0.000225,0.0,0.0,0.294758,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169052,0.0,0.172566
3,0.766223,0.257117,0.998660,0.916341,0.575661,0.000225,0.0,0.0,0.294758,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169052,0.0,0.172566
4,0.766223,0.257117,0.998660,0.916341,0.575661,0.000225,0.0,0.0,0.294758,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169052,0.0,0.172566
5,0.766223,0.257117,0.998660,0.916341,0.575661,0.000225,0.0,0.0,0.294758,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169052,0.0,0.172566
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
172797,0.807047,0.265856,0.992100,0.885106,0.514986,0.000200,0.0,0.0,0.510464,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136980,0.0,0.000000
172798,0.807047,0.265856,0.992100,0.885106,0.514986,0.000200,0.0,0.0,0.510464,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136980,0.0,0.000000
172799,0.806841,0.283336,0.992325,0.885138,0.515454,0.000218,0.0,0.0,0.515564,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.131638,0.0,0.000000
172800,0.806841,0.283336,0.992325,0.885138,0.515454,0.000218,0.0,0.0,0.515564,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.131638,0.0,0.000000


In [30]:
# downsample features by 10x using median
attack_data = attack_data.rolling(window=10, step=10).median()

attack_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_001_STATUS,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
11,0.766228,0.261487,0.998552,0.916335,0.575930,0.000175,0.0,0.0,0.293548,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169016,0.0,0.172566
21,0.766237,0.271683,0.998760,0.916341,0.575675,0.000142,0.0,0.0,0.294935,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169957,0.0,0.172566
31,0.766251,0.270226,0.998743,0.916288,0.576143,0.000197,0.0,0.0,0.294377,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169334,0.0,0.172566
41,0.766223,0.268770,0.998468,0.916362,0.575708,0.000191,0.0,0.0,0.291649,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.164901,0.0,0.141593
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
172761,0.807439,0.254204,0.992408,0.884553,0.515614,0.000282,0.0,0.0,0.512470,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.140142,0.0,0.000000
172771,0.807056,0.260030,0.992358,0.884660,0.516481,0.000137,0.0,0.0,0.511129,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.139279,0.0,0.000000
172781,0.807009,0.268770,0.992059,0.884819,0.515398,0.000163,0.0,0.0,0.512046,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.134754,0.0,0.000000
172791,0.806981,0.274597,0.992325,0.884914,0.515754,0.000169,0.0,0.0,0.512653,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.137133,0.0,0.000000


In [31]:
# downsample labels by 10x using max
attack_labels = attack_labels.rolling(window=10, step=10).max()
attack_labels

Row
1         NaN
11        0.0
21        0.0
31        0.0
41        0.0
         ... 
172761    0.0
172771    0.0
172781    0.0
172791    0.0
172801    0.0
Name: attack, Length: 17281, dtype: float64

In [32]:
# rejoin data and labels and drop first weird row
attack_data = attack_data[1:].assign(attack=attack_labels[1:].astype(int))
attack_data

Unnamed: 0_level_0,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,...,3_MV_002_STATUS,3_MV_003_STATUS,3_P_001_STATUS,3_P_002_STATUS,3_P_003_STATUS,3_P_004_STATUS,LEAK_DIFF_PRESSURE,PLANT_START_STOP_LOG,TOTAL_CONS_REQUIRED_FLOW,attack
Row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11,0.766228,0.261487,0.998552,0.916335,0.575930,0.000175,0.0,0.0,0.293548,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.169016,0.0,0.172566,0
21,0.766237,0.271683,0.998760,0.916341,0.575675,0.000142,0.0,0.0,0.294935,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.169957,0.0,0.172566,0
31,0.766251,0.270226,0.998743,0.916288,0.576143,0.000197,0.0,0.0,0.294377,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.169334,0.0,0.172566,0
41,0.766223,0.268770,0.998468,0.916362,0.575708,0.000191,0.0,0.0,0.291649,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.164901,0.0,0.141593,0
51,0.766186,0.254203,0.998851,0.916350,0.575501,0.000201,0.0,0.0,0.289220,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.167520,0.0,0.172566,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
172761,0.807439,0.254204,0.992408,0.884553,0.515614,0.000282,0.0,0.0,0.512470,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.140142,0.0,0.000000,0
172771,0.807056,0.260030,0.992358,0.884660,0.516481,0.000137,0.0,0.0,0.511129,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.139279,0.0,0.000000,0
172781,0.807009,0.268770,0.992059,0.884819,0.515398,0.000163,0.0,0.0,0.512046,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.134754,0.0,0.000000,0
172791,0.806981,0.274597,0.992325,0.884914,0.515754,0.000169,0.0,0.0,0.512653,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.137133,0.0,0.000000,0


In [33]:
# save to disk
attack_data.to_csv(
    '/content/drive/MyDrive/WADI-processed/test.csv'
)

In [34]:
# sanity check
!head /content/drive/MyDrive/WADI-processed/test.csv

Row,1_AIT_001_PV,1_AIT_002_PV,1_AIT_003_PV,1_AIT_004_PV,1_AIT_005_PV,1_FIT_001_PV,1_LS_001_AL,1_LS_002_AL,1_LT_001_PV,1_MV_001_STATUS,1_MV_002_STATUS,1_MV_003_STATUS,1_MV_004_STATUS,1_P_001_STATUS,1_P_002_STATUS,1_P_003_STATUS,1_P_004_STATUS,1_P_005_STATUS,1_P_006_STATUS,2_DPIT_001_PV,2_FIC_101_CO,2_FIC_101_PV,2_FIC_101_SP,2_FIC_201_CO,2_FIC_201_PV,2_FIC_201_SP,2_FIC_301_CO,2_FIC_301_PV,2_FIC_301_SP,2_FIC_401_CO,2_FIC_401_PV,2_FIC_401_SP,2_FIC_501_CO,2_FIC_501_PV,2_FIC_501_SP,2_FIC_601_CO,2_FIC_601_PV,2_FIC_601_SP,2_FIT_001_PV,2_FIT_002_PV,2_FIT_003_PV,2_FQ_101_PV,2_FQ_201_PV,2_FQ_301_PV,2_FQ_401_PV,2_FQ_501_PV,2_FQ_601_PV,2_LS_001_AL,2_LS_002_AL,2_LS_101_AH,2_LS_101_AL,2_LS_201_AH,2_LS_201_AL,2_LS_301_AH,2_LS_301_AL,2_LS_401_AH,2_LS_401_AL,2_LS_501_AH,2_LS_501_AL,2_LS_601_AH,2_LS_601_AL,2_LT_001_PV,2_LT_002_PV,2_MCV_007_CO,2_MCV_101_CO,2_MCV_201_CO,2_MCV_301_CO,2_MCV_401_CO,2_MCV_501_CO,2_MCV_601_CO,2_MV_001_STATUS,2_MV_002_STATUS,2_MV_003_STATUS,2_MV_004_STATUS,2_MV_005_STATUS,2_MV_0

Looks good. Last file GDN needs is `list.txt` which just lists column names.

In [43]:
# output column names (sans attack)
with open('/content/drive/MyDrive/WADI-processed/list.txt', 'w') as f:
  for col in training_data.columns[:-1]:
    f.write(col+'\n')

In [44]:
!head '/content/drive/MyDrive/WADI-processed/list.txt'

1_AIT_001_PV
1_AIT_002_PV
1_AIT_003_PV
1_AIT_004_PV
1_AIT_005_PV
1_FIT_001_PV
1_LS_001_AL
1_LS_002_AL
1_LT_001_PV
1_MV_001_STATUS


All looks good so far. I'll try running GDN with these files when the cluster comes back online.