# Experiment - Train and test classification models
```
Input files: src/data_split/train_64_RIPE_n.csv, test_64_RIPE_n.csv  
```
Choose one of the following sub-sections(algorithm) to run at one time.


### 1. Set variables
Set cut_pct, site to be consistent with those in xpr_BPG_offlinc_data.ipynb.


In [None]:
site = 'RIPE'
cut_pct = '64'

### 2. Integrated run 
Run some RNN algorithms as a batch. The results stored in src/STAT/results_64_RIPE.csv can be used as baseline for benchmarking new algorithms. Choose one to run from the following sub section.
```
Output files: src/STAT/results_64_RIPE.csv
```

#### 2.1 LSTM and GRU
Approximately 4 minutes to run on 2022 Macbook Air M2.

In [None]:
from src.subprocess_cmd import subprocess_cmd

print("--------------------RNNs Experiment-Begin--------------------------")
subprocess_cmd("cd src/; \
                cp ./data_split/train_%s_%s_n.csv ./data_split/test_%s_%s_n.csv ./RNN_Running_Code/RNN_Run/dataset/ ; \
                cd RNN_Running_Code/RNN_Run/dataset/; \
                mv train_%s_%s_n.csv train.csv; mv test_%s_%s_n.csv test.csv; \
                cd ..; cd ..; \
                chmod +x integrate_run.sh; sh ./integrate_run.sh ; \
                cd RNN_Run/; sh ./collect.sh; \
                cp -r res_acc res_run ../data_representation/ ; \
                cd .. ; cd data_representation/ ; \
                python TableGenerator.py; " \
                % (cut_pct, site, cut_pct, site, cut_pct, site, cut_pct, site))

print("--------------------RNNs Experiment-end----------------------------")
subprocess_cmd("cd src/; \
                mv ./RNN_Running_Code/data_representation/data_representation_table.csv ./STAT/ ; \
                mv ./STAT/data_representation_table.csv ./STAT/results_%s_%s.csv" \
                % (cut_pct, site))

# Remove generated folders
# subprocess_cmd("cd src/; \
#                 cd RNN_Running_Code/RNN_Run/; \
#                 rm -rf ./experiment/ ./res_acc/ ./res_run/ ./tmp/")


#### 2.2 Bi-LSTM and Bi-GRU
Approximately 7 minutes to run on 2022 Macbook Air M2.  
Known issue: There is no results in the output file caused by unknown problems.

In [None]:
from src.subprocess_cmd import subprocess_cmd

print("--------------------RNNs Experiment-Begin--------------------------")
subprocess_cmd("cd src/; \
                cp ./data_split/train_%s_%s_n.csv ./data_split/test_%s_%s_n.csv ./BiRNN_Running_Code/BiRNN_Run/dataset/ ; \
                cd BiRNN_Running_Code/BiRNN_Run/dataset/; \
                mv train_%s_%s_n.csv train.csv; mv test_%s_%s_n.csv test.csv; \
                cd ..; cd ..; \
                chmod +x integrate_run.sh; sh ./integrate_run.sh ; \
                cd BiRNN_Run/; sh ./collect.sh; \
                cp -r res_acc res_run ../data_representation/ ; \
                cd .. ; cd data_representation/ ; \
                python TableGenerator.py; " \
                % (cut_pct, site, cut_pct, site, cut_pct, site, cut_pct, site))

print("--------------------RNNs Experiment-end----------------------------")
subprocess_cmd("cd src/; \
                mv ./BiRNN_Running_Code/data_representation/data_representation_table.csv ./STAT/ ; \
                mv ./STAT/data_representation_table.csv ./STAT/results_%s_%s.csv" \
                % (cut_pct, site))

# Remove generated folders
subprocess_cmd("cd src/; \
                cd BiRNN_Running_Code/BiRNN_Run/; \
                rm -rf ./experiment/ ./res_acc/ ./res_run/ ./tmp/")



### 3. Run single algorithm
Choose one existing algorithm from project Cyberdenfense to run.  


#### 3.1. BLS
Run code blocks in this section one by one.  

##### 3.1.1. Data normalization

In [3]:
import numpy as np

train_dataset = np.loadtxt('src/data_split/train_64_RIPE_n.csv', delimiter=',')
test_dataset = np.loadtxt('src/data_split/test_64_RIPE_n.csv', delimiter=',')

row_index_end = train_dataset.shape[0] - train_dataset.shape[0] % 100  # divisible by 100
train_x = train_dataset[:row_index_end, :-1]
train_y = train_dataset[:row_index_end, -1]
# Change training labels
inds1 = np.where(train_y == 0)
train_y[inds1] = 2

row_index_end = test_dataset.shape[0] - test_dataset.shape[0] % 100  # divisible by 100
test_x = test_dataset[:row_index_end, :-1]
test_y = test_dataset[:row_index_end, -1]
# Change testing labels
inds1 = np.where(test_y == 0)
test_y[inds1] = 2

print(train_y, test_y)


[2. 2. 2. ... 1. 1. 1.] [1. 1. 1. ... 2. 2. 2.]


##### 3.1.2 Set BLS parameters

In [4]:
from src.VFBLS_v110.bls.processing.one_hot_m import one_hot_m

mem = 'low'
# BLS parameters
seed = 1  # set the seed for generating random numbers
num_class = 2  # number of the classes
epochs = 1  # number of epochs

C = 2 ** -15  # parameter for sparse regularization
s = 0.6  # the shrinkage parameter for enhancement nodes

train_y = one_hot_m(train_y, num_class)
# test_y = one_hot_m(test_y, num_class);
#######################
# N1* - the number of mapped feature nodes
# N2* - the groups of mapped features
# N3* - the number of enhancement nodes
if mem == 'low':
    N1_bls = 20
    N2_bls = 5
    N3_bls = 100
else:
    N1_bls = 200
    N2_bls = 10
    N3_bls = 100
#######################

train_err = np.zeros((1, epochs))
train_time = np.zeros((1, epochs))
test_time = np.zeros((1, epochs))


##### 3.1.3 Training and testing

In [11]:
from src.VFBLS_v110.bls.model.bls_train import bls_train_realtime

print("======================= BLS =======================\n")
np.random.seed(seed)  # set the seed for generating random numbers
for j in range(0, epochs):
    trainingAccuracy, trainingTime, testingTime, predicted = \
        bls_train_realtime(train_x, train_y, test_x,
                            s, C,
                            N1_bls, N2_bls, N3_bls)

    train_err[0, j] = trainingAccuracy * 100
    train_time[0, j] = trainingTime
    test_time[0, j] = testingTime

# predicted = [[1.], [2.], [2.], [2.], [2.]]
predicted_list = []
for label in predicted:
    predicted_list.append(label[0])

print("predicted_list:", predicted_list)



Feature nodes in window  0 : Max Val of Output  22.68435949482985  Min Val  -25.685774496491742
Feature nodes in window  1 : Max Val of Output  24.090482377329433  Min Val  -22.082808111693122
Feature nodes in window  2 : Max Val of Output  24.817905968414177  Min Val  -21.740651504116364
Feature nodes in window  3 : Max Val of Output  25.616916166613823  Min Val  -21.41504551936608
Feature nodes in window  4 : Max Val of Output  19.54398001273009  Min Val  -19.61050358278534
Enhancement nodes: Max Val of Output  0.3016402107355213  Min Val  -1.3438089604206882
Training has been finished!
The Total Training Time is :  0.17660188674926758  seconds
xx.shape (3700, 2)
xx [[ 0.01374273  0.98625754]
 [-0.03379464  1.03379376]
 [ 0.10267602  0.89732422]
 ...
 [ 0.63936616  0.36063246]
 [ 0.44298338  0.55701714]
 [ 0.2924114   0.70758789]]
Training Accuracy is :  98.89189189189189  %
Testing has been finished!
The Total Testing Time is :  0.032888174057006836  seconds
predicted_list: [1.0, 1

##### 3.1.4 Accuracy and Fscore

In [15]:
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(test_y, predicted_list)
fscore = f1_score(test_y, predicted_list)

print(f"Accuracy: {accuracy:.2%}, Fscore: {fscore:.2%}")

Accuracy: 91.29%, Fscore: 58.26%
