# <center> <b><span style="color:blue">Machine Learning Strategy For Detecting Cyber attacks on Industrial Control Systems  (ICS)     </b> </span></center>

## <b> <span><center>Tamir Suliman</b> </span></center>



---
## Problem Description 
In this notebook we looking in to detecting or predicting attacks on industrial control systems ICS using Machine Learning techniques.

## Project Outline: 
- Apply machine learning techniques to predict attacks on industrial systems using HAI-ICS time series dataset.
- Evaluate and analyze HAI dataset time series aspects ( stationary , seasonality ,etc ) and attributes to detect anomalies and thresholds.
- Visualize the process and the results using  different machine learning and statistical visualization techniques. 
- Apply different machine learning models such as  (VAR , ISOLATION FOREST, Logisitic Regression.. etc) and dimensionality reduction technique such as prinicpal component analysis (PCA) to make our prediction.

## Motivation

Industrial Control Systems (ICS) can be found in a variety of places, from automated manufacturing machinery to the cooling system of an office building.
ICS were formerly expected to be based on certain operating systems and communication protocols. However, in recent years, network connections based ICS is deplooyed on general-purpose operating systems and common communication protocols have decreased system development costs and increased productivity.
These threat actors and rogue states are frequently motivated by financial gain, political purpose, or even a military goal while carrying out assaults. State-sponsored attacks are possible, but so are attacks by competitors, insiders with a hostile intent, and even hacktivists.
As a result, the ability to detect abnormalities ahead of time and mitigate risks is a very useful capacity that allows for the prevention of not just cyber assaults, but also downtime, maintenance, and unscheduled downtime.

This problem presents another example of anomaly detection and binary classification predictive modeling.

## Overview of HAI Dataset: 
The HAI dataset was collected from a realistic industiral control system (ICS) testbed augmented with a Hardware-In-the-Loop (HIL) simulator that emulates steam-turbine power generation and pumped-storage hydropower generation.

The dataset contains several measurement channels (e.g., sensors, actuators, and control devices) that depict the current state of the systems.It's also available in two variants. 

We will be using the last version (21.03) which has a mixture of 20 attack scenarios and 33 interval data.HAI 21.03 was released in 2021, and is based on a more tightly coupled HIL simulator to produce clearer attack effects with additional attacks. 

The HAI dataset is currently available on the project github page which can be found at <b> (https://github.com/icsdataset/hai)</b>. The data is collected as part of a reseaarch to advance understanding of attack on ICS systems in South Korea. It's worth mentioning there is a comptetion 

The dataset attributes comprised from the followings:

1. <b>time</b> : This is the first column and it represents the observed local time as “yyyy-MM-dd hh:mm:ss,” while the rest columns provide the recorded SCADA data points - column 01 
2. <b>P1_B2004...P4_HT_LD </b>:  contains data collected from different setpoints and sensors in the process. These values are neumeric values.
3. <b>attack</b>: provides an info for whether an attack occured or not{1,0}. Where this is applicable to any attack happening on all the processes.last column.
    

    
### Identifying Important Attributes (Manual Anlysis of Technical Specs)

So analysis of the dataset attributes reveal the fact that there is 80 features. We know nomral scenarios involved operator using 4 controllers and 2 simulation models points (set points) .
We also know abnormal scenario occured when some of the parameters were not within the limits of the normal range or in unexpetded states due to attacks, malfunctions, and failures [1].

before we work on the dataset attributes and apply our models we need to prepare the dataset. There are  three attack types targeting those systems:

  - Setpoints attacks: Forcing setpoints value to change the controlle variables
  - Target points attacks: Forcing the control operator value directly.
  - Response Prevention attacks: Involves hiding abnormal response on Human Machine Interface HMI (dashboards)
    
Since our attacks scenarios involved attacking specific controllers that influence the system when different set points and target points values are changed according to the technical details document provided by HAI researchers. 

We could compile those data and consider the following attributes as good predictors

##### Set points & target points:
<table>
<tr><th>Table 1-Set points </th><th>Table 2-Target points (Attack)</th><th>Table 3-Description</th></tr>
<tr><td>
    
|No|Name|Description|
|---|---|---|
|1|P1_B2004| Set point (Controller P1-PC|
|2|P1_B3004| Set Point (Controller P1-LC|
|3|P1_B3005| Set Point (Controller P1-FC|
|4|P1_B4002| Set Point (Controller P1-TC|
|5|P4_ST_PS| Set Point (Controller P4-ST|
|6|P4_HT_PS| Set Point (Controller P4-HT|

</td><td>
    
|No|Name|Description|
|---|---|---|
|1|P2_VTR02| Target point |
|2|P1_B2016| Target Point |
|3|P1_LCV01D| Target Point|
|4|P1_FT03| Target Point |
|5|P1_FCV03D| Target Point |
|6|P2_VTR01| Target Point |
|7|P1_LIT01| Target Point |
|8|P1_B3004| Target Point |
|9|P1_PCV01D| Target Point |
|10|P1_PIT01| Target Point |
|11|P2_ASD| Target Point |
|12|P2_SIT01| Target Point |
|13|P2_LCV01D| Target Point |
|14|P3_LCP01D| Target Point |
|15|P2_CO_RPM| Target Point |
|16|P2_RTR| Target Point |
    
</td><td>

|No|Name|Unit|Description|
|---|---|---|---|
|1|P1_B2004|bar|Heat Exchanger Outlet Pressure set point|
|2|P1_B3004|mm|Water Level setpoint|
|3|P1_B3005|I/h|Discharge flowrate setpoint (return to water tank)|
|4|P1_B4002|C|Heat exchanger outlet temperature setpoint |
|5|P4_ST_PS|MW|Scheduled power demand of STM|
|6|P4_HT_PS|MW|Scheduled power demand of HTM|
|7|P2_VTR02|Um |Preset vigration limit for sensor |
|8|P1_B2016|bar|Pressure demand for thermal power output control|
|9|P1_PIT01|bar|Heat exchanger outlet pressure |
|10|P1_LCV01D|% |Position Command for LCV01 |
|11|P1_FT03| mmH2O|Measured flowrate of returnwater tank |
|12|P1_FCV01D|%|Position command for FC01 valve|
|13|P1_FCV03D|% |Position command for FCV03 valve|
|14|P2_VTR01|um |Preset vibration limit for sensor P2_VIBTR01 |
|15|P1_LIT01|%|Water level of the upper water tank|
|16|P3_LCP01D|-|Speed command for pump LCP01 |
|17|P2_RTR|RPM|RPM trip rate|
|18|P2_SIT01|RPM|Current turbine RPM measured by speed probe|
|19|P1_PCV01D|%|Position Command for valve PCV01 |
|20|P2_CO_rpm |-|Control output value of speed controller |
|20|P2_ASD|RPM| Auto Speed Demand|
|class|attack |-|Normal Condition = 0 / Abnomral Condition =1|
|index|time |YYYY-MM-DD HH:mm:ss|Date and time stamp|  
</td></tr> </table>

    
                        
                         
Using those attribute and setpoints will be the focus of our dataset.

Since the attack scenarios are configured based on the four variables of the feedback control loop , namely :
1. the set points (SP) , process variables (PV) , control variables CV and control parameters CP according to the technical details.

                        
                        

## Model Structure :

The diagram below illustrates our model pipeline :
<div>
<img src="images/ml-model1.png" align="center" alt="Model Arch. Diagram"  width="1400"/>
</div>

1. Loading the dataset and work on analyzing the data and the features.
2. Analayze the attributes and understand the process and the variables impact.
3. Time series data test - stationary, trends etc 
4. Evaluate Machine Learning Models.


## <b><span style="color:blue">Installing Packages</b></span>
Uncomment the following line and paste it in to a codebox to install the packages:
```
* !pip install yellowbrick
* !pip install seaborn
* !pip install scipy
* !pip install statsmodels
* !pip install imbalanced-learn
* !pip install lightgbm
* !pip install xgboost

```





