# RAMP on Detecting Attacks on an Industrial Water System from Its Operational Status

Laurent Pipitone (Student CentraleSupélec / Master IAC)

## Introduction

This notebook focuses on the detection of cyber attacks within critical industrial systems using the BATtle of Attack Detection Algorithms (BaTaDal) dataset. The BaTaDal dataset simulates various scenarios of cyber incidents in a water distribution network named C-Town. The aim of this project is to build and evaluate machine learning models capable of distinguishing between normal operation periods and those affected by cyber attacks, using only sensor data from the industrial environment.

### Dataset Description

The BaTaDal dataset comprises data collected over multiple time periods, including both normal operations and periods where cyber attacks were simulated. The data captures measurements from various sensors and actuators, such as water levels, pump statuses, and flow rates, which are critical for the operation of the water distribution system. The dataset contains two main parts:

- **Training Dataset 1**: A one-year long simulation of normal operations without any cyber attacks.
- **Training Dataset 2**: A six-month long simulation containing various cyber attacks, including replay attacks, data injections, and modifications of control signals. The attacks are labeled, making it suitable for supervised learning tasks.

### Objective

The objective of this analysis is to explore the potential of using sensor data to detect cyber attacks in industrial control systems (ICS). Specifically, we aim to address the following questions:
- Can machine learning models accurately distinguish between normal and attack scenarios using only sensor data?
- Which features (e.g., water levels, pump flows) are most indicative of a cyber attack?
- How do different models perform in terms of detecting attack instances, and what are their strengths and limitations?

### Structure of the Notebook

1. **Data Loading and Preprocessing**: Import the necessary libraries, load the dataset, and preprocess the data for analysis.
2. **Exploratory Data Analysis (EDA)**: Understand the distributions, correlations, and potential patterns in the data, both in normal and attack scenarios.
3. **Feature Engineering**: Create and transform features to enhance model performance.
4. **Model Training**: Train various classification models to detect cyber attacks.
5. **Model Evaluation**: Assess the performance of the models using appropriate metrics and visualize the results.
6. **Conclusion**: Summarize findings and suggest potential improvements for future research.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score, roc_curve, auc

## Loading the data with pandas

In [None]:
file_path = './BATADAL_dataset04.csv'  
data = pd.read_csv(file_path)

print(data.head())

print(data.info())

print(data.describe())

# Configuration de base pour les graphiques avec le nouveau style
sns.set_theme(style="whitegrid")
plt.rcParams["figure.figsize"] = (10, 6)