# DETECTING RECON ATTACKS ON IOT DEVICES

### THE PROBLEM

As our infrastructures, services and products become ever more reliant on IoT networks, securing the networks through dection is becoming increasingly important.


### THE PROJECT OBJECTIVE

Our aim is to develop and test different models to analyze IoT network traffic in order to detect and classify Recon Attacks, a specific kind of cyberattack.


### THE DATASET

The CICIOT2023 dataset from West Brunwick University consists in a collection of csv files containing data generated by the traffic monitoring (perfromed through wireshark) of an IoT smart home network.

The network is reproduced in a lab and consists of 105 IoT devices. In order to gather network traffic data from actual different kinds of cyberattacks, 33 different cyberattack has been performed on the network and its devices.

Such devices are divided into three categories:

- Victims (67 devices from all over the nework)
- Attackers (10 devices, mainly raspberry Pi 4s)
- Passives (28 devices not involved in the attacks)

Thanks to the way the network is arranged the researchers were able to label the data with the kind of cyberattack being performed.
Subsequently the data was aggregated from .pcap files into a set of .csv files.
The end result is a large dataset with 47 features and more than 40 million rows.


### CURRENT NOTEBOOK SCOPE


This work will focus on developing deep learning models in order to identify 5 kinds of Recon attacks, attacks meant to identify the network vulnerability.
We are interested in this kind as they are the typical first step of the most sophisticated and dangerous kinds of attack.

This notebook aims at pre-processing and cleaning the data in order to conduct EDA.


## Feature Descriptions

| Feature           | Data Type | Description                                                  |
|-------------------|-----------|--------------------------------------------------------------|
| `flow_duration`   | float64   | Total duration of the network flow in seconds.               |
| `Header_Length`   | float64   | The length of the packet header in bytes.                    |
| `Protocol Type`   | float64   | Numerical representation of the network protocol used.       |
| `Duration`        | float64   | Time duration of the network connection (similar to flow duration but could be a subset or different measurement). |
| `Rate`            | float64   | The rate of packet transmission over the network in packets per second. |
| `Srate`           | float64   | The rate of outbound packets in the flow, indicating data sent from the source. |
| `Drate`           | float64   | The rate of inbound packets in the flow, indicating data received by the destination. |
| `fin_flag_number` | float64   | Number of packets with the FIN flag set, indicating the end of data communication. |
| `syn_flag_number` | float64   | Number of packets with the SYN flag set, used to initiate a TCP connection. |
| `rst_flag_number` | float64   | Number of packets with the RST flag set, used to reset the connection. |
| `psh_flag_number` | float64   | Number of packets with the PSH flag set, indicating the push function. |
| `ack_flag_number` | float64   | Number of packets with the ACK flag set, used to acknowledge the receipt of packets. |
| `ece_flag_number` | float64   | Number of packets with the ECE flag set, indicating Explicit Congestion Notification Echo. |
| `cwr_flag_number` | float64   | Number of packets with the CWR flag set, used to signal congestion window reduced. |
| `ack_count`       | float64   | The total number of acknowledgment packets within the flow.  |
| `syn_count`       | float64   | The total number of synchronization packets within the flow. |
| `fin_count`       | float64   | The total number of finish packets within the flow.          |
| `urg_count`       | float64   | The total number of urgent packets within the flow.          |
| `rst_count`       | float64   | The total number of reset packets within the flow.           |
| `HTTP`            | float64   | Indicator of HTTP traffic (1 for HTTP traffic, 0 otherwise). |
| `HTTPS`           | float64   | Indicator of HTTPS traffic (1 for HTTPS traffic, 0 otherwise).|
| `DNS`             | float64   | Indicator of DNS traffic (1 for DNS traffic, 0 otherwise).   |
| `Telnet`          | float64   | Indicator of Telnet traffic (1 for Telnet traffic, 0 otherwise). |
| `SMTP`            | float64   | Indicator of SMTP traffic (1 for SMTP traffic, 0 otherwise). |
| `SSH`             | float64   | Indicator of SSH traffic (1 for SSH traffic, 0 otherwise).   |
| `IRC`             | float64   | Indicator of IRC traffic (1 for IRC traffic, 0 otherwise).   |
| `TCP`             | float64   | Indicator of TCP protocol usage (1 for TCP, 0 otherwise).    |
| `UDP`             | float64   | Indicator of UDP protocol usage (1 for UDP, 0 otherwise).    |
| `DHCP`            | float64   | Indicator of DHCP traffic (1 for DHCP traffic, 0 otherwise). |
| `ARP`             | float64   | Indicator of ARP traffic (1 for ARP traffic, 0 otherwise).   |
| `ICMP`            | float64   | Indicator of ICMP traffic (1 for ICMP traffic, 0 otherwise). |
| `IPv`             | float64   | Indicator of IPv4 or IPv6 traffic (1 for IP traffic, 0 otherwise). |
| `LLC`             | float64   | Indicator of LLC traffic (1 for LLC traffic, 0 otherwise).   |
| `Tot sum`         | float64   | The total size of the packets transferred in the flow.       |
| `Min`             | float64   | The minimum size of packets in the flow.                     |
| `Max`             | float64   | The maximum size of packets in the flow.                     |
| `AVG`             | float64   | The average size of packets in the flow.                     |
| `Std`             | float64   | The standard deviation of packet sizes in the flow.          |
| `Tot size`        | float64   | Total size of the flow in bytes.                             |
| `IAT`             | float64   | Inter-Arrival Time of the packets in the flow.               |
| `Number`          | float64   | Total number of packets in the flow.                         |
| `Magnitue`        | float64   | A derived metric indicating the magnitude of the flow.       |
| `Radius`          | float64   | A derived metric indicating the radius of the flow.          |
| `Covariance`      | float64   | The covariance of packet sizes in the flow.                  |
| `Variance`        | float64   | The variance of packet sizes in the flow.                    |
| `Weight`          | float64   | A weight metric related to the flow.                         |
| `label`           | object    | Categorical label of the traffic type.                       |


# Simplified Data Features Overview

## Communication Patterns
- **Duration Measures**: These features tell us how long a communication takes place. It's like timing a phone call to see if it's a quick hello or a long conversation.
  - `flow_duration`, `Duration`
- **Transmission Rates**: These are like the speed of conversation, telling us how fast data is being sent and received.
  - `Rate`, `Srate`, `Drate`

## Traffic Signs
- **Flag Counts**: Just like flags used in sports to indicate different events, these counts tell us how many times certain types of network "signals" occur.
  - `fin_flag_number`, `syn_flag_number`, `rst_flag_number`, etc.
- **Packet Counts**: These give us the total number of "letters" sent and received during the communication.
  - `ack_count`, `syn_count`, `fin_count`, etc.

## Communication Types
- **Protocol Indicators**: These indicate the method of communication being used, similar to different social media platforms like email or instant messaging.
  - `HTTP`, `HTTPS`, `DNS`, `Telnet`, etc.

## Conversation Statistics
- **Size Measures**: These features measure the "volume" of the conversation, or how much information is being exchanged.
  - `Tot sum`, `Min`, `Max`, `AVG`, `Std`, `Tot size`
- **Timing Measures**: These tell us about the timing between exchanges, like the pauses between sentences in a conversation.
  - `IAT` (Inter-Arrival Time)

## Interaction Complexity
- **Complexity Metrics**: These are advanced measures that summarize the overall complexity and pattern of the communication.
  - `Magnitue`, `Radius`, `Covariance`, `Variance`, `Weight`

## Nature of Traffic
- **Traffic Category**: This is the label that tells us whether the communication is regular and expected (`BenignTraffic`) or potentially suspicious (`ReconAttack`).
  - `label`
