## Problem Statement
 In the realm of cybersecurity, **network anomaly detection** is a critical task that involves identifying unusual patterns or behaviors that deviate from the norm within network traffic. These anomalies could signify a range of security threats, from compromised devices and malware infections to large-scale cyber-attacks like **DDoS (Distributed Denial of Service)**.

The challenge lies in accurately detecting these anomalies in real-time, amidst the vast and continuous streams of network data, which are often **noisy and heterogeneous**.

Traditional methods of network anomaly detection often rely on predefined rules or signatures based on known attack patterns. However, these methods fall short in detecting **new or evolving threats** that do not match the existing signatures. Furthermore, as network environments grow in complexity, maintaining and updating these rules becomes increasingly **cumbersome and less effective**.


## Dataset Location/Link

https://drive.google.com/file/d/1AlZak8gC27ntWFR0-ZJ0tMxVWFac-XPf/view?usp=drive_link 

# Network Anomaly Detection Dataset Features

This document outlines and explains the various features commonly used in network anomaly detection datasets such as KDD Cup 1999 or NSL-KDD. These features help in analyzing traffic behavior and detecting anomalies or potential attacks.

---

## 1. Basic Connection Features

- **Duration**:  
  Length of time (in seconds) that the connection lasted.

- **Protocol_type**:  
  The protocol used in the connection (e.g., TCP, UDP, ICMP).

- **Service**:  
  The destination network service accessed during the connection (e.g., HTTP, Telnet, FTP).

- **Flag**:  
  Status of the connection (indicates normal or error state). It shows the result of the connection attempt (e.g., SF, S0, REJ).

- **Src_bytes**:  
  Number of data bytes sent from the source to the destination during the connection.

- **Dst_bytes**:  
  Number of data bytes sent from the destination back to the source.

- **Land**:  
  A binary flag indicating if the connection is to/from the same IP address and port (1 if same, 0 otherwise).

- **Wrong_fragment**:  
  Number of incorrect (incomplete or overlapping) IP packet fragments.

- **Urgent**:  
  Number of packets with the URG (urgent) flag set in the TCP header.

---

## 2. Content-Related Features

These features analyze the actual data within the connection for suspicious activity.

- **Hot**:  
  Number of 'hot' indicators in the content (e.g., system directory access, file creations, program executions).

- **Num_failed_logins**:  
  Number of failed login attempts before a successful login.

- **Logged_in**:  
  Binary flag indicating if the user is successfully logged in (1) or not (0).

- **Num_compromised**:  
  Number of compromised conditions (such as a system call to gain unauthorized privileges).

- **Root_shell**:  
  Binary flag indicating if a root shell was obtained during the session (1 if yes, 0 otherwise).

- **Su_attempted**:  
  Binary flag indicating if the "su root" command was attempted (1 if yes, 0 otherwise).

- **Num_root**:  
  Number of root-level operations performed during the connection.

- **Num_file_creations**:  
  Number of file creation operations performed.

- **Num_shells**:  
  Number of shell prompts invoked.

- **Num_access_files**:  
  Number of attempts to access control files (e.g., `/etc/passwd`).

- **Num_outbound_cmds**:  
  Number of outbound commands issued in an FTP session (usually 0 in most datasets).

- **Is_hot_login**:  
  Indicates whether the login is to a "hot" account (root or admin). (1 if yes, 0 otherwise).

- **Is_guest_login**:  
  Indicates whether the login is a guest login (1 if yes, 0 otherwise).

---

## 3. Time-Related Traffic Features

These features describe the temporal behavior of connections in a short time window (usually 2 seconds).

- **Count**:  
  Number of connections to the same destination host as the current connection in the past two seconds.

- **Srv_count**:  
  Number of connections to the same service as the current connection in the past two seconds.

- **Serror_rate**:  
  Percentage of connections that had SYN errors (flags: S0, S1, S2, S3) among the `count` connections.

- **Srv_serror_rate**:  
  Percentage of connections with SYN errors among the `srv_count` connections.

- **Rerror_rate**:  
  Percentage of connections that were rejected (flag REJ) among the `count` connections.

- **Srv_rerror_rate**:  
  Percentage of connections with REJ flags among the `srv_count` connections.

- **Same_srv_rate**:  
  Percentage of connections to the same service among the `count` connections.

- **Diff_srv_rate**:  
  Percentage of connections to different services among the `count` connections.

- **Srv_diff_host_rate**:  
  Percentage of connections to different hosts (IP addresses) among the `srv_count` connections.

---

## 4. Host-Based Traffic Features

These are features computed over a larger time window (typically 100 connections) to detect long-term or stealthy attacks.

- **Dst_host_count**:  
  Number of connections having the same destination host.

- **Dst_host_srv_count**:  
  Number of connections having the same service (port) to the destination host.

- **Dst_host_same_srv_rate**:  
  Percentage of connections to the same service among the `dst_host_count` connections.

- **Dst_host_diff_srv_rate**:  
  Percentage of connections to different services among the `dst_host_count` connections.

- **Dst_host_same_src_port_rate**:  
  Percentage of connections to the same source port among the `dst_host_srv_count` connections.

- **Dst_host_srv_diff_host_rate**:  
  Percentage of connections to different destination hosts using the same service.

- **Dst_host_serror_rate**:  
  Percentage of SYN error connections among the `dst_host_count` connections.

- **Dst_host_srv_serror_rate**:  
  Percentage of SYN error connections among the `dst_host_srv_count` connections.

- **Dst_host_rerror_rate**:  
  Percentage of REJ error connections among the `dst_host_count` connections.

- **Dst_host_srv_rerror_rate**:  
  Percentage of REJ error connections among the `dst_host_srv_count` connections.
