In [1]:
import pandas as pd
import os

General Identifiers  
Flow_ID: A unique identifier for each network flow.  
Src_IP: Source IP address.  
Src_Port: Source port number.  
Dst_IP: Destination IP address.  
Dst_Port: Destination port number.  
Protocol: The protocol used (e.g., TCP, UDP).  
Timestamp: The time when the flow was recorded.  
Flow Duration and Packet Statistics  
Flow_Duration: The total duration of the flow (in microseconds).  
Tot_Fwd_Pkts: Total number of packets sent from source to destination.  
Tot_Bwd_Pkts: Total number of packets sent from destination to source.  
TotLen_Fwd_Pkts: Total length of packets in the forward direction.  
TotLen_Bwd_Pkts: Total length of packets in the backward direction.  
Packet Length Metrics  
Fwd_Pkt_Len_Max: Maximum packet length in the forward direction.  
Fwd_Pkt_Len_Min: Minimum packet length in the forward direction.  
Fwd_Pkt_Len_Mean: Mean packet length in the forward direction.  
Fwd_Pkt_Len_Std: Standard deviation of packet lengths in the forward direction.  
Bwd_Pkt_Len_Max: Maximum packet length in the backward direction.  
Bwd_Pkt_Len_Min: Minimum packet length in the backward direction.  
Bwd_Pkt_Len_Mean: Mean packet length in the backward direction.  
Bwd_Pkt_Len_Std: Standard deviation of packet lengths in the backward direction.  
Flow Rate Metrics  
Flow_Byts/s: Total bytes per second for the flow.  
Flow_Pkts/s: Total packets per second for the flow.  
Inter-arrival Time Metrics  
Flow_IAT_Mean: Mean inter-arrival time between packets in the flow.  
Flow_IAT_Std: Standard deviation of inter-arrival times.  
Flow_IAT_Max: Maximum inter-arrival time.  
Flow_IAT_Min: Minimum inter-arrival time.  
Flags and Header Information  
Fwd_PSH_Flags: Number of forward packets with PSH (push) flag set.  
Bwd_PSH_Flags: Number of backward packets with PSH flag set.  
Fwd_URG_Flags: Number of forward packets with URG (urgent) flag set.  
Bwd_URG_Flags: Number of backward packets with URG flag set.  
Fwd_Header_Len: Total length of headers in forward packets.  
Bwd_Header_Len: Total length of headers in backward packets.  
Packet Transmission Rates  
Fwd_Pkts/s: Forward packets per second.  
Bwd_Pkts/s: Backward packets per second.  
Packet Size Metrics  
Pkt_Len_Min: Minimum packet size in the flow.  
Pkt_Len_Max: Maximum packet size in the flow.  
Pkt_Len_Mean: Mean packet size in the flow.  
Pkt_Len_Std: Standard deviation of packet sizes.  
Pkt_Len_Var: Variance of packet sizes.  
Flag Counts  
FIN_Flag_Cnt: Count of FIN flags in the flow.   
SYN_Flag_Cnt: Count of SYN flags.  
RST_Flag_Cnt: Count of RST flags.  
PSH_Flag_Cnt: Count of PSH flags.  
ACK_Flag_Cnt: Count of ACK flags.  
URG_Flag_Cnt: Count of URG flags.  
CWE_Flag_Count: Count of CWE (Congestion Window Reduced) flags.  
ECE_Flag_Cnt: Count of ECE (Explicit Congestion Notification) flags.  
Ratios and Averages  
Down/Up_Ratio: Ratio of download to upload bytes.  
Pkt_Size_Avg: Average packet size.  
Fwd_Seg_Size_Avg: Average size of forward segments.  
Bwd_Seg_Size_Avg: Average size of backward segments.  
Window and Data Metrics  
Fwd_Byts/b_Avg: Average bytes per forward bulk transfer.  
Fwd_Pkts/b_Avg: Average packets per forward bulk transfer.  
Bwd_Byts/b_Avg: Average bytes per backward bulk transfer.  
Bwd_Pkts/b_Avg: Average packets per backward bulk transfer.  
Init_Fwd_Win_Byts: Initial window bytes in the forward direction.  
Init_Bwd_Win_Byts: Initial window bytes in the backward direction.  
Subflows  
Subflow_Fwd_Pkts: Number of packets in the forward subflow.  
Subflow_Fwd_Byts: Number of bytes in the forward subflow.  
Subflow_Bwd_Pkts: Number of packets in the backward subflow.  
Subflow_Bwd_Byts: Number of bytes in the backward subflow.  
Activity Metrics  
Active_Mean: Mean active time of the flow.  
Active_Std: Standard deviation of active times.  
Active_Max: Maximum active time.  
Active_Min: Minimum active time.  
Idle_Mean: Mean idle time of the flow.  
Idle_Std: Standard deviation of idle times.  
Idle_Max: Maximum idle time.  
Idle_Min: Minimum idle time.  
Labels and Categories  
Label: The label for the flow (e.g., normal or attack type).  
Cat: Broad category of the flow (e.g., DoS, DDoS).  
Sub_Cat: Subcategory of the attack (providing finer details).  

In [2]:
#different producers from each SRC

In [3]:
df = pd.read_csv('IoT Network Intrusion Dataset.csv')
# Convert to datetime
df['Timestamp'] = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %I:%M:%S %p')

df = df.sort_values('Timestamp')

In [4]:
print(f"There are {len(df)} rows, with {df['Src_IP'].nunique()} different source IP and {df['Dst_IP'].nunique()} different destiny IP")

There are 625783 rows, with 57985 different source IP and 478 different destiny IP


In [5]:
#We need to handle the different source IP problem, so lets check the number of records for each source IP
df['Src_IP'].value_counts(normalize=False).to_frame().head(5)

Unnamed: 0_level_0,count
Src_IP,Unnamed: 1_level_1
192.168.0.13,222096
192.168.0.16,125890
192.168.0.24,122846
104.118.134.215,46092
104.74.213.186,23308


In [6]:
df['Src_IP'].value_counts(normalize=True).to_frame().head(3)
print(f"{round(df['Src_IP'].value_counts(normalize=True).to_frame().head(3).sum().values[0],2)}% of the data comes from these 3 IP sources:")
print(df['Src_IP'].value_counts(normalize=True).to_frame().head(3).index[0])
print(df['Src_IP'].value_counts(normalize=True).to_frame().head(3).index[1])
print(df['Src_IP'].value_counts(normalize=True).to_frame().head(3).index[2])
print("so we will analyse these 3.")
#75% of our data are come from these 3 IP source

0.75% of the data comes from these 3 IP sources:
192.168.0.13
192.168.0.16
192.168.0.24
so we will analyse these 3.


In [7]:
ip_df = df[df['Src_IP'].isin(['192.168.0.13','192.168.0.16','192.168.0.24'])]
print(f"Sorted data len is", len(ip_df))

Sorted data len is 470832


In [8]:
#We need continous data, so we need to check the Timestamps
print("The min datetime is:" ,ip_df['Timestamp'].min())
print("The max datetime is:" ,ip_df['Timestamp'].max())

The min datetime is: 2019-05-20 04:56:14
The max datetime is: 2019-09-10 01:54:23


In [9]:
ip_df['Date'] = ip_df['Timestamp'].dt.date

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ip_df['Date'] = ip_df['Timestamp'].dt.date


In [10]:
ip_df['Date'].value_counts().sort_index()

Date
2019-05-20     37775
2019-05-26       493
2019-05-31     13071
2019-06-03     15540
2019-06-04        77
2019-07-11     70980
2019-07-25    244648
2019-08-20     14116
2019-09-04     18583
2019-09-10     55549
Name: count, dtype: int64

In [11]:
ip_df

Unnamed: 0,Flow_ID,Src_IP,Src_Port,Dst_IP,Dst_Port,Protocol,Timestamp,Flow_Duration,Tot_Fwd_Pkts,Tot_Bwd_Pkts,...,Active_Max,Active_Min,Idle_Mean,Idle_Std,Idle_Max,Idle_Min,Label,Cat,Sub_Cat,Date
15553,192.168.0.13-192.168.0.16-9020-49784-6,192.168.0.13,9020,192.168.0.16,49784,6,2019-05-20 04:56:14,263,2,1,...,0.0,0.0,131.5,12.020815,140.0,123.0,Normal,Normal,Normal,2019-05-20
324057,192.168.0.13-192.168.0.16-9020-49784-6,192.168.0.13,9020,192.168.0.16,49784,6,2019-05-20 04:56:14,356,0,2,...,0.0,0.0,356.0,0.000000,356.0,356.0,Normal,Normal,Normal,2019-05-20
115855,192.168.0.13-192.168.0.16-9020-49784-6,192.168.0.13,9020,192.168.0.16,49784,6,2019-05-20 04:56:14,396,0,2,...,0.0,0.0,396.0,0.000000,396.0,396.0,Normal,Normal,Normal,2019-05-20
23591,192.168.0.13-192.168.0.16-9020-49784-6,192.168.0.13,9020,192.168.0.16,49784,6,2019-05-20 04:56:14,156,0,3,...,0.0,0.0,78.0,2.828427,80.0,76.0,Normal,Normal,Normal,2019-05-20
306958,192.168.0.13-192.168.0.16-9020-49784-6,192.168.0.13,9020,192.168.0.16,49784,6,2019-05-20 04:56:14,321,3,1,...,0.0,0.0,107.0,27.495454,131.0,77.0,Normal,Normal,Normal,2019-05-20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143943,192.168.0.24-211.237.6.104-58485-443-6,192.168.0.24,58485,211.237.6.104,443,6,2019-09-10 01:54:23,83,0,2,...,0.0,0.0,83.0,0.000000,83.0,83.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
188832,192.168.0.24-211.237.6.104-58485-443-6,192.168.0.24,58485,211.237.6.104,443,6,2019-09-10 01:54:23,162,2,1,...,0.0,0.0,81.0,9.899495,88.0,74.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
507986,192.168.0.24-211.237.6.104-58485-443-6,192.168.0.24,58485,211.237.6.104,443,6,2019-09-10 01:54:23,120,0,2,...,0.0,0.0,120.0,0.000000,120.0,120.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
424493,192.168.0.24-211.237.6.104-58485-443-6,192.168.0.24,58485,211.237.6.104,443,6,2019-09-10 01:54:23,164,0,2,...,0.0,0.0,164.0,0.000000,164.0,164.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10


In [12]:
ip_df[['Src_IP','Dst_IP']].value_counts()

Src_IP        Dst_IP         
192.168.0.13  192.168.0.16       141356
192.168.0.16  192.168.0.13       115487
192.168.0.13  210.89.164.90       65586
192.168.0.24  210.89.164.90       65584
              222.239.240.107     11240
                                  ...  
192.168.0.13  222.16.175.58           1
              222.100.196.201         1
              20.184.14.47            1
              111.150.145.146         1
              111.143.114.146         1
Name: count, Length: 303, dtype: int64

In [13]:
ip_df.groupby(['Date','Src_IP'])['Flow_ID'].count()

Date        Src_IP      
2019-05-20  192.168.0.13     31483
            192.168.0.16      4342
            192.168.0.24      1950
2019-05-26  192.168.0.13       493
2019-05-31  192.168.0.13        47
            192.168.0.16     13024
2019-06-03  192.168.0.13       105
            192.168.0.24     15435
2019-06-04  192.168.0.13        22
            192.168.0.24        55
2019-07-11  192.168.0.13     13467
            192.168.0.16     54630
            192.168.0.24      2883
2019-07-25  192.168.0.13    155706
            192.168.0.16      5934
            192.168.0.24     83008
2019-08-20  192.168.0.13      7948
            192.168.0.16      4786
            192.168.0.24      1382
2019-09-04  192.168.0.13      1655
            192.168.0.16     13354
            192.168.0.24      3574
2019-09-10  192.168.0.13     11170
            192.168.0.16     29820
            192.168.0.24     14559
Name: Flow_ID, dtype: int64

In [14]:
import datetime
sample_df = ip_df[ip_df['Date'] == datetime.date(2019,9,10)]
sample_df.head(5)

Unnamed: 0,Flow_ID,Src_IP,Src_Port,Dst_IP,Dst_Port,Protocol,Timestamp,Flow_Duration,Tot_Fwd_Pkts,Tot_Bwd_Pkts,...,Active_Max,Active_Min,Idle_Mean,Idle_Std,Idle_Max,Idle_Min,Label,Cat,Sub_Cat,Date
569452,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,75,1,1,...,0.0,0.0,75.0,0.0,75.0,75.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
419826,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,121,1,1,...,0.0,0.0,121.0,0.0,121.0,121.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
29145,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,72,0,2,...,0.0,0.0,72.0,0.0,72.0,72.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
621803,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,145,0,3,...,0.0,0.0,72.5,0.707107,73.0,72.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
133854,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,153,0,3,...,0.0,0.0,76.5,4.949747,80.0,73.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10


In [25]:
# import plotly.express as px

# for ip in sample_df['Src_IP'].unique():
#     plot_df = sample_df[sample_df['Src_IP'] == ip]
#     fig = px.box(plot_df, y="Flow_Duration",points="all")
#     fig.show()

In [26]:
# for ip in sample_df['Src_IP'].unique():
#     plot_df = sample_df[sample_df['Src_IP'] == ip]
#     fig = px.box(plot_df, y="Fwd_Pkts/s",points="all", title=ip)
#     fig.show()

In [27]:
# for ip in sample_df['Src_IP'].unique():
#     plot_df = sample_df[sample_df['Dst_IP'] == ip]
#     fig = px.box(plot_df, y="Fwd_Pkts/s",points="all", title=ip)
#     fig.show()

In [18]:
sample_df['Src_IP'].value_counts()

Src_IP
192.168.0.16    29820
192.168.0.24    14559
192.168.0.13    11170
Name: count, dtype: int64

In [19]:
sample_df[sample_df['Timestamp'] == datetime.datetime(2019,9,10,1,38,11)]

Unnamed: 0,Flow_ID,Src_IP,Src_Port,Dst_IP,Dst_Port,Protocol,Timestamp,Flow_Duration,Tot_Fwd_Pkts,Tot_Bwd_Pkts,...,Active_Max,Active_Min,Idle_Mean,Idle_Std,Idle_Max,Idle_Min,Label,Cat,Sub_Cat,Date
569452,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,75,1,1,...,0.0,0.0,75.0,0.000000,75.0,75.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
419826,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,121,1,1,...,0.0,0.0,121.0,0.000000,121.0,121.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
29145,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,72,0,2,...,0.0,0.0,72.0,0.000000,72.0,72.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
621803,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,145,0,3,...,0.0,0.0,72.5,0.707107,73.0,72.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
133854,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,153,0,3,...,0.0,0.0,76.5,4.949747,80.0,73.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
225372,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,113,1,1,...,0.0,0.0,113.0,0.000000,113.0,113.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
131109,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,165,1,1,...,0.0,0.0,165.0,0.000000,165.0,165.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
121190,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,155,0,3,...,0.0,0.0,77.5,7.778175,83.0,72.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
559805,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,150,1,1,...,0.0,0.0,150.0,0.000000,150.0,150.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10


In [20]:
filtered_df = sample_df.groupby("Src_IP").head(10800)
filtered_df

Unnamed: 0,Flow_ID,Src_IP,Src_Port,Dst_IP,Dst_Port,Protocol,Timestamp,Flow_Duration,Tot_Fwd_Pkts,Tot_Bwd_Pkts,...,Active_Max,Active_Min,Idle_Mean,Idle_Std,Idle_Max,Idle_Min,Label,Cat,Sub_Cat,Date
569452,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,75,1,1,...,0.0,0.0,75.0,0.000000,75.0,75.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
419826,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,121,1,1,...,0.0,0.0,121.0,0.000000,121.0,121.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
29145,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,72,0,2,...,0.0,0.0,72.0,0.000000,72.0,72.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
621803,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,145,0,3,...,0.0,0.0,72.5,0.707107,73.0,72.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
133854,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:11,153,0,3,...,0.0,0.0,76.5,4.949747,80.0,73.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
563352,192.168.0.24-211.237.6.41-52064-443-6,192.168.0.24,52064,211.237.6.41,443,6,2019-09-10 01:50:34,115,0,2,...,0.0,0.0,115.0,0.000000,115.0,115.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
453766,192.168.0.24-211.237.6.41-52064-443-6,192.168.0.24,52064,211.237.6.41,443,6,2019-09-10 01:50:34,74,0,2,...,0.0,0.0,74.0,0.000000,74.0,74.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
431205,192.168.0.24-211.237.6.41-52064-443-6,192.168.0.24,52064,211.237.6.41,443,6,2019-09-10 01:50:34,122,0,2,...,0.0,0.0,122.0,0.000000,122.0,122.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10
310477,192.168.0.24-211.237.6.41-52064-443-6,192.168.0.24,52064,211.237.6.41,443,6,2019-09-10 01:50:34,71,0,2,...,0.0,0.0,71.0,0.000000,71.0,71.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10


In [21]:
import pandas as pd

# Generate the date range
start_time = "2024-11-23 00:00:00"
timestamps = pd.date_range(start=start_time, periods=10800, freq="1S")

# Create a mapping for timestamps for each ID
ip_timestamp_map = {ip: timestamps for ip in filtered_df["Src_IP"].unique()}

In [22]:
filtered_df.groupby("Src_IP").cumcount()

569452        0
419826        1
29145         2
621803        3
133854        4
          ...  
563352    10795
453766    10796
431205    10797
310477    10798
408904    10799
Length: 32400, dtype: int64

In [23]:
# Generate the timestamp range for one Src_IP
start_time = "2024-11-23 00:00:00"
timestamps = pd.date_range(start=start_time, periods=10800, freq="1S")

# Assign timestamps independently for each Src_IP
filtered_df["Custom_Timestamp"] = filtered_df.groupby("Src_IP").cumcount().map(
    lambda x: timestamps[x]
)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [29]:
filtered_df[filtered_df['Custom_Timestamp'] == datetime.datetime(2024,11,23,1,0,0)]

Unnamed: 0,Flow_ID,Src_IP,Src_Port,Dst_IP,Dst_Port,Protocol,Timestamp,Flow_Duration,Tot_Fwd_Pkts,Tot_Bwd_Pkts,...,Active_Min,Idle_Mean,Idle_Std,Idle_Max,Idle_Min,Label,Cat,Sub_Cat,Date,Custom_Timestamp
574763,192.168.0.13-192.168.0.16-9020-56196-6,192.168.0.13,9020,192.168.0.16,56196,6,2019-09-10 01:38:42,75,0,2,...,0.0,75.0,0.0,75.0,75.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10,2024-11-23 01:00:00
346574,192.168.0.13-192.168.0.16-9020-10109-6,192.168.0.16,10109,192.168.0.13,9020,6,2019-09-10 01:41:07,120,0,2,...,0.0,120.0,0.0,120.0,120.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10,2024-11-23 01:00:00
517668,192.168.0.24-101.79.244.148-48723-443-6,192.168.0.24,48723,101.79.244.148,443,6,2019-09-10 01:43:41,114,0,2,...,0.0,114.0,0.0,114.0,114.0,Anomaly,Mirai,Mirai-Hostbruteforceg,2019-09-10,2024-11-23 01:00:00


In [31]:
filtered_df.drop(columns=['Date','Timestamp'],inplace=True)
filtered_df.rename(columns={"Custom_Timestamp" : "Timestamp"},inplace=True)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [33]:
filtered_df.to_csv("processed_data/IoT Network Intrusion Dataset.csv",index=False)

In [36]:
filtered_df['Label'].value_counts()

Label
Anomaly    32400
Name: count, dtype: int64