# Cybersecurity Network Traffic Data Analysis with Pandas

#### DESCRIPTION: This project entails using pandas to analyze threats level in a dataset to discover cybersecurity threats and attacks on a company's network.

### By: Benedine Okeke
##### *benedinenokeke@gmail.com*
##### 8th April, 2025.

#### OBJECTIVE: As a Data Analytics student, this project aims to deepen my understanding and skills in using pandas for effective data analysis, allowing me to extract valuable insights from complex datasets. By working with this real-world data, I will develop proficiency in data manipulation, and filtering which will be essential for future data-driven endeavors. 

### -------------------------------------------------------------------------------------------------------------

### STEP 1: Installing & Importing Pandas from Python Library

In [1]:
# Importing pandas as pd
import pandas as pd

### STEP 2: Loading the Dataset from Github for Analysis

In [2]:
# Loading the dataset from github and storing it in a variable.
df= pd.read_csv('https://raw.githubusercontent.com/ritaafrica/data/main/network_traffic_data.csv')

### STEP 3: Basic Data Exploration

In [3]:
# Loading and viewing the first 5 Rows of Dataset 
df.head()

Unnamed: 0,Timestamp,Source_IP,Destination_IP,Protocol,Port,Bytes_Sent,Bytes_Received,Status,Threat_Level
0,2025-03-19 13:04:10,10.0.0.15,192.168.1.20,TCP,,5411,8989,Blocked,Low
1,2025-03-19 13:03:40,192.168.1.13,172.217.169.46,ICMP,443.0,4999,11808,Allowed,Medium
2,2025-03-19 13:03:10,10.0.0.5,203.0.113.99,HTTP,443.0,6360,10852,Allowed,Medium
3,2025-03-19 13:02:40,10.0.0.9,192.168.1.20,TCP,,4011,14314,Blocked,Low
4,2025-03-19 13:02:10,192.168.1.4,172.217.169.46,FTP,,5254,8718,Blocked,Medium


In [4]:
# Displaying the Total Number of Rows and Columns of the Dataset.
print(f'Rows, Columns: \n{df.shape}')

Rows, Columns: 
(1000, 9)


In [5]:
# Displaying all Column Names in the Dataset
print(f'Column Names: \n{df.columns}')

Column Names: 
Index(['Timestamp', 'Source_IP', 'Destination_IP', 'Protocol', 'Port',
       'Bytes_Sent', 'Bytes_Received', 'Status', 'Threat_Level'],
      dtype='object')


In [6]:
# Displaying the Columns datatypes, null values and memory usage. 
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Timestamp       1000 non-null   object 
 1   Source_IP       1000 non-null   object 
 2   Destination_IP  1000 non-null   object 
 3   Protocol        1000 non-null   object 
 4   Port            874 non-null    float64
 5   Bytes_Sent      1000 non-null   int64  
 6   Bytes_Received  1000 non-null   int64  
 7   Status          1000 non-null   object 
 8   Threat_Level    1000 non-null   object 
dtypes: float64(1), int64(2), object(6)
memory usage: 70.4+ KB


In [7]:
# Describing Statistical Summary of the Datatset. 
df.describe()

Unnamed: 0,Port,Bytes_Sent,Bytes_Received
count,874.0,1000.0,1000.0
mean,1819.73913,5143.572,7562.659
std,2899.374632,2808.256143,4240.206295
min,21.0,106.0,102.0
25%,22.0,2857.0,4025.5
50%,80.0,5224.0,7584.5
75%,3389.0,7487.75,11147.75
max,8080.0,9984.0,14977.0


### STEP 4: Selecting Columns for Display

In [8]:
# Selecting these Columns for Display.
selected_columns = df[["Source_IP", "Destination_IP",  "Status", "Threat_Level"]]

#  Displaying the first 10 Rows of the Selected Columns.
selected_columns.head(10)  

Unnamed: 0,Source_IP,Destination_IP,Status,Threat_Level
0,10.0.0.15,192.168.1.20,Blocked,Low
1,192.168.1.13,172.217.169.46,Allowed,Medium
2,10.0.0.5,203.0.113.99,Allowed,Medium
3,10.0.0.9,192.168.1.20,Blocked,Low
4,192.168.1.4,172.217.169.46,Blocked,Medium
5,10.0.0.43,172.217.169.46,Allowed,Low
6,10.0.0.26,10.0.0.5,Allowed,High
7,192.168.1.36,192.168.1.20,Allowed,Medium
8,192.168.1.26,192.168.1.20,Allowed,Medium
9,10.0.0.43,10.0.0.5,Blocked,Low


### STEP 5: Revealing the Threat Level in the Dataset

In [9]:
# Storing and Selecting Columns to Display the Threat Level on the Company's Network.
threat_level = df[["Source_IP", "Destination_IP",  "Threat_Level"]]

# Displaying the first 10 Rows of the Threat Level.
threat_level.head(10)

Unnamed: 0,Source_IP,Destination_IP,Threat_Level
0,10.0.0.15,192.168.1.20,Low
1,192.168.1.13,172.217.169.46,Medium
2,10.0.0.5,203.0.113.99,Medium
3,10.0.0.9,192.168.1.20,Low
4,192.168.1.4,172.217.169.46,Medium
5,10.0.0.43,172.217.169.46,Low
6,10.0.0.26,10.0.0.5,High
7,192.168.1.36,192.168.1.20,Medium
8,192.168.1.26,192.168.1.20,Medium
9,10.0.0.43,10.0.0.5,Low


### STEP 6: Filtering only Blocked Traffic from the Datatset.

In [10]:
# Filtering Blocked Traffic and Ignoring Case.
blocked_traffic = df[df['Status'].str.lower() == 'blocked']

# Creating a Summary of Blocked Traffic.
blocked_summary = blocked_traffic[['Timestamp', 'Source_IP', 'Destination_IP', 'Threat_Level', 'Status']]

# Displaying the first 5 Rows of the Blocked Traffic.
blocked_summary.head(5)

Unnamed: 0,Timestamp,Source_IP,Destination_IP,Threat_Level,Status
0,2025-03-19 13:04:10,10.0.0.15,192.168.1.20,Low,Blocked
3,2025-03-19 13:02:40,10.0.0.9,192.168.1.20,Low,Blocked
4,2025-03-19 13:02:10,192.168.1.4,172.217.169.46,Medium,Blocked
9,2025-03-19 12:59:40,10.0.0.43,10.0.0.5,Low,Blocked
10,2025-03-19 12:59:10,10.0.0.33,203.0.113.99,Medium,Blocked


In [12]:
print(f'Total Number of Blocked Traffic: {len(blocked_summary)}')

Total Number of Blocked Traffic: 532


### STEP 7: Filtering Suspicious Traffic from the Dataset

In [13]:
# Filtering only Suspicious Traffic and Ignoring Case.
suspicious_traffic = df[df['Threat_Level'].str.lower() == 'critical']

# Displaying the first 10 Rows with Suspicious Traffic.
suspicious_traffic.head(10)

Unnamed: 0,Timestamp,Source_IP,Destination_IP,Protocol,Port,Bytes_Sent,Bytes_Received,Status,Threat_Level
59,2025-03-19 12:34:40,10.0.0.47,192.168.1.20,ICMP,,5885,463,Allowed,Critical
96,2025-03-19 12:16:10,192.168.1.35,203.0.113.99,FTP,8080.0,9371,7189,Allowed,Critical
134,2025-03-19 11:57:10,192.168.1.17,172.217.169.46,DNS,22.0,6714,13124,Blocked,Critical
150,2025-03-19 11:49:10,192.168.1.42,10.0.0.5,HTTP,53.0,2702,634,Allowed,Critical
209,2025-03-19 11:19:40,10.0.0.17,203.0.113.99,TCP,3389.0,5085,10014,Blocked,Critical
212,2025-03-19 11:18:10,192.168.1.23,8.8.8.8,FTP,21.0,7190,10232,Blocked,Critical
219,2025-03-19 11:14:40,10.0.0.30,192.168.1.20,TCP,22.0,2702,4498,Allowed,Critical
232,2025-03-19 11:08:10,10.0.0.3,172.217.169.46,DNS,,2606,11416,Blocked,Critical
251,2025-03-19 10:58:40,192.168.1.48,203.0.113.99,DNS,53.0,7644,5920,Allowed,Critical
256,2025-03-19 10:56:10,10.0.0.31,203.0.113.99,ICMP,22.0,9167,8793,Allowed,Critical


In [14]:
print(f'Total Number of Suspicious Traffic: {len(suspicious_traffic)}')

Total Number of Suspicious Traffic: 47


### STEP 8: Filtering Traffic with High Data Transfer

In [15]:
# Filtering Traffic where Bytes Sent is greater than 5000.
high_data_transfer = df[df['Bytes_Sent'] > 5000]

# Displaying the first 5 Rows of Traffic with High Data Transfer Greater than 5000 Bytes. 
high_data_transfer.head()

Unnamed: 0,Timestamp,Source_IP,Destination_IP,Protocol,Port,Bytes_Sent,Bytes_Received,Status,Threat_Level
0,2025-03-19 13:04:10,10.0.0.15,192.168.1.20,TCP,,5411,8989,Blocked,Low
2,2025-03-19 13:03:10,10.0.0.5,203.0.113.99,HTTP,443.0,6360,10852,Allowed,Medium
4,2025-03-19 13:02:10,192.168.1.4,172.217.169.46,FTP,,5254,8718,Blocked,Medium
5,2025-03-19 13:01:40,10.0.0.43,172.217.169.46,DNS,53.0,6915,12981,Allowed,Low
7,2025-03-19 13:00:40,192.168.1.36,192.168.1.20,TCP,21.0,5655,119,Allowed,Medium


In [16]:
# Displaying the total number of traffic with high data transfer greater than 5000 Bytes.
print(f'Total Number of Traffic with High Data Transfer Greater than 5000 Bytes: \n{len(high_data_transfer)}')

Total Number of Traffic with High Data Transfer Greater than 5000 Bytes: 
518


### STEP 9: Splitting the Dataset into X (Features) and y (Target Variable)

In [17]:
X = df.drop(columns =['Threat_Level']) # Selecting all columns to be Features except Threat Level.

y = df['Threat_Level'] # Selecting Threat Level as the Target Variable.

In [18]:
# Displaying the first 5 Rows of the selected Features (All columns except Threat Level)
X.head()

Unnamed: 0,Timestamp,Source_IP,Destination_IP,Protocol,Port,Bytes_Sent,Bytes_Received,Status
0,2025-03-19 13:04:10,10.0.0.15,192.168.1.20,TCP,,5411,8989,Blocked
1,2025-03-19 13:03:40,192.168.1.13,172.217.169.46,ICMP,443.0,4999,11808,Allowed
2,2025-03-19 13:03:10,10.0.0.5,203.0.113.99,HTTP,443.0,6360,10852,Allowed
3,2025-03-19 13:02:40,10.0.0.9,192.168.1.20,TCP,,4011,14314,Blocked
4,2025-03-19 13:02:10,192.168.1.4,172.217.169.46,FTP,,5254,8718,Blocked


In [19]:
# Displaying the first 5 Rows of the Target Variable (Threat Level)
y.head()

0       Low
1    Medium
2    Medium
3       Low
4    Medium
Name: Threat_Level, dtype: object

### STEP 10: Removing a Column

In [20]:
# Removing the Timestamp column from the dataset
df = df.drop(columns=['Timestamp'])  

# Displaying the other Columns without Timestamp for confirmation.
df.head()

Unnamed: 0,Source_IP,Destination_IP,Protocol,Port,Bytes_Sent,Bytes_Received,Status,Threat_Level
0,10.0.0.15,192.168.1.20,TCP,,5411,8989,Blocked,Low
1,192.168.1.13,172.217.169.46,ICMP,443.0,4999,11808,Allowed,Medium
2,10.0.0.5,203.0.113.99,HTTP,443.0,6360,10852,Allowed,Medium
3,10.0.0.9,192.168.1.20,TCP,,4011,14314,Blocked,Low
4,192.168.1.4,172.217.169.46,FTP,,5254,8718,Blocked,Medium


### CONCLUSION: Analyzing the threat level of this dataset with pandas has provided deep insights into the cybersceurity threat level and attacks on the company's network. 

### ***Key Insights***
##### 1. Security Threats: Over 50% (532) of the data has Blocked traffic which indicates potential security threats or malicious activity.
##### 2.  System Vulnerabilities: 47 Threat Level were Critical which indicates potential system vulnerabilities or sensitive data targeted by blocked traffic.
##### 3.  Incident Response: Threat level distribution and high-risk threats inform incident response and security measures.
##### 4.  Security Monitoring: Temporal patterns in blocked traffic inform security monitoring and incident response strategies.
