# AWBS Project 1: Exploratory Data Analysis of Internet Traffic Performance

## Authors:
- Jakub Piotrowski 266502
- Jakub Włodarski

---

## 📘 Project Overview

This project aims to analyze and compare internet traffic performance using datasets collected in January 2021 and January 2023 by the **Federal Communications Commission (FCC)**. The datasets contain measurements like **download throughput**, **upload throughput**, and **latency**, crucial for evaluating broadband performance.

### Objectives:
1. Perform comprehensive Exploratory Data Analysis (EDA) using the CRISP-DM methodology.
2. Analyze internet traffic performance by focusing on:
   - Download and upload speeds
   - Latency
   - Relationships between metrics
3. Compare trends between 2021 and 2023.
4. Predict future internet performance using machine learning models.

### Methodology: CRISP-DM
The CRISP-DM (Cross Industry Standard Process for Data Mining) methodology guides our workflow:
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment

---

## 📂 Dataset Sources
- [2021 Dataset (FCC)](https://data.fcc.gov/download/measuring-broadband-america/2021/data-raw-2021-jan.tar.gz)
- [2023 Dataset (FCC)](https://data.fcc.gov/download/measuring-broadband-america/2023/data-raw-2023-jan.tar.gz)
- [CRISP-DM](https://www.ibm.com/docs/pl/spss-modeler/saas?topic=dm-crisp-help-overview)


## 📊 Step 1: Business Understanding

We aim to analyze and compare internet performance data from the FCC for January 2021 and 2023. Key performance indicators include download speed, upload speed, and latency.

### Objectives:
- Explore trends and performance changes over time.
- Find relationships between key metrics.
- Build predictive models for download/upload performance.

In [6]:
import os

data_2021_path = "data/data-raw-2021-jan"
data_2023_path = "data/data-raw-2023-jan"

files_2021 = os.listdir(data_2021_path)
files_2023 = os.listdir(data_2023_path)

print("2021 Files:", files_2021)
print("2023 Files:", files_2023)


2021 Files: ['202101']
2023 Files: ['202301']


In [7]:
subfolder_2023 = os.path.join(data_2023_path, '202301')

files_2023_deep = os.listdir(subfolder_2023)
for i, f in enumerate(files_2023_deep, start=1):
    print(f"{i}. {f}")


1. curr_datausage.csv
2. curr_dlping.csv
3. curr_dns.csv
4. curr_httpget.csv
5. curr_httpgetmt.csv
6. curr_httpgetmt6.csv
7. curr_httppost.csv
8. curr_httppostmt.csv
9. curr_httppostmt6.csv
10. curr_lct_dl.csv
11. curr_lct_ul.csv
12. curr_ping.csv
13. curr_traceroute.csv
14. curr_udpcloss.csv
15. curr_udpjitter.csv
16. curr_udplatency.csv
17. curr_udplatency6.csv
18. curr_ulping.csv
19. curr_webget.csv


In [8]:
import pandas as pd

print("🧹 Checking for empty files (0 rows):\n")

empty_files = []

for file in files_2023_deep:
    file_path = os.path.join(subfolder_2023, file)
    try:
        df = pd.read_csv(file_path, low_memory=False, nrows=1)
        if df.empty:
            empty_files.append(file)
    except Exception as e:
        print(f"{file}: ❌ Failed to read ({e})")

if empty_files:
    print("\n🚫 Empty files:")
    for f in empty_files:
        print(f"- {f}")
else:
    print("✅ No empty files found.")

🧹 Checking for empty files (0 rows):


🚫 Empty files:
- curr_httpget.csv
- curr_httpgetmt6.csv
- curr_httppost.csv


In [9]:
import pandas as pd

print("📊 Row count per file (data-raw-2023-jan/202301):\n")

for file in files_2023_deep:
    file_path = os.path.join(subfolder_2023, file)
    try:
        df = pd.read_csv(file_path, low_memory=False)
        print(f"{file}: {len(df):,} rows")
    except Exception as e:
        print(f"{file}: ❌ Failed to read ({e})")

📊 Row count per file (data-raw-2023-jan/202301):

curr_datausage.csv: 2,776,853 rows
curr_dlping.csv: 1,380,351 rows
curr_dns.csv: ❌ Failed to read (Error tokenizing data. C error: out of memory)
curr_httpget.csv: 0 rows
curr_httpgetmt.csv: 806,142 rows
curr_httpgetmt6.csv: 0 rows
curr_httppost.csv: 0 rows
curr_httppostmt.csv: 803,436 rows
curr_httppostmt6.csv: 2 rows
curr_lct_dl.csv: 977,725 rows
curr_lct_ul.csv: 975,854 rows
curr_ping.csv: 4,482,225 rows
curr_traceroute.csv: 13,265,007 rows
curr_udpcloss.csv: 2,138,359 rows
curr_udpjitter.csv: 2,688,577 rows
curr_udplatency.csv: 5,454,686 rows
curr_udplatency6.csv: 18,543 rows
curr_ulping.csv: 1,394,990 rows
curr_webget.csv: 10,668,402 rows


## 📘 Step 2: Data Understanding

In this step, we explore the structure, format, and contents of the key datasets. Our focus is to understand:

- Download/upload performance over time
- Relationships between internet quality metrics (latency, jitter, packet loss, etc.)
- Which features might serve as good predictors

### 📂 Datasets Selected for Analysis
Based on row counts and relevance to upload/download performance, we’ll focus on:
- `curr_datausage.csv` – likely includes bandwidth or data consumption
- `curr_lct_dl.csv` – likely download throughput
- `curr_lct_ul.csv` – likely upload throughput
- `curr_dlping.csv` / `curr_ulping.csv` – ping times (latency)
- `curr_udplatency.csv`, `curr_udpjitter.csv`, `curr_udpcloss.csv` – latency-related

These files contain millions of records, making them suitable for performance trend and modeling analysis.

Let’s preview a few of these datasets and inspect their structure using `.head()`, `.info()`, and `.describe()` methods.


In [10]:
crucial_files = ['curr_datausage.csv', 'curr_lct_dl.csv', 'curr_lct_ul.csv',
                 'curr_dlping.csv', 'curr_ulping.csv',
                 'curr_udplatency.csv', 'curr_udpjitter.csv', 'curr_udpcloss.csv']

for file in crucial_files:
    file_path = os.path.join(subfolder_2023, file)
    print(f"\nPreviewing {file}\n{'=' * 40}")

    try:
        df = pd.read_csv(file_path, low_memory=False)

        print("🔹 Head:")
        display(df.head())

        print("🔹 Info:")
        display(df.info())

        print("🔹 Describe:")
        display(df.describe())

    except Exception as e:
        print(f" Could not read {file}: {e}")


Previewing curr_datausage.csv
🔹 Head:


Unnamed: 0,unit_id,dtime,sk_tx_bytes,sk_rx_bytes,cust_wired_tx_bytes,cust_wired_rx_bytes,cust_wifi_tx_bytes,cust_wifi_rx_bytes
0,386,2023-01-01 06:32:00,0,61288,42114358,2190548237,0,3225217
1,386,2023-01-01 07:32:02,0,77942,36364678,1781312604,0,2878273
2,386,2023-01-01 08:32:03,1593,59292,31497966,914522306,0,2649831
3,386,2023-01-01 09:32:05,0,61601,37000182,1609290026,0,2889108
4,386,2023-01-01 10:32:07,2552,103223,33011116,1499945826,0,2771261


🔹 Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2776853 entries, 0 to 2776852
Data columns (total 8 columns):
 #   Column               Dtype 
---  ------               ----- 
 0   unit_id              int64 
 1   dtime                object
 2   sk_tx_bytes          int64 
 3   sk_rx_bytes          int64 
 4   cust_wired_tx_bytes  int64 
 5   cust_wired_rx_bytes  int64 
 6   cust_wifi_tx_bytes   int64 
 7   cust_wifi_rx_bytes   int64 
dtypes: int64(7), object(1)
memory usage: 169.5+ MB


None

🔹 Describe:


Unnamed: 0,unit_id,sk_tx_bytes,sk_rx_bytes,cust_wired_tx_bytes,cust_wired_rx_bytes,cust_wifi_tx_bytes,cust_wifi_rx_bytes
count,2776853.0,2776853.0,2776853.0,2776853.0,2776853.0,2776853.0,2776853.0
mean,24742640.0,55354380.0,236953700.0,785899600000.0,119450800.0,25561720.0,243834000.0
std,25959550.0,264159100.0,694157600.0,1309436000000000.0,1161513000.0,151067800.0,839987100.0
min,386.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,950364.0,633692.0,3297897.0,0.0,6653.0,296423.0,3659172.0
50%,24806510.0,987107.0,11576050.0,1556.0,83751.0,2503843.0,18528130.0
75%,39774650.0,2393821.0,32522080.0,784927.0,1742218.0,11769290.0,119775900.0
max,90021130.0,12885320000.0,24804000000.0,2.182029e+18,364022600000.0,35358500000.0,121392800000.0



Previewing curr_lct_dl.csv
🔹 Head:


Unnamed: 0,unit_id,ddate,dtime,target,address,packets_received,packets_sent,packet_size,bytes_total,duration,bytes_sec,error_code,successes,failures
0,386,2023-01-01,2023-01-01 11:47:19,sp1-vm-newyork-us.samknows.com,151.139.31.1,52,100,1400,72800,878,85835096,NO_ERROR,1,0
1,386,2023-01-01,2023-01-01 23:53:23,sp1-vm-newyork-us.samknows.com,151.139.31.1,70,100,1400,98000,5455,84407488,NO_ERROR,1,0
2,386,2023-01-02,2023-01-02 17:50:01,sp1-vm-newyork-us.samknows.com,151.139.31.1,52,100,1400,72800,739,93750000,NO_ERROR,1,0
3,386,2023-01-02,2023-01-03 01:53:08,sp2-vm-newyork-us.samknows.com,151.139.31.8,52,100,1400,72800,763,91304344,NO_ERROR,1,0
4,386,2023-01-03,2023-01-03 11:46:41,sp2-vm-newyork-us.samknows.com,151.139.31.8,76,100,1400,106400,5495,89171976,NO_ERROR,1,0


🔹 Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 977725 entries, 0 to 977724
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   unit_id           977725 non-null  int64 
 1   ddate             977725 non-null  object
 2   dtime             977725 non-null  object
 3   target            977725 non-null  object
 4   address           977725 non-null  object
 5   packets_received  977725 non-null  int64 
 6   packets_sent      977725 non-null  int64 
 7   packet_size       977725 non-null  int64 
 8   bytes_total       977725 non-null  int64 
 9   duration          977725 non-null  int64 
 10  bytes_sec         977725 non-null  int64 
 11  error_code        977725 non-null  object
 12  successes         977725 non-null  int64 
 13  failures          977725 non-null  int64 
dtypes: int64(9), object(5)
memory usage: 104.4+ MB


None

🔹 Describe:


Unnamed: 0,unit_id,packets_received,packets_sent,packet_size,bytes_total,duration,bytes_sec,successes,failures
count,977725.0,977725.0,977725.0,977725.0,977725.0,977725.0,977725.0,977725.0,977725.0
mean,22426930.0,86.534322,99.6543,1400.0,121148.050832,34412.31,30486650.0,0.996543,0.003457
std,25439670.0,21.048576,5.869461,0.0,29468.006595,109268.1,35603780.0,0.058695,0.058695
min,386.0,0.0,0.0,1400.0,0.0,0.0,0.0,0.0,0.0
25%,805912.0,76.0,100.0,1400.0,106400.0,5574.0,4481236.0,1.0,0.0
50%,4172665.0,100.0,100.0,1400.0,140000.0,9775.0,13162670.0,1.0,0.0
75%,39486000.0,100.0,100.0,1400.0,140000.0,25092.0,43846230.0,1.0,0.0
max,90021130.0,100.0,100.0,1400.0,140000.0,3007790.0,319685000.0,1.0,1.0



Previewing curr_lct_ul.csv
🔹 Head:


Unnamed: 0,unit_id,ddate,dtime,target,address,packets_received,packets_sent,packet_size,bytes_total,duration,bytes_sec,error_code,successes,failures
0,386,2023-01-01,2023-01-01 11:47:34,sp1-vm-newyork-us.samknows.com,151.139.31.1,100,100,1400,140000,1002725,139620,NO_ERROR,1,0
1,386,2023-01-02,2023-01-03 01:53:24,sp2-vm-newyork-us.samknows.com,151.139.31.8,100,100,1400,140000,992604,141043,NO_ERROR,1,0
2,386,2023-01-03,2023-01-03 11:46:57,sp2-vm-newyork-us.samknows.com,151.139.31.8,100,100,1400,140000,1009937,138623,NO_ERROR,1,0
3,386,2023-01-04,2023-01-05 01:55:21,sp2-vm-newyork-us.samknows.com,151.139.31.8,100,100,1400,140000,1002115,139705,NO_ERROR,1,0
4,386,2023-01-05,2023-01-05 05:49:49,sp2-vm-newyork-us.samknows.com,151.139.31.8,100,100,1400,140000,1012548,138265,NO_ERROR,1,0


🔹 Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 975854 entries, 0 to 975853
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   unit_id           975854 non-null  int64 
 1   ddate             975854 non-null  object
 2   dtime             975854 non-null  object
 3   target            975854 non-null  object
 4   address           975854 non-null  object
 5   packets_received  975854 non-null  int64 
 6   packets_sent      975854 non-null  int64 
 7   packet_size       975854 non-null  int64 
 8   bytes_total       975854 non-null  int64 
 9   duration          975854 non-null  int64 
 10  bytes_sec         975854 non-null  int64 
 11  error_code        975854 non-null  object
 12  successes         975854 non-null  int64 
 13  failures          975854 non-null  int64 
dtypes: int64(9), object(5)
memory usage: 104.2+ MB


None

🔹 Describe:


Unnamed: 0,unit_id,packets_received,packets_sent,packet_size,bytes_total,duration,bytes_sec,successes,failures
count,975854.0,975854.0,975854.0,975854.0,975854.0,975854.0,975854.0,975854.0,975854.0
mean,22511440.0,91.326126,99.290365,1400.0,127856.575881,311128.3,5987570.0,0.992904,0.007096
std,25464390.0,20.707236,8.394044,0.0,28990.131005,475684.1,13048920.0,0.08394,0.08394
min,386.0,0.0,0.0,1400.0,0.0,0.0,0.0,0.0,0.0
25%,806090.0,100.0,100.0,1400.0,140000.0,20088.0,222901.2,1.0,0.0
50%,4172709.0,100.0,100.0,1400.0,140000.0,64690.0,1860910.0,1.0,0.0
75%,39486060.0,100.0,100.0,1400.0,140000.0,457426.5,6508903.0,1.0,0.0
max,90021130.0,100.0,100.0,1400.0,140000.0,4815919.0,742857100.0,1.0,1.0



Previewing curr_dlping.csv
🔹 Head:


Unnamed: 0,unit_id,dtime,target,rtt_avg,rtt_min,rtt_max,rtt_std,successes,failures
0,386,2023-01-01 11:46:44,newyorkfcc.west.verizon.net,8625,7036,13951,1269,164,0
1,386,2023-01-01 11:46:44,sp1-vm-newyork-us.samknows.com,8215,6818,10715,1085,164,0
2,386,2023-01-02 17:49:03,newyorkfcc.west.verizon.net,8764,7381,13252,1211,168,0
3,386,2023-01-02 17:49:03,sp1-vm-newyork-us.samknows.com,8296,6969,22873,1705,168,0
4,386,2023-01-03 03:51:51,newyorkfcc.west.verizon.net,10952,8456,27849,2431,167,0


🔹 Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1380351 entries, 0 to 1380350
Data columns (total 9 columns):
 #   Column     Non-Null Count    Dtype 
---  ------     --------------    ----- 
 0   unit_id    1380351 non-null  int64 
 1   dtime      1380351 non-null  object
 2   target     1380351 non-null  object
 3   rtt_avg    1380351 non-null  int64 
 4   rtt_min    1380351 non-null  int64 
 5   rtt_max    1380351 non-null  int64 
 6   rtt_std    1380351 non-null  int64 
 7   successes  1380351 non-null  int64 
 8   failures   1380351 non-null  int64 
dtypes: int64(7), object(2)
memory usage: 94.8+ MB


None

🔹 Describe:


Unnamed: 0,unit_id,rtt_avg,rtt_min,rtt_max,rtt_std,successes,failures
count,1380351.0,1380351.0,1380351.0,1380351.0,1380351.0,1380351.0,1380351.0
mean,23321720.0,105709.0,23020.82,190163.7,41082.3,149.0849,7.393548
std,25709820.0,198887.9,39607.94,368083.3,95510.78,631.7885,316.5206
min,386.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,805832.0,21702.0,8535.0,33722.5,3600.0,123.0,0.0
50%,24329560.0,42362.0,14620.0,69458.0,8844.0,141.0,0.0
75%,39486420.0,96827.5,26958.0,166148.5,28916.0,164.0,2.0
max,90021130.0,2899709.0,2771961.0,2999974.0,1668147.0,219814.0,215649.0



Previewing curr_ulping.csv
🔹 Head:


Unnamed: 0,unit_id,dtime,target,rtt_avg,rtt_min,rtt_max,rtt_std,successes,failures
0,386,2023-01-01 11:47:12,newyorkfcc.west.verizon.net,7720,6994,12152,929,153,0
1,386,2023-01-01 11:47:12,sp1-vm-newyork-us.samknows.com,7721,6794,12338,1008,153,0
2,386,2023-01-03 05:50:26,newyorkfcc.west.verizon.net,7606,7260,10780,533,121,0
3,386,2023-01-03 05:50:26,sp1-vm-newyork-us.samknows.com,7447,7049,10987,653,121,0
4,386,2023-01-03 11:46:39,newyorkfcc.west.verizon.net,7696,7327,11320,658,152,0


🔹 Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1394990 entries, 0 to 1394989
Data columns (total 9 columns):
 #   Column     Non-Null Count    Dtype 
---  ------     --------------    ----- 
 0   unit_id    1394990 non-null  int64 
 1   dtime      1394990 non-null  object
 2   target     1394990 non-null  object
 3   rtt_avg    1394990 non-null  int64 
 4   rtt_min    1394990 non-null  int64 
 5   rtt_max    1394990 non-null  int64 
 6   rtt_std    1394990 non-null  int64 
 7   successes  1394990 non-null  int64 
 8   failures   1394990 non-null  int64 
dtypes: int64(7), object(2)
memory usage: 95.8+ MB


None

🔹 Describe:


Unnamed: 0,unit_id,rtt_avg,rtt_min,rtt_max,rtt_std,successes,failures
count,1394990.0,1394990.0,1394990.0,1394990.0,1394990.0,1394990.0,1394990.0
mean,23442100.0,252324.5,25164.44,374998.8,70618.81,131.0003,10.26208
std,25740920.0,442531.1,49326.11,599687.3,149907.1,787.1682,254.2786
min,386.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,806158.0,28159.0,9829.0,54110.25,5364.0,116.0,0.0
50%,24525650.0,61209.0,16174.0,144782.0,19333.0,141.0,1.0
75%,39486460.0,250881.2,28208.0,378390.2,56787.0,151.0,4.0
max,90021130.0,2999684.0,2999684.0,2999999.0,1555467.0,852479.0,217867.0



Previewing curr_udplatency.csv
🔹 Head:


Unnamed: 0,unit_id,dtime,target,rtt_avg,rtt_min,rtt_max,rtt_std,successes,failures
0,386,2023-01-01 00:32:57,newyorkfcc.west.verizon.net,8748,8748,8748,0,1,0
1,386,2023-01-01 00:32:57,sp1-vm-newyork-us.samknows.com,8513,8513,8513,0,1,0
2,386,2023-01-01 02:32:54,sp1-vm-newyork-us.samknows.com,7317,7064,7064,330,36,0
3,386,2023-01-01 02:32:54,newyorkfcc.west.verizon.net,7546,7291,7291,363,36,0
4,386,2023-01-01 03:32:52,newyorkfcc.west.verizon.net,7395,7078,7078,571,18,0


🔹 Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5454686 entries, 0 to 5454685
Data columns (total 9 columns):
 #   Column     Dtype 
---  ------     ----- 
 0   unit_id    int64 
 1   dtime      object
 2   target     object
 3   rtt_avg    int64 
 4   rtt_min    int64 
 5   rtt_max    int64 
 6   rtt_std    int64 
 7   successes  int64 
 8   failures   int64 
dtypes: int64(7), object(2)
memory usage: 374.5+ MB


None

🔹 Describe:


Unnamed: 0,unit_id,rtt_avg,rtt_min,rtt_max,rtt_std,successes,failures
count,5454686.0,5454686.0,5454686.0,5454686.0,5454686.0,5454686.0,5454686.0
mean,23602050.0,28170.94,21802.43,57659.54,6052.07,1621.233,56.70454
std,25922690.0,57307.12,37050.75,164192.9,33860.2,800.7929,324.8323
min,386.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,811962.0,10738.0,7804.0,15843.0,563.0,1125.0,0.0
50%,24525820.0,17373.0,13777.0,24985.0,1281.0,2034.0,0.0
75%,39486550.0,30528.0,25658.0,43551.0,2666.0,2237.0,0.0
max,90021130.0,2902489.0,2830730.0,2999999.0,1762910.0,4440.0,2400.0



Previewing curr_udpjitter.csv
🔹 Head:


Unnamed: 0,unit_id,dtime,target,packet_size,stream_rate,duration,packets_up_sent,packets_down_sent,packets_up_recv,packets_down_recv,jitter_up,jitter_down,latency,successes,failures
0,386,2023-01-01 11:41:34,sp1-vm-newyork-us.samknows.com,160,64000,15010478,500,500,500,500,1528,724,8340,1,0
1,386,2023-01-02 18:41:48,sp1-vm-newyork-us.samknows.com,160,64000,15255138,500,500,500,500,1429,759,8636,1,0
2,386,2023-01-02 19:41:32,sp1-vm-newyork-us.samknows.com,160,64000,15048060,500,500,500,500,1228,730,7431,1,0
3,386,2023-01-03 01:41:19,sp2-vm-newyork-us.samknows.com,160,64000,14992462,500,500,500,500,1233,731,8332,1,0
4,386,2023-01-03 05:41:10,sp2-vm-newyork-us.samknows.com,160,64000,15078435,500,500,500,500,1232,711,7930,1,0


🔹 Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2688577 entries, 0 to 2688576
Data columns (total 15 columns):
 #   Column             Dtype 
---  ------             ----- 
 0   unit_id            int64 
 1   dtime              object
 2   target             object
 3   packet_size        int64 
 4   stream_rate        int64 
 5   duration           int64 
 6   packets_up_sent    int64 
 7   packets_down_sent  int64 
 8   packets_up_recv    int64 
 9   packets_down_recv  int64 
 10  jitter_up          int64 
 11  jitter_down        int64 
 12  latency            int64 
 13  successes          int64 
 14  failures           int64 
dtypes: int64(13), object(2)
memory usage: 307.7+ MB


None

🔹 Describe:


Unnamed: 0,unit_id,packet_size,stream_rate,duration,packets_up_sent,packets_down_sent,packets_up_recv,packets_down_recv,jitter_up,jitter_down,latency,successes,failures
count,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0,2688577.0
mean,22641100.0,160.0,63686.45,13930550.0,495.1697,495.0798,494.6948,494.5956,-169997.8,1100.631,23918.85,0.9902621,0.009737865
std,25536830.0,0.0,4468.669,2460975.0,48.90598,49.2435,49.63622,49.64921,19292210.0,8869.266,1853476.0,0.098199,0.098199
min,386.0,160.0,0.0,0.0,0.0,0.0,0.0,0.0,-2147484000.0,-429924.0,-2147484000.0,0.0,0.0
25%,806200.0,160.0,64000.0,14990970.0,500.0,500.0,500.0,500.0,655.0,151.0,11002.0,1.0,0.0
50%,4172757.0,160.0,64000.0,14994560.0,500.0,500.0,500.0,500.0,1161.0,296.0,16755.0,1.0,0.0
75%,39486140.0,160.0,64000.0,14998020.0,500.0,500.0,500.0,500.0,2132.0,573.0,27053.0,1.0,0.0
max,90021130.0,160.0,64000.0,20836770.0,500.0,500.0,800.0,500.0,8515183.0,2841873.0,14445610.0,1.0,1.0



Previewing curr_udpcloss.csv
🔹 Head:


Unnamed: 0,unit_id,dtime,duration,target,address,packets
0,386,2023-01-21 06:05:08,4500449,newyorkfcc.west.verizon.net,206.124.86.197,2
1,386,2023-01-25 20:21:18,87004198,sp1-vm-newyork-us.samknows.com,151.139.31.1,57
2,390,2023-01-10 00:16:44,5277980,sp1-vm-newyork-us.samknows.com,151.139.31.1,3
3,390,2023-01-25 20:21:19,86999555,sp1-vm-newyork-us.samknows.com,151.139.31.1,57
4,390,2023-01-28 10:13:27,114962514,sp1-vm-newyork-us.samknows.com,151.139.31.1,76


🔹 Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2138359 entries, 0 to 2138358
Data columns (total 6 columns):
 #   Column    Dtype 
---  ------    ----- 
 0   unit_id   int64 
 1   dtime     object
 2   duration  int64 
 3   target    object
 4   address   object
 5   packets   int64 
dtypes: int64(3), object(3)
memory usage: 97.9+ MB


None

🔹 Describe:


Unnamed: 0,unit_id,duration,packets
count,2138359.0,2138359.0,2138359.0
mean,24285270.0,50498420.0,134.9578
std,23503280.0,320412800.0,1514.01
min,386.0,5.0,-2.0
25%,939188.0,4500053.0,2.0
50%,25755270.0,4744629.0,2.0
75%,39876830.0,8999804.0,4.0
max,90021130.0,5184420000.0,825233.0
