In [1]:
import pandas as pd 

# urr_httpgetmt and curr_httpgetmt6: Download speed metrics


In [2]:
# first we explore the curr_httpgetmt
df_httpgetmt = pd.read_csv('../data/raw/curr_httpgetmt.csv')

In [3]:
df_httpgetmt.head(2)

Unnamed: 0,unit_id,dtime,target,address,fetch_time,bytes_total,bytes_sec,bytes_sec_interval,warmup_time,warmup_bytes,sequence,threads,successes,failures
0,386,2023-02-02 11:46:44,sp1-vm-newyork-us.samknows.com,151.139.31.1,10028665,257012514,25627789,25627789,5028092,122041478,0,8,1,0
1,386,2023-02-02 17:46:29,sp1-vm-newyork-us.samknows.com,151.139.31.1,10018544,255021762,25454972,25454972,1523612,33933926,0,8,1,0


In [4]:
df_httpgetmt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 724511 entries, 0 to 724510
Data columns (total 14 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   unit_id             724511 non-null  int64 
 1   dtime               724511 non-null  object
 2   target              724511 non-null  object
 3   address             724511 non-null  object
 4   fetch_time          724511 non-null  int64 
 5   bytes_total         724511 non-null  int64 
 6   bytes_sec           724511 non-null  int64 
 7   bytes_sec_interval  724511 non-null  int64 
 8   warmup_time         724511 non-null  int64 
 9   warmup_bytes        724511 non-null  int64 
 10  sequence            724511 non-null  int64 
 11  threads             724511 non-null  int64 
 12  successes           724511 non-null  int64 
 13  failures            724511 non-null  int64 
dtypes: int64(11), object(3)
memory usage: 77.4+ MB


From this data, you can explore the following:

1. **Download Speed Analysis**:
    - Analyze the `bytes_sec` and `bytes_sec_interval` columns to understand the download speed trends.
    - Compare the download speeds across different `target` servers or `address`.

2. **Fetch Time Analysis**:
    - Investigate the `fetch_time` column to see how long it takes to fetch data from different servers.

3. **Warmup Performance**:
    - Analyze the `warmup_time` and `warmup_bytes` columns to understand the warmup phase's impact on performance.

4. **Server Performance**:
    - Compare the performance of different `target` servers or `address` in terms of download speed and fetch time.

5. **Success and Failure Rates**:
    - Use the `successes` and `failures` columns to calculate the success rate of HTTP GET requests.

6. **Thread Utilization**:
    - Analyze the `threads` column to see how the number of threads affects performance metrics like `bytes_sec`.

7. **Time-based Trends**:
    - Use the `dtime` column to explore time-based trends in download speed or fetch time.

8. **Correlation Analysis**:
    - Perform correlation analysis between columns like `fetch_time`, `bytes_sec`, `warmup_time`, etc., to identify relationships.ـ
    

### Description
`urr_httpgetmt` and `curr_httpgetmt6` are functions that measure the download speed of HTTP GET requests. They are used to assess the performance of HTTP servers by calculating the time taken to download a specified number of bytes from a given URL. The functions return the download speed in bytes per second (Bps) and can be used for both IPv4 and IPv6 addresses.

In [5]:
# second we explore the curr_httpgetmt
df_curr_httpgetmt6 = pd.read_csv('../data/raw/curr_httpgetmt6.csv')

In [6]:
df_curr_httpgetmt6.head(2)

Unnamed: 0,unit_id,dtime,target,address,fetch_time,bytes_total,bytes_sec,bytes_sec_interval,warmup_time,warmup_bytes,sequence,threads,successes,failures
0,216748,2023-02-25 02:55:46,v6-n1-fcc-losangeles-us.samknows.com,2604:6840:1600:1501::1,10489480,616088464,58733938,58733938,7983673,441679036,0,8,1,0
1,216748,2023-02-25 05:55:25,v6-n1-fcc-losangeles-us.samknows.com,2604:6840:1600:1501::1,10541189,646054608,61288590,61288590,8068534,446827962,0,8,1,0


The data in `curr_httpgetmt6.csv` represents performance metrics for HTTP GET requests, specifically for IPv6 addresses. Here's a detailed explanation of the columns and their significance:

1. **unit_id**: 
    - A unique identifier for the unit (device or system) performing the HTTP GET requests.

2. **dtime**:
    - The timestamp when the HTTP GET request was initiated. This helps in analyzing time-based trends in performance.

3. **target**:
    - The target server's hostname that the HTTP GET request is directed to. This indicates the server being tested for performance.

4. **address**:
    - The IPv6 address of the target server. This is the actual network address used for the HTTP GET request.

5. **fetch_time**:
    - The time (in microseconds) taken to fetch the requested data from the server. This is a key metric for measuring latency.

6. **bytes_total**:
    - The total number of bytes downloaded during the HTTP GET request. This indicates the size of the data transfer.

7. **bytes_sec**:
    - The average download speed in bytes per second for the entire request. This is a critical metric for assessing network performance.

8. **bytes_sec_interval**:
    - The average download speed in bytes per second for a specific interval during the request. This helps in identifying variations in speed during the transfer.

9. **warmup_time**:
    - The time (in microseconds) spent in the warmup phase before the actual data transfer begins. This phase is used to stabilize the connection.

10. **warmup_bytes**:
     - The number of bytes downloaded during the warmup phase. This indicates the amount of data transferred before the main download starts.

11. **sequence**:
     - A sequence number for the HTTP GET request. This helps in identifying the order of requests.

12. **threads**:
     - The number of threads used for the HTTP GET request. This indicates the level of parallelism in the data transfer.

13. **successes**:
     - The number of successful HTTP GET requests. This is useful for calculating the success rate.

14. **failures**:
     - The number of failed HTTP GET requests. This helps in identifying reliability issues.

### Example Insights:
- The data can be used to analyze download speeds (`bytes_sec` and `bytes_sec_interval`) across different servers (`target`) and addresses (`address`).
- The `fetch_time` and `warmup_time` columns provide insights into latency and connection setup times.
- The `successes` and `failures` columns can be used to calculate the reliability of the HTTP GET requests.
- By analyzing `threads`, you can assess how parallelism impacts performance metrics like `bytes_sec`.

This dataset is valuable for understanding the performance of IPv6-based HTTP GET requests and identifying potential bottlenecks or areas for optimization.

---

---

### Comparison of `urr_httpgetmt` and `curr_httpgetmt6`

From the descriptions of the two datasets, here are the key observations and comparisons:

1. **Purpose**:
    - Both datasets measure the performance of HTTP GET requests.
    - `urr_httpgetmt` focuses on general HTTP GET performance, while `curr_httpgetmt6` specifically targets IPv6-based HTTP GET requests.

2. **Target Servers**:
    - Both datasets include a `target` column that identifies the server being tested. However, `curr_httpgetmt6` focuses on IPv6 servers, as indicated by the `address` column containing IPv6 addresses.

3. **Performance Metrics**:
    - Both datasets provide similar performance metrics, such as:
      - `fetch_time`: Time taken to fetch data.
      - `bytes_sec` and `bytes_sec_interval`: Download speeds.
      - `warmup_time` and `warmup_bytes`: Metrics for the warmup phase.
    - These metrics allow for a detailed analysis of download speeds, latency, and connection setup times.

4. **Thread Utilization**:
    - Both datasets include a `threads` column, which indicates the level of parallelism used during the HTTP GET requests.

5. **Success and Failure Rates**:
    - Both datasets track the number of successful (`successes`) and failed (`failures`) HTTP GET requests, enabling reliability analysis.

6. **Time-based Trends**:
    - The `dtime` column in both datasets allows for time-based trend analysis of performance metrics.

7. **Differences**:
    - The primary difference lies in the focus of the datasets:
      - `urr_httpgetmt` includes both IPv4 and IPv6 data.
      - `curr_httpgetmt6` is exclusively for IPv6 performance analysis.

### Insights:
- The datasets are complementary and can be used together to compare IPv4 and IPv6 performance.
- `urr_httpgetmt` provides a broader view of HTTP GET performance, while `curr_httpgetmt6` offers a focused analysis of IPv6-specific performance.
- By analyzing both datasets, you can identify trends, bottlenecks, and areas for optimization in HTTP GET performance across different protocols and server configurations.

-----

# curr_httppostmt and curr_httppostmt6: Upload speed metrics



In [7]:
df_httppostmt = pd.read_csv('../data/raw/curr_httppostmt.csv')

In [8]:
df_httppostmt.head(2)

Unnamed: 0,unit_id,dtime,target,address,fetch_time,bytes_total,bytes_sec,bytes_sec_interval,warmup_time,warmup_bytes,sequence,threads,successes,failures
0,386,2023-02-02 01:51:29,sp1-vm-newyork-us.samknows.com,151.139.31.1,10000049,181888570,18188768,18188768,1500026,24164586,0,8,1,0
1,386,2023-02-02 11:47:11,sp1-vm-newyork-us.samknows.com,151.139.31.1,10000030,184537036,18453648,18453648,5000020,86765796,0,8,1,0


In [9]:
df_httppostmt6 = pd.read_csv('../data/raw/curr_httppostmt6.csv')

In [10]:
df_httppostmt6.head(2)

Unnamed: 0,unit_id,dtime,target,address,fetch_time,bytes_total,bytes_sec,bytes_sec_interval,warmup_time,warmup_bytes,sequence,threads,successes,failures
0,26419,2023-02-28 21:49:55,v6-n1-fcc-ashburn-us.samknows.com,2604:6840:1300:1501::24,10000123,25003274,2500297,2500297,5000056,12327382,0,8,1,0
1,26419,2023-02-28 21:50:23,v6-n1-fcc-ashburn-us.samknows.com,2604:6840:1300:1501::24,10000052,25013690,2501356,2501356,5000145,12339388,0,8,1,0


```markdown
### curr_httppostmt.csv and curr_httppostmt6.csv

The datasets `curr_httppostmt.csv` and `curr_httppostmt6.csv` represent performance metrics for HTTP POST requests, focusing on upload speeds. Here's a comparison and description of these datasets:

1. **Purpose**:
    - Both datasets measure the performance of HTTP POST requests.
    - `curr_httppostmt.csv` focuses on general HTTP POST performance, while `curr_httppostmt6.csv` specifically targets IPv6-based HTTP POST requests.

2. **Target Servers**:
    - Both datasets include a `target` column that identifies the server being tested. However, `curr_httppostmt6.csv` focuses on IPv6 servers, as indicated by the `address` column containing IPv6 addresses.

3. **Performance Metrics**:
    - Both datasets provide similar performance metrics, such as:
      - `fetch_time`: Time taken to upload data.
      - `bytes_sec` and `bytes_sec_interval`: Upload speeds.
      - `warmup_time` and `warmup_bytes`: Metrics for the warmup phase.
    - These metrics allow for a detailed analysis of upload speeds, latency, and connection setup times.

4. **Thread Utilization**:
    - Both datasets include a `threads` column, which indicates the level of parallelism used during the HTTP POST requests.

5. **Success and Failure Rates**:
    - Both datasets track the number of successful (`successes`) and failed (`failures`) HTTP POST requests, enabling reliability analysis.

6. **Time-based Trends**:
    - The `dtime` column in both datasets allows for time-based trend analysis of performance metrics.

7. **Differences**:
    - The primary difference lies in the focus of the datasets:
      - `curr_httppostmt.csv` includes both IPv4 and IPv6 data.
      - `curr_httppostmt6.csv` is exclusively for IPv6 performance analysis.

### Insights:
- The datasets are complementary and can be used together to compare IPv4 and IPv6 performance for HTTP POST requests.
- `curr_httppostmt.csv` provides a broader view of HTTP POST performance, while `curr_httppostmt6.csv` offers a focused analysis of IPv6-specific performance.
- By analyzing both datasets, you can identify trends, bottlenecks, and areas for optimization in HTTP POST performance across different protocols and server configurations.
```

---