<a href="https://colab.research.google.com/github/andreaaraldo/machine-learning-for-networks/blob/master/04.neural_networks/neural_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import pandas as pd

# Use case description

The use case is from:

Khangura, S. K. (2019). Neural Network-based Available Bandwidth Estimation from TCP Sender-side Measurements. In IEEE/IFIP PEMWN.


**Goal** Estimate available bandwidth in a network via **passive measures**.

More precisely:
_Estimate the capacity available to a TCP flow_ (sharing links with other flows) observing
* The time gaps between segments sent $g_{\text{in}}$
* The gaps between acks $g_\text{ack}$

The auhtors set up the following testbed:


![alt text](https://raw.githubusercontent.com/andreaaraldo/machine-learning-for-networks/master/04.neural_networks/img/testbed.png)


Measures are collected in the **Video Receiver**. All the other machines just produce cross-traffic.

Measures are recorded via an Endace Data Acquisition and Generation (DAG) card, which timestamp all packets in an extremely precise way.

![alt text](https://www.endace.com/assets/images/products/DAG%209.5G4F_angled_small.png)

([Producer website](https://www.endace.com/endace-high-speed-packet-capture-solutions/oem/dag/))

**Why**: Knowing the available bandwidth, video streaming clients can properly choose the quality level to request.

# Traces

Check the [description](https://www.ikt.uni-hannover.de/bandwidthestimationtraces.html) of the dataset.

In [2]:
!wget https://www.ikt.uni-hannover.de/fileadmin/institut/Forschung/BandwidthEstimationTraces/BandwidthEstimationTraces.zip

--2020-03-15 08:18:14--  https://www.ikt.uni-hannover.de/fileadmin/institut/Forschung/BandwidthEstimationTraces/BandwidthEstimationTraces.zip
Resolving www.ikt.uni-hannover.de (www.ikt.uni-hannover.de)... 130.75.2.72
Connecting to www.ikt.uni-hannover.de (www.ikt.uni-hannover.de)|130.75.2.72|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 941822 (920K) [application/zip]
Saving to: ‘BandwidthEstimationTraces.zip’


2020-03-15 08:18:17 (396 KB/s) - ‘BandwidthEstimationTraces.zip’ saved [941822/941822]



In [3]:
! unzip BandwidthEstimationTraces.zip

Archive:  BandwidthEstimationTraces.zip
   creating: BandwidthEstimationTraces/
   creating: BandwidthEstimationTraces/testing/
   creating: BandwidthEstimationTraces/testing/MultiLinkCapacity100/
   creating: BandwidthEstimationTraces/testing/MultiLinkCapacity100/50_et_100_bC_100_C_3_delta_multihop3/
  inflating: BandwidthEstimationTraces/testing/MultiLinkCapacity100/50_et_100_bC_100_C_3_delta_multihop3/50_et_100_bC_100_C_3_delta_mh3_1.csv  
  inflating: BandwidthEstimationTraces/testing/MultiLinkCapacity100/50_et_100_bC_100_C_3_delta_multihop3/50_et_100_bC_100_C_3_delta_mh3_10.csv  
  inflating: BandwidthEstimationTraces/testing/MultiLinkCapacity100/50_et_100_bC_100_C_3_delta_multihop3/50_et_100_bC_100_C_3_delta_mh3_100.csv  
  inflating: BandwidthEstimationTraces/testing/MultiLinkCapacity100/50_et_100_bC_100_C_3_delta_multihop3/50_et_100_bC_100_C_3_delta_mh3_11.csv  
  inflating: BandwidthEstimationTraces/testing/MultiLinkCapacity100/50_et_100_bC_100_C_3_delta_multihop3/50_et_100_bC

In [4]:
!ls BandwidthEstimationTraces

testing  training


Training and test datasets are separated

In [5]:
! ls BandwidthEstimationTraces/training

MultiLinkCapacity100   TightLinkafterBottleneckLink
SingleLinkCapacity100  TightLinkbeforeBottleneckLink
SingleLinkCapacity50


For simplicity, we will just consider the case with a single link between client and server, of total capacity C=100 Mbps (Ethernet level)

In [6]:
! ls BandwidthEstimationTraces/training/SingleLinkCapacity100

25_et_100_C_5_delta  50_et_100_C_5_delta  75_et_100_C_5_delta


There are three sets of traces:
* With cross traffic rate $\lambda$=25 Mbps
* With cross traffic rate $\lambda$=50 Mbps
* With cross traffic rate $\lambda$=75 Mbps

All rates are intended at the Ethernet level

In [7]:
! ls BandwidthEstimationTraces/training/SingleLinkCapacity100/25_et_100_C_5_delta

25_et_100_C_5_delta_100.csv  25_et_100_C_5_delta_55.csv
25_et_100_C_5_delta_10.csv   25_et_100_C_5_delta_56.csv
25_et_100_C_5_delta_11.csv   25_et_100_C_5_delta_57.csv
25_et_100_C_5_delta_12.csv   25_et_100_C_5_delta_58.csv
25_et_100_C_5_delta_13.csv   25_et_100_C_5_delta_59.csv
25_et_100_C_5_delta_14.csv   25_et_100_C_5_delta_5.csv
25_et_100_C_5_delta_15.csv   25_et_100_C_5_delta_60.csv
25_et_100_C_5_delta_16.csv   25_et_100_C_5_delta_61.csv
25_et_100_C_5_delta_17.csv   25_et_100_C_5_delta_62.csv
25_et_100_C_5_delta_18.csv   25_et_100_C_5_delta_63.csv
25_et_100_C_5_delta_19.csv   25_et_100_C_5_delta_64.csv
25_et_100_C_5_delta_1.csv    25_et_100_C_5_delta_65.csv
25_et_100_C_5_delta_20.csv   25_et_100_C_5_delta_66.csv
25_et_100_C_5_delta_21.csv   25_et_100_C_5_delta_67.csv
25_et_100_C_5_delta_22.csv   25_et_100_C_5_delta_68.csv
25_et_100_C_5_delta_23.csv   25_et_100_C_5_delta_69.csv
25_et_100_C_5_delta_24.csv   25_et_100_C_5_delta_6.csv
25_et_100_C_5_delta_25.csv   25_et_100_C_5_delta_7

Every experiment is repeated 100 times.

In [8]:
help(pd.read_csv)

Help on function read_csv in module pandas.io.parsers:

read_csv(filepath_or_buffer:Union[str, pathlib.Path, IO[~AnyStr]], sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
    Read a comma-separated values (csv) file into Data

In [9]:
filename = "BandwidthEstimationTraces/training/SingleLinkCapacity100/25_et_100_C_5_delta/25_et_100_C_5_delta_1.csv"
df = pd.read_csv(filename, header=None)
df.head()

Unnamed: 0,0,1
0,100.0,75.0
1,0.99999,4.9906
2,0.9994,9.9444
3,0.99939,14.991
4,0.99981,19.988
