# analysing the speedtest-cli dataset

We're working in a [Jupyter notebook](https://jupyter.org/) running the [Python 3](https://www.python.org/) kernel and will be using [pandas](https://pandas.pydata.org/) as a data analysis library and [seaborn](https://seaborn.pydata.org/index.html) for advanced plots.

In [None]:
import pandas
import seaborn

from matplotlib import rcParams
seaborn.set(style="white", palette="muted", color_codes=True)

from matplotlib import pyplot as plt

%matplotlib inline

Import our csv file as a pandas dataframe. 3 servers, 40 turns, 5 measurements each:

In [None]:
data = pandas.read_csv("data/speedtest.csv", parse_dates=['Timestamp'])

Print the first couple of lines to get an overview.

In [None]:
# Notice Download and Upload are in bits per second (scientific notation).
data.head()

Print first dataset of each server.

In [None]:
data[:15:5] # ~~ every 50th element including the 0th.

Split up the one big dataset into 3 sets by the Sponsor of the server.

In [None]:
upc = data[data.Sponsor == "UPC"]
drei = data[data.Sponsor == "www.drei.at"]
telekom = data[data.Sponsor == "A1 Telekom Austria AG"]

# describe() elucidates different aspects of our data:
upc[['Download', 'Upload']].describe()

Print round trip time info.

In [None]:
drei.Ping.describe()

In [None]:
# Print download rate info in mebibits/s.
upc.Download.apply(lambda x: x/2**20).describe()

## basic plotting of up- and download (round robin capture)

In [None]:
df = pandas.DataFrame({'upc down': upc.Download, 
                       'drei down': drei.Download, 
                       'telekom down': telekom.Download,
                       'upc up': upc.Upload, 
                       'drei up': drei.Upload, 
                       'telekom up': telekom.Upload,
                       'time': data.Timestamp})

df.plot(stacked=False, style='.-', figsize=(15,4), 
        x='time', title='basic plot');

## box and violin plot of download rate

In [None]:
# in megabits/s:

df = pandas.DataFrame({
            'upc down': upc.Download.apply(lambda x: x/1000**2),
            'drei down': drei.Download.apply(lambda x: x/1000**2),
            'telekom down': telekom.Download.apply(lambda x: x/1000**2)})

df.plot.box(figsize=(15,5));

In [None]:
fig = plt.figure(figsize=(15,5))
seaborn.violinplot(data=df);

## histogram of download rate

In [None]:
df = pandas.DataFrame({'upc down': upc.Download, 
                       'drei down': drei.Download, 
                       'telekom down': telekom.Download})


df.plot.hist(bins=16, stacked=True, figsize=(15,5), 
             title='low resolution, stacked');

df.plot.hist(alpha=.4, bins=64, stacked=False, figsize=(15,5),
             title='high resolution, unstacked');


In [None]:
fig = plt.figure(figsize=(15,5))
seaborn.histplot(data.Download, 
                 bins=512, 
                 kde_kws={"shade": True}).set(xlim=(3.67e7, 4.94e7));