# Ping data exploration

## How data is  collected
Ping data is collected every 5 seconds doing single pings to 8.8.8.8

**Example**:
>$ ping -c 1 8.8.8.8  
>PING 8.8.8.8 (8.8.8.8): 56 data bytes  
>64 bytes from 8.8.8.8: icmp_seq=0 ttl=119 **time=22.985 ms**  
>--- 8.8.8.8 ping statistics ---  
>1 packets transmitted, 1 packets received, **0.0% packet loss**  
>round-trip min/**avg**/max/**stddev** = 22.985/22.985/**22.985**/**0.000** ms  

Data being collected:
 - **Ping latency** (in miliseconds, 22.985 ms in the example)
 - __Percentage of packets dropped__  (0.0% in the example, can be only 0% or 100%, data collected for single pings) 
 - **Standart deviation (for latency)**  - always 0 for single pings


## How data looks like
Load libraries:

In [1]:
from data_exploration import *

In [2]:
import pyodbc
def connect_to_mssql():
    with open('../credentials.json', 'r') as f_credentials:
        credentials_config = json.load(f_credentials)
    password=credentials_config['mssql_password']
    srv=credentials_config['mssql_host']
    connection = pyodbc.connect(driver='/usr/local/lib/libtdsodbc.so', server=srv,port='1433', database='net_speed_md', uid='cybera_sql', pwd=password)
    return connection

Set up test time interval:

In [3]:
time_interval='4w' #5d

In [4]:
#Set up starting point, by default if will start from current time
starting_point=datetime.now().strftime('%Y-%m-%d %H:%M:%S')
#starting point="2019-01-10 14:00:00"  # to set upl alternative starting point
print("Starting point:",starting_point )

Starting point: 2019-02-04 16:54:37


Set up influxdb connection:

In [5]:
client, client_df = connect_to_influxdb()

Checking last 10 records in the ping measurment and see how data looks like:

In [6]:
query_ping = 'SELECT PING,SK_PI FROM PING ORDER BY time DESC LIMIT 10;'
ping_df = get_dataframe_from_influxdb(client_df=client_df,query_influx=query_ping,table_name='PING')
ping_df

Unnamed: 0,time,PING,SK_PI
6,2019-02-02 17:59:57-06:00,14.746,3
2,2019-02-02 17:59:56-06:00,38.336,5
5,2019-02-02 17:59:57-06:00,27.604,11
0,2019-02-02 17:59:55-06:00,24.877,12
1,2019-02-02 17:59:55-06:00,35.105,15
7,2019-02-02 17:59:58-06:00,38.158,16
8,2019-02-02 17:59:58-06:00,91.81,19
4,2019-02-02 17:59:57-06:00,52.81,20
9,2019-02-02 17:59:58-06:00,270.969,21
3,2019-02-02 17:59:56-06:00,47.418,22


Let's take just one device, for example 3:

In [7]:
query_ping = "SELECT PING,SK_PI FROM PING WHERE SK_PI='3' ORDER BY time  DESC LIMIT 10 ;"
ping_df = get_dataframe_from_influxdb(client_df=client_df,query_influx=query_ping,table_name='PING')
ping_df

Unnamed: 0,time,PING,SK_PI
0,2019-02-02 17:59:13-06:00,14.736,3
1,2019-02-02 17:59:17-06:00,14.781,3
2,2019-02-02 17:59:23-06:00,14.738,3
3,2019-02-02 17:59:27-06:00,14.824,3
4,2019-02-02 17:59:33-06:00,14.696,3
5,2019-02-02 17:59:37-06:00,14.755,3
6,2019-02-02 17:59:43-06:00,14.875,3
7,2019-02-02 17:59:47-06:00,14.677,3
8,2019-02-02 17:59:53-06:00,14.856,3
9,2019-02-02 17:59:57-06:00,14.746,3


Let's compare with what we have in MS SQL database:

In [8]:
cnxn = connect_to_mssql()
sql = "SELECT TOP 10 DATA_DATE, PING, SK_PI,CONNTRACK FROM FCT_PI WHERE SK_PI='3' ORDER BY DATA_DATE DESC;"
pd.read_sql(sql,cnxn)

Unnamed: 0,DATA_DATE,PING,SK_PI,CONNTRACK
0,2019-02-03 23:59:59.400,14.771,3,18
1,2019-02-03 23:59:57.397,,3,18
2,2019-02-03 23:59:55.397,,3,18
3,2019-02-03 23:59:53.393,14.827,3,18
4,2019-02-03 23:59:51.390,,3,18
5,2019-02-03 23:59:49.390,14.829,3,18
6,2019-02-03 23:59:47.387,,3,18
7,2019-02-03 23:59:45.387,,3,18
8,2019-02-03 23:59:43.383,14.806,3,18
9,2019-02-03 23:59:41.380,,3,18


Pings are coming every 4-6 seconds.

Checking statistics for number of packets dropped:

In [9]:
query_ping = "SELECT DISTINCT(PING_DROPRATE) from PING;"
ping_df = get_dataframe_from_influxdb(client_df=client_df,query_influx=query_ping,table_name='PING')
ping_df

Unnamed: 0,time,distinct
0,1969-12-31 18:00:00-06:00,0
1,1969-12-31 18:00:00-06:00,100


Lets see what is the value of *PING* measurment when *PING_DROPRATE* is equal to 100:

In [10]:
query_ping = "SELECT PING, PING_DROPRATE FROM PING WHERE PING_DROPRATE=100 ORDER BY time DESC LIMIT 5;"
ping_df = get_dataframe_from_influxdb(client_df=client_df,query_influx=query_ping,table_name='PING')
ping_df

Unnamed: 0,time,PING,PING_DROPRATE
0,2019-02-02 17:56:24-06:00,0,100
1,2019-02-02 17:57:14-06:00,0,100
2,2019-02-02 17:58:28-06:00,0,100
3,2019-02-02 17:59:34-06:00,0,100
4,2019-02-02 17:59:48-06:00,0,100


*PING* is equal to 0 when PING_DROPRATE=100, its importatnt to note.

## Number of datapoints per device

Getting device numbers(tags SK_PI):

In [11]:
device_numbers=get_tag_values_influxdb(client_influx=client,table_name='PING', tag_name='SK_PI')
device_numbers=list(map(int, device_numbers))
device_numbers= sorted(device_numbers)
print(device_numbers)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]


Getting number of data points per device for the entire period of time.

In [12]:
query_ping_counts = "SELECT COUNT(PING) FROM PING WHERE time <= '"+starting_point+"'GROUP BY SK_PI;"
ping_counts=get_stats_influxdb(client_influx=client,
                               query_influx=query_ping_counts,
                               stat_name='count',
                               device_numbers=device_numbers)

Plotting device numbers and number of data points. 

In [13]:
simple_bar_plot(xvalues=device_numbers,
                yvalues=ping_counts,
                name="ping datapoints",
                title="Number of data points per device to the date "+ starting_point,
                ytitle="Number of datapoints")

Some of the devices have small number of datapoints, may be they are just installed? Lets check how many dataponts came in last 4 weeks.

Getting number of datapoints per device in last 4 weeks and list of the devices that have data for this period of time(not all of the have).

In [14]:
#query_ping_counts_dec = "SELECT COUNT(PING) FROM PING WHERE time >= '2018-12-01 00:00:00' GROUP BY SK_PI ;"
query_ping_counts_time = "SELECT COUNT(PING) FROM PING WHERE time >= '"+starting_point+"'-"+time_interval+" GROUP BY SK_PI ;"
ping_counts_time = get_stats_influxdb(client_influx=client,
                                      query_influx=query_ping_counts_time,
                                      stat_name='count',
                                      device_numbers=device_numbers)

Plotting combined barchart - entire number of datapoints vs number of datapoints in last 4 weeks.

In [15]:
combined_bar_plot_2traces(xvalues=device_numbers,
                          yvalues1=ping_counts_time,
                          yvalues2=[a - b for a, b in zip(ping_counts, ping_counts_time)],
                          name1='Last '+time_interval,
                          name2='The rest of the time',
                          title="Comparing number of datapoints in last "+time_interval+" vs entire time" + " starting from "+ starting_point,
                          ytitle="Number of datapoints")

Devices 1,2,4,5,6,8 and possibly 13 and 14 need to be double checked. Looks like they have started reporting and then stopped. Let's check last reporting time for every device.

In [16]:
query_ping_last = "SELECT LAST(PING), time FROM PING WHERE time <= '"+starting_point+"' GROUP BY SK_PI;"
result_ping_last=get_stats_influxdb(client_influx=client,
                               query_influx=query_ping_last,
                               stat_name='time',
                               device_numbers=device_numbers)

In [17]:
query_ping_first = "SELECT FIRST(PING), time FROM PING WHERE time <= '"+starting_point+"' GROUP BY SK_PI;"
result_ping_first=get_stats_influxdb(client_influx=client,
                               query_influx=query_ping_first,
                               stat_name='time',
                               device_numbers=device_numbers)

In [18]:
print("Collectd reporting times:")
data=[]
for i in range(len(device_numbers)):
    try:
        result_ping_first[i] = dateutil.parser.parse(result_ping_first[i]).strftime('%Y-%m-%d %H:%M:%S')
    except:
        result_ping_first[i]=None
    try:
        result_ping_last[i] = dateutil.parser.parse(result_ping_last[i]).strftime('%Y-%m-%d %H:%M:%S')
    except:
        result_ping_last[i]=None
    print("Device: ", device_numbers[i],"  was reporting from ", result_ping_first[i], " to ",result_ping_last[i])
    trace = go.Scatter(x=[result_ping_first[i],result_ping_last[i]],y=[device_numbers[i],device_numbers[i]], 
                       name = device_numbers[i],marker=dict(color=colors[i]))
    data.append(trace)
layout = dict(title = "Device reporting times(collectd)",xaxis=dict(title="Time"),
        yaxis=dict(title="Device Number"))
fig = go.Figure(data=data, layout=layout)
iplot(fig)

Collectd reporting times:
Device:  1   was reporting from  2018-10-10 14:16:15  to  2018-10-11 15:46:31
Device:  2   was reporting from  2018-10-11 15:27:46  to  2018-11-06 19:16:32
Device:  3   was reporting from  2018-10-11 15:27:46  to  2019-02-02 23:59:57
Device:  4   was reporting from  2018-10-11 15:27:46  to  2018-12-05 22:53:02
Device:  5   was reporting from  2018-10-11 15:27:46  to  2019-02-02 23:59:56
Device:  6   was reporting from  2018-10-11 15:27:46  to  2018-10-31 14:55:00
Device:  7   was reporting from  2018-11-11 00:00:03  to  2019-02-02 23:59:54
Device:  8   was reporting from  2018-10-11 15:27:46  to  2019-01-14 14:27:00
Device:  9   was reporting from  2018-10-11 15:27:46  to  2019-02-02 23:59:54
Device:  10   was reporting from  2018-10-11 15:27:46  to  2018-12-31 04:39:59
Device:  11   was reporting from  2018-10-11 15:27:46  to  2019-02-02 23:59:57
Device:  12   was reporting from  2018-10-11 15:27:46  to  2019-02-02 23:59:55
Device:  13   was reporting from  2

Something is happening with devices  1,2,4,5,6,8,13 and 14. They stopped reporting - does it need to be investigated?

## Ping latency 

What is normal ping latency? From [this link](https://www.pingman.com/kb/article/what-s-normal-for-latency-and-packet-loss-42.html):

>There are two normal factors that significantly influence the latency of a consumer device (like a cable modem, dsl modem or dial-up modem).

>The latency of the connecting device. For a cable modem, this can normally be between 5 and 40 ms. For a DSL modem this is normally 10 to 70ms. For a dial-up modem, this is normally anywhere from 100 to 220ms. For a cellular link, this can be from 200 to 600 ms. For a T1, this is normally 0 to 10 ms.
The distance the data is traveling. Data travels at (very roughly) 120,000 miles (or 192,000 kilometers) per second, or 120 miles (192 km) per ms (millisecond) over a network connection. With traceroute, we have to send the data there and back again, so roughly 1 ms of latency is added for every 60 miles (96km, although with the level of accuracy we're using here, we should say '100km') of distance between you and the target.
Connecting to a web site across 1500 miles (2400 km) of distance is going to add at least 25 ms to the latency. Normally, it's more like 75 after the data zig-zags around a bit and goes through numerous routers.

>This means that a DSL modem on the west coast of the United States, tracing to a server on the east coast of the United States should expect somewhere around 120 ms (depending on the route and a number of other factors, but this is a rough ballpark) - 25 ms for the DSL modem and 100 ms for the distance. Tracing across an ocean, or through a satellite link, or some other link where the distance is further will certainly impact the expected latency more.



Let's check actual ping latency numbers in last 4 weeks:

In [19]:
#query_ping_max_dec = "SELECT MAX(PING), MEAN(PING) FROM PING WHERE time >= '2018-12-01 00:00:00' GROUP BY SK_PI;"
#time_interval='4w'
query_ping_stats_time = "SELECT MAX(PING), MEAN(PING), MEDIAN(PING) FROM PING WHERE PING!=0 AND time >= '"+starting_point+"'-"+\
                         time_interval+" GROUP BY SK_PI;"
result_ping_stats=get_3_stats_influxdb(client_influx=client,
                                       query_influx=query_ping_stats_time,
                                       stat_name1='max',
                                       stat_name2='mean',
                                       stat_name3='median',
                                       device_numbers=device_numbers)

Plotting mean and max ping latency results for every reporting result in December.

In [20]:
combined_bar_plot_3traces(xvalues=result_ping_stats["SK_PI"],
                         yvalues1=result_ping_stats["max"],
                         yvalues2=result_ping_stats["mean"],
                         yvalues3=result_ping_stats["median"],
                         name1="Max",
                         name2="Mean",
                         name3="Median",
                         title="Maximum, mean and median ping delay in miliseconds for the last "+time_interval+ " starting from "+ starting_point,
                         stack=False)

Device 16 has the highest mean and highest max.Devices 7,10,12,17 and 18 have large ping delays as well.

### Ping latency grouped by duration
Lets divide ping latencies into 3 groups:
 - group1:  1-40Ms delay
 - group2:  41-100Ms delay
 - group3:  101+Ms delay  
 And calculate percentages of ping latencies in each group for every device:

In [21]:
range1=40
range2=100
#time_interval='4w'

In [22]:
query_ping_group1 = "SELECT COUNT(PING) FROM PING WHERE PING > 0 AND PING < "+str(range1+1)+\
                   " AND time >= '"+starting_point+"'-"+time_interval+" GROUP BY SK_PI;"
ping_group1_time  = get_stats_influxdb(client_influx=client,
                                      query_influx=query_ping_group1,
                                      stat_name='count',
                                      device_numbers=device_numbers)

In [23]:
query_ping_group2 = "SELECT COUNT(PING) FROM PING WHERE PING > "+str(range1)+\
                   " AND PING < "+str(range2+1)+" AND time >= '"+starting_point+"'-"+time_interval+" GROUP BY SK_PI;"
ping_group2_time = get_stats_influxdb(client_influx=client,
                                      query_influx=query_ping_group2,
                                      stat_name='count',
                                      device_numbers=device_numbers)

In [24]:
query_ping_group3= "SELECT COUNT(PING) FROM PING WHERE PING > "+str(range2+1)+" AND  time >= '"+starting_point+"'-"+\
                   time_interval+" GROUP BY SK_PI;"
ping_group3_time = get_stats_influxdb(client_influx=client,
                                      query_influx=query_ping_group3,
                                      stat_name='count',
                                      device_numbers=device_numbers)

In [25]:
combined_bar_plot_3traces(xvalues=device_numbers,
                         yvalues1=ping_group1_time,
                         yvalues2=ping_group2_time,
                         yvalues3=ping_group3_time,
                         name1="Number of pings with 1-"+str(range1)+"Ms delay",
                         name2="Number of pings with "+str(range1+1)+"-"+str(range2)+"Ms delay",
                         name3="Number of pings with "+str(range2+1)+"+Ms delay",
                         title="Ping latency grouped by duration in the last "+time_interval+ " starting from "+ starting_point,
                         ytitle="Number of datapoints")

## Ping droprate 
We will calculate the etite number of packets dropped by device using formula:SUM(PING_DROPRATE)/100

In [26]:
query_pingdroprate_mean_time= "SELECT MEAN(PING_DROPRATE) FROM PING WHERE  time >= '"+starting_point+"'-"+time_interval+"  GROUP BY SK_PI;"
pingdroprate_mean_time=get_stats_influxdb(client_influx=client,
                               query_influx=query_pingdroprate_mean_time,
                               stat_name='mean',
                               device_numbers=device_numbers)

In [27]:
simple_bar_plot(xvalues=device_numbers,
                yvalues=pingdroprate_mean_time,
                name="Mean",
                title="Average percentage of ping droprate per device in the last "+time_interval + " starting from "+ starting_point,
                ytitle="Percent")

Device 10 has the highest mean - highst percentage of packets dropped - more than 6%.

In [28]:
query_pingdroprate_counts_time = "SELECT COUNT(PING_DROPRATE) FROM PING WHERE time >= '"+starting_point+"'-"+time_interval+\
                                 " AND PING_DROPRATE>0 GROUP BY SK_PI ;"
pingdroprate_counts_time=get_stats_influxdb(client_influx=client,
                               query_influx=query_pingdroprate_counts_time,
                               stat_name='count',
                               device_numbers=device_numbers)

In [29]:
query_ping_counts_time = "SELECT COUNT(PING) FROM PING WHERE time >= '"+starting_point+"'-"+time_interval+" GROUP BY SK_PI ;"
ping_counts_time = get_stats_influxdb(client_influx=client,
                                      query_influx=query_ping_counts_time,
                                      stat_name='count',
                                      device_numbers=device_numbers)

In [30]:
print(pingdroprate_counts_time)
print([a - b for a, b in zip(ping_counts_time, pingdroprate_counts_time)])

[0, 0, 14, 0, 1086, 0, 1190, 2, 5034, 0, 880, 7966, 0, 0, 70, 14823, 0, 0, 4854, 810, 8856, 1779]
[0, 0, 454363, 0, 33460, 0, 453185, 65295, 449343, 0, 453497, 446411, 0, 0, 454290, 439555, 0, 0, 149527, 141201, 65418, 52483]


In [31]:
combined_bar_plot_2traces(xvalues=device_numbers,
                          yvalues1=pingdroprate_counts_time,
                          yvalues2=[a - b for a, b in zip(ping_counts_time, pingdroprate_counts_time)],
                          name1='Number of pings dropped',
                          name2='Number of pings delievered',
                          title="Ping droprate(actual number) in the last "+time_interval+ " starting from "+ starting_point,
                          ytitle="Number of datapoints")

Device #10 and device #16  have the largest number of packets dropped.

In [32]:
#trace = go.Pie(labels=device_numbers, values=device_dropped_counts)
#data=[trace]
#layout = go.Layout(
#        title="Total number of packets dropped by device"
#    )
#fig = go.Figure(data=data, layout=layout)
#iplot(fig)