### Collecting some ping statistics
Import python libraries:

In [5]:
from influxdb import DataFrameClient
from influxdb import InfluxDBClient

from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
import plotly.plotly as py
from plotly import tools
init_notebook_mode(connected=True)

Set up influxdb connection:

In [83]:
host='204.209.76.145'
port=8086
dbname = 'net_speed_md'
#client = DataFrameClient(host, port, '', '', dbname)
client = InfluxDBClient(host, port, '', '', dbname)

 Used this to learn about influxdb:
 - https://influxdb-python.readthedocs.io/en/latest/examples.html
 - https://www.influxdata.com/blog/getting-started-python-influxdb/
 - https://influxdb-python.readthedocs.io/en/latest/resultset.html
 - https://influxdb-python.readthedocs.io/en/latest/api-documentation.html

In [88]:
#Trying to load entire table, convert to dataframe and estimate how much memory its taking
#from here https://stackoverflow.com/questions/18089667/how-to-estimate-how-much-memory-a-pandas-dataframe-will-need
#client = DataFrameClient(host, port, '', '', dbname)
#query_ping = 'SELECT * FROM PING;'
#result_ping = client.query(query_ping)
#ping_df = result_ping['PING']
#print(ping_df.memory_usage())
#print(sys.getsizeof(ping_df))
#rint( df.memory_usage(deep=True).sum())

### Number of datapoints per device

Getting device numbers(tags SK_PI):

In [89]:
query_unique_devices = "SHOW TAG VALUES FROM PING WITH KEY=SK_PI;"
result_unique_devices = client.query(query_unique_devices)
points_unique_devices = result_unique_devices.get_points()
device_numbers=[]
for point in points_unique_devices:
    device_numbers.append(point['value'])
device_numbers=list(map(int, device_numbers))
device_numbers= sorted(device_numbers)
print(device_numbers)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]


Getting number of data points per device for the entire period of time.

In [90]:
query_ping_counts = 'SELECT COUNT(PING) FROM PING GROUP BY SK_PI;'
result_ping_counts = client.query(query_ping_counts)

In [91]:
device_counts=[]
for device in device_numbers:
    points_ping_counts=result_ping_counts.get_points(tags={'SK_PI':str(device)})
    for point in points_ping_counts:
        device_counts.append(point['count'])

Plotting device numbers and number of data points. 

In [92]:
data = [go.Bar(
            x=device_numbers,
            y=device_counts
    )]

layout = go.Layout(
        barmode='stack',
        title="Number of data points per device"
    )

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Some of the devices have small number of datapoints, may be they are just installed? Lets check how many dataponts came in December.

Getting number of datapoints per device in December and list of the devices that have data for December(not all of the have).

In [93]:
query_ping_counts_dec = "SELECT COUNT(PING) FROM PING WHERE time >= '2018-12-01 00:00:00' GROUP BY SK_PI ;"
result_ping_counts_dec = client.query(query_ping_counts_dec)

In [116]:
device_counts_dec=[]
device_numbers_dec = []
for device in device_numbers:
    points_ping_counts_dec=result_ping_counts_dec.get_points(tags={'SK_PI':str(device)})
    point_new=0
    for point in points_ping_counts_dec:
        point_new=point['count']
    if (point_new!=0): 
        device_numbers_dec.append(device)
    device_counts_dec.append(point_new)
print("Devices, that have data in December: ",device_numbers_dec)

Devices, that have data in December:  [3, 4, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]


Plotting combined barchart - entire number of datapoints vs number of datapoints in December.

In [117]:
trace1 = go.Bar(
            x=device_numbers,
            y=device_counts_dec,
            name='December',
    )
trace2 = go.Bar(
            x=device_numbers,
            y=[a - b for a, b in zip(device_counts, device_counts_dec)],
            name='Entire time',
    
    )
data = [trace1, trace2]
layout = go.Layout(
        barmode='stack',
        title="Comparing number of datapoints in December vs entire time"
    )

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Devices 1,2,4,5,6,7 and possibly 13 and 14 need to be double checked. Looks like they have started reporting and then stopped. Let's check last reporting time for every device.

In [118]:
query_ping_last = "SELECT last(PING), time FROM PING GROUP BY SK_PI;"
result_ping_last = client.query(query_ping_last)

In [119]:
for device in device_numbers:
    points_ping_last=result_ping_last.get_points(tags={'SK_PI':str(device)})
    for point in points_ping_last:
        print("Device: ", device," reported last time on ", point['time'])

Device:  1  reported last time on  2018-10-11T15:46:31.790000128Z
Device:  2  reported last time on  2018-11-06T19:16:32.792999936Z
Device:  3  reported last time on  2018-12-20T00:00:00Z
Device:  4  reported last time on  2018-12-05T22:53:02.383000064Z
Device:  5  reported last time on  2018-11-05T21:32:32.552999936Z
Device:  6  reported last time on  2018-10-31T14:55:02.703000064Z
Device:  7  reported last time on  2018-12-18T08:46:02.763000064Z
Device:  8  reported last time on  2018-12-03T19:58:50.129999872Z
Device:  9  reported last time on  2018-12-19T23:59:58.996999936Z
Device:  10  reported last time on  2018-12-20T00:00:00Z
Device:  11  reported last time on  2018-12-20T00:00:00Z
Device:  12  reported last time on  2018-12-20T00:00:00Z
Device:  13  reported last time on  2018-12-04T20:27:07.227000064Z
Device:  14  reported last time on  2018-12-06T02:12:31.350000128Z
Device:  15  reported last time on  2018-12-19T23:59:58.996999936Z
Device:  16  reported last time on  2018-12-

Something is happening with devices  1,2,4,5,6,8,13 and 14. They stopped reporting - does it need to be investigated?

### Ping latency  per device in December

Let's check actual ping latency numbers for December:

In [120]:
query_ping_max_dec = "SELECT MAX(PING), MEAN(PING) FROM PING WHERE time >= '2018-12-01 00:00:00' GROUP BY SK_PI;"
result_ping_max_dec = client.query(query_ping_max_dec)

We will exclude devices  1,2,4,5,6,8,13,14 as they are not reporting any more

In [121]:
#device_numbers_dec = [x for x in device_numbers if x not in [1,2,4,5,6,8,13,14]]
#print(device_numbers_dec)

In [122]:
device_max_dec=[]
device_mean_dec = []
for device in device_numbers_dec:
    points_max_dec=result_ping_max_dec.get_points(tags={'SK_PI':str(device)})
    for point in points_max_dec:
        device_max_dec.append(point['max'])
        device_mean_dec.append(point['mean'])

Plotting mean and max ping latency results for every reporting result in December.

In [123]:
trace1 = go.Bar(
            x=device_numbers_dec,
            y=device_mean_dec,
            name='Mean',
    )
trace2 = go.Bar(
            x=device_numbers_dec,
            y=device_max_dec,
            name='Max',
    
    )
data = [trace1, trace2]
layout = go.Layout(
       # barmode='stack',
        title="Maximum and minimum delay in miliseconds"
    )

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Device 16 has the highest mean, it needs to be, (all these spikes are also visible in Grafana). Something is going on there. Device 13 looks like it was giving high latency but stopped reporting Dec 4th

### To do: 
Find median?  
Check by time of the day?  
Divide ping latencies into groups  and calculate percentage of devices by group?  