# Processing IoT data 

## Summary 

This notebook explains how to process telemetry data coming from IoT devices that arrives trough a gateway enabled edgeHub.

## Description

The purpose of this notebook is to explain and guide the reader onto how to process telemetry data generated from IoT devices whitin the DSVM IoT extension.

## Requirements

* A gateway enabled Edge Runtime. See 'Setting up IoT Edge'
* A sniffer architecture deployed. See 'Obtaining IoT Telemetry'
* A device sending telemetry to your gateway. for this notebook we have choosed the scenario where a device is sending Temperature telemetry.

## Documentation

* https://tutorials-raspberrypi.com/raspberry-pi-measure-humidity-temperature-dht11-dht22/
* http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/

## Step 1: Reading generated data

During this step we are going to load the data generated from the IoT devices. The sniffer module mounts a docker volume in order to share data between the module and the host, in order to retrieve the path where the module is storing it's data you can run the following:


In [None]:
%%bash
# Listing the volumes
sudo docker volume ls

In [None]:
%%bash
# Getting the volume path
sudo docker inspect <volume id>

Copy the path that was obtained as result from the last command.

Since the file location is protected, we are going to make a directory and copy the file over there. The name of the file generated from the module is called data.json

Path: volume_path/data.json


In [None]:
%%bash
## Making a directory
mkdir "/home/$USER/IoT/Data"
sudo cp <file path> "/home/$USER/IoT/Data/data.json"
## 

Next, we are going to extract the data using python.

In [None]:
import json
import numpy as np

## Reading the data from the file
## Note Change your user
path = "/home/<user>/IoT/Data"
file = path + "/data.json"
data = {}
with open(file) as f:
    for line in f.readlines():
        sample = json.loads(line)
        for key in sample.keys():
            if key in data:
                data[key].append([len(data[key]),sample[key]])
            else:
                data[key] = []
                data[key].append([len(data[key]),sample[key]])

temperature = np.array(data['temperature'])


## Step 2: Using a low pass filter in order to detect anomalies 

"The simplest approach to identifying irregularities in data is to flag the data points that deviate from common statistical properties of a distribution, including mean, median, mode, and quantiles. Let's say the definition of an anomalous data point is one that deviates by a certain standard deviation from the mean. Traversing mean over time-series data isn't exactly trivial, as it's not static. You would need a rolling window to compute the average across the data points. Technically, this is called a rolling average or a moving average, and it's intended to smooth short-term fluctuations and highlight long-term ones. Mathematically, an n-period simple moving average can also be defined as a 'low pass filter.' "


In this next step we are going to build a low pass filter (moving average) using discrete linear convolution to detect anomalies in our telemetry data. Check the documentation for a more detailed explanation of the theory.


In [None]:
from __future__ import division
from itertools import  count
import matplotlib.pyplot as plt
from numpy import linspace, loadtxt, ones, convolve
import pandas as pd
import collections
from random import randint
from matplotlib import style
style.use('fivethirtyeight')
%matplotlib inline
print(temperature)
## Adding some noise
#temperature[50][1] = 50.0
#temperature[100][1] = 75.0
#temperature[150][1] = 50.0
#temperature[200][1] = 75.0
data_as_frame = pd.DataFrame(temperature, columns=['index','temperature'])
data_as_frame.head()

In [None]:
# Computes moving average using discrete linear convolution of two one dimensional sequences.
def moving_average(data, window_size):
    window = np.ones(int(window_size))/float(window_size)
    return np.convolve(data, window, 'same')

# Helps in exploring the anamolies using stationary standard deviation
def explain_anomalies(y, window_size, sigma=1.0):
    avg = moving_average(y, window_size).tolist()
    residual = y - avg
    # Calculate the variation in the distribution of the residual
    std = np.std(residual)
    return {'standard_deviation': round(std, 3),
            'anomalies_dict': collections.OrderedDict([(index, y_i) for
                                                       index, y_i, avg_i in zip(count(), y, avg)
              if (y_i > avg_i + (sigma*std)) | (y_i < avg_i - (sigma*std))])}

# Helps in exploring the anamolies using rolling standard deviation
def explain_anomalies_rolling_std(y, window_size, sigma=1.0):
    avg = moving_average(y, window_size)
    avg_list = avg.tolist()
    residual = y - avg
    # Calculate the variation in the distribution of the residual
    testing_std = pd.rolling_std(residual, window_size)
    testing_std_as_df = pd.DataFrame(testing_std)
    rolling_std = testing_std_as_df.replace(np.nan,
                                  testing_std_as_df.ix[window_size - 1]).round(3).iloc[:,0].tolist()
    std = np.std(residual)
    return {'stationary standard_deviation': round(std, 3),
            'anomalies_dict': collections.OrderedDict([(index, y_i)
                                                       for index, y_i, avg_i, rs_i in zip(count(),
                                                                                           y, avg_list, rolling_std)
              if (y_i > avg_i + (sigma * rs_i)) | (y_i < avg_i - (sigma * rs_i))])}


# This function is repsonsible for displaying how the function performs on the given dataset.
def plot_results(x, y, window_size, sigma_value=1,text_xlabel="X Axis", text_ylabel="Y Axis", applying_rolling_std=False):
    plt.figure(figsize=(15, 8))
    plt.plot(x, y, "k.")
    y_av = moving_average(y, window_size)
    plt.plot(x, y_av, color='green')
    plt.xlim(0, 1000)
    plt.xlabel(text_xlabel)
    plt.ylabel(text_ylabel)

    # Query for the anomalies and plot the same
    events = {}
    if applying_rolling_std:
        events = explain_anomalies_rolling_std(y, window_size=window_size, sigma=sigma_value)
    else:
        events = explain_anomalies(y, window_size=window_size, sigma=sigma_value)

    x_anomaly = np.fromiter(events['anomalies_dict'].keys(), dtype=int, count=len(events['anomalies_dict']))
    y_anomaly = np.fromiter(events['anomalies_dict'].values(), dtype=float,
                                            count=len(events['anomalies_dict']))
    plt.plot(x_anomaly, y_anomaly, "r*", markersize=12)

    # add grid and lines and enable the plot
    plt.grid(True)
    plt.show()

In [None]:
x = data_as_frame['index']
Y = data_as_frame['temperature']


# plot the results
plot_results(x, y=Y, window_size=10, text_xlabel="Moment", sigma_value=3,
             text_ylabel="Temperature")
events = explain_anomalies(y, window_size=5, sigma=3)

# Display the anomaly dict
print("Information about the anomalies model:{}".format(events))