### AppDev Part One: Timeseries Data from Predix
by: Alireza Dibazar
dibazar@gmail.com

# Summary:

We will learn to develope an app for live streaming of IoT data. The app is suppose to provide analytics-based anomaly detection with details needed to guide field engineers. The data is fetched from Predix Timeseries database and it provides readings of a Three-axis accelerometer which installed in a gas turbine. It is suggested to see the deployed version of the app before the exercise. The app can be found at the following url: https://sample-live-streaming.run.aws-usw02-pr.ice.predix.io/ and [document here](./Documents/Sample_live_Stream.pptx) (Sample_live_Stream.pptx)



### Objectives:
1. Learn planning of the project based of agile methodology
2. Learn how to read data from Cassandra -- Predix Timeseries database -- and be able to change sampling frequency of the data
3. Be able to visually demonstrate field engineers pain point and convert that to a wireframe
4. Design analytics
5. Learn to design app layout based on wireframe
6. Be able to deply the app and operationalize it for users




### This notebook 
THis notebook is focused on reading data from Predix TimeSeries database. Though content of this notebook will be used for live streaming of timeseries, here students employ this code to access historical data for offline data analysis and model building.

please note that the historical data which is stored in the Predix, does not include any anomalies and therefore students are encoraged to consider models such as PCA, T2-Hotelling, similarity based models.


### Assignment
1. Study Predix Timeseries website: https://docs.predix.io/en-US/content/service/data_management/time_series/
2. Creat three data frame and read data from the database interval at three frequencies "interpolation": one, 10, and 60 seconds; plot and visually inspect differences
3. Read and store data in your local computer and build a model

In [19]:
# This section imports necessary packages 

import requests
import json
import numpy as np
import pandas as pd
import os
import datetime as dt
# import http.client

In [20]:
# Tag names of the intrest are: 'Feather2.GX', 'Feather2.GY', and 'Feather2.GZ'
# 
Tags = ['Feather2.GX', 'Feather2.GY', 'Feather2.GZ']

## How to read data from Predix Timeseries database
Predix Timeseries database has been built on Cassandra. For the purpose of this course we have made an API call to access the data and bring it to working environment.

1. We need an access token; the token expires every "expiration_time" which is about 24hrs
2. The token and expiration time is stored in a file "Data/token_expiration_time.txt"
3. When requesting new data from database we use the available token unless it is expired
4. For this project Three tags are read from the database. They are namely 'Feather2.GX', 'Feather2.GY', 'Feather2.GZ'
5. For a single read we need to define start_time and end_time. These times have to be UTC and in milisecond
6. We can dynamically change "aggregations". In this practice it is set to be 60 second. 

More information can be found in https://docs.predix.io/en-US/content/service/data_management/time_series/ 



In [21]:
def payload_json(start_time, end_time, Tags, interval):
    '''
    
    Takes three inputs parameters and creates a dictionary in json format. The Three inputs are:
    a) start time of data, b) end time of data, and c) Tag names for which data are pulled.
    The output is a JSON string of the reqest which will be used
    when we send the request to the database: requests.request(...,...,data = payload_json(m, n, Tags), ...)
    
    '''
    
    q = {
#       "cache_time": 0,
      "tags": [
        {
          "name": Tags,
          "aggregations": [{"type": "interpolate", "interval": interval}],
          "order": "asc"
        }
      ],
      "start": start_time,
      "end": end_time
    }
#     print(json.dumps(q))
    return json.dumps(q)

In [22]:
def create_tidy_df_from_jsondict(json_dict):
    ''' 
    
    Extract data from JSON string and stores in a dataframe
    
    '''
    
    times, tags, values = [], [], []
    
    for tag_dict in json_dict['tags']:

        val_list = tag_dict['results'][0]['values']
        
        for v in val_list:
            times.append(v[0])
            tags.append(tag_dict['name'])
            values.append(v[1])

    df = pd.DataFrame({'time':times, 'tag':tags, 'value': values})
    
    df['value'] = df['value'].astype(np.float, copy=True, errors='ignore')
    return(df)


In [23]:
def get_token():
    '''
    Function to get Authorization token.
    Students are aksed to read details from Predix Timeseries website

    '''
    
    url = "https://d1e53858-2903-4c21-86c0-95edc7a5cef2.predix-uaa.run.aws-usw02-pr.ice.predix.io/oauth/token"

    payload = "grant_type=client_credentials"
    headers = {
        'Content-Type': "application/x-www-form-urlencoded",
        'Authorization': "Basic cHJlZGl4YXZlbmdlcnNzYl90czpZVzlLYVNIYXRoRTVibTh2RzhLRnlmWUY=",
        'User-Agent': "PostmanRuntime/7.13.0",
        'Accept': "*/*",
        'Cache-Control': "no-cache",
        'Postman-Token': "fe4920c7-5519-486d-bd08-8c42dfd712d1,a9da8cf0-f8db-4955-82e4-44de44058f96",
        'Host': "d1e53858-2903-4c21-86c0-95edc7a5cef2.predix-uaa.run.aws-usw02-pr.ice.predix.io",
        'accept-encoding': "gzip, deflate",
        'content-length': "29",
        'Connection': "keep-alive",
        'cache-control': "no-cache"
        }
    response = requests.request("POST", url, data=payload, headers=headers)
    data = response.text
#     print(response.text)

    json_dict = json.loads(data)
    token = 'Bearer '+json_dict['access_token']
    expires_in = json_dict['expires_in']
#     print(data.decode("utf-8"))

    utc_tm = int(dt.datetime.utcnow().timestamp())
    expiration_time = utc_tm + expires_in
    
    fid = open("Data/token_expiration_time.txt","w")
    fid.write(str(expiration_time)+'\n')
    fid.write(token)
    fid.close()
    return token


In [24]:
def get_data_from_timeseries_database(m,n,Tags,authorization, interval):
    '''
    # This function fetches data from time m to time n for tag names specified in varibale named "Tags"
    '''
    
    print('This is get_data_from_timeseries_database()','\n')
    url = "https://time-series-store-predix.run.aws-usw02-pr.ice.predix.io/v1/datapoints/"
    interval = interval
    payload = payload_json(m,n,Tags, interval)
    headers = {
        'Content-Type': "application/json",
        'Authorization': authorization,
        'Predix-Zone-Id': "38357f8f-2ca8-4b67-9479-2a0748c8becd",
        'User-Agent': "PostmanRuntime/7.13.0",
        'Accept': "*/*",
        'Cache-Control': "no-cache",
        'Postman-Token': "d3cf433e-75cd-4d85-a671-ec2b3b5daf0c,db1e50b3-bcb2-4723-bed2-96a182e5bb8e",
        'Host': "time-series-store-predix.run.aws-usw02-pr.ice.predix.io",
        'accept-encoding': "gzip, deflate",
        'content-length': "148",
        'Connection': "keep-alive",
        'cache-control': "no-cache"
        }

    response = requests.request("POST", url, data=payload, headers=headers)

    return(response.text)


### EXAMPLE: Create a function to read a block of data from 2019-07-17 to 2019-07-30

In [25]:

def get_some_data(start, end, interval):
    
#     read token and if it is expired request a new token
    fid = open("Data/token_expiration_time.txt","r")
    utc_tm = int(dt.datetime.utcnow().timestamp())
    expiration_time = fid.readline()
    authorization = fid.readline()
    fid.close()
    
    print('Token Expired: ', (int(expiration_time)- utc_tm)<=0 ,'\n')
    if (int(expiration_time) <= utc_tm):
        authorization = get_token()
#         print(authorization,'\n')
    
#     convert timestring to UTC
    m = dt.datetime.strptime(start, '%Y-%m-%d %H:%M:%S').timestamp()
    n = dt.datetime.strptime(end, '%Y-%m-%d %H:%M:%S').timestamp()
    M = m*1000                  # epoch in milisec
    N = n*1000                  # epoch in milisec
    interval = '1s'
        
    data = get_data_from_timeseries_database(M,N,Tags,authorization, interval)

    json_data = json.loads(data)

    _df = create_tidy_df_from_jsondict(json_data)
    _df = _df.drop_duplicates()
    df = _df.pivot(index='time', columns='tag', values='value')

    for c in Tags:
        if c not in df.columns:
            df[c] =np.nan

    cols = df.columns
    new_cols = []
    
    for c in cols:
        new_cols.append(c.split('.')[-1])
        
    df.columns = new_cols
    df = df[sorted(df.columns)]
    
    return(df)

In [26]:
#data = get_some_data()


2019-07-17 00:00:00
2019-07-18 11:59:59

## Store Data in local hard drive for off-line analysis and model building

#### Read and store data in your local computer and build your model

start = '2019-07-17 00:00:00'
end = '2019-07-18 11:59:59'
interval = '1s'
df_1s = get_some_data(start, end, interval)
pd.write_csv(df_1s, 'sample_data_1s')

start = '2019-07-17 00:00:00'
end = '2019-07-30 11:59:59'
interval = '10s'
df_10s = get_some_data(start, end, interval)
pd.write_csv(df_10s, 'sample_data_10s')

In [27]:
start = '2019-09-23 00:00:00'
end = '2019-09-23 11:59:59'
interval = '60s'
df_60s = get_some_data(start, end, interval)
df_60s.to_csv('data_60s', sep=';')

Token Expired:  False 

This is get_data_from_timeseries_database() 

