<a href="https://colab.research.google.com/github/cognitedata/WiDS-2019/blob/master/WiDS_2019_Cognite_Interact_with_Assets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Visualization and Time series prediction

## What this notebook will achieve

* Extract live data from an oil rig in the North Sea.

* Visualize and inspect data directly from the Cognite Data Platform.

* Apply Linear Regression for time series prediction.


## Getting started

* Having a basic understanding of Python concepts will help to understand the process.

* Cognite has released *live* data to the public on the Cognite Data Platform streaming from [Valhall](https://www.akerbp.com/en/our-assets/production/valhall/), one of Aker's oil fields.

* To access the data, generate an API Key on [Open Industrial Data](https://openindustrialdata.com/). Get your key via the Google Access platform. You will be asked to fill out some personal information to generate your personal key.

* Visualize some of the machines (assets) on Valhall with Cognite's [Operational Intelligence](https://opint.cogniteapp.com/publicdata/infographics/-LOHKEJPLvt0eRIZu8mE) dashboard. This data on this page shows is streaming live data from the Valhall oil field located in the North Sea.

* To understand how to interact with the data using the Python SDK ([Docs](https://cognite-sdk-python.readthedocs-hosted.com/en/latest/)) follow along in this notebook.

## Environment Setup

#### Install the Cognite SDK package

In [1]:
!pip install cognite-sdk

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support[0m
Looking in indexes: https://pypi.python.org/simple, https://que.tran:****@cognite.jfrog.io/cognite/api/pypi/snakepit/simple
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


#### Import the required packages

In [2]:
%matplotlib notebook

import os
from datetime import datetime, timedelta
from datetime import datetime
from getpass import getpass

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression

from cognite.client import CogniteClient

#### Connect to the Cognite Data Platform
* This client object is how all queries will be sent to the Cognite API to retrieve data.

When prompted for your API key, use the key generated by open industrial data as mentioned in the Getting Started steps.

In [3]:
client = CogniteClient(api_key=getpass("Open Industrial Data API-KEY: "), 
                       project="publicdata", 
                       client_name="OID_example")

Open Industrial Data API-KEY: ········


## Accessing Cognite Data Platform (CDP)
* The CDP organizes digital information about the physical world.
* There are 6 resource types stored on the CDP. Each of these objects in the CDP are labelled with a unique ID. Information regarding a specific Asset, Event, etc are often retrieved using this ID.

  * [Assets](https://docs.cognite.com/api/v1/#tag/Assets) are digital representations of physical objects or groups of objects, and assets are organized into an asset hierarchy. For example, an asset can represent a water pump which is part of a subsystem on an oil platform.
  
  * [Events](https://docs.cognite.com/api/v1/#tag/Events) objects store complex information about multiple assets over a time period. For example, an event can describe two hours of maintenance on a water pump and some associated pipes.
  
  * A [Files](https://docs.cognite.com/api/v1/#tag/Files) stores a sequence of bytes connected to one or more assets. For example, a file can contain a piping and instrumentation diagram (P&IDs) showing how multiple assets are connected.
  
  * A [Time Series](https://docs.cognite.com/api/v1/#tag/Time-series) consists of a sequence of data points connected to a single asset. For example: A water pump asset can have a temperature time series that records a data point in units of °C every second.
  
  * [Sequences](https://docs.cognite.com/api/v1/#tag/Sequences) are similar to time series in that they are a key value pair, but rather than using a timestamp as the key, another measurment such as depth could be the key. For example, this is used in practice when drilling and taking measurments at various depths.
  
  * A [3D models](https://docs.cognite.com/api/v1/#tag/3D-Models) model is typically built up by a hierarchical structure. This looks very similar to how we organize our internal asset hierarchy. 3D models are visualized via Cognite's dashboards.
  
* It is important to refer back to the [SDK](https://cognite-sdk-python.readthedocs-hosted.com/en/latest/index.html) for specific details on arguments on all avaiable methods on how to access these objects.

### Collecting Asset Information

#### Retrieve a list of all Assets

* There are thousands of Assets in the CDP, we can have a look at a few examples.

* This will generate a list of assets from the CDP with no particular filters, this is a random result. Generally we would want to apply filters when retrieving records.


In [4]:
client.assets.list(limit=5).to_pandas()

Unnamed: 0,createdTime,description,id,lastUpdatedTime,metadata,name,parentId,rootId
0,0,VRD - PH 1STSTGGEAR THRUST BRG OUT,702630644612,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '525283', ...",23-TE-96116-04,3117826349444493,6687602007296940
1,0,VRD - PH 1STSTG COMP SEAL GAS HTR,5156972057719,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '532924', ...",23-TE-96148,8515799768286580,6687602007296940
2,0,VRD - PH 1STSTGGEAR 1 JOURNBRG DE,8019487489463,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '446683', ...",23-YT-96117-01,3257705896277160,6687602007296940
3,0,SOFT TAG VRD - PH 1STSTG PRIM SEAL LEAK DE,9258567430091,0,"{'ELC_STATUS_ID': '1211', 'SOURCE_DB': 'workma...",23-FI-96151,4239585628663887,6687602007296940
4,0,VRD - PH 1STSTGSUCTSCRUBBER LEVEL,12670864495024,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '523206', ...",23-LT-92521,2069232457199305,6687602007296940


#### Decide on which asset we want to explore
* To get started exploring data in the CDP, we first need to decide on which Asset we want to gather information from.

* Some asset names may be retrieved from the [Op Int](https://opint.cogniteapp.com/publicdata/infographics/-LOHKEJPLvt0eRIZu8mE) dashboard.

* Some example asset names are:
  * 23-HA-9103
  * 23-PV-92583
  * 23-VG-9101
  
The *fuzzy* search for an asset can be performed as followed


In [5]:
asset_name = "23-HA-9103"
asset_df = client.assets.search(name=asset_name).to_pandas()
asset_df.head()

Unnamed: 0,createdTime,description,id,lastUpdatedTime,metadata,name,parentId,rootId
0,0,VRD - 1ST STAGE SUCTION COOLER,2861239574637735,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '531306', ...",23-HA-9103,2513266419866445,6687602007296940
1,0,VRD - 1ST STAGE COMPRESSOR SEAL GAS PRESSURE B...,3089052537026304,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '476494', ...",23-KB-9103,3904753668320840,6687602007296940
2,0,VRD - 1ST STAGE COMPPRESSOR LUBE OIL RESERVOIR,2357112351749647,0,"{'ELC_STATUS_ID': '1211', 'SOURCE_DB': 'workma...",23-TX-9103,2137557577165478,6687602007296940
3,0,VRD - 1ST STAGE COMPRESSOR LUBE OIL COOLER A,4965752723543746,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '786890', ...",23-HA-9107A,2137557577165478,6687602007296940
4,0,VRD - 1ST STAGE COMPRESSOR LUBE OIL COOLER B,6838563873305104,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '786896', ...",23-HA-9107B,2137557577165478,6687602007296940


#### Get information on the asset of interest

* We can filter the assets to get asset-specific details based on asset_name

* The *retrieve()* interface provides the same information for 1 specific asset based on the provided ID


In [6]:
asset_id = int(asset_df[asset_df["name"] == asset_name].iloc[0]['id'])
asset = client.assets.retrieve(id=asset_id).to_pandas()
asset

Unnamed: 0,value
name,23-HA-9103
parentId,2513266419866445
description,VRD - 1ST STAGE SUCTION COOLER
id,2861239574637735
createdTime,0
lastUpdatedTime,0
rootId,6687602007296940
ELC_STATUS_ID,1211
RES_ID,531306
SOURCE_DB,workmate


#### How do we get Asset relationships?

* The interface *get_asset_subtree()* can be used to retrieve the *children* of an Asset. 

* Each Asset is given various properties, some of the useful ones for this method are:

  * Depth: The number of edges from the parent node
  
  * Description: Includes information such as the platform and type of sensor being monitored
  
We will generate a list of all children of the main asset of interest. This is done by specifying a depth of 1.

In [7]:
subtree_df = client.assets.retrieve_subtree(id=asset_id, depth=1).to_pandas()
subtree_df.head()

Unnamed: 0,createdTime,description,id,lastUpdatedTime,metadata,name,parentId,rootId
0,0,VRD - 1ST STAGE SUCTION COOLER,2861239574637735,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '531306', ...",23-HA-9103,2513266419866445,6687602007296940
1,0,VRD - PH 1STSTGSUCTCOOL SHELL PSV IN,274450897701725,0,"{'ELC_STATUS_ID': '1225', 'RES_ID': '444134', ...",45-HV-92510-01,2861239574637735,6687602007296940
2,0,VRD - PH 1STSTGSUCTCLR GAS IN,576308321452985,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '609895', ...",23-ESDV-92501A,2861239574637735,6687602007296940
3,0,VRD - PH 1STSTGSUCTCOOL SHELL PSV OUT,619750565594754,0,"{'ELC_STATUS_ID': '1225', 'RES_ID': '510103', ...",45-HV-92510-03,2861239574637735,6687602007296940
4,0,VRD - PH 1STSTGSUCTCOOL COOLMED OUT,705952550422793,0,"{'ELC_STATUS_ID': '1211', 'RES_ID': '485917', ...",45-PT-92508,2861239574637735,6687602007296940


### Collecting Time Series Information and Data Points

* Time Series objects contain the bulk of information in the CDP.

* Time Series objects are generally linked to an asset through the asset_id field.

* The *list()* method has a variety of filters, all are listed in the SDK.

* By specifying **asset_ids** we will get a dataframe of all the associated time series objects with at least one of the specified assets. This is seen below.

#### Compile a list of time series objects under the asset
* Get the associated time series objects for al the children assets.

In [8]:
client.time_series.list(asset_ids=subtree_df.id.tolist()).to_pandas()

Unnamed: 0,assetId,createdTime,description,externalId,id,isStep,isString,lastUpdatedTime,metadata,name
0,5439867226448359,0,PH 1stStgSuctCool Gas In,VAL_23-TT-92512:X.Value,988967451935968,False,False,0,"{'tag': 'VAL_23-TT-92512:X.Value', 'scan': '1'...",VAL_23-TT-92512:X.Value
1,7835659687560027,0,PH 1stStgSuctCool Shell,VAL_45-PDT-92506:X.Value,1804879717768950,False,False,0,"{'tag': 'VAL_45-PDT-92506:X.Value', 'scan': '1...",VAL_45-PDT-92506:X.Value
2,5497674254221099,0,PH 1stStgSuctCool CoolMed In,VAL_45-FT-92506:X.Value,1920137775628302,False,False,0,"{'tag': 'VAL_45-FT-92506:X.Value', 'scan': '1'...",VAL_45-FT-92506:X.Value
3,7390359759479147,0,PH 1stStgSuctClr CoolMed Sply,VAL_45-TT-92506:X.Value,2288456938237513,False,False,0,"{'tag': 'VAL_45-TT-92506:X.Value', 'scan': '1'...",VAL_45-TT-92506:X.Value
4,786220428505816,0,PH 1stStgSuctCool Gas In,VAL_23-FT-92512:X.Value,3518012501014915,False,False,0,"{'tag': 'VAL_23-FT-92512:X.Value', 'scan': '1'...",VAL_23-FT-92512:X.Value
5,705952550422793,0,PH 1stStgSuctCool CoolMed Out,VAL_45-PT-92508:X.Value,3529821833330815,False,False,0,"{'tag': 'VAL_45-PT-92508:X.Value', 'scan': '1'...",VAL_45-PT-92508:X.Value
6,5891006566061532,0,PH 1stStgSuctClr CoolMed Out,VAL_45-TT-92508:X.Value,3929156348065703,False,False,0,"{'tag': 'VAL_45-TT-92508:X.Value', 'scan': '1'...",VAL_45-TT-92508:X.Value
7,4840206559741735,0,PH 1stStgSuctCool Gas In,VAL_23-TT-92502:X.Value,5474031062875475,False,False,0,"{'tag': 'VAL_23-TT-92502:X.Value', 'scan': '1'...",VAL_23-TT-92502:X.Value
8,5552927149248373,0,PH 1stStgSuctCool Gas In ESDV,VAL_23-PDT-92501:X.Value,5880632484472759,False,False,0,"{'tag': 'VAL_23-PDT-92501:X.Value', 'scan': '1...",VAL_23-PDT-92501:X.Value
9,2814662602621825,0,PH 1stStgSuctCool Gas In,VAL_23-PT-92512:X.Value,6156871056679530,False,False,0,"{'tag': 'VAL_23-PT-92512:X.Value', 'scan': '1'...",VAL_23-PT-92512:X.Value


#### View datapoints for one of the time series

* Now that we have a list of all of the time series directly related to the immediete childern of our asset, we can retrieve datapoints for some of the them.

* A Datapoint in the CDP is stored as a key value pair

  * timestamp is the time since epoch in milliseconds
  
  * value is the reading from the sensor
  
The identifier to retrieve Datapoints is the **name** column from the DataFrame above.

In [9]:
ts_id = 988967451935968
client.datapoints.retrieve(id=ts_id, start="10d-ago", end="now").to_pandas().head()

Unnamed: 0,VAL_23-TT-92512:X.Value
2019-11-04 08:50:21.205,39.820095
2019-11-04 08:50:40.247,39.792019
2019-11-04 08:51:22.215,39.820095
2019-11-04 08:51:32.235,39.876251
2019-11-04 08:51:43.221,39.932404
