# Time Stitching Demo

This notebook demonstrates how to use relationships to stitch together timeseries

In [1]:
from matplotlib import pyplot as plt
import pandas
import os
from datetime import datetime
from getpass import getpass
import uuid
import random
from cognite.client.experimental import CogniteClient
from cognite.client.data_classes import Asset, Relationship, TimeSeries
from cognite.client.exceptions import *

# The Scenario
There is a heat sensor on an asset in an asset hiearchy. It has broken and been replaced. The timeseries are associated with the specific sensor units. While the first sensor was broken, it generated known bad readings. How to correctly stitch together data from two sensors to give a complete and accurate picture of the temperature of this asset over time?

# Relationships
Relationships have optional start and end times, indicating a time interval during which the relationship between the resources is considered _active_. By using the start and end times on the relationships between the sensor units and the heat sensor in the asset hiearchy, we can select temperature data from the correct time series at the correct time and stitch it together into a single pandas dataframe!

**For this demo you will need**: access to a CDF tenant with the playground APIs enabled and an api key for a user with read and write access to **assets**, **timeseries**, **datapoints** and **relationships**

# RUN THE CLEAN UP STEP WHEN FINISHED
This demo creates model data to demonstrate using relationships for time series, which will be cleaned up in the **CLEAN UP STEP**. Be sure to run it when you are done!

In [2]:
project = input("Enter project name: ")

Enter project name:  caelyn-first-project


In [3]:
base_url = input("Enter base url for project (leave blank to use greenfield)")
if (base_url == ""):
    base_url = "https://greenfield.cognitedata.com"

print("Using base url: ", base_url)

Enter base url for project (leave blank to use greenfield) 


Using base url:  https://greenfield.cognitedata.com


In [4]:
api_key_request = "Enter API-KEY for % s :" % project
api_key = getpass(api_key_request)

Enter API-KEY for caelyn-first-project : ················································


In [5]:
client_name = "Time Stitching Demo"
client = CogniteClient(project=project,api_key=api_key,client_name=client_name, base_url=base_url)

# Setting up Assets

A **widgetHeatSensor** tracks the temperature of a widget. A **heatSensor** is a specific hest sensor instance.

In [6]:
source = "time_stitching_demo"
widget_heat_sensor_external_id = "widgetHeatSensor:" + str(uuid.uuid4())
widget_heat_sensor = Asset(
    external_id=widget_heat_sensor_external_id,
    name="widgetHeatSensor",
    description="A heat sensor for a widget",
    source=source
)
sensor_instance_1_external_id = "heatSensor:" + str(uuid.uuid4())
sensor_instance_1 = Asset(
    external_id=sensor_instance_1_external_id,
    parent_external_id = widget_heat_sensor_external_id,
    name="heatSensor",
    description="A heat sensor unit",
    source=source
)
sensor_instance_2_external_id = "heatSensor:" + str(uuid.uuid4())
sensor_instance_2 = Asset(
    external_id=sensor_instance_2_external_id,
    parent_external_id=widget_heat_sensor_external_id,
    name="heatSensor",
    description="A heat sensor unit",
    source=source
)
assets = client.assets.create([widget_heat_sensor, sensor_instance_1, sensor_instance_2]).to_pandas()
assets

Unnamed: 0,externalId,name,description,metadata,source,id,createdTime,lastUpdatedTime,rootId,parentId,parentExternalId
0,widgetHeatSensor:22b8de78-0a2e-4930-b1a8-f7315...,widgetHeatSensor,A heat sensor for a widget,{},time_stitching_demo,1558048972229913,1586264971236,1586264971236,1558048972229913,,
1,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b,heatSensor,A heat sensor unit,{},time_stitching_demo,7026854450382658,1586264971236,1586264971236,1558048972229913,1558048972229913.0,widgetHeatSensor:22b8de78-0a2e-4930-b1a8-f7315...
2,heatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c8,heatSensor,A heat sensor unit,{},time_stitching_demo,7049557553599949,1586264971236,1586264971236,1558048972229913,1558048972229913.0,widgetHeatSensor:22b8de78-0a2e-4930-b1a8-f7315...


# Setting up TimeSeries
Each **heatSensor** has its own **timeSeries**

In [7]:
sensor_instance_1_id = assets.loc[assets["externalId"] == sensor_instance_1_external_id].iloc[0]["id"]
sensor_instance_2_id = assets.loc[assets["externalId"] == sensor_instance_2_external_id].iloc[0]["id"]

sensor_1_timeseries_external_id = sensor_instance_1_external_id + ":timeseries"
sensor_1_timeseries = TimeSeries(
    external_id=sensor_1_timeseries_external_id,
    asset_id=sensor_instance_1_id,
    name="heatSensorTimeSeries",
    description="A time series for a heat sensor",
    unit="degrees, celsius"
)
sensor_2_timeseries_external_id = sensor_instance_2_external_id + ":timeseries"
sensor_2_timeseries = TimeSeries(
    external_id=sensor_2_timeseries_external_id,
    asset_id=sensor_instance_2_id,
    name="heatSensorTimeSeries",
    description="A time series for a heat sensor",
    unit="degrees, celsius"
)
client.time_series.create([sensor_1_timeseries, sensor_2_timeseries])

Unnamed: 0,id,external_id,name,is_string,unit,asset_id,is_step,description,created_time,last_updated_time
0,4620851781779103,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175...,heatSensorTimeSeries,False,"degrees, celsius",7026854450382658,False,A time series for a heat sensor,1586264982766,1586264982766
1,1170316637448719,heatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c...,heatSensorTimeSeries,False,"degrees, celsius",7049557553599949,False,A time series for a heat sensor,1586264982766,1586264982766


# Populating TimeSeries
The **first sensor** breaks at 2017-10-15:14:35:47 and begins recording incorrect readings, 50-150 degrees higher than the acutal temperature. The **second sensor** is installed and begins recording correct data at 2017-10-15:14:41:42.

In [8]:
sensor_break_time = int(datetime(2017, 10, 15, 14, 35, 47).timestamp() * 1000)
sensor_replacement_time = int(datetime(2017, 10, 15, 14, 41, 42).timestamp() * 1000)

sensor_1_start = sensor_break_time - (10 * 1000)
sensor_1_end = sensor_replacement_time + (10 * 1000)
sensor_1_data = lambda x: random.uniform(50, 55) if x < sensor_break_time else random.uniform(100,200)
sensor_1_datapoints = [(x, sensor_1_data(x)) for x in range(sensor_1_start, sensor_1_end, 1000)] 

sensor_2_start = sensor_replacement_time
sensor_2_end = sensor_replacement_time + (30 * 1000)
sensor_2_data = lambda x: random.uniform(50,55)
sensor_2_datapoints = [(x, sensor_2_data(x)) for x in range(sensor_2_start, sensor_2_end, 1000)]

client.datapoints.insert(sensor_1_datapoints, external_id=sensor_1_timeseries_external_id)
client.datapoints.insert(sensor_2_datapoints, external_id=sensor_2_timeseries_external_id)

# Setting up Relationships

The **timeSeries** _belongsTo_ their **heatSensor**. This relationship has no time range associated with it as it is considered _active_ at all times. The **heatSensor** _implements_ the **widgetHeatSensor**. The first **heatSensor** has an _endTime_ equal to when that sensor broke. The second **heatSensor** has a _startTime_ equal to when that sensor began recording data.

In [9]:
widget_heat_sensor_resource = {"resource": "asset", "resourceId": widget_heat_sensor_external_id}
sensor_1_resource = {"resource": "asset", "resourceId": sensor_instance_1_external_id}
sensor_2_resource = {"resource": "asset", "resourceId": sensor_instance_2_external_id}
sensor_1_timeseries_resource = {"resource": "timeseries", "resourceId": sensor_1_timeseries_external_id}
sensor_2_timeseries_resource = {"resource": "timeseries", "resourceId": sensor_2_timeseries_external_id}

sensor_1_to_timeseries_relationship_external_id = sensor_instance_1_external_id + ":readings"
sensor_1_to_timeseries_relationship = Relationship(
    external_id=sensor_1_to_timeseries_relationship_external_id,
    relationship_type="belongsTo",
    confidence="0.99",
    source=sensor_1_timeseries_resource,
    target=sensor_1_resource,
    data_set=source
)
sensor_2_to_timeseries_relationship_external_id = sensor_instance_2_external_id + ":readings"
sensor_2_to_timeseries_relationship = Relationship(
    external_id=sensor_2_to_timeseries_relationship_external_id,
    relationship_type="belongsTo",
    confidence="0.99",
    source=sensor_2_timeseries_resource,
    target=sensor_2_resource,
    data_set=source
)
widget_heat_sensor_implementation_1_external_id = widget_heat_sensor_external_id + ":impl:1"
widget_heat_sensor_1_relationship = Relationship(
    external_id=widget_heat_sensor_implementation_1_external_id,
    relationship_type="implements",
    confidence="0.99",
    source=sensor_1_resource,
    target=widget_heat_sensor_resource,
    data_set=source,
    end_time = sensor_break_time
)
widget_heat_sensor_implementation_2_external_id = widget_heat_sensor_external_id + ":impl:2"
widget_heat_sensor_2_relationship = Relationship(
    external_id=widget_heat_sensor_implementation_2_external_id,
    relationship_type="implements",
    confidence="0.99",
    source=sensor_2_resource,
    target=widget_heat_sensor_resource,
    data_set=source,
    start_time = sensor_replacement_time
)
relationships = [sensor_1_to_timeseries_relationship, sensor_2_to_timeseries_relationship, widget_heat_sensor_1_relationship, widget_heat_sensor_2_relationship]
client.relationships.create(relationships)

Unnamed: 0,source,target,confidence,data_set,external_id,relationship_type,created_time,last_updated_time,end_time,start_time
0,{'resourceId': 'heatSensor:a513a7b6-a67b-440a-...,{'resourceId': 'heatSensor:a513a7b6-a67b-440a-...,0.99,time_stitching_demo,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175...,belongsTo,1586265019896,1586265019896,,
1,{'resourceId': 'heatSensor:985ccacb-1594-4ea0-...,{'resourceId': 'heatSensor:985ccacb-1594-4ea0-...,0.99,time_stitching_demo,heatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c...,belongsTo,1586265019896,1586265019896,,
2,{'resourceId': 'heatSensor:a513a7b6-a67b-440a-...,{'resourceId': 'widgetHeatSensor:22b8de78-0a2e...,0.99,time_stitching_demo,widgetHeatSensor:22b8de78-0a2e-4930-b1a8-f7315...,implements,1586265019896,1586265019896,1508078147000.0,
3,{'resourceId': 'heatSensor:985ccacb-1594-4ea0-...,{'resourceId': 'widgetHeatSensor:22b8de78-0a2e...,0.99,time_stitching_demo,widgetHeatSensor:22b8de78-0a2e-4930-b1a8-f7315...,implements,1586265019896,1586265019896,,1508078502000.0


# Finding the widgetHeatSensor
We start by selecting the **widgetHeatSensor**

In [10]:
heat_sensors = client.assets.list(external_id_prefix="widgetHeatSensor")
heat_sensors

Unnamed: 0,external_id,name,description,metadata,source,id,created_time,last_updated_time,root_id
0,widgetHeatSensor:22b8de78-0a2e-4930-b1a8-f7315...,widgetHeatSensor,A heat sensor for a widget,{},time_stitching_demo,1558048972229913,1586264971236,1586264971236,1558048972229913


# Finding the heatSensors
Once we have our **widgetHeatSensor**, we want to find which **heatSensor**s _implements_ our **widgetHeatSensor** and when. To make this easier to follow, we move the **externalId** of the sensors into their own column and drop all the other columns we won't need.

In [16]:
sensors = client.relationships.list(
    relationship_type="implements",
    targets = [{"resourceId": heat_sensors[0].external_id}]).to_pandas()
sensors["sensorId"] = sensors["source"].map(lambda source: str(source["resourceId"]))
sensor_filter_data = sensors[["sensorId", "startTime", "endTime"]]
sensor_filter_data

Unnamed: 0,sensorId,startTime,endTime
0,heatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c8,1508078502000.0,
1,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b,,1508078147000.0


# Finding the TimeSeries
The next step, now that we have our **sensorIds** is to find the timeseries that have a relationship to those sensors. 

In [19]:
matching_sensor_ids = sensor_filter_data["sensorId"].tolist()
sensor_timeseries_relationships = client.relationships.list(
    sources=[{"resource": "timeseries"}],
    targets = [{"resourceId" : ms} for ms in matching_sensor_ids]).to_pandas()
sensor_timeseries_source_target = sensor_timeseries_relationships[["target", "source"]]
sensor_timeseries_external_ids = sensor_timeseries_source_target.applymap(lambda x: str(x["resourceId"]))

sensor_timeseries_column_renaming = { "source": "timeseriesId", "target": "sensorId" }
sensor_timeseries = sensor_timeseries_external_ids.rename(columns=sensor_timeseries_column_renaming)
sensor_timeseries

Unnamed: 0,sensorId,timeseriesId
0,heatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c8,heatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c...
1,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175...


# Building a filter for Datapoints

Let's put it all together! **sensor_timeseries** connects the **sensorId** to the **timeseriesId**. **sensor_filter_data** connects the **sensorId** to the sensor's active timerange. We put the two dataframes together and then normalize the **startTime** (to 0) and **endTime** (to the current time), which gives us everything we need to retrive our datapoints! We then sort by **startTime** so that we will get the early set of data first.

In [20]:
datapoints_filter_dataframe = sensor_filter_data.merge(sensor_timeseries, how="inner", on="sensorId")
normalize_start = lambda x: x if isinstance(x,int) else 0
datapoints_filter_dataframe["startTime"] = datapoints_filter_dataframe["startTime"].apply(normalize_start)
now = int(datetime.now().timestamp()) * 1000
normalize_end = lambda x: x if isinstance(x,int) else now
datapoints_filter_dataframe["endTime"] = datapoints_filter_dataframe["endTime"].apply(normalize_end)
datapoints_filters = datapoints_filter_dataframe.to_dict(orient="records")
datapoints_filters.sort(key=lambda x: x["startTime"])
datapoints_filters

[{'sensorId': 'heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b',
  'startTime': 0,
  'endTime': 1508078147000,
  'timeseriesId': 'heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b:timeseries'},
 {'sensorId': 'heatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c8',
  'startTime': 1508078502000,
  'endTime': 1586270909000,
  'timeseriesId': 'heatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c8:timeseries'}]

# Fetching and Appending the DataPoints

For each of the filters we built above, we want to fetch the matching set of datapoints. And then we normalize the column names on all the resulting pandas dataframes to just **data**, so that we can append them together.

In [21]:
datapoints = [
    client.datapoints.retrieve(
        external_id = fd["timeseriesId"],
        start = fd["startTime"],
        end = fd["endTime"]).to_pandas().rename(
        columns={ fd["timeseriesId"]: "data" }) for fd in datapoints_filters
    ]
final_dataframe = datapoints[0].append(datapoints[1])
final_dataframe

Unnamed: 0,data
2017-10-15 14:35:37,50.169152
2017-10-15 14:35:38,51.423075
2017-10-15 14:35:39,50.655519
2017-10-15 14:35:40,50.345947
2017-10-15 14:35:41,50.626169
2017-10-15 14:35:42,52.447916
2017-10-15 14:35:43,54.6348
2017-10-15 14:35:44,53.924057
2017-10-15 14:35:45,53.40086
2017-10-15 14:35:46,52.876919


And there is a dataframe containing a single timeseries for the temperature of our asset!

# CLEAN UP STEP

Make sure to run the cell below to clean up after the **assets**, **relationships** and **timeseries** we created for this demo!

In [22]:
relationship_external_ids = [
    sensor_1_to_timeseries_relationship_external_id,
    sensor_2_to_timeseries_relationship_external_id,
    widget_heat_sensor_implementation_1_external_id,
    widget_heat_sensor_implementation_2_external_id
]
client.relationships.delete(relationship_external_ids)

timeseries_external_ids = [
    sensor_1_timeseries_external_id,
    sensor_2_timeseries_external_id
]
for timeseries_external_id in timeseries_external_ids:
    client.datapoints.delete_range(start=0, end=now, external_id=timeseries_external_id)
    client.time_series.delete(external_id=timeseries_external_id)

asset_external_ids = [
    widget_heat_sensor_external_id,
    sensor_instance_1_external_id,
    sensor_instance_2_external_id
]
client.assets.delete(external_id=asset_external_ids)

# Thank You!