# Time Stitching Demo

This notebook demonstrates how to use relationships to stitch together timeseries

## Suggested pre-read

Before you begin you should have a look over the api documentation for relationships. https://cognite-sdk-experimental.readthedocs-hosted.com/en/latest/cognite.html#relationships

## Relationships
Relationships have optional start and end times, indicating a time interval during which the relationship between the resources is considered _active_. By using the start and end times on the relationships between the sensor units and the heat sensor in the asset hiearchy, we can select temperature data from the correct time series at the correct time and stitch it together into a single pandas dataframe!

**For this demo you will need**: access to a CDF tenant and an api key for a user with read and write access to `assets`, `timeseries` and `relationships`.

## The Scenario

There is a heat sensor on an asset in an asset hiearchy. It has broken and been replaced. The timeseries are associated with the specific sensor units. While the first sensor was broken, it generated known bad readings.

This tutorial shows you how you can correctly stitch together data from two sensors to give a complete and accurate picture of the temperature of this asset over time.

## Imports and Cognite SDK client

First, as always, we will need to import some modules and create a Cognite SDK client.

In [1]:
import pandas
import os
from datetime import datetime, timedelta
from getpass import getpass
from cognite.client.experimental import CogniteClient
from cognite.client.data_classes import Asset, Relationship, TimeSeries
from cognite.client.exceptions import *

In [2]:
client_name = "Time Stitching Demo"
project = input("Please enter your project name: ")
base_url = input("Please enter base url for % s (leave blank to use greenfield)" % project )
if (base_url == ""):
    base_url = "https://greenfield.cognitedata.com"
api_key = getpass("Please enter API-KEY for % s :" % project)

client = CogniteClient(client_name=client_name, project=project, api_key=api_key, base_url=base_url)

Please enter your project name:  caelyn-first-project
Please enter base url for caelyn-first-project (leave blank to use greenfield) 
Please enter API-KEY for caelyn-first-project : ················································


## Verify Client Setup

Retrieve some assets to verify that we have access to the project.

In [3]:
assets = client.assets.list(limit=3)
assets

Unnamed: 0,external_id,name,parent_id,description,metadata,id,created_time,last_updated_time,root_id,parent_external_id
0,sensor:def456,Sensor2,6295122166955048,A specific sensor with a serial number of def456,{},934598399574514,1585224438483,1585224438483,7617695584661258,heatSensor123
1,heatSensor123,HeatSensor,7617695584661258,A heat sensor for a widget,{},6295122166955048,1585224158489,1585224158489,7617695584661258,widget123
2,sensor:abc123,Sensor1,6295122166955048,A specific sensor with a serial number of abc123,{},6819467016278382,1585224438483,1585224438483,7617695584661258,heatSensor123


## Setting up Assets and Timeseries

A **Widget Heat Sensor** tracks the temperature of a widget. A **Heat Sensor** is a specific heat sensor instance. Here is a diagram showing the asset hierarchy.

![A diagram with a widget heat sensor as a root node, with two heat neat sensor children, each with a timeseries child](timestitching_hierarchy.png "Time Stitching Assets and Timeseries")

In [4]:
# Convenience methods to cleanly handle data that was not cleaned properly in a previous run of tutorial
def retrieve_or_create_asset(asset_data):
    asset = client.assets.retrieve(external_id=asset_data.external_id)
    if not asset:
        asset = client.assets.create(asset_data)
    return asset


def retrieve_or_create_timeseries(timeseries_data):
    timeseries = client.time_series.retrieve(external_id=timeseries_data.external_id)
    if not timeseries:
        timeseries = client.time_series.create(timeseries_data)
    return timeseries


widget_heat_sensor_external_id = "widgetHeatSensor:985ccacb-1594-4ea0-9bd3-91986115a1c8"
widget_heat_sensor = Asset(
    external_id=widget_heat_sensor_external_id,
    name="Widget Heat Sensor",
    description="A heat sensor for a widget",
)
retrieve_or_create_asset(widget_heat_sensor)

sensor_1_external_id = "heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b"
sensor_1 = Asset(
    external_id=sensor_1_external_id,
    parent_external_id = widget_heat_sensor_external_id,
    name="Heat Sensor 1",
    description="A heat sensor unit",
)
sensor_1_id = retrieve_or_create_asset(sensor_1).id

sensor_2_external_id = "heatSensor:e3a0b02b-4a5c-43b4-96b5-e9cf9df8400b"
sensor_2 = Asset(
    external_id=sensor_2_external_id,
    parent_external_id = widget_heat_sensor_external_id,
    name="Heat Sensor 2",
    description="A heat sensor unit",
)
sensor_2_id = retrieve_or_create_asset(sensor_2).id

sensor_1_timeseries = TimeSeries(
    external_id=sensor_1_external_id,
    asset_id=sensor_1_id,
    name="Heat Sensor 1 Timeseries",
    description="A time series for a heat sensor",
    unit="degrees, celsius"
)
retrieve_or_create_timeseries(sensor_1_timeseries)

sensor_2_timeseries = TimeSeries(
    external_id=sensor_2_external_id,
    asset_id=sensor_2_id,
    name="Heat Sensor 2 Timeseries",
    description="A time series for a heat sensor",
    unit="degrees, celsius"
)
retrieve_or_create_timeseries(sensor_2_timeseries)
asset_external_ids = [widget_heat_sensor_external_id, sensor_1_external_id, sensor_2_external_id]
client.assets.retrieve_multiple(external_ids=asset_external_ids)

Unnamed: 0,external_id,name,description,metadata,id,created_time,last_updated_time,root_id,parent_id,parent_external_id
0,widgetHeatSensor:985ccacb-1594-4ea0-9bd3-91986...,Widget Heat Sensor,A heat sensor for a widget,{},6608346262502101,1586934731889,1586934731889,6608346262502101,,
1,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b,Heat Sensor 1,A heat sensor unit,{},4252607884038200,1586934731985,1586934731985,6608346262502101,6608346262502101.0,widgetHeatSensor:985ccacb-1594-4ea0-9bd3-91986...
2,heatSensor:e3a0b02b-4a5c-43b4-96b5-e9cf9df8400b,Heat Sensor 2,A heat sensor unit,{},7520888093718981,1586934732089,1586934732089,6608346262502101,6608346262502101.0,widgetHeatSensor:985ccacb-1594-4ea0-9bd3-91986...


## Populating TimeSeries
The temperature of the asset is a constant 50°C. The **first sensor** breaks at 2017-10-15:14:35:47 and begins recording incorrect readings of 150°C. The **second sensor** is installed and begins recording correct data at 2017-10-15:14:41:42. **NOTE:** All times are in UTC.

In [5]:
# We multiply by 1000 becuase api expects time in milliseconds
def datetime_to_ms(ts): return int(ts.timestamp() * 1000)


sensor_break_datetime = datetime(2017, 10, 15, 14, 35, 47)
sensor_break_ms = datetime_to_ms(sensor_break_datetime)
sensor_replacement_datetime = datetime(2017, 10, 15, 14, 41, 42)
sensor_replacement_ms = datetime_to_ms(sensor_replacement_datetime)

sensor_1_start_datetime = sensor_break_datetime - timedelta(seconds=10)
sensor_1_start_ms = datetime_to_ms(sensor_1_start_datetime)
sensor_1_end_datetime = sensor_replacement_datetime + timedelta(seconds=10)
sensor_1_end_ms = datetime_to_ms(sensor_1_end_datetime)

sensor_1_datapoints = [(t, 50.0) for t in range(sensor_1_start_ms, sensor_break_ms, 1000)]
sensor_1_incorrect_datapoints = [(t, 150.0) for t in range(sensor_break_ms, sensor_1_end_ms, 1000)]
sensor_1_datapoints.extend(sensor_1_incorrect_datapoints)

sensor_2_start_datetime = sensor_replacement_datetime
sensor_2_start_ms = datetime_to_ms(sensor_2_start_datetime)
sensor_2_end_datetime = sensor_replacement_datetime + timedelta(seconds=30)
sensor_2_end_ms = datetime_to_ms(sensor_2_end_datetime)

sensor_2_datapoints = [(t, 50.0) for t in range(sensor_2_start_ms, sensor_2_end_ms, 1000)]

client.datapoints.insert(sensor_1_datapoints, external_id=sensor_1_external_id)
client.datapoints.insert(sensor_2_datapoints, external_id=sensor_2_external_id)

## Setting up Relationships

Each `timeSeries` _belongsTo_ a **Heat Sensor**. This relationship has no time range because it is permanent (_active_ at all times). The **Heat Sensor** _implements_ the **Widget Heat Sensor**. The first **Heat Sensor** has an `endTime` equal to when that sensor broke. The second **Heat Sensor** has a `startTime` equal to when that sensor began recording data. Here is a diagram showing the **relationship**s between the assets and timeseries.

![A diagram with a widget heat sensor as a root node, with two heat neat sensor children, each with a timeseries child](timestitching_relationships.png "Time Stitching Assets and Timeseries")

In [15]:
# Convenience method to cleanly handle relationships that was not cleaned properly in a previous run of tutorial
def retrieve_or_create_relationship(relationship_data):
    relationship = client.relationships.retrieve(external_id=relationship_data.external_id)
    if not relationship:
        relationship = client.relationships.create(relationship_data)
    return relationship

widget_heat_sensor_resource = {"resource": "asset", "resourceId": widget_heat_sensor_external_id}
sensor_1_resource = {"resource": "asset", "resourceId": sensor_1_external_id}
sensor_2_resource = {"resource": "asset", "resourceId": sensor_2_external_id}
sensor_1_timeseries_resource = {"resource": "timeseries", "resourceId": sensor_1_external_id}
sensor_2_timeseries_resource = {"resource": "timeseries", "resourceId": sensor_2_external_id}
data_set = "time_stitching_demo"

sensor_1_to_timeseries_relationship = Relationship(
    external_id=sensor_1_external_id,
    relationship_type="belongsTo",
    data_set=data_set,
    confidence=0.99,
    source=sensor_1_timeseries_resource,
    target=sensor_1_resource
)
retrieve_or_create_relationship(sensor_1_to_timeseries_relationship)

sensor_2_to_timeseries_relationship = Relationship(
    external_id=sensor_2_external_id,
    relationship_type="belongsTo",
    data_set=data_set,
    confidence=0.99,
    source=sensor_2_timeseries_resource,
    target=sensor_2_resource
)
retrieve_or_create_relationship(sensor_2_to_timeseries_relationship)


widget_heat_sensor_implementation_1_external_id = widget_heat_sensor_external_id + ":impl:1"
widget_heat_sensor_1_relationship = Relationship(
    external_id=widget_heat_sensor_implementation_1_external_id,
    relationship_type="implements",
    data_set=data_set,
    confidence=0.99,
    source=sensor_1_resource,
    target=widget_heat_sensor_resource,
    end_time = sensor_break_ms
)
retrieve_or_create_relationship(widget_heat_sensor_1_relationship)

widget_heat_sensor_implementation_2_external_id = widget_heat_sensor_external_id + ":impl:2"
widget_heat_sensor_2_relationship = Relationship(
    external_id=widget_heat_sensor_implementation_2_external_id,
    relationship_type="implements",
    data_set=data_set,
    confidence=0.99,
    source=sensor_2_resource,
    target=widget_heat_sensor_resource,
    start_time = sensor_replacement_ms
)
retrieve_or_create_relationship(widget_heat_sensor_2_relationship)

relationship_external_ids = [
    sensor_1_external_id,
    sensor_2_external_id,
    widget_heat_sensor_implementation_1_external_id,
    widget_heat_sensor_implementation_2_external_id
]
client.relationships.retrieve_multiple(relationship_external_ids)

Unnamed: 0,source,target,start_time,confidence,data_set,external_id,relationship_type,created_time,last_updated_time,end_time
0,{'resourceId': 'heatSensor:e3a0b02b-4a5c-43b4-...,{'resourceId': 'widgetHeatSensor:985ccacb-1594...,1508078502000.0,0.99,time_stitching_demo,widgetHeatSensor:985ccacb-1594-4ea0-9bd3-91986...,implements,1586935787773,1586935787773,
1,{'resourceId': 'heatSensor:a513a7b6-a67b-440a-...,{'resourceId': 'heatSensor:a513a7b6-a67b-440a-...,,0.99,time_stitching_demo,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b,belongsTo,1586935759432,1586935759432,
2,{'resourceId': 'heatSensor:a513a7b6-a67b-440a-...,{'resourceId': 'widgetHeatSensor:985ccacb-1594...,,0.99,time_stitching_demo,widgetHeatSensor:985ccacb-1594-4ea0-9bd3-91986...,implements,1586935787589,1586935787589,1508078147000.0
3,{'resourceId': 'heatSensor:e3a0b02b-4a5c-43b4-...,{'resourceId': 'heatSensor:e3a0b02b-4a5c-43b4-...,,0.99,time_stitching_demo,heatSensor:e3a0b02b-4a5c-43b4-96b5-e9cf9df8400b,belongsTo,1586935759844,1586935759844,


## Finding the Widget Heat Sensor
We start by selecting the assets with an `external_id` that begins with: **widgetHeatSensor**

In [16]:
heat_sensors = client.assets.list(external_id_prefix="widgetHeatSensor")
heat_sensors

Unnamed: 0,external_id,name,description,metadata,id,created_time,last_updated_time,root_id
0,widgetHeatSensor:985ccacb-1594-4ea0-9bd3-91986...,Widget Heat Sensor,A heat sensor for a widget,{},6608346262502101,1586934731889,1586934731889,6608346262502101


## Finding the Heat Sensors
Once we have our **Widget Heat Sensor**, we want to find which `assets` (**heatSensors**) _implement_ **widgetHeatSensor** (and when). To make this easier to follow, we move the `externalId` of the sensors into their own column and drop all the other columns we won't need.

In [17]:
sensors = client.relationships.list(
    relationship_type="implements",
    targets = [{"resourceId": heat_sensors[0].external_id}]).to_pandas()
sensors["sensorId"] = sensors["source"].map(lambda source: str(source["resourceId"]))
sensor_filter_data = sensors[["sensorId", "startTime", "endTime"]]
sensor_filter_data

Unnamed: 0,sensorId,startTime,endTime
0,heatSensor:e3a0b02b-4a5c-43b4-96b5-e9cf9df8400b,1508078502000.0,
1,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b,,1508078147000.0


## Finding the time series
The next step, now that we have our `sensorId`s is to find `timeseries` that have relationships to those sensors. 

In [18]:
matching_sensor_ids = sensor_filter_data["sensorId"].tolist()
sensor_timeseries_relationships = client.relationships.list(
    sources=[{"resource": "timeseries"}],
    targets = [{"resourceId" : ms} for ms in matching_sensor_ids]).to_pandas()
sensor_timeseries_source_target = sensor_timeseries_relationships[["target", "source"]]
sensor_timeseries_external_ids = sensor_timeseries_source_target.applymap(lambda x: str(x["resourceId"]))

sensor_timeseries_column_renaming = { "source": "timeseriesId", "target": "sensorId" }
sensor_timeseries = sensor_timeseries_external_ids.rename(columns=sensor_timeseries_column_renaming)
sensor_timeseries

Unnamed: 0,sensorId,timeseriesId
0,heatSensor:e3a0b02b-4a5c-43b4-96b5-e9cf9df8400b,heatSensor:e3a0b02b-4a5c-43b4-96b5-e9cf9df8400b
1,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b,heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b


## Building a filter for Datapoints

Let's put it all together!

`sensor_timeseries` connects a `sensorId` to a `timeseriesId`. `sensor_filter_data` connects a `sensorId` to the sensor's active timerange. We put the two dataframes together and then normalize the `startTime` (to 0) and `endTime` (to the current time), which gives us everything we need to retrive our datapoints!

(We then sort by `startTime` so that we will get our datasets in the order we want to stitch them together)

In [19]:
datapoints_filter_dataframe = sensor_filter_data.merge(sensor_timeseries, how="inner", on="sensorId")
normalize_start = lambda x: x if isinstance(x,int) else 0
datapoints_filter_dataframe["startTime"] = datapoints_filter_dataframe["startTime"].apply(normalize_start)
now = datetime_to_ms(datetime.now())
normalize_end = lambda x: x if isinstance(x,int) else now
datapoints_filter_dataframe["endTime"] = datapoints_filter_dataframe["endTime"].apply(normalize_end)
datapoints_filters = datapoints_filter_dataframe.to_dict(orient="records")
datapoints_filters.sort(key=lambda x: x["startTime"])
datapoints_filters

[{'sensorId': 'heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b',
  'startTime': 0,
  'endTime': 1508078147000,
  'timeseriesId': 'heatSensor:a513a7b6-a67b-440a-b6b0-0bd1188b175b'},
 {'sensorId': 'heatSensor:e3a0b02b-4a5c-43b4-96b5-e9cf9df8400b',
  'startTime': 1508078502000,
  'endTime': 1586936360565,
  'timeseriesId': 'heatSensor:e3a0b02b-4a5c-43b4-96b5-e9cf9df8400b'}]

## Fetching and Stitching the DataPoints

For each of the filters we built above, we want to fetch the matching set of datapoints. And then we normalize the column names on all the resulting pandas dataframes to just `data`, so that we can append them together.

In [20]:
datapoints = [
    client.datapoints.retrieve(
        external_id = fd["timeseriesId"],
        start = fd["startTime"],
        end = fd["endTime"]).to_pandas().rename(
        columns={ fd["timeseriesId"]: "data" }) for fd in datapoints_filters
    ]
final_dataframe = datapoints[0].append(datapoints[1])
final_dataframe

Unnamed: 0,data
2017-10-15 14:35:37,50.0
2017-10-15 14:35:38,50.0
2017-10-15 14:35:39,50.0
2017-10-15 14:35:40,50.0
2017-10-15 14:35:41,50.0
2017-10-15 14:35:42,50.0
2017-10-15 14:35:43,50.0
2017-10-15 14:35:44,50.0
2017-10-15 14:35:45,50.0
2017-10-15 14:35:46,50.0


And there is a dataframe containing a single timeseries for the temperature of our asset!

# CLEAN UP STEP

Make sure to run the cell below to clean up after the `assets`, `relationships` and `timeseries` we created for this demo!

In [21]:
relationship_external_ids = [
    sensor_1_external_id,
    sensor_2_external_id,
    widget_heat_sensor_implementation_1_external_id,
    widget_heat_sensor_implementation_2_external_id
]
client.relationships.delete(relationship_external_ids)

timeseries_external_ids = [
    sensor_1_external_id,
    sensor_2_external_id
]
for timeseries_external_id in timeseries_external_ids:
    client.datapoints.delete_range(start=0, end=now, external_id=timeseries_external_id)
    client.time_series.delete(external_id=timeseries_external_id)

asset_external_ids = [
    widget_heat_sensor_external_id,
    sensor_1_external_id,
    sensor_2_external_id
]
client.assets.delete(external_id=asset_external_ids)

# Thank You!