# Assets hierarchy visualization


## Quick links
* Asset data dive notebook - [Hackathon github repo](https://github.com/cognitedata/open-industrial-data/blob/master/workshops/uni-hackathon/Part_3_Asset_Data_Dive.ipynb)
* Documentation of [CDP concepts](https://doc.cognitedata.com/concepts/)
* Reference documentation for the [Python SDK](https://cognite-sdk-python.readthedocs-hosted.com/en/latest/)
<hr>

# Step 0: Environment Setup

#### Install the Cognite SDK package and some auxiliary packages

In [None]:
# if you're working in google colab or similar
!pip install -q cognite-sdk plotly networkx

#### Because of some limitations on google colab platform we should define huge auxiliary functions directly here, but we are working on eligant way for it

By the way, if you know how to improve it - ideas or contributions are welcome :)

In [None]:
# %load common.py
"""

DON'T BE SCARED

You can don't dig into details here, just go to next cell

"""
from itertools import islice
from typing import Dict, Iterable

import networkx as nx
import pandas as pd
import plotly.graph_objs as go


def sliding_window(seq: Iterable, n: int = 2):
    """

    Generate an overlapping windows with variable size from iterator

    Args:
        seq:
        n:

    Returns:

    """
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result


def make_assets_tree(df: pd.DataFrame) -> nx.DiGraph:
    """

    Generates directional graph of assets from a given dataframe
    
    Args:
        df:

    Returns:

    """

    G = nx.DiGraph()
    for path in df["path"].values:
        for parent_id, child_id in sliding_window(path, n=2):
            G.add_edge(parent_id, child_id)

    return G


def hierarchy_pos(
    graph: nx.DiGraph,
    root: str = None,
    width: float = 10000.0,
    vert_gap: float = 0.2,
    vert_loc: float = 0.0,
    x_center=0.5,
):
    """
    If the graph is a tree this will return the positions to plot this in a
    hierarchical layout.

    G: the graph (must be a tree)

    root: the root node of current branch
    - if the tree is directed and this is not given, the root will be found and used
    - if the tree is directed and this is given, then the positions will be just for the descendants of this node.
    - if the tree is undirected and not given, then a random choice will be used.

    width: horizontal space allocated for this branch - avoids overlap with other branches

    vert_gap: gap between levels of hierarchy

    vert_loc: vertical location of root

    xcenter: horizontal location of root
    """
    if not nx.is_tree(graph):
        raise TypeError("cannot use hierarchy_pos on a graph that is not a tree")

    def _hierarchy_pos(
        graph: nx.DiGraph,
        root: str,
        width: float = 1.0,
        vert_gap: float = 0.2,
        vert_loc: float = 0.0,
        x_center: float = 0.5,
        pos: Dict = None,
        parent: str = None,
    ):
        """
        see hierarchy_pos docstring for most arguments

        pos: a dict saying where all nodes go if they have been assigned
        parent: parent of this branch. - only affects it if non-directed

        """

        if pos is None:
            pos = {root: (x_center, vert_loc)}
        else:
            pos[root] = (x_center, vert_loc)
        children = list(graph.neighbors(root))
        if not isinstance(graph, nx.DiGraph) and parent is not None:
            children.remove(parent)
        if len(children) != 0:
            dx = width / len(children)
            next_x = x_center - width / 2 - dx / 2
            for child in children:
                next_x += dx
                pos = _hierarchy_pos(
                    graph,
                    child,
                    width=dx,
                    vert_gap=vert_gap,
                    vert_loc=vert_loc - vert_gap,
                    x_center=next_x,
                    pos=pos,
                    parent=root,
                )
        return pos

    return _hierarchy_pos(graph, root, width, vert_gap, vert_loc, x_center)


def get_label(id_, client=None):
    """ Get asset's name by given asset id """
    asset_info = client.assets.get_asset(id_)
    return asset_info.to_json()["name"]


def make_assets_tree_plot(df: pd.DataFrame, root_id: int = None, max_depth: int = None) -> go.Figure:
    """

    Generates assets plots from Assets dataframe

    Args:
        df:
        root_id:
        max_depth:

    Returns:

    """
    assets_tree = make_assets_tree(df)

    if root_id is None:
        root_id = next(iter(nx.topological_sort(assets_tree)))  # allows back compatibility with nx version 1.11

    assets_tree = nx.dfs_tree(assets_tree, source=root_id, depth_limit=max_depth)
    pos = hierarchy_pos(assets_tree, root=root_id)

    # extract node coordinates and labels
    Xn = [pos[i][0] for i in pos.keys()]
    Yn = [pos[i][1] for i in pos.keys()]
    labels = [get_label(id_) for id_ in pos.keys()]

    # extract edges from tree
    Xe = list()
    Ye = list()
    for e in assets_tree.edges():
        Xe.extend([pos[e[0]][0], pos[e[1]][0], None])
        Ye.extend([pos[e[0]][1], pos[e[1]][1], None])

    # make plotly traces
    trace_nodes = dict(
        type="scatter",
        x=Xn,
        y=Yn,
        mode="markers",
        marker=dict(size=20, color="rgb(0, 0, 204)"),
        text=labels,
        hoverinfo="text",
    )
    trace_edges = dict(
        type="scatter", mode="lines", x=Xe, y=Ye, line=dict(width=1, color="rgb(25,25,25)"), hoverinfo="none"
    )

    # some pretty details
    axis = dict(
        showline=False,  # hide axis line, grid, ticklabels and  title
        zeroline=False,
        showgrid=False,
        showticklabels=False,
        title="",
    )

    layout = go.Layout(autosize=True, showlegend=False, xaxis=axis, yaxis=axis, hovermode="closest")
    fig = go.Figure(data=[trace_edges, trace_nodes], layout=layout)
    return fig


def configure_plotly_browser_state():
    """

    Resolves an issue with plotly in google colab

    Returns:

    """
    from IPython.core.display import display, HTML

    display(
        HTML(
            """
        <script src="/static/components/requirejs/require.js"></script>
        <script>
          requirejs.config({
            paths: {
              base: '/static/base',
              plotly: 'https://cdn.plot.ly/plotly-latest.min.js?noext',
            },
          });
        </script>
        """
        )
    )


In [None]:
#### Import the required packages

In [None]:
%matplotlib inline

import os
from datetime import datetime, timedelta
from datetime import datetime
from getpass import getpass
from typing import List, Any
from itertools import islice

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression

from cognite import CogniteClient

import networkx as nx
from networkx.algorithms.traversal.depth_first_search import dfs_tree
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)


pd.set_option('display.max_rows', 10)

#### Connect to the Cognite Data Platform
The SDK client is the entrypoint to all data in CDP, and simply requires the API key that you generated in Part 1.

When prompted for your API key, use the key generated by open industrial data as mentioned in the Getting Started steps.

In [None]:
client = CogniteClient(api_key=getpass("Open Industrial Data API-KEY: "))

# Step 1: Learn about Organizing Industrial Data

The Cognite Data Platform organizes digital information about the physical world. The building blocks of this representation are called *resources*, which you can read up on in detail [here](https://doc.cognitedata.com/concepts/#core-concepts).

An important resource to understand is the Asset resource. This is the foundation for organizing industrial data -- time series, work orders, event logs and arbitrary files -- from across complex industrial systems.
Assets are linked together with parent-child relationships to build a top-down hierarchical tree, known as "The Asset Hierarchy".
For example, an Asset Hierarchy could look like this:
```
  Gas Export Compressor
    |- First stage export compressor
    |    |- Compressor
    |    |- Scrubber
    |    |- ...
    |- Second stage export compressor
    |- ...
```
Timeseries, events, files and other resources are attached to each Asset.

The hierarchical structure can make it easier to find the timeseries data that you're looking for. Though there are [other ways](https://doc.cognitedata.com/concepts/#_3d-models-and-revisions) to do this, we'll focus on using the hierarchy today!

In [None]:
# download a sample of assets up to a certain depth in the hierarchy
df_sample_assets = client.assets.get_assets(limit=1000, depth=6).to_pandas().sort_values('depth')
df_sample_assets

In [None]:
configure_plotly_browser_state()

fig = make_assets_tree_plot(df_sample_assets)
iplot(fig)

You may found this overcomplicated, but if you wanna explore your assets in interactive way - that's a good way. Also, you can play with `root_id` and `max_depth` arguments for `make_assets_tree_plot()` function, and build only a branch in details


In [None]:
df_sample_assets = client.assets.get_assets(limit=2000, depth=15).to_pandas().sort_values('depth')

In [None]:
configure_plotly_browser_state()

fig = make_assets_tree_plot(df_sample_assets, root_id=4518112062673878, max_depth=4)
iplot(fig)