# OSIsoft Academic Hub Library Quick Start

Version 0.98 - CMU

Academic Hub datasets are hosted by the OSIsoft Cloud Service (OCS, https://www.osisoft.com/solutions/cloud/vision/), a cloud-native realtime data infrastructure to perform enterprise-wide analytics using tools and languages of their choice. 

**Raw operational data has specific characteristics making it difficult to deal with directly**, among them:

* variable data collection frequencies
* bad values (system error codes)
* data gaps 


**But data science projects against operational data needs to be:**

* **Time-aligned** to deal with the characteritics above in consistent way according to the data type (e.g. interpolation for float values, repeat last good value for categorical data, etc)
* **Context aware** so that the data can be understandable, across as many real-world assets that you need it for
* **Shaped and filtered** to ensure you have the data you need, in the form you need it

**OCS solution for application-ready data are Data Views:**

![](https://academichub.blob.core.windows.net/images/piworld-dse-dataview-p2.png)

**Each Academic Hub datasets comes endowed with a set of asset-centric data views.** The goal of Academic Hub Python library is to allow in a very generic and consistent way to access:

* the list of existing datasets
* for a given dataset: 
  * get the list of its assets
  * get the OCS namespace where the dataset is hosted
* for a given asset, get the list data views it belongs to

The rest of this notebook is a working example of the functionality listed above. 

## Install Academic Hub Python library 

In [1]:
!pip install ocs-academic-hub



## Use the `pip uninstall` only in case of library issues

In [2]:
# It's sometimes necessary to uninstall previous versions, uncomment and run the following line. Then restart kernel and reinstall with previous cell
# !pip uninstall -y ocs-academic-hub ocs-sample-library-preview

## Import and initialize HubClient 

Necessary to connect and interact with OCS

In [3]:
from ocs_academic_hub import HubClient

## Hosted environment like Collab requires a configuration file 

* The configuration filename should be passed through `OCS_HUB_CONFIG` environment variable
* The referred configuration file should be located in the same directory as this notebook

In [4]:
%env OCS_HUB_CONFIG=config-cmu.txt
hub = HubClient()

env: OCS_HUB_CONFIG=config-cmu.txt
> configuration file: config-cmu.txt
@ Hub data file: hub_datasets.json


## Update/refresh hub datasets information 

By default, only datasets with `production` (as now, `Brewery` and `Campus_Energy`) status are updated. CMU unit operations data is published with the `lab_data` status, thus the optional argument 

In [5]:
hub.refresh_datasets(additional_status='lab_data')

@ Hub data file: hub_datasets.json
@ Current dataset: Brewery


## Get list of published hub datasets

Dataset `CMU_UnitOps` should show up along with `Brewery` and `Campus_Energy`

In [6]:
hub.datasets()

['Brewery', 'CMU_UnitOps', 'Campus_Energy']

## Select CMU lab as current dataset

Then check it's the case

In [7]:
hub.set_dataset('CMU_UnitOps')
hub.current_dataset()

'CMU_UnitOps'

## Get the OCS namespace where the data lives 

In [8]:
namespace_id = hub.namespace_of("CMU_UnitOps")
namespace_id

'hub_lab_data'

## The list of available assets of the current dataset

In the case of CMU, assets are LabVIEW-instrumented Unit Ops experiments 

Note: the descriptions can be modified, please contact OSIsoft 

In [9]:
hub.assets()

Unnamed: 0,Asset_Id,Description
0,CMU_controls_hxer_steptest,OMF.cmu_unitops1 Connector.assets_type_CMU_con...
1,F20_PSA_v2,OMF.cmu_unitops1 Connector.assets_type_F20_PSA_v2
2,F20_Wk1_PSA,OMF.cmu_unitops1 Connector.assets_type_F20_Wk1...
3,F20_wk1_Membrane,OMF.cmu_unitops1 Connector.assets_type_F20_wk1...
4,HX Cycle1 F20,OMF.cmu_unitops1 Connector.assets_type_HX Cycl...
5,Membrane,OMF.cmu_unitops1 Connector.assets_type_Membrane
6,wk1_F20,OMF.cmu_unitops1 Connector.assets_type_wk1_F20


## Each asset as a default data view

Each data view has a unique identifier, needed to make a request and get data

In [10]:
selected_asset = "F20_PSA_v2"  # experiment now online
dv_id = hub.asset_dataviews(selected_asset)[0]
selected_asset, dv_id

('F20_PSA_v2', 'cmu.unitops-f20_psa_v2')

## Get data view structure

This is a description of the table structure returned by the data view where:

* `Column_Name`: name of the column
* `Stream_Type`: data type found in this column
* `Stream_UOM`: unit of measure
* `OCS_Stream_Name`: name of the OCS stream providing data in the column

In [11]:
print(hub.dataview_definition(namespace_id, dv_id).to_string(index=False))

   Asset_Id    Column_Name Stream_Type Stream_UOM                                              OCS_Stream_Name
 F20_PSA_v2  InletPressure       Float             cmu_unitops1.F20_PSA_v2_data_values_container.InletPressure
 F20_PSA_v2      Lab notes      String                 cmu_unitops1.F20_PSA_v2_data_values_container.Lab notes
 F20_PSA_v2     O2_Percent       Float                cmu_unitops1.F20_PSA_v2_data_values_container.O2_Percent
 F20_PSA_v2        Ptank_A       Float                   cmu_unitops1.F20_PSA_v2_data_values_container.Ptank_A
 F20_PSA_v2        Ptank_B       Float                   cmu_unitops1.F20_PSA_v2_data_values_container.Ptank_B
 F20_PSA_v2       SetFlowA       Float                  cmu_unitops1.F20_PSA_v2_data_values_container.SetFlowA
 F20_PSA_v2       SetFlowB       Float                  cmu_unitops1.F20_PSA_v2_data_values_container.SetFlowB
 F20_PSA_v2     TankA_RLYs       Float                cmu_unitops1.F20_PSA_v2_data_values_container.TankA_RLYs
 

## Get current timestamp

And compute the timestamp from 20 minutes ago

In [12]:
import datetime
now = datetime.datetime.now()
prev = now - datetime.timedelta(minutes=20)
print(prev, now)

2020-09-29 17:10:26.998893 2020-09-29 17:30:26.998893


## Get interpolated data from data view

Last argument is the interpolation interval with format HH:MM:SS (so every 5 seconds below)

In [13]:
df = hub.dataview_interpolated_pd(
    namespace_id, dv_id, prev.isoformat(), now.isoformat(), "00:00:05"
)
df


  ==> Finished 'dataview_interpolated_pd' in       0.6413 secs [ 376 rows/sec ]


Unnamed: 0,Timestamp,Asset_Id,InletPressure,Lab notes,O2_Percent,Ptank_A,Ptank_B,SetFlowA,SetFlowB,TankA_RLYs,TankA_Tin,TankA_Tout,TankB_RLYs,TankB_Tin,TankB_Tout
0,2020-09-29 17:10:26.998893,F20_PSA_v2,0,,0.457300,-0.351158,,0.004,0.004,0,24.933247,,0,,
1,2020-09-29 17:10:31.998893,F20_PSA_v2,0,,-1.953668,0.543103,,0.004,0.004,0,24.908102,,0,,
2,2020-09-29 17:10:36.998893,F20_PSA_v2,0,,0.340346,-0.370392,,0.004,0.004,0,24.934109,,0,,
3,2020-09-29 17:10:41.998893,F20_PSA_v2,0,,0.284210,0.468107,,0.004,0.004,0,24.920401,,0,,
4,2020-09-29 17:10:46.998893,F20_PSA_v2,0,,-0.409467,0.672965,,0.004,0.004,0,24.920215,,0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
236,2020-09-29 17:30:06.998893,F20_PSA_v2,0,,0.481373,0.732181,,0.004,0.004,0,24.922438,,0,,
237,2020-09-29 17:30:11.998893,F20_PSA_v2,0,,-0.571728,-0.426011,,0.004,0.004,0,24.946860,,0,,
238,2020-09-29 17:30:16.998893,F20_PSA_v2,0,,0.867429,-0.026487,,0.004,0.004,0,24.933751,,0,,
239,2020-09-29 17:30:21.998893,F20_PSA_v2,0,,-0.157502,-0.198765,,0.004,0.004,0,24.923743,,0,,


## For data frame manipulation and plotting

In [14]:
import pandas as pd
import plotly.express as px

## Change shape of data frame to narrow 

Column names are mapped to column named `Sensor` and column values are stored in column `Value` 

In [15]:
df2 = pd.melt(df, id_vars=["Timestamp"], var_name="Sensor", value_name="Value")
df2

Unnamed: 0,Timestamp,Sensor,Value
0,2020-09-29 17:10:26.998893,Asset_Id,F20_PSA_v2
1,2020-09-29 17:10:31.998893,Asset_Id,F20_PSA_v2
2,2020-09-29 17:10:36.998893,Asset_Id,F20_PSA_v2
3,2020-09-29 17:10:41.998893,Asset_Id,F20_PSA_v2
4,2020-09-29 17:10:46.998893,Asset_Id,F20_PSA_v2
...,...,...,...
3369,2020-09-29 17:30:06.998893,TankB_Tout,
3370,2020-09-29 17:30:11.998893,TankB_Tout,
3371,2020-09-29 17:30:16.998893,TankB_Tout,
3372,2020-09-29 17:30:21.998893,TankB_Tout,


In [16]:
df3 = df2[  # remove unplotable Asset_Id and out-of-range sensors
    ~( 
        (df2["Sensor"] == "Asset_Id")
        | (df2["Sensor"] == "TankB_Tout")
        | (df2["Sensor"] == "TankB_Tin")
        | (df2["Sensor"] == "TankA_Tout")
    )
]

##  Plot data using Plotly

In [17]:
fig = px.line(
    df3.dropna(),
    x="Timestamp",
    y="Value",
    color="Sensor",
    title=f"Experiment: {selected_asset}",
)
fig.show()