# OSIsoft Academic Hub Datasets Quick Start (Python Library)

Version 0.95

Academic Hub datasets are hosted by the OSIsoft Cloud Service (OCS, https://www.osisoft.com/solutions/cloud/vision/), a cloud-native realtime data infrastructure to perform enterprise-wide analytics using tools and languages of their choice. 

**Raw operational data has specific characteristics making it difficult to deal with directly**, among them:

* variable data collection frequencies
* bad values (system error codes)
* data gaps 


**But data science projects against operational data needs to be:**

* **Time-aligned** to deal with the characteritics above in consistent way according to the data type (e.g. interpolation for float values, repeat last good value for categorical data, etc)
* **Context aware** so that the data can be understandable, across as many real-world assets that you need it for
* **Shaped and filtered** to ensure you have the data you need, in the form you need it

**OCS solution for application-ready data are Data Views:**

![](https://academichub.blob.core.windows.net/images/piworld-dse-dataview-p2.png)

**Each Academic Hub datasets comes endowed with a set of asset-centric data views.** The goal of Academic Hub Python library is to allow in a very generic and consistent way to access:

* the list of existing datasets
* for a given dataset: 
  * get the list of its assets
  * get the OCS namespace where the dataset is hosted
* for a given asset, get the list data views it belongs to

The rest of this notebook is a working example of the functionality listed above. 

## Install Academic Hub Python library 

In [1]:
!pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple ocs-academic-hub==0.79.0

## Use the `pip uninstall` only in case of library issues

In [2]:
# It's sometimes necessary to uninstall previous versions, uncomment and run the following line. Then restart kernel and reinstall with previous cell
# !pip uninstall -y ocs-academic-hub ocs-sample-library-preview

# WARNING: uncomment only for testing
#%env OCS_HUB_CONFIG=config.ini

env: OCS_HUB_CONFIG=config.ini


## Import HubClient, necessary to connect and interact with OCS

In [3]:
from ocs_academic_hub import HubClient

## Running the following cell initiate the login sequence

Return to this web page when done

In [4]:
hub = HubClient()

> configuration file: config.ini


## Get list of published hub datasets

NOTE: currently only `Deschutes Brewery` supports the new interface. This notebook is specifically about this dataset. 

In [5]:
hub.datasets()

['Deschutes-v1', 'UCDavis.Facilities', 'Brewery']

## Display current active dataset

NOTE: it will be possible to switch it once other datasets support the new asset interface. 

In [6]:
hub.current_dataset()

'Brewery'

## Get list of assets with Data Views

Returned into the form of a pandas dataframe, with column `Asset_Id` and `Description`. The cell above with `print` and `.to_string()` allows to see the whole dataframe content. 

In [7]:
print(hub.assets().to_string())

          Asset_Id                Description
0        Acid Tank                        AT1
1              BA1                           
2              BA2                           
3             BB02                Bright Tank
4        BB02 Line                Bright Tank
5             BB03                Bright Tank
6             BB04                Bright Tank
7             BB05                Bright Tank
8             BB06                Bright Tank
9             BB07                Bright Tank
10            BB08                Bright Tank
11            BB09                Bright Tank
12            BB11                Bright Tank
13            BB12                Bright Tank
14            BB13                Bright Tank
15            BB14                Bright Tank
16            BB15                Bright Tank
17   Beer Transfer      Beer TransferTemplate
18         C1_BBL1   Cellar1 Bright Beer Line
19          C1_BL1         Cellar 1 Beer Line
20          C1_PS1                

## List of all Data Views

Those are all single-asset default (with all data available for the asset) Data Views

In [8]:
hub.asset_dataviews()

['brewery-acid.tank',
 'brewery-ba1',
 'brewery-ba2',
 'brewery-bb02',
 'brewery-bb02.line',
 'brewery-bb03',
 'brewery-bb04',
 'brewery-bb05',
 'brewery-bb06',
 'brewery-bb07',
 'brewery-bb08',
 'brewery-bb09',
 'brewery-bb11',
 'brewery-bb12',
 'brewery-bb13',
 'brewery-bb14',
 'brewery-bb15',
 'brewery-beer.transfer',
 'brewery-c1_bbl1',
 'brewery-c1_bl1',
 'brewery-c1_ps1',
 'brewery-c1_ps2',
 'brewery-c1_yl1',
 'brewery-c2_bbl1',
 'brewery-c2_ft1',
 'brewery-c2_ps1',
 'brewery-c2_ps2',
 'brewery-c3_bl1',
 'brewery-c3_ft1',
 'brewery-c3_yl1',
 'brewery-caustic.tank',
 'brewery-clean.in.place',
 'brewery-cnt1',
 'brewery-cnt2',
 'brewery-cst1',
 'brewery-fv01',
 'brewery-fv02',
 'brewery-fv08',
 'brewery-fv09',
 'brewery-fv10',
 'brewery-fv11',
 'brewery-fv12',
 'brewery-fv13',
 'brewery-fv14',
 'brewery-fv15',
 'brewery-fv16',
 'brewery-fv17',
 'brewery-fv18',
 'brewery-fv19',
 'brewery-fv1__fv2.line',
 'brewery-fv20',
 'brewery-fv21',
 'brewery-fv22',
 'brewery-fv23',
 'brewery-fv

## List of Data Views exclusive to Fermenter Vessel #32 (FV32)

Empty filter (`filter=""`) allows to see all dataviews for the asset instead of simply the default one

In [9]:
dvs_fv32 = hub.asset_dataviews(asset="FV32", filter="")
dvs_fv32

['brewery-fv32',
 'brewery-fv32-adf_prediction',
 'brewery-fv32-cooling_prediction',
 'brewery-fv32-pca']

## List Multi-Asset Data Views Containing FV32

The column `Asset_Id` in data view results indicates which asset the row of data belongs to 

In [10]:
hub.asset_dataviews(asset="FV32", multiple_asset=True, filter="")

['brewery-fv31--36',
 'brewery-fv31--36-adf_prediction',
 'brewery-fv31--36-cooling_prediction',
 'brewery-fv31--36-pca']

## Get the OCS namespace associated to the dataset

Each data set belongs to a namespace within the Academic Hub OCS account. Since dataset may move over time, the function below always return the active namespace for the given dataset. 

In [11]:
dataset = hub.current_dataset()
namespace_id = hub.namespace_of(dataset)
namespace_id

'academic_hub_01'

## Get Data View structure

With Stream Name, the column name under which stream data appears, its value type and engineering units if available. We display below the structure of the default data view. 

In [12]:
dataview_id = hub.asset_dataviews(asset="FV32", filter="default")[0]
print(dataview_id)
hub.dataview_definition(namespace_id, dataview_id)

brewery-fv32


Unnamed: 0,Asset_Id,OCS_StreamName,DV_Column,Value_Type,EngUnits
0,FV32,B2_CL_C2_FV32/ADF,ADF,Float,
1,FV32,B2_CL_C2_FV32/BRAND.CV,Brand,Category,
2,FV32,B2_CL_C2_FV32/DcrsFvFullPlato,FV Full Plato,Float,Plato
3,FV32,B2_CL_C2_FV32/Diacetyl,Diacetyl,Integer,ppb
4,FV32,B2_CL_C2_FV32/EndPhaseTime.CV,End Phase Time,Float,m
5,FV32,B2_CL_C2_FV32/Fermentation_Start_Time,Fermentation Start Time,Timestamp,
6,FV32,B2_CL_C2_FV32/Integrator Key,Integrator Key,Float,
7,FV32,B2_CL_C2_FV32/Phase Duration,Phase Duration,Integer,
8,FV32,B2_CL_C2_FV32/Plato,Plato,Float,Plato
9,FV32,B2_CL_C2_FV32/Predicted Transition,Predicted Transition,String,


## Getting data from a Data View

Return interpolated data between a start and end date, with the requested interpolation interval (format is HH:MM:SS)

In [13]:
df_fv32= hub.dataview_interpolated_pd(namespace_id, dataview_id, "2017-01-19", "2020-01-19", "00:30:00")
df_fv32

+++++++++++++++++++
  ==> Finished 'dataview_interpolated_pd' in       64.9983 secs [ 809 rows/sec ]


Unnamed: 0,Timestamp,Asset_Id,ADF,FV Full Plato,Diacetyl,End Phase Time,Fermentation Start Time,Integrator Key,Phase Duration,Plato,...,Bottom TIC SP,Middle TIC OUT,Middle TIC PV,Middle TIC SP,Top TIC OUT,Top TIC PV,Top TIC SP,Brand,Status,Yeast Strain
0,2017-01-19 00:00:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,17.355146,63.032608,63.0,8.699567,63.067005,63.0,Realtime Hops,Fermentation,NCYC1187
1,2017-01-19 00:30:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,7.568751,63.035385,63.0,20.299335,63.055607,63.0,Realtime Hops,Fermentation,NCYC1187
2,2017-01-19 01:00:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,4.256329,63.019897,63.0,46.984615,63.241917,63.0,Realtime Hops,Fermentation,NCYC1187
3,2017-01-19 01:30:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,17.092400,63.079906,63.0,74.284065,63.305080,63.0,Realtime Hops,Fermentation,NCYC1187
4,2017-01-19 02:00:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,46.463210,63.176716,63.0,42.702755,63.150780,63.0,Realtime Hops,Fermentation,NCYC1187
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52556,2020-01-18 22:00:00,FV32,0.756629,16.295610,25.0,2880.0,2019-12-16T09:28:19.6750028Z,3.96588,6803.0,3.96588,...,30.0,6.131247,30.028662,30.0,0.000000,29.951517,30.0,Trois Lacs,Ready to Transfer,a38
52557,2020-01-18 22:30:00,FV32,0.756629,16.295610,25.0,2880.0,2019-12-16T09:28:19.6750028Z,3.96588,6833.0,3.96588,...,30.0,0.000000,29.966312,30.0,38.112650,30.199999,30.0,Trois Lacs,Ready to Transfer,a38
52558,2020-01-18 23:00:00,FV32,0.756629,16.295610,25.0,2880.0,2019-12-16T09:28:19.6750028Z,3.96588,6863.0,3.96588,...,30.0,0.000000,30.089066,30.0,42.780502,30.199999,30.0,Trois Lacs,Ready to Transfer,a38
52559,2020-01-18 23:30:00,FV32,0.756629,16.295610,25.0,2880.0,2019-12-16T09:28:19.6750028Z,3.96588,6893.0,3.96588,...,30.0,0.000000,30.011000,30.0,42.780502,30.199999,30.0,Trois Lacs,Ready to Transfer,a38


In [14]:
# Information about the dataframe
df_fv32.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52561 entries, 0 to 52560
Data columns (total 34 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   Timestamp                52561 non-null  datetime64[ns]
 1   Asset_Id                 52561 non-null  object        
 2   ADF                      30540 non-null  float64       
 3   FV Full Plato            36987 non-null  float64       
 4   Diacetyl                 32934 non-null  float64       
 5   End Phase Time           37646 non-null  float64       
 6   Fermentation Start Time  44539 non-null  object        
 7   Integrator Key           51627 non-null  float64       
 8   Phase Duration           5244 non-null   float64       
 9   Plato                    34735 non-null  float64       
 10  Predicted Transition     16973 non-null  object        
 11  Deviation                21289 non-null  float64       
 12  VesselID                 52561 n

## Data Views with multiple assets

Some Data Views return data for fermenter vessels 31 up to 36. Cell below is how to get their names. 

In [17]:
multi_asset_dvs = hub.asset_dataviews(multiple_asset=True)
multi_asset_dvs

['brewery-fv31--36']

## Get result

The column "Asset_Id" indicates which asset the data row belongs to. The data order is all data for FV31 in increasing time, followed by FV32 and so on up to FV36. 


In [20]:
df_fv31_36 = hub.dataview_interpolated_pd(namespace_id, multi_asset_dvs[0], "2017-02-01", "2017-08-01", "00:10:00")
df_fv31_36

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  ==> Finished 'dataview_interpolated_pd' in       67.9013 secs [ 2.30K rows/sec ]


Unnamed: 0,Timestamp,Asset_Id,ADF,FV Full Plato,Diacetyl,End Phase Time,Fermentation Start Time,Integrator Key,Phase Duration,Plato,...,Bottom TIC SP,Middle TIC OUT,Middle TIC PV,Middle TIC SP,Top TIC OUT,Top TIC PV,Top TIC SP,Brand,Status,Yeast Strain
0,2017-02-01 00:00:00,FV31,0.719046,17.084625,70.0,,2017-01-26T07:30:03.2369995Z,-1.0,,4.80000,...,30.0,100.0,53.798340,30.0,100.0,53.674717,30.0,Grey Horse,Cooling,NCYC1187
1,2017-02-01 00:10:00,FV31,0.719046,17.084625,70.0,,2017-01-26T07:30:03.2369995Z,-1.0,,4.80000,...,30.0,100.0,53.607800,30.0,100.0,53.494137,30.0,Grey Horse,Cooling,NCYC1187
2,2017-02-01 00:20:00,FV31,0.719046,17.084625,70.0,,2017-01-26T07:30:03.2369995Z,-1.0,,4.80000,...,30.0,100.0,53.483322,30.0,100.0,53.200000,30.0,Grey Horse,Cooling,NCYC1187
3,2017-02-01 00:30:00,FV31,0.719046,17.084625,70.0,,2017-01-26T07:30:03.2369995Z,-1.0,,4.80000,...,30.0,100.0,53.009580,30.0,100.0,52.928270,30.0,Grey Horse,Cooling,NCYC1187
4,2017-02-01 00:40:00,FV31,0.719046,17.084625,70.0,,2017-01-26T07:30:03.2369995Z,-1.0,,4.80000,...,30.0,100.0,52.899998,30.0,100.0,52.800003,30.0,Grey Horse,Cooling,NCYC1187
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
156385,2017-07-31 23:20:00,FV36,0.787345,17.026010,60.0,,2017-07-25T00:43:17.1560056Z,60.0,,3.62067,...,30.0,100.0,40.500000,30.0,100.0,39.928513,30.0,Grey Horse,Cooling,NCYC1187
156386,2017-07-31 23:30:00,FV36,0.787345,17.026010,60.0,,2017-07-25T00:43:17.1560056Z,60.0,,3.62067,...,30.0,100.0,40.653477,30.0,100.0,39.853134,30.0,Grey Horse,Cooling,NCYC1187
156387,2017-07-31 23:40:00,FV36,0.787345,17.026010,60.0,,2017-07-25T00:43:17.1560056Z,60.0,,3.62067,...,30.0,100.0,40.390297,30.0,100.0,39.852184,30.0,Grey Horse,Cooling,NCYC1187
156388,2017-07-31 23:50:00,FV36,0.787345,17.026010,60.0,,2017-07-25T00:43:17.1560056Z,60.0,,3.62067,...,30.0,100.0,40.443350,30.0,100.0,39.700000,30.0,Grey Horse,Cooling,NCYC1187


In [21]:
df_fv31_36.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 156390 entries, 0 to 156389
Data columns (total 34 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   Timestamp                156390 non-null  datetime64[ns]
 1   Asset_Id                 156390 non-null  object        
 2   ADF                      91764 non-null   float64       
 3   FV Full Plato            125435 non-null  float64       
 4   Diacetyl                 99260 non-null   float64       
 5   End Phase Time           0 non-null       float64       
 6   Fermentation Start Time  129216 non-null  object        
 7   Integrator Key           153709 non-null  float64       
 8   Phase Duration           0 non-null       float64       
 9   Plato                    96089 non-null   float64       
 10  Predicted Transition     0 non-null       float64       
 11  Deviation                0 non-null       float64       
 12  VesselID        

## Refresh datasets information 

When new datasets are published and/or existing ones are extended, you can access the updated information using `refresh_datasets` 

In [22]:
hub.refresh_datasets()

@ Hub data file: hub_datasets.json
@ Current dataset: Brewery
