# OSIsoft Academic Hub Library Quick Start

Version 0.97

Academic Hub datasets are hosted by the OSIsoft Cloud Service (OCS, https://www.osisoft.com/solutions/cloud/vision/), a cloud-native realtime data infrastructure to perform enterprise-wide analytics using tools and languages of their choice. 

**Raw operational data has specific characteristics making it difficult to deal with directly**, among them:

* variable data collection frequencies
* bad values (system error codes)
* data gaps 


**But data science projects against operational data needs to be:**

* **Time-aligned** to deal with the characteritics above in consistent way according to the data type (e.g. interpolation for float values, repeat last good value for categorical data, etc)
* **Context aware** so that the data can be understandable, across as many real-world assets that you need it for
* **Shaped and filtered** to ensure you have the data you need, in the form you need it

**OCS solution for application-ready data are Data Views:**

![](https://academichub.blob.core.windows.net/images/piworld-dse-dataview-p2.png)

**Each Academic Hub datasets comes endowed with a set of asset-centric data views.** The goal of Academic Hub Python library is to allow in a very generic and consistent way to access:

* the list of existing datasets
* for a given dataset: 
  * get the list of its assets
  * get the OCS namespace where the dataset is hosted
* for a given asset, get the list data views it belongs to

The rest of this notebook is a working example of the functionality listed above. 

## Install Academic Hub Python library 

In [1]:
!pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple ocs-academic-hub==0.80.0

Looking in indexes: https://test.pypi.org/simple/, https://pypi.org/simple


## Use the `pip uninstall` only in case of library issues

In [2]:
# It's sometimes necessary to uninstall previous versions, uncomment and run the following line. Then restart kernel and reinstall with previous cell
# !pip uninstall -y ocs-academic-hub ocs-sample-library-preview

# WARNING: uncomment only for testing
#%env OCS_HUB_CONFIG=config.ini

## Import HubClient, necessary to connect and interact with OCS

In [3]:
from ocs_academic_hub import HubClient

## Running the following cell initiate the login sequence

Return to this web page when done

In [4]:
hub = HubClient()

Step 1: Get OAuth endpoint configuration...
Step 2: Set up server to process authorization response...
Step 3: Authorize the user...
Step 4: Set server to handle one request...


127.0.0.1 - - [08/Sep/2020 23:03:00] "GET /callback.html?code=D31MK_KAYbK_vDXfd_d6_2wyySmoJZznY832hfq7KNQ&scope=openid%20ocsapi&session_state=fzZ-SvVNiptdp9Xf-7IvCB9FeF7_4lbBJPxf_I_BGm0.6EGeuAZUH2rKFCbXT-Hu9A HTTP/1.1" 200 -


Step 5: Get a token using the authorization code...
Step 6: Access token read ok
Complete!
@ Hub data file: hub_datasets.json


## Get list of published hub datasets


In [5]:
hub.datasets()

['Brewery', 'Campus_Energy']

## Display current active dataset

NOTE: it will be possible to switch it once other datasets support the new asset interface. 

In [6]:
hub.current_dataset()

'Brewery'

## Get list of assets with Data Views

Returned into the form of a pandas dataframe, with column `Asset_Id` and `Description`. The cell above with `print` and `.to_string()` allows to see the whole dataframe content. 

In [7]:
print(hub.assets().to_string())

          Asset_Id            Description
0        Acid Tank                    AT1
1              BA1                       
2              BA2                       
3             BB02            Bright Tank
4        BB02 Line                       
5             BB03            Bright Tank
6             BB04            Bright Tank
7             BB05            Bright Tank
8             BB06            Bright Tank
9             BB07            Bright Tank
10            BB08            Bright Tank
11            BB09            Bright Tank
12            BB11            Bright Tank
13            BB12            Bright Tank
14            BB13            Bright Tank
15            BB14            Bright Tank
16            BB15            Bright Tank
17   Beer Transfer  Beer TransferTemplate
18         C1_BBL1            Bright Tank
19          C1_BL1     Cellar 1 Beer Line
20          C1_PS1                    PS1
21          C1_PS2                    PS2
22          C1_YL1    Cellar 1 Yea

## List of all Data Views

Those are all single-asset default (with all data available for the asset) Data Views

In [8]:
hub.asset_dataviews()

['brewery-acid.tank',
 'brewery-ba1',
 'brewery-ba2',
 'brewery-bb02',
 'brewery-bb02.line',
 'brewery-bb03',
 'brewery-bb04',
 'brewery-bb05',
 'brewery-bb06',
 'brewery-bb07',
 'brewery-bb08',
 'brewery-bb09',
 'brewery-bb11',
 'brewery-bb12',
 'brewery-bb13',
 'brewery-bb14',
 'brewery-bb15',
 'brewery-beer.transfer',
 'brewery-c1_bbl1',
 'brewery-c1_bl1',
 'brewery-c1_ps1',
 'brewery-c1_ps2',
 'brewery-c1_yl1',
 'brewery-c2_bbl1',
 'brewery-c2_ft1',
 'brewery-c2_ps1',
 'brewery-c2_ps2',
 'brewery-c3_bl1',
 'brewery-c3_ft1',
 'brewery-c3_yl1',
 'brewery-caustic.tank',
 'brewery-clean.in.place',
 'brewery-cnt1',
 'brewery-cnt2',
 'brewery-cst1',
 'brewery-fv01',
 'brewery-fv02',
 'brewery-fv08',
 'brewery-fv09',
 'brewery-fv10',
 'brewery-fv11',
 'brewery-fv12',
 'brewery-fv13',
 'brewery-fv14',
 'brewery-fv15',
 'brewery-fv16',
 'brewery-fv17',
 'brewery-fv18',
 'brewery-fv19',
 'brewery-fv1__fv2.line',
 'brewery-fv20',
 'brewery-fv21',
 'brewery-fv22',
 'brewery-fv23',
 'brewery-fv

## List of Data Views exclusive to Fermenter Vessel #32 (FV32)

Empty filter (`filter=""`) allows to see all dataviews for the asset instead of simply the default one

In [9]:
dvs_fv32 = hub.asset_dataviews(asset="FV32", filter="")
dvs_fv32

['brewery-fv32',
 'brewery-fv32-adf_prediction',
 'brewery-fv32-cooling_prediction',
 'brewery-fv32-pca']

## List Multi-Asset Data Views Containing FV32

The column `Asset_Id` in data view results indicates which asset the row of data belongs to 

In [10]:
hub.asset_dataviews(asset="FV32", multiple_asset=True, filter="")

['brewery-fv31--36',
 'brewery-fv31--36-adf_prediction',
 'brewery-fv31--36-cooling_prediction',
 'brewery-fv31--36-pca']

## Get the OCS namespace associated to the dataset

Each data set belongs to a namespace within the Academic Hub OCS account. Since dataset may move over time, the function below always return the active namespace for the given dataset. 

In [11]:
dataset = hub.current_dataset()
namespace_id = hub.namespace_of(dataset)
namespace_id

'academic_hub_01'

## Get Data View structure

With Stream Name, the column name under which stream data appears, its value type and engineering units if available. We display below the structure of the default data view. 

In [12]:
dataview_id = hub.asset_dataviews(asset="FV32", filter="default")[0]
print(dataview_id)
print(hub.dataview_definition(namespace_id, dataview_id).to_string(index=False))

brewery-fv32
Asset_Id              Column_Name Stream_Type Stream_UOM                        OCS_Stream_Name
    FV32                      ADF       Float                                 B2_CL_C2_FV32/ADF
    FV32           Bottom TIC OUT       Float          %          B2_CL_C2_FV32_TIC1380A/OUT.CV
    FV32            Bottom TIC PV       Float        Â°F           B2_CL_C2_FV32_TIC1380A/PV.CV
    FV32            Bottom TIC SP       Float        Â°F           B2_CL_C2_FV32_TIC1380A/SP.CV
    FV32                    Brand    Category                            B2_CL_C2_FV32/BRAND.CV
    FV32                Deviation       Float                B2_CL_C2_FV32/Prediction.Deviation
    FV32                 Diacetyl     Integer        ppb                 B2_CL_C2_FV32/Diacetyl
    FV32           End Phase Time       Float          m          B2_CL_C2_FV32/EndPhaseTime.CV
    FV32            FV Full Plato       Float      Plato          B2_CL_C2_FV32/DcrsFvFullPlato
    FV32  Fermentation Star

## Getting data from a Data View

Return interpolated data between a start and end date, with the requested interpolation interval (format is HH:MM:SS)

In [13]:
df_fv32= hub.dataview_interpolated_pd(namespace_id, dataview_id, "2017-01-19", "2020-01-19", "00:30:00")
df_fv32

+++++++++++++++++++
  ==> Finished 'dataview_interpolated_pd' in       58.6850 secs [ 896 rows/sec ]


Unnamed: 0,Timestamp,Asset_Id,ADF,FV Full Plato,Diacetyl,End Phase Time,Fermentation Start Time,Integrator Key,Phase Duration,Plato,...,Bottom TIC SP,Middle TIC OUT,Middle TIC PV,Middle TIC SP,Top TIC OUT,Top TIC PV,Top TIC SP,Brand,Status,Yeast Strain
0,2017-01-19 00:00:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,17.355146,63.032608,63.0,8.699567,63.067005,63.0,Realtime Hops,Fermentation,NCYC1187
1,2017-01-19 00:30:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,7.568751,63.035385,63.0,20.299335,63.055607,63.0,Realtime Hops,Fermentation,NCYC1187
2,2017-01-19 01:00:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,4.256329,63.019897,63.0,46.984615,63.241917,63.0,Realtime Hops,Fermentation,NCYC1187
3,2017-01-19 01:30:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,17.092400,63.079906,63.0,74.284065,63.305080,63.0,Realtime Hops,Fermentation,NCYC1187
4,2017-01-19 02:00:00,FV32,0.104108,13.506092,,,2017-01-18T05:59:56.6180112Z,12.10000,,12.10000,...,63.0,46.463210,63.176716,63.0,42.702755,63.150780,63.0,Realtime Hops,Fermentation,NCYC1187
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52556,2020-01-18 22:00:00,FV32,0.756629,16.295610,25.0,2880.0,2019-12-16T09:28:19.6750028Z,3.96588,6803.0,3.96588,...,30.0,6.131247,30.028662,30.0,0.000000,29.951517,30.0,Trois Lacs,Ready to Transfer,a38
52557,2020-01-18 22:30:00,FV32,0.756629,16.295610,25.0,2880.0,2019-12-16T09:28:19.6750028Z,3.96588,6833.0,3.96588,...,30.0,0.000000,29.966312,30.0,38.112650,30.199999,30.0,Trois Lacs,Ready to Transfer,a38
52558,2020-01-18 23:00:00,FV32,0.756629,16.295610,25.0,2880.0,2019-12-16T09:28:19.6750028Z,3.96588,6863.0,3.96588,...,30.0,0.000000,30.089066,30.0,42.780502,30.199999,30.0,Trois Lacs,Ready to Transfer,a38
52559,2020-01-18 23:30:00,FV32,0.756629,16.295610,25.0,2880.0,2019-12-16T09:28:19.6750028Z,3.96588,6893.0,3.96588,...,30.0,0.000000,30.011000,30.0,42.780502,30.199999,30.0,Trois Lacs,Ready to Transfer,a38


In [14]:
# Information about the dataframe - this is a Pandas operation 
df_fv32.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52561 entries, 0 to 52560
Data columns (total 34 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   Timestamp                52561 non-null  datetime64[ns]
 1   Asset_Id                 52561 non-null  object        
 2   ADF                      30540 non-null  float64       
 3   FV Full Plato            36987 non-null  float64       
 4   Diacetyl                 32934 non-null  float64       
 5   End Phase Time           37646 non-null  float64       
 6   Fermentation Start Time  44539 non-null  object        
 7   Integrator Key           51627 non-null  float64       
 8   Phase Duration           5244 non-null   float64       
 9   Plato                    34735 non-null  float64       
 10  Predicted Transition     16973 non-null  object        
 11  Deviation                21289 non-null  float64       
 12  VesselID                 52561 n

## Data Views with multiple assets

Some Data Views return data for fermenter vessels 31 up to 36. Cell below is how to get their names. 

In [15]:
multi_asset_dvs = hub.asset_dataviews(multiple_asset=True)
multi_asset_dvs

['brewery-fv01--28', 'brewery-fv31--36', 'brewery-fv37--46']

## Get result

The column "Asset_Id" indicates which asset the data row belongs to. The data order is all data for FV31 in increasing time, followed by FV32 and so on up to FV36. 


In [16]:
df_fv31_36 = hub.dataview_interpolated_pd(namespace_id, multi_asset_dvs[0], "2017-02-01", "2017-08-01", "00:30:00")
df_fv31_36

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  ==> Finished 'dataview_interpolated_pd' in       101.3838 secs [ 1.97K rows/sec ]


Unnamed: 0,Timestamp,Asset_Id,ADF,FV Full Plato,Diacetyl,End Phase Time,Fermentation Start Time,Integrator Key,Maturation Start Time,Phase Duration,...,Middle TIC PV,Middle TIC SP,Top TIC OUT,Top TIC PV,Top TIC SP,Brand,Brewing Release,Quality Release,Status,Yeast Strain
0,2017-02-01 00:00:00,FV01,,,,,,-1.0,,,...,63.784530,63.784530,0.0,64.094820,64.094820,,,,Sanitizing,
1,2017-02-01 00:30:00,FV01,,,,,,-1.0,,,...,63.810040,63.810040,0.0,64.115980,64.115980,,,,Sanitizing,
2,2017-02-01 01:00:00,FV01,,,,,,-1.0,,,...,63.828495,63.828495,0.0,64.132515,64.132515,,,,Sanitizing,
3,2017-02-01 01:30:00,FV01,,,,,,-1.0,,,...,63.846947,63.846947,0.0,64.149055,64.149055,,,,Sanitizing,
4,2017-02-01 02:00:00,FV01,,,,,,-1.0,,,...,63.865402,63.865402,0.0,64.165590,64.165590,,,,Sanitizing,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
199842,2017-07-31 22:00:00,FV28,,,37.0,,2017-07-25T07:28:25.7840118Z,37.0,,,...,,,100.0,44.100000,30.000000,Coorsight,,,Addition Maturation,NCYC2124
199843,2017-07-31 22:30:00,FV28,,,37.0,,2017-07-25T07:28:25.7840118Z,37.0,,,...,,,100.0,43.984710,30.000000,Coorsight,,,Addition Maturation,NCYC2124
199844,2017-07-31 23:00:00,FV28,,,37.0,,2017-07-25T07:28:25.7840118Z,37.0,,,...,,,100.0,43.722637,30.000000,Coorsight,,,Addition Maturation,NCYC2124
199845,2017-07-31 23:30:00,FV28,,,37.0,,2017-07-25T07:28:25.7840118Z,37.0,,,...,,,100.0,43.500000,30.000000,Coorsight,,,Addition Maturation,NCYC2124


In [17]:
df_fv31_36.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 199847 entries, 0 to 199846
Data columns (total 34 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   Timestamp                199847 non-null  datetime64[ns]
 1   Asset_Id                 199847 non-null  object        
 2   ADF                      85288 non-null   float64       
 3   FV Full Plato            117399 non-null  float64       
 4   Diacetyl                 94518 non-null   float64       
 5   End Phase Time           0 non-null       float64       
 6   Fermentation Start Time  128646 non-null  object        
 7   Integrator Key           193890 non-null  float64       
 8   Maturation Start Time    0 non-null       float64       
 9   Phase Duration           0 non-null       float64       
 10  Plato                    89553 non-null   float64       
 11  Predicted Transition     0 non-null       float64       
 12  Deviation       

## Change datasets

As seen above, the other available dataset is `Campus_Energy`. 

In [18]:
hub.set_dataset("Campus_Energy")

## Verify that it's now the current dataset

In [19]:
hub.current_dataset()

'Campus_Energy'

## Update the namespace Id 

It can be different from dataset to dataset 

In [20]:
namespace_id = hub.namespace_of(hub.current_dataset())
namespace_id

'UC__Davis'

## Assets of new dataset 

In [21]:
print(hub.assets().to_string(index=False))
#hub.assets()

                                                   Asset_Id                             Description
                                               ARC Pavilion                                Building
                                    Academic Surge Building                                Building
                           Activities and Recreation Center                                Building
                     Advanced Materials Research Laboratory                                Building
     Advanced Transportation Infrastructure Research Center                                Building
                                  Agronomy Field Laboratory                                Building
                                            Animal Building                                Building
                                 Animal Resource Service J1                                Building
                                 Animal Resource Service M3                                Building


## Data view discovery and interpolation data methods are the same

The difference is that for `Campus_Energy` dataset, the default data view is the same as the `-electricity` data view. The reason is that the `electricity` data view is the only one which is common to all buildings. The `chilled_water` and `steam` data views are optional. Please consult the `Campus_Energy` dataset documentation for details. 

## Refresh datasets information 

When new datasets are published and/or existing ones are extended, you can access the updated information using `refresh_datasets`. 

Note: after execution of this method, a file named `hub_datasets.json` will be created in the same directory as this notebook. The data in this file supersedes the one built-in with the `ocs_academic_hub` module. To get back to the built-in datasets information, move/rename/delete `hub_datasets.json`.  

In [22]:
hub.refresh_datasets()

@ Hub data file: hub_datasets.json
@ Current dataset: Campus_Energy


In [23]:
hub.current_dataset()

'Campus_Energy'