# Campus Energy Dataset Quick Start

Version 1.1

The Campus Energy Dataset is an Academic Hub dataset hosted by the OSIsoft Cloud Service (OCS, https://www.osisoft.com/solutions/cloud/vision/), a cloud-native real-time data infrastructure used to perform enterprise-wide analytics using tools and languages of the user's choice. 

<div class="alert alert-info">
<b>For documentation about the Campus Energy dataset itself, please go to <a href="https://data.academic.osisoft.com/nbviewer/github/academic-hub/campus_energy/blob/main/Campus_Energy_Dataset_Doc.ipynb">https://data.academic.osisoft.com/nbviewer/github/academic-hub/campus_energy/blob/main/Campus_Energy_Dataset_Doc.ipynb</a></b>
</div>

**Raw operational data has specific characteristics making it difficult to deal with directly**, among them:

* variable data collection frequencies
* bad values (system error codes)
* data gaps 


**But data science projects using operational data needs to be:**

* **Time-aligned** to deal with the characteritics above in consistent way according to the data type (e.g. interpolation for float values, repeat last good value for categorical data, etc)
* **Context aware** so that the data can be understandable, across as many real-world assets that you need it for
* **Shaped and filtered** to ensure you have the data you need, in the form you need it
* **Stored values** are also available when interpolated (time-aligned) values are not desirable

**The OCS solutions for application-ready data are Data Views:**

![](https://academichub.blob.core.windows.net/images/piworld-dse-dataview-p2.png)

**Each Academic Hub datasets comes endowed with a set of asset-centric data views.** 

The goal of Academic Hub Python library is to provide a very generic and consistent way to access:

* the list of existing datasets
* for a given dataset:
  * the list of its assets
  * the OCS namespace where the dataset is hosted
* for a given asset, the list of data views it belongs to

<div class="alert alert-info">
<b>The rest of this notebook is a working example of the functionality listed above for the Campus Energy dataset</b>
</div>


## Install Academic Hub Python library 

In [1]:
!pip install ocs-academic-hub==0.99.42

You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m


### [Optional] Use the `pip uninstall` only in case of library issues

In [2]:
# It's sometimes necessary to uninstall previous versions, uncomment and run the following line. 
# Then restart kernel and reinstall with previous cell
# !pip uninstall -y ocs-academic-hub ocs-sample-library-preview

### Import required module and hub_login

In [3]:
from ocs_academic_hub.datahub import hub_login

### Login to Academic Hub by running the next cell

**Execute the cell below and follow the indicated steps to log in (an AVEVA banner would show up)** 

In [4]:
widget, hub = hub_login()
widget

<IPython.core.display.Javascript object>

VBox(children=(Text(value='OK, you can proceed +@#', description='Login status:', disabled=True, placeholder='…

## Refresh datasets information

Over time existing datasets are updated and new ones are added. The cell below makes sure you have the latest version of the production datasets. 

Note: after execution of this method, a file named `hub_datasets.json` will be created in the same directory as this notebook. The data in this file supersedes the one built-in with the `ocs_academic_hub` module. To get back to the built-in datasets information, move/rename/delete `hub_datasets.json`. 

In [5]:
hub.refresh_datasets()

## Get list of published hub datasets


In [6]:
hub.datasets()

['Brewery',
 'Campus_Energy',
 'Classroom_Data',
 'MIT',
 'Pilot_Plant',
 'USC_Well_Data',
 'Wind_Farms']

## Display current active dataset

The default dataset is Brewery. Only one dataset can be active. 

In [7]:
hub.current_dataset()

'Brewery'

## Set Campus Energy as the current dataset

In [8]:
hub.set_dataset("Campus_Energy")

## Verify that Campus Energy is active

In [9]:
hub.current_dataset()

'Campus_Energy'

## Get list of assets with Data Views

Returned into the form of a pandas dataframe, with column `Asset_Id` and `Description`. Each asset has a unique `Asset_Id` as its identity. 

The cell below with `print` and `.to_string()` allows to see the whole dataframe content. 

Note that the *Academic Surge Building* is having index 1 (first column). We'll use this information in a few cells.   

In [10]:
buildings = hub.assets()
print(buildings.to_string())

                                                       Asset_Id                             Description
0                                                  ARC Pavilion                                Building
1                                       Academic Surge Building                                Building
2                              Activities and Recreation Center                                Building
3                        Advanced Materials Research Laboratory                                Building
4        Advanced Transportation Infrastructure Research Center                                Building
5                                     Agronomy Field Laboratory                                Building
6                                               Animal Building                                Building
7                                    Animal Resource Service J1                                Building
8                                    Animal Resource Service M3 

## List of all Data Views

Those are all single-asset default (with all data available for the asset) Data Views

In [11]:
hub.asset_dataviews()

['campus.building-academic_surge_building',
 'campus.building-activities_and_recreation_center',
 'campus.building-advanced_materials_research_laboratory',
 'campus.building-advanced_transportation_infrastructure_research_center',
 'campus.building-agronomy_field_laboratory',
 'campus.building-animal_building',
 'campus.building-animal_resource_service_j_1',
 'campus.building-animal_resource_service_m_3',
 'campus.building-animal_resource_service_n_1',
 'campus.building-ann_e_pitzer_center',
 'campus.building-antique_mechanics_trailer',
 'campus.building-aquatic_biology_environmental_science_bldg',
 'campus.building-arc_pavilion',
 'campus.building-art_building_annex',
 'campus.building-art_music_wright_halls',
 'campus.building-asmundson_annex',
 'campus.building-asmundson_hall',
 'campus.building-bainer_hall',
 'campus.building-bowley_head_house',
 'campus.building-briggs_hall',
 'campus.building-california_hall',
 'campus.building-campus_data_center',
 'campus.building-cellular_biol

## List of Data Views exclusive to Academic Surge Building

Empty filter (`filter=""`) allows to see all dataviews for the asset instead of simply the default one

In [12]:
academic_id = buildings["Asset_Id"][1]
print("Building Id:", academic_id)
dvs_academic = hub.asset_dataviews(asset=academic_id, filter="")
dvs_academic

Building Id: Academic Surge Building


['campus.building-academic_surge_building',
 'campus.building-academic_surge_building-chilled_water',
 'campus.building-academic_surge_building-electricity',
 'campus.building-academic_surge_building-steam']

<div class="alert alert-warning">
    <b>For the Campus Energy dataset, the default data view (e.g. <tt>campus.building-academic_surge_building</tt>) and Electricity data view (e.g. <tt>campus.building-academic_surge_building-electricity</tt>) are the same for each building. The reason is that all buildings have electricity data while while Steam and Chilled Water are optional. </b>
</div>
    
**This [link](https://data.academic.osisoft.com/nbviewer/github/academic-hub/datasets/blob/master/Campus_Energy_Dataset_Doc.ipynb#1a.-Presence-of--electricity/chilled-water/steam-for-each-building) provides a table of available data per building.**
    

## Get the OCS namespace associated to the dataset

Each data set belongs to a namespace within the Academic Hub OCS account. Since dataset may move over time, the function below always return the active namespace for the given dataset. 

In [13]:
dataset = hub.current_dataset()
namespace_id = hub.namespace_of(dataset)
namespace_id

'UC__Davis'

## Get Data View structure

With Stream Name, the column name under which stream data appears, its value type and engineering units if available. We display below the structure of the default data view. 

In [14]:
dataview_id = hub.asset_dataviews(asset=academic_id, filter="default")[0]
print(dataview_id)
print(hub.dataview_definition(namespace_id, dataview_id).to_string(index=False))

campus.building-academic_surge_building
               Asset_Id          Column_Name Stream_Type Stream_UOM                                      Stream_Name
Academic Surge Building          AnnualUsage       Float       kBtu        UCD.AcademicSurge_Electricity_AnnualUsage
Academic Surge Building          Demand_kBtu       Float       kBtu        UCD.AcademicSurge_Electricity_Demand_kBtu
Academic Surge Building      Electricity_EUI       Float  kBtu/sqft                UCD.AcademicSurge_Electricity_EUI
Academic Surge Building         MonthlyUsage       Float       kBtu       UCD.AcademicSurge_Electricity_MonthlyUsage
Academic Surge Building       Rollover Check       Float                 UCD.AcademicSurge_Electricity_RolloverCheck
Academic Surge Building Rollover Count Month       Float            UCD.AcademicSurge_Electricity_RolloverCountMonth
Academic Surge Building  Rollover Count Year       Float             UCD.AcademicSurge_Electricity_RolloverCountYear


## Getting data from a Data View

Data View can return either interpolated data or stored data

### Interpolated Data

Get interpolated data between a start and end date, with the requested interpolation interval (format is HH:MM:SS)

In [15]:
# Use the first commented out line to access a full 3-year worth of data
# df_fv32= hub.dataview_interpolated_pd(namespace_id, dataview_id, "2017-01-19", "2020-01-19", "00:30:00")
#
# This next line is for a single month of data
df_acad_interpolated = hub.dataview_interpolated_pd(
    namespace_id, dataview_id, "2018-01-01", "2018-02-01", "00:30:00"
)
df_acad_interpolated


  ==> Finished 'dataview_interpolated_pd' in       2.7755 secs [ 536 rows/sec ]


Unnamed: 0,Timestamp,Asset_Id,MonthlyUsage,AnnualUsage,Demand_kBtu,Electricity_EUI,Rollover Check,Rollover Count Month,Rollover Count Year
0,2018-01-01 00:00:00,Academic Surge Building,611748.94,7573665.5,760.19775,60.203968,0,0.0,0
1,2018-01-01 00:30:00,Academic Surge Building,612117.20,7573654.5,725.03925,60.203846,0,0.0,0
2,2018-01-01 01:00:00,Academic Surge Building,612485.50,7573643.5,830.64404,60.203724,0,0.0,0
3,2018-01-01 01:30:00,Academic Surge Building,612853.75,7573632.5,748.75460,60.203600,0,0.0,0
4,2018-01-01 02:00:00,Academic Surge Building,613222.00,7573622.0,787.10240,60.203480,0,0.0,0
...,...,...,...,...,...,...,...,...,...
1484,2018-01-31 22:00:00,Academic Surge Building,636483.44,7583340.0,1025.57020,60.274185,0,0.0,0
1485,2018-01-31 22:30:00,Academic Surge Building,636940.94,7583344.5,1053.76230,60.274240,0,0.0,0
1486,2018-01-31 23:00:00,Academic Surge Building,637398.50,7583348.5,1032.89450,60.274290,0,0.0,0
1487,2018-01-31 23:30:00,Academic Surge Building,637856.06,7583353.0,1074.98910,60.274338,0,0.0,0


In [16]:
# Information about the dataframe - this is a Pandas operation 
df_acad_interpolated.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1489 entries, 0 to 1488
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Timestamp             1489 non-null   datetime64[ns]
 1   Asset_Id              1489 non-null   object        
 2   MonthlyUsage          1489 non-null   float64       
 3   AnnualUsage           1489 non-null   float64       
 4   Demand_kBtu           1489 non-null   float64       
 5   Electricity_EUI       1489 non-null   float64       
 6   Rollover Check        1489 non-null   int64         
 7   Rollover Count Month  1441 non-null   float64       
 8   Rollover Count Year   1489 non-null   int64         
dtypes: datetime64[ns](1), float64(5), int64(2), object(1)
memory usage: 104.8+ KB


### Stored Data

Get stored (recorded) data between a start and end date. The shape of the resulting data frame is narrow with the following 4 columns and one event per row:

* Timestamp: of the stored event
* Asset_Id: asset identifier 
* Field: one of the column name from the data view definition
* Value: actual value of the event

All the events (rows) of a given field are grouped together. The maximuum number of rows is 2,000,000 (2 millions). 

In [17]:
# This next line is for a single month of data
df_acad_stored = hub.dataview_stored_pd(
    namespace_id, dataview_id, "2018-01-01", "2018-02-01"
)
df_acad_stored

+
  ==> Finished 'dataview_stored_pd' in             4.9750 secs [ 8.11K rows/sec ]


Unnamed: 0,Timestamp,Asset_Id,Field,Value
0,2018-01-01 07:59:00+00:00,Academic Surge Building,MonthlyUsage,617629.0
1,2018-01-02 07:59:00+00:00,Academic Surge Building,MonthlyUsage,17883.0
2,2018-01-03 07:59:00+00:00,Academic Surge Building,MonthlyUsage,38028.0
3,2018-01-04 07:59:00+00:00,Academic Surge Building,MonthlyUsage,58468.0
4,2018-01-05 07:59:00+00:00,Academic Surge Building,MonthlyUsage,78843.0
...,...,...,...,...
15364,2018-01-27 08:00:00+00:00,Academic Surge Building,Rollover Count Year,0.0
15365,2018-01-28 08:00:00+00:00,Academic Surge Building,Rollover Count Year,0.0
15366,2018-01-29 08:00:00+00:00,Academic Surge Building,Rollover Count Year,0.0
15367,2018-01-30 08:00:00+00:00,Academic Surge Building,Rollover Count Year,0.0


In [18]:
df_acad_stored.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 40369 entries, 0 to 15368
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype              
---  ------     --------------  -----              
 0   Timestamp  40369 non-null  datetime64[ns, UTC]
 1   Asset_Id   40369 non-null  object             
 2   Field      40369 non-null  object             
 3   Value      40367 non-null  float64            
dtypes: datetime64[ns, UTC](1), float64(1), object(2)
memory usage: 1.5+ MB


## Asset metadata

In some datasets like `Campus_Energy`, assets have metadata (static information) attached to them. This metadata comes in the form of a Python dictionary, i.e. a set of keys, each key with an associated value. The example below is representative of building metadata available with `Campus_Energy`. 

In [19]:
hub.asset_metadata(academic_id)

{'BuildingName': 'AcademicSurge',
 'CAAN': 4632,
 'Construction Date': 19920501,
 'Display Name': 'Academic Surge Building',
 'Latitude': 38.53530193195,
 'Longitude': -121.752910499,
 'Primary Usage (Type)': 'OFF - Academic / Administrative Office',
 'Total Maintained Gross Sq. Ft.': 127426,
 'chilledwater.Annual Cost': 712,
 'chilledwater.tonh Rate': 0.1493,
 'electricity.Annual Cost': 143910,
 'electricity.kWh Rate': 0.0687,
 'steam.Annual Cost': 13562,
 'steam.klb Rate': 7.2552,
 'Asset_Id': 'Academic Surge Building'}

## Metadata for all assets

It sometimes useful to get metadata of all assets into a single Pandas dataframe to select assets according to some criteria, for example Primary Usage.  

In [20]:
hub.all_assets_metadata()

Unnamed: 0,BuildingName,CAAN,Construction Date,Display Name,Latitude,Longitude,Primary Usage (Type),Total Maintained Gross Sq. Ft.,chilledwater.Annual Cost,chilledwater.tonh Rate,electricity.Annual Cost,electricity.kWh Rate,steam.Annual Cost,steam.klb Rate,Asset_Id,Primary Usage (SF % of Total),Building Name,Sq Ft.
0,ARCPavilion,4444,19770301.0,ARC Pavilion,38.541812,-121.759624,REC - Athletics & Recreation,171940.0,557.0,0.1493,51155,0.0687,1500.0,7.2552,ARC Pavilion,,,
1,AcademicSurge,4632,19920501.0,Academic Surge Building,38.535302,-121.752910,OFF - Academic / Administrative Office,127426.0,712.0,0.1493,143910,0.0687,13562.0,7.2552,Academic Surge Building,,,
2,ARC,4799,20020415.0,Activities and Recreation Center,38.542897,-121.759644,REC - Athletics & Recreation,158120.0,55977.0,0.1493,97822,0.0687,25234.0,7.2552,Activities and Recreation Center,,,
3,AMRL,4853,20080731.0,Advanced Materials Research Laboratory,38.532286,-121.758484,LAB - Lab / Research,7560.0,,,9567,0.0687,,,Advanced Materials Research Laboratory,,,
4,ATIRC,4879,20080801.0,Advanced Transportation Infrastructure Researc...,38.534596,-121.794319,LAB - Lab / Research,18955.0,,,29061,0.0687,,,Advanced Transportation Infrastructure Researc...,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
161,WatershedResearch,4833,20051024.0,Watershed Science Facility,38.534788,-121.752625,LAB - Lab / Research,18391.0,6847.0,0.1493,41650,0.0687,3452.0,7.2552,Watershed Science Facility,,,
162,Wellman,4050,0.0,0,38.541337,-121.751393,0,0.0,32992.0,0.1493,11147,0.0687,159444.0,7.2552,Wellman Hall,,,
163,WHNRC,4843,0.0,Western Human Nutrition Research Center (WHNRC),38.535025,-121.766218,LAB - Lab / Research,49941.0,52087.0,0.1493,80918,0.0687,23547.0,7.2552,Western Human Nutrition Research Center (WHNRC),,,
164,Wickson,3351,19590501.0,Wickson Hall,38.542042,-121.751590,CLS - Classroom,112937.0,62342.0,0.1493,167646,0.0687,67190.0,7.2552,Wickson Hall,,,


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=f1a24daf-02e8-473d-ac7d-4f95d28af1b1' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>