<a href="https://colab.research.google.com/github/bytehub-ai/code-examples/blob/main/bytehub_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q bytehub

[K     |████████████████████████████████| 17.7MB 201kB/s 
[?25h

In [2]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

# **ByteHub Feature Store Demo**

**Import** ByteHub

In [4]:
import bytehub as bh
bh.__version__

'0.0.7'

**Load** and **login** to **Feature Store**

In [5]:
fs = bh.FeatureStore(
    endpoint='https://api.dev.bytehub.ai/v1',
    client_id='rlnnlteca0oq01bk081kivg4s'
)

Please go to https://bytehub-ai-dev.auth.eu-west-2.amazoncognito.com/login?response_type=code&client_id=rlnnlteca0oq01bk081kivg4s&redirect_uri=https%3A%2F%2Fwww.bytehub.ai%2Fauthenticated&state=7kaddqPDqI2doZBClr63auaTLjPIRr and login. Copy the response code and paste below.
Response: ··········


## **Data Access**

ByteHub Feature Stores are **preloaded** with **prepared production-ready data**, e.g **hyperlocal weather data**

In [6]:
fs.list_features()[:5]

['bytehub/test-new-feature-123456',
 'bytehub/piglet',
 'bytehub/bmrs.feature.total-wind',
 'bytehub/noaa.data.gfs-uk',
 '/whatever-whatever']

Features are stored with **metadata**

In [7]:
fs.list_features(meta=True).head(5)

Unnamed: 0,name,meta
0,bytehub/test-new-feature-123456,{'food': 'haycorns'}
1,bytehub/piglet,{'food': 'haycorns'}
2,bytehub/bmrs.feature.total-wind,{}
3,bytehub/noaa.data.gfs-uk,{'source': 'https://www.ncdc.noaa.gov/data-acc...
4,/whatever-whatever,{}


## **Data Discovery**

Features are **regex searchable** 

In [8]:
# Search for the BMRS features
fs.list_features(regex=r'bmrs\..')[:5]

['bytehub/bmrs.feature.total-wind',
 'bytehub/bmrs.data.bod',
 'bytehub/bmrs.feature.lowest-offer',
 'bytehub/bmrs.feature.highest-bid',
 'bytehub/bmrs.data.indoitsdo']

## **Optimised for timeseries**

Lets **query** and **aggregate** some features


In [9]:
df = fs.get_timeseries(['bmrs.feature.system-price', 'bmrs.feature.niv'], from_date='2020-09-01', to_date=pd.Timestamp.utcnow(), freq='30min')
df.head()

Unnamed: 0,time,entity,bytehub/bmrs.feature.system-price,bytehub/bmrs.feature.niv
0,2020-09-01 00:00:00+00:00,,46.4,249.9333
1,2020-09-01 00:30:00+00:00,,46.3,335.4755
2,2020-09-01 01:00:00+00:00,,46.3,437.5139
3,2020-09-01 01:30:00+00:00,,46.0,239.6694
4,2020-09-01 02:00:00+00:00,,46.0,316.8444


Lets **plot** the features in a **chart**

In [11]:
traces = [
          go.Scatter(x=df.time, y=df['bytehub/bmrs.feature.system-price'], name='System Price'),
          go.Scatter(x=df.time, y=df['bytehub/bmrs.feature.niv'], name='NIV', yaxis='y2'),
]
layout = {
    'title': 'System Prices and NIV',
    'template': 'seaborn',
    'yaxis': {'title': 'System price'},
    'yaxis2': {'title': 'Net Imbalance Volume', 'overlaying': 'y', 'side': 'right'},
}
fig = go.Figure(data=traces, layout=layout)
fig

We can **upsample**

In [12]:
fs.get_timeseries(['bmrs.feature.system-price', 'bmrs.feature.niv'], from_date='2020-09-01', to_date=pd.Timestamp.utcnow(), freq='10min').head()

Unnamed: 0,time,entity,bytehub/bmrs.feature.system-price,bytehub/bmrs.feature.niv
0,2020-09-01 00:00:00+00:00,,46.4,249.9333
1,2020-09-01 00:10:00+00:00,,46.4,249.9333
2,2020-09-01 00:20:00+00:00,,46.4,249.9333
3,2020-09-01 00:30:00+00:00,,46.3,335.4755
4,2020-09-01 00:40:00+00:00,,46.3,335.4755


Or **downsample**

In [13]:
fs.get_timeseries(['bmrs.feature.system-price', 'bmrs.feature.niv'], from_date='2020-09-01', to_date=pd.Timestamp.utcnow(), freq='1d').head()

Unnamed: 0,time,entity,bytehub/bmrs.feature.system-price,bytehub/bmrs.feature.niv
0,2020-09-01 00:00:00+00:00,,46.4,249.9333
1,2020-09-02 00:00:00+00:00,,15.0,-4.6359
2,2020-09-03 00:00:00+00:00,,51.5,231.7499
3,2020-09-04 00:00:00+00:00,,2.0,-401.1833
4,2020-09-05 00:00:00+00:00,,2.0,-128.9746


For **raw features** with **different frequencies**



In [14]:
fs.get_freq('bmrs.feature.rolling-demand')

'5T'

In [15]:
fs.get_freq('bmrs.feature.system-price')

'30T'

**`get_timeseries`** will **resample** and **merge** timeseries at a **consistent frequency**

In [16]:
fs.get_timeseries(['bmrs.feature.rolling-demand', 'bmrs.feature.system-price'], from_date='2020-09-01', to_date=pd.Timestamp.utcnow(), freq='30min').head()

Unnamed: 0,time,entity,bytehub/bmrs.feature.rolling-demand,bytehub/bmrs.feature.system-price
0,2020-09-01 00:00:00+00:00,,21456,46.4
1,2020-09-01 00:30:00+00:00,,21488,46.3
2,2020-09-01 01:00:00+00:00,,21252,46.3
3,2020-09-01 01:30:00+00:00,,21293,46.0
4,2020-09-01 02:00:00+00:00,,20967,46.0


We can also use **`get_timeseries`** to **time_travel** by returning features that were available at a specified point in time. For example, return the weather forecasts that were available at least 6 hours in advance.

In [17]:
fs.get_timeseries('noaa.data.gfs-uk', from_date='2020-11-01', to_date='2020-11-10', freq='60min', time_travel='-6h').head()

Unnamed: 0,time,entity,bytehub/noaa.data.gfs-uk
0,2020-11-01 00:00:00+00:00,,"[{'GUST_surface': 11.1313114166, 'PRES_surface..."
1,2020-11-01 01:00:00+00:00,,"[{'GUST_surface': 11.7034912109, 'PRES_surface..."
2,2020-11-01 02:00:00+00:00,,"[{'GUST_surface': 13.0000200272, 'PRES_surface..."
3,2020-11-01 03:00:00+00:00,,"[{'GUST_surface': 12.8234596252, 'PRES_surface..."
4,2020-11-01 04:00:00+00:00,,"[{'GUST_surface': 13.3250379562, 'PRES_surface..."


## **Complex features**


In [18]:
df = fs.get_timeseries('bmrs.data.dersysdata', from_date='2020-10-01', to_date='2020-10-03', freq='30min')
df.tail()

Unnamed: 0,time,entity,bytehub/bmrs.data.dersysdata
92,2020-10-02 22:00:00+00:00,,"[{'activeFlag': 'Y', 'bSADDefault': 'F', 'buyP..."
93,2020-10-02 22:30:00+00:00,,"[{'activeFlag': 'Y', 'bSADDefault': 'F', 'buyP..."
94,2020-10-02 23:00:00+00:00,,"[{'activeFlag': 'Y', 'bSADDefault': 'F', 'buyP..."
95,2020-10-02 23:30:00+00:00,,"[{'activeFlag': 'Y', 'bSADDefault': 'F', 'buyP..."
96,2020-10-03 00:00:00+00:00,,"[{'activeFlag': 'Y', 'bSADDefault': 'F', 'buyP..."


In [19]:
df.iloc[-1, -1]

array([{'activeFlag': 'Y', 'bSADDefault': 'F', 'buyPriceAdjustment': 0, 'indicativeNetImbalanceVolume': 16.363, 'priceDerivationCode': 'P', 'recordType': 'SSB', 'replacementPrice': None, 'replacementPriceCalculationVolume': None, 'reserveScarcityPrice': 'NULL', 'sellPriceAdjustment': 0, 'settlementDate': 1601683200000, 'settlementPeriod': 3, 'systemBuyPrice': 50.0, 'systemSellPrice': 50.0, 'time': 1601683200000, 'totalSystemAcceptedBidVolume': -368.035, 'totalSystemAcceptedOfferVolume': 908.898, 'totalSystemAdjustmentBuyVolume': 437.0, 'totalSystemAdjustmentSellVolume': -961.5, 'totalSystemTaggedAcceptedBidVolume': -368.035, 'totalSystemTaggedAcceptedOfferVolume': 907.898, 'totalSystemTaggedAdjustmentBuyVolume': 437.0, 'totalSystemTaggedAdjustmentSellVolume': -961.5}],
      dtype=object)

In [20]:
fs.get_last('bmrs.data.dersysdata')

Unnamed: 0,time,entity,bytehub/bmrs.data.dersysdata
0,2021-01-15 23:30:00+00:00,,"[{'time': 1610753400000, 'activeFlag': 'Y', 'r..."


In [22]:
fs.get_last('bmrs.data.rolsysdem').iloc[0, -1]

[{'activeFlag': 'Y',
  'fuelTypeGeneration': 27041,
  'publishingPeriodCommencingTime': '2021-01-16 00:00:00',
  'recordType': 'VD',
  'settDate': '2021-01-16',
  'time': 1610755200000}]

In [23]:
fs.get_last('bmrs.feature.demand-30min-history')

Unnamed: 0,time,entity,bytehub/bmrs.feature.demand-30min-history
0,2021-01-16 00:00:00+00:00,,"[27892, 27697, 27613, 27518, 27389, 27041]"


## **Develop Reusable Features**

**Create** a **new** feature

In [24]:
fs.create_feature('test.my-new-feature', source='demo', animal='dog')

In [25]:
dts = pd.date_range('2020-10-01', '2020-10-04', freq='1h')
df = pd.DataFrame({'time': dts, 'value': np.random.randint(0, 100, len(dts))})
fs.save_timeseries('test.my-new-feature', df)

In [26]:
fs.get_timeseries('test.my-new-feature', from_date='2020-09-25', to_date='2020-10-04', freq='1d').head(5)

Unnamed: 0,time,entity,bytehub/test.my-new-feature
0,2020-09-25 00:00:00+00:00,,
1,2020-09-26 00:00:00+00:00,,
2,2020-09-27 00:00:00+00:00,,
3,2020-09-28 00:00:00+00:00,,
4,2020-09-29 00:00:00+00:00,,


In [27]:
fs.delete_feature('test.my-new-feature')

Build **new** features out of **old** - start by defining a **transform**.

In [28]:
fuel_mix = fs.get_last('bmrs.data.fuelhh')
fuel_mix.iloc[0,-1]

{'activeFlag': 'Y',
 'biomass': 1085,
 'ccgt': 7436,
 'coal': 1041,
 'intelec': 0,
 'intew': 336,
 'intfr': 1804,
 'intifa2': 0,
 'intirl': 252,
 'intned': 0,
 'intnem': 902,
 'intnsl': 0,
 'npshyd': 429,
 'nuclear': 6073,
 'ocgt': 3,
 'oil': 0,
 'other': 158,
 'ps': 0,
 'recordType': 'FUELHH',
 'settlementPeriod': 48,
 'startTimeOfHalfHrPeriod': 1610668800000,
 'wind': 8006}

In [29]:
def total_renewable(x):
  return x.get('biomass', 0) + x.get('npshyd', 0) + x.get('wind', 0)

fs.create_transform('total-renewables', func=total_renewable)
fs.list_transforms()

['bytehub/identity',
 'bytehub/double',
 'bytehub/extract-key',
 'bytehub/sum-interconnectors',
 'bytehub/testing123',
 'bytehub/total-renewables']

In [30]:
fs.create_virtual_feature('bmrs.feature.total-renewables', 'total-renewables', derived_from='bmrs.data.fuelhh')

Now get a timeseries of this new, **virtual feature**


In [31]:
fs.get_timeseries('bmrs.feature.total-renewables', from_date='2020-10-01', to_date='2020-10-02', freq='30min').head()

Unnamed: 0,time,entity,bytehub/bmrs.feature.total-renewables
0,2020-10-01 00:00:00+00:00,,8747
1,2020-10-01 00:30:00+00:00,,8545
2,2020-10-01 01:00:00+00:00,,8282
3,2020-10-01 01:30:00+00:00,,8004
4,2020-10-01 02:00:00+00:00,,7574


In [32]:
fs.delete_feature('bmrs.feature.total-renewables')
fs.delete_transform('total-renewables')