# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Backfill Features to the Feature Store</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/bitcoin/1_backfill_feature_groups.ipynb)

## 🗒️ This notebook is divided into the following sections:
1. Fetch historical data 
2. Connect to the Hopsworks feature store
3. Create feature groups and insert them to the feature store

![tutorial-flow](../../images/01_featuregroups.png)

---
## <span style="color:#ff5f27;"> 📡 Connecting to the Hopsworks Feature Store </span>

In [1]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

Copy your Api Key (first register/login): https://c.app.hopsworks.ai/account/api/generated


KeyboardInterrupt: Interrupted by user

### Don't forget to create an `.env` configuration file where all the necessary environment variables (API keys) will be stored:

![](images/api_keys_env_file.png)

---

### <span style="color:#ff5f27;"> 📝 Imports</span>

In [9]:
!pip install -U unicorn-binance-rest-api --quiet
!pip install -U python-dotenv --quiet

In [1]:
import pandas as pd

from functions import *

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/romankah/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/romankah/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/romankah/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


___

## <span style="color:#ff5f27;"> 💽 Loading Data</span>

### <span style='color:#ff5f27'> 📈 Bitcoin Data

In [2]:
df_bitcoin = parse_btc_data(number_of_days_ago=2000)

df_bitcoin = df_bitcoin[(df_bitcoin.date >= '2021-02-05') & (df_bitcoin.date <= '2022-06-04')] 
df_bitcoin.reset_index(drop=True,inplace=True)

df_bitcoin.head(3)

Unnamed: 0,date,open,high,low,close,volume,quote_av,trades,tb_base_av,tb_quote_av,unix
0,2021-02-05 00:00:00,36936.65,38310.12,36570.0,38290.24,66681.334275,2509278000.0,1853253,32756.385031,1232714000.0,1612479600000
1,2021-02-06 00:00:00,38289.32,40955.51,38215.94,39186.94,98757.311183,3922095000.0,2291646,52015.513362,2065181000.0,1612566000000
2,2021-02-07 00:00:00,39181.01,39700.0,37351.0,38795.69,84363.679763,3256521000.0,1976357,40764.388959,1574483000.0,1612652400000


In [None]:
df_bitcoin_processed = process_btc_data(df_bitcoin)
df_bitcoin_processed.tail(3)

> Older records may come with time=11pm or time=9pm, but new ones have time=10pm. Thats because of timezones and daylight saving time. Lets apply this function to make unix column usable.

In [None]:
def get_hours(unix):
    return unix / 3600000 % 24

In [None]:
def fix_unix(unix):
    if get_hours(unix) == 23.0:
        return unix - 3600000
    elif get_hours(unix) == 21.0:
        return unix + 3600000
    return unix

In [None]:
get_hours(1654293600000)

In [None]:
df_bitcoin_processed.unix = df_bitcoin_processed.unix.apply(fix_unix)

### <span style='color:#ff5f27'> 💭 Tweets Data

In [None]:
tweets_textblob = pd.read_csv("https://repo.hops.works/dev/davit/bitcoin/tweets_textblob.csv", index_col=0)
tweets_textblob.unix = tweets_textblob.unix.apply(fix_unix)
tweets_textblob.head(3)

In [None]:
tweets_vader = pd.read_csv("https://repo.hops.works/dev/davit/bitcoin/tweets_vader.csv", index_col=0)
tweets_vader.unix = tweets_vader.unix.apply(fix_unix)
tweets_vader.tail(3)

In [None]:
tweets_textblob.date = tweets_textblob.date.apply(lambda x: x[:10])
tweets_vader.date = tweets_vader.date.apply(lambda x: x[:10])

---

## <span style="color:#ff5f27;"> 🪄 Creating Feature Groups </span>

### <span style='color:#ff5f27'> 📈 Bitcoin Price Feature Group

In [None]:
btc_price_fg = fs.get_or_create_feature_group(
    name='bitcoin_price',
    description='Bitcoin price aggregated for days',
    version=1,
    primary_key=['unix'],
    online_enabled=True,
    event_time='unix'
)

btc_price_fg.insert(df_bitcoin_processed, write_options={"wait_for_job": False})

### <span style='color:#ff5f27'> 💭 Tweets Feature Groups

In [None]:
tweets_textblob_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_textblob',
    version=1,
    primary_key=['unix'],
    online_enabled=True,
    event_time='unix'
)

tweets_textblob_fg.insert(tweets_textblob, write_options={"wait_for_job": False})

In [None]:
tweets_vader_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_vader',
    version=1,
    primary_key=['unix'],
    online_enabled=True,
    event_time='unix'
)

tweets_vader_fg.insert(tweets_vader, write_options={"wait_for_job": False})

---