# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Backfill Features to the Feature Store</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/{project_name}/{notebook_name}.ipynb)


## 🗒️ This notebook is divided into the following sections:
1. Fetch historical data
2. Connect to the Hopsworks feature store
3. Create feature groups and insert them to the feature store

![tutorial-flow](01_featuregroups.png)

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
import pandas as pd

from functions import *

---

## <span style='color:#ff5f27'> 💽 Loading Historical Data</span>


#### <span style='color:#ff5f27'> 👩🏻‍🔬 Air Quality Data

In [2]:
df_air_quality = pd.read_csv('https://repo.hops.works/dev/davit/air_quality/air_quality.csv')
df_air_quality.head()

Unnamed: 0,city,aqi,date,iaqi_h,iaqi_p,iaqi_pm10,iaqi_t,o3_avg,o3_max,o3_min,pm10_avg,pm10_max,pm10_min,pm25_avg,pm25_max,pm25_min,uvi_avg,uvi_max,uvi_min
0,Sundsvall,11,2022-09-06,34.3,1020.5,10,15.8,19,25,13,3,4,2,7,8,6,0,0,0
1,Kyiv,6,2022-09-06,98.78,1022.7,2,13.91,19,27,11,7,11,4,23,39,14,0,0,0
2,Stockholm,13,2022-09-06,59.0,1021.0,13,17.0,17,25,6,6,12,3,14,30,9,0,0,0
3,Malmo,23,2022-09-06,54.5,1019.0,4,17.3,27,32,24,5,5,3,10,11,8,1,1,1
4,Stockholm,1,2022-09-07,46.0,1022.0,1,16.0,18,26,8,7,10,4,16,21,8,1,1,1


In [3]:
df_air_quality.date = df_air_quality.date.apply(timestamp_2_time)
df_air_quality.sort_values(by = ['city','date'],inplace = True,ignore_index = True)
#df_air_quality=df_air_quality.drop(columns=['uvi_max','uvi_avg','uvi_min'])
df_air_quality.head()

Unnamed: 0,city,aqi,date,iaqi_h,iaqi_p,iaqi_pm10,iaqi_t,o3_avg,o3_max,o3_min,pm10_avg,pm10_max,pm10_min,pm25_avg,pm25_max,pm25_min
0,Kyiv,6,1662415200000,98.78,1022.7,2,13.91,19,27,11,7,11,4,23,39,14
1,Kyiv,2,1662501600000,65.58,1020.0,1,21.02,22,32,10,9,12,5,29,43,15
2,Kyiv,4,1662588000000,99.9,1017.6,2,12.45,22,32,10,9,12,5,29,43,15
3,Kyiv,2,1662674400000,99.9,1021.3,1,10.0,20,33,6,10,17,5,33,56,16
4,Malmo,23,1662415200000,54.5,1019.0,4,17.3,27,32,24,5,5,3,10,11,8


#### <span style='color:#ff5f27'> 🌦 Weather Data

In [4]:
df_weather = pd.read_csv('https://repo.hops.works/dev/davit/air_quality/weather.csv')
df_weather=df_weather.drop(columns=['uvindex'])
df_weather.head(3)

Unnamed: 0,city,date,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,...,snowdepth,windgust,windspeed,winddir,pressure,cloudcover,visibility,solarradiation,solarenergy,conditions
0,Kyiv,2022-09-06,17.7,4.6,11.5,17.7,4.6,11.5,1.8,55.3,...,0.0,24.5,9.7,267.0,1022.3,34.8,24.1,227.5,19.6,Partially cloudy
1,Sundsvall,2022-09-06,13.0,3.0,8.6,13.0,0.1,7.4,5.5,81.8,...,0.0,31.0,14.3,192.9,1024.1,90.8,15.3,116.1,10.1,Overcast
2,Stockholm,2022-09-06,15.9,7.8,12.0,15.9,7.1,11.8,7.0,73.6,...,0.0,25.2,13.0,70.7,1022.0,59.5,15.3,132.5,11.6,Partially cloudy


In [5]:
df_weather.date = df_weather.date.apply(timestamp_2_time)
df_weather.sort_values(by=['city', 'date'],inplace=True, ignore_index=True)

df_weather.head(3)

Unnamed: 0,city,date,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,...,snowdepth,windgust,windspeed,winddir,pressure,cloudcover,visibility,solarradiation,solarenergy,conditions
0,Kyiv,1662415200000,17.7,4.6,11.5,17.7,4.6,11.5,1.8,55.3,...,0.0,24.5,9.7,267.0,1022.3,34.8,24.1,227.5,19.6,Partially cloudy
1,Kyiv,1662501600000,17.7,4.6,11.5,17.7,4.6,11.5,1.8,55.3,...,0.0,24.5,9.7,267.0,1022.3,34.8,24.1,227.5,19.6,Partially cloudy
2,Kyiv,1662588000000,21.1,7.9,14.3,21.1,7.9,14.3,3.3,50.9,...,0.0,24.8,9.7,132.9,1019.4,48.4,24.1,217.9,18.7,Partially cloudy


---

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [12]:
from platform import python_version
python_version()

'3.9.13'

In [13]:
pip uninstall hsfs hopsworks

^C
Note: you may need to restart the kernel to use updated packages.


In [15]:
!SET CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1

In [7]:
pip install hopsworks

Note: you may need to restart the kernel to use updated packages.


In [16]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/5287


FeatureStoreException: Trying to instantiate Python as engine, but 'python' extras are missing in HSFS installation. Install with `pip install hsfs[python]`.

---

## <span style="color:#ff5f27;">🪄 Creating Feature Groups</span>

#### <span style='color:#ff5f27'> 👩🏻‍🔬 Air Quality Data

In [None]:
air_quality_fg = fs.get_or_create_feature_group(
        name = 'air_quality_fg',
        description = 'Air Quality characteristics of each day',
        version = 3,
        primary_key = ['city','date'],
        online_enabled = True,
        event_time = 'date'
    )    

air_quality_fg.insert(df_air_quality)

#### <span style='color:#ff5f27'> 🌦 Weather Data

In [None]:
weather_fg = fs.get_or_create_feature_group(
        name = 'weather_fg',
        description = 'Weather characteristics of each day',
        version = 3,
        primary_key = ['city','date'],
        online_enabled = True,
        event_time = 'date'
    )    

weather_fg.insert(df_weather)

---