# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>

## 🗒️ This notebook is divided in 3 sections:
1. Parsing Data.
2. Preparing dataframes.
3. Feature Group Insertion.

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
import pandas as pd
from datetime import datetime
import time 
import requests
import os 

from features import *

## <span style='color:#ff5f27'> 👮🏻‍♂️ API Keys

In [2]:
AIR_QUALITY_API_KEY = os.getenv('AIR_QUALITY_API_KEY')
WEATHER_API_KEY = os.getenv('WEATHER_API_KEY')

date_today = datetime.now().strftime("%Y-%m-%d")

## <span style='color:#ff5f27'> 🕵🏻‍♂️ Parsing

In [3]:
cities = ['Kyiv','Stockholm','Sundsvall','Malmo']

data_air_quality = [get_air_quality_data(city,AIR_QUALITY_API_KEY) for city in cities]

data_weather = [get_weather_data(city,date_today,WEATHER_API_KEY) for city in cities]

## <span style='color:#ff5f27'> 🧑🏻‍🏫 Dataset Preparation

#### <span style='color:#ff5f27'> 👩🏻‍🔬 Air Quality Data

In [4]:
df_air_quality = get_air_quality_df(data_air_quality)

df_air_quality.head()

Unnamed: 0,city,aqi,date,iaqi_h,iaqi_p,iaqi_pm10,iaqi_t,o3_avg,o3_max,o3_min,pm10_avg,pm10_max,pm10_min,pm25_avg,pm25_max,pm25_min,uvi_avg,uvi_max,uvi_min
0,Kyiv,5,1662843600000,99.9,1007.8,2,13.37,22,28,18,8,11,6,29,41,19,0,0,0
1,Stockholm,16,1662843600000,71.0,1017.3,16,16.1,24,29,20,5,7,4,12,18,8,0,0,0
2,Sundsvall,19,1662843600000,68.0,1013.6,12,15.0,23,29,16,4,5,2,10,12,6,0,0,0
3,Malmo,-,1662757200000,76.2,1014.3,7,16.6,25,35,20,11,14,8,35,46,25,0,0,0


In [5]:
df_air_quality.aqi = df_air_quality.aqi.replace('-',24)

#### <span style='color:#ff5f27'> 🌦 Weather Data

In [6]:
df_weather = get_weather_df(data_weather)

df_weather.head()

Unnamed: 0,city,date,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,...,windgust,windspeed,winddir,pressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,conditions
0,Kyiv,1662843600000,16.7,12.0,13.9,16.7,12.0,13.9,11.3,85.2,...,43.9,19.1,185.4,1009.9,91.9,22.4,69.9,5.9,3.0,"Rain, Overcast"
1,Stockholm,1662843600000,16.0,9.0,12.6,16.0,8.7,12.5,8.5,77.0,...,19.4,9.2,141.6,1016.7,73.6,11.8,72.0,6.2,4.0,Partially cloudy
2,Sundsvall,1662843600000,17.0,8.0,11.5,17.0,7.5,11.4,7.9,79.8,...,18.7,16.1,253.8,1016.3,75.0,11.8,155.0,13.4,5.0,Partially cloudy
3,Malmo,1662843600000,17.4,13.6,15.3,17.4,13.6,15.3,12.8,85.7,...,31.0,14.2,215.6,1016.1,70.8,11.3,66.3,5.7,4.0,Partially cloudy


## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [7]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/167
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;">🪄 Retrieving Feature Groups</span>

In [8]:
air_quality_fg = fs.get_or_create_feature_group(
    name = 'air_quality_fg',
    version = 1
)
weather_fg = fs.get_or_create_feature_group(
    name = 'weather_fg',
    version = 1
)

### <span style="color:#ff5f27;"> ⚜️ Index column preparation </span>

In [9]:
index_latest = air_quality_fg.read()['index'].max()

2022-09-11 21:34:08,256 INFO: USE `maksym00_featurestore`
2022-09-11 21:34:09,316 INFO: SELECT `fg0`.`index` `index`, `fg0`.`city` `city`, `fg0`.`aqi` `aqi`, `fg0`.`date` `date`, `fg0`.`iaqi_h` `iaqi_h`, `fg0`.`iaqi_p` `iaqi_p`, `fg0`.`iaqi_pm10` `iaqi_pm10`, `fg0`.`iaqi_t` `iaqi_t`, `fg0`.`o3_avg` `o3_avg`, `fg0`.`o3_max` `o3_max`, `fg0`.`o3_min` `o3_min`, `fg0`.`pm10_avg` `pm10_avg`, `fg0`.`pm10_max` `pm10_max`, `fg0`.`pm10_min` `pm10_min`, `fg0`.`pm25_avg` `pm25_avg`, `fg0`.`pm25_max` `pm25_max`, `fg0`.`pm25_min` `pm25_min`, `fg0`.`uvi_avg` `uvi_avg`, `fg0`.`uvi_max` `uvi_max`, `fg0`.`uvi_min` `uvi_min`
FROM `maksym00_featurestore`.`air_quality_fg_1` `fg0`




In [11]:
prepare_index(df_air_quality,index_latest)
prepare_index(df_weather,index_latest)

## <span style="color:#ff5f27;">🧬 Inserting into Feature Groups</span>

In [14]:
air_quality_fg.insert(df_air_quality)

Uploading Dataframe: 0.00% |          | Rows 0/4 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/air_quality_fg_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7fa7f9eb5f10>, None)

In [15]:
weather_fg.insert(df_weather)

Uploading Dataframe: 0.00% |          | Rows 0/4 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/167/jobs/named/weather_fg_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7fa7f9ec05e0>, None)

---