# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/{project_name}/{notebook_name}.ipynb)


## 🗒️ This notebook is divided into the following sections:
1. Parse Data.
2. Feature Group Insertion.

## <span style='color:#ff5f27'>💽 Data
You will parse Air Quality data from [World Air Quality Index](https://aqicn.org//here/) site using your own credentials, so you have to [get an API-key](https://aqicn.org/data-platform/token/) from there.

Also, to be able parse weather data, you should get an API key from [VisualCrossing](https://www.visualcrossing.com/). You can use [this link](https://www.visualcrossing.com/weather-api).

## <span style='color:#ff5f27'> 👮🏻‍♂️ API Keys
    
`AIR_QUALITY_API_KEY = "YOUR_API_KEY"`
    
`WEATHER_API_KEY = "YOUR_API_KEY"`
    
Don't forget to create an `.env` configuration file where all the necessary environment variables will be stored:
![](images/api_keys_env_file.png)

## <span style='color:#ff5f27'> 📝 Imports

In [3]:
import pandas as pd
from datetime import datetime
import time 
import requests

from functions import *

#ignore warnings
import warnings
warnings.filterwarnings('ignore')

---

## <span style='color:#ff5f27'> 👮🏻‍♂️ API Keys

### Don't forget to create an `.env` configuration file where all the necessary environment variables (API keys) will be stored:
![](images/api_keys_env_file.png)

In [6]:
date_today = datetime.now().strftime("%Y-%m-%d")

---

## <span style='color:#ff5f27'>  🧙🏼‍♂️ Parsing Data

In [7]:
cities = ['Kyiv', 'Stockholm', 'Sundsvall', 'Malmo']

data_air_quality = [get_air_quality_data(city) for city in cities]

data_weather = [get_weather_data(city, date_today) for city in cities]

---

## <span style='color:#ff5f27'> 🧑🏻‍🏫 Dataset Preparation

#### <span style='color:#ff5f27'> 👩🏻‍🔬 Air Quality Data

In [8]:
df_air_quality = get_air_quality_df(data_air_quality)

df_air_quality.head()

Unnamed: 0,city,aqi,date,iaqi_h,iaqi_p,iaqi_pm10,iaqi_t,o3_avg,o3_max,o3_min,pm10_avg,pm10_max,pm10_min,pm25_avg,pm25_max,pm25_min
0,Kyiv,5,1666137600000,88.23,1017.6,2,14.9,21,29,12,16,19,10,54,64,34
1,Stockholm,21,1671408000000,93.0,1011.0,21,2.0,22,29,19,10,14,6,33,52,14
2,Sundsvall,32,1671408000000,88.5,1012.0,9,-6.0,22,28,18,6,7,3,18,23,9
3,Malmo,33,1671408000000,92.0,1009.3,8,2.7,20,26,13,12,15,10,42,48,33


#### <span style='color:#ff5f27'> 🌦 Weather Data

In [9]:
df_weather = get_weather_df(data_weather)

df_weather.head()

Unnamed: 0,city,date,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,...,windgust,windspeed,winddir,pressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,conditions
0,Kyiv,1671408000000,-4.8,-8.6,-6.8,-5.8,-14.5,-10.8,-9.4,81.6,...,35.6,15.8,303.8,1041.1,57.2,23.8,47.1,4.1,2.0,Partially cloudy
1,Stockholm,1671408000000,2.8,-3.1,0.5,1.9,-4.4,-1.8,-1.2,88.9,...,26.6,14.7,173.8,1019.0,87.6,9.0,4.1,0.3,0.0,"Snow, Rain, Partially cloudy"
2,Sundsvall,1671408000000,-5.4,-12.0,-8.4,-5.4,-15.3,-10.1,-9.9,89.3,...,42.8,9.4,284.7,1016.5,83.5,11.4,3.4,0.3,0.0,Partially cloudy
3,Malmo,1671408000000,4.5,-2.1,0.9,2.3,-8.2,-3.9,-0.7,88.8,...,38.9,31.4,162.8,1017.9,82.7,6.7,8.3,0.7,1.0,"Rain, Partially cloudy"


---

## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

In [None]:
air_quality_fg = fs.get_or_create_feature_group(
    name = 'air_quality_fg',
    version = 1
)
weather_fg = fs.get_or_create_feature_group(
    name = 'weather_fg',
    version = 1
)

---

## <span style="color:#ff5f27;">⬆️ Uploading new data to the Feature Store</span>

In [None]:
air_quality_fg.insert(df_air_quality, write_options={"wait_for_job": False})

In [None]:
weather_fg.insert(df_weather, write_options={"wait_for_job": True})

---