<span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Daily Feature Pipeline for Air Quality (aqicn.org) and weather (openmeteo)</span>

## 🗒️ This notebook is divided into the following sections:
1. Download and Parse Data
2. Feature Group Insertion


__This notebook should be scheduled to run daily__

In the book, we use a GitHub Action stored here:
[.github/workflows/air-quality-daily.yml](https://github.com/featurestorebook/mlfs-book/blob/main/.github/workflows/air-quality-daily.yml)

However, you are free to use any Python Orchestration tool to schedule this program to run daily.

### <span style='color:#ff5f27'> 📝 Imports

In [24]:
import datetime
import time
import requests
import pandas as pd
import hopsworks
from functions import util
from functions import fetch_data
import json
import os
import warnings
warnings.filterwarnings("ignore")

## <span style='color:#ff5f27'> 🌍 Get the Sensor URL, Country, City, Street names from Hopsworks </span>

__Update the values in the cell below.__

__These should be the same values as in notebook 1 - the feature backfill notebook__


In [25]:
# If you haven't set the env variable 'HOPSWORKS_API_KEY', then uncomment the next line and enter your API key
# os.environ["HOPSWORKS_API_KEY"] = ""
with open('../data/keys/hopsworks-api-key.txt', 'r') as file:
    os.environ["HOPSWORKS_API_KEY"] = file.read().rstrip()

project = hopsworks.login(project="ML_Project_Electricity", api_key_value=os.environ["HOPSWORKS_API_KEY"])
fs = project.get_feature_store() 
# secrets = util.secrets_api(project.name)
print("Project name:", project.name)

# This line will fail if you have not registered the AQI_API_KEY as a secret in Hopsworks
# AQI_API_KEY = secrets.get_secret("AQI_API_KEY").value
# location_str = secrets.get_secret("SENSOR_LOCATION_JSON").value
# location = json.loads(location_str)

# country=location['country']
# city=location['city']
# street=location['street']
# aqicn_url=location['aqicn_url']
# latitude=location['latitude']
# longitude=location['longitude']

# today = datetime.date.today()

# location_str

2025-01-08 10:30:27,714 INFO: Closing external client and cleaning up certificates.


Connection closed.
2025-01-08 10:30:27,776 INFO: Initializing external client
2025-01-08 10:30:27,776 INFO: Base URL: https://c.app.hopsworks.ai:443
2025-01-08 10:30:27,776 INFO: Base URL: https://c.app.hopsworks.ai:443
2025-01-08 10:30:28,964 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1207495
Project name: ML_Project_Electricity


### <span style="color:#ff5f27;"> 🔮 Get references to the Feature Groups </span>

In [26]:
# Retrieve feature groups
sthlm_weather_fg = fs.get_feature_group(
    name='stockholm_weather',
    version=1,
)
malmo_weather_fg = fs.get_feature_group(
    name='malmo_weather',
    version=1,
)

se3_fg = fs.get_feature_group(
    name='se3_electricity_prices',
    version=1,
)

se4_fg = fs.get_feature_group(
    name='se4_electricity_prices',
    version=1,
)

---

## Retrieve the most recent electricity price data (for tomorrow)

In [27]:
# Use the function get_tomorrows_electricity_prices 
# from the fetch_data module to get the electricity prices for tomorrow

se3_current_prices = fetch_data.get_tomorrows_electricity_prices('SE3')
se4_current_prices = fetch_data.get_tomorrows_electricity_prices('SE4')


se4_current_prices.head()


Request failed. Status: 404
Failed to retrieve data: 404
Request failed. Status: 404
Failed to retrieve data: 404


AttributeError: 'NoneType' object has no attribute 'head'

In [None]:
# Read the feature groups into pandas dataframes
se3_df = se3_fg.read()
se4_df = se4_fg.read()

sthlm_weather_df = sthlm_weather_fg.read()
malmo_weather_df = malmo_weather_fg.read()

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.10s) 
Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.81s) 
Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.82s) 
Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.79s) 


In [None]:
# Merge the historical electricity prices with the most recent prices
se3_df = pd.concat([se3_df, se3_current_prices], axis=0)
se4_df = pd.concat([se4_df, se4_current_prices], axis=0)

# sort the dataframes by time
se3_df = se3_df.sort_values('time')
se4_df = se4_df.sort_values('time')

# Calculate a rolling average of the electricity prices of the last 7 days
se3_df['spot_price_rolling'] = se3_df['spotpriceeur'].rolling(window=24*7).mean()
se4_df['spot_price_rolling'] = se4_df['spotpriceeur'].rolling(window=24*7).mean()

# convert to datetime
# se3_df['time'] = pd.to_datetime(se3_df['time'])
# se4_df['time'] = pd.to_datetime(se4_df['time'])


In [None]:
se3_df.tail()

Unnamed: 0,time,pricearea,spotpriceeur,spot_price_rolling
0,2025-01-06 23:00:00+00:00,SE3,10.0,11.458512
0,2025-01-06 23:00:00+00:00,SE3,10.0,11.497738
1,2025-01-07 00:00:00+00:00,SE3,6.96,11.521131
1,2025-01-07 00:00:00+00:00,SE3,6.96,11.543452
2,2025-01-07 01:00:00+00:00,SE3,5.42,11.559286
2,2025-01-07 01:00:00+00:00,SE3,5.42,11.577976
3,2025-01-07 02:00:00+00:00,SE3,4.35,11.574167
3,2025-01-07 02:00:00+00:00,SE3,4.35,11.534464
4,2025-01-07 03:00:00+00:00,SE3,5.08,11.509345
4,2025-01-07 03:00:00+00:00,SE3,5.08,11.470476


## Insert the newly retrieved values into the feature groups

In [None]:
# Insert the new electricity prices into the feature store
se3_fg.insert(se3_df.tail(24))
se4_fg.insert(se4_df.tail(24))

Uploading Dataframe: 100.00% |██████████| Rows 24/24 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: se3_electricity_prices_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1207495/jobs/named/se3_electricity_prices_1_offline_fg_materialization/executions


Uploading Dataframe: 100.00% |██████████| Rows 24/24 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: se4_electricity_prices_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1207495/jobs/named/se4_electricity_prices_1_offline_fg_materialization/executions


(Job('se4_electricity_prices_1_offline_fg_materialization', 'SPARK'), None)

## Retrieve fresh weather data

In [None]:
sthlm_forecast_df = fetch_data.get_hourly_weather_forecast(59.3294, 18.0687) #Stockholm
malmo_forecast_df = fetch_data.get_hourly_weather_forecast(55.6059, 13.0007) #Malmo

# Drop nan values
sthlm_forecast_df = sthlm_forecast_df.dropna()
malmo_forecast_df = malmo_forecast_df.dropna()

malmo_forecast_df

Coordinates 59.32889938354492°N 18.072357177734375°E
Elevation 24.0 m asl
Timezone b'Europe/Berlin' b'CET'
Timezone difference to GMT+0 3600 s
Coordinates 55.60652542114258°N 13.002044677734375°E
Elevation 12.0 m asl
Timezone b'Europe/Berlin' b'CET'
Timezone difference to GMT+0 3600 s


Unnamed: 0,time,temperature,precipitation,cloud_cover,wind_speed_10m,date,sunshine_duration,weekday,month,hour
0,2025-01-05 23:00:00+00:00,0.874,0.3,100,31.680000,2025-01-05,0.000000,6,1,23
1,2025-01-06 00:00:00+00:00,0.824,0.6,100,35.279999,2025-01-06,0.000000,0,1,0
2,2025-01-06 01:00:00+00:00,0.674,0.6,100,34.919998,2025-01-06,0.000000,0,1,1
3,2025-01-06 02:00:00+00:00,0.674,1.5,100,32.760002,2025-01-06,0.000000,0,1,2
4,2025-01-06 03:00:00+00:00,0.824,1.6,100,30.599998,2025-01-06,0.000000,0,1,3
...,...,...,...,...,...,...,...,...,...,...
140,2025-01-11 19:00:00+00:00,0.450,0.0,83,27.155552,2025-01-11,21260.839844,5,1,19
141,2025-01-11 20:00:00+00:00,0.700,0.0,93,26.019806,2025-01-11,21260.839844,5,1,20
142,2025-01-11 21:00:00+00:00,0.900,0.0,100,25.233406,2025-01-11,21260.839844,5,1,21
143,2025-01-11 22:00:00+00:00,0.950,0.0,100,24.975283,2025-01-11,21260.839844,5,1,22


## Insert weather forecast data into weather featuregroups

In [None]:
# Insert the new weather forecast into the feature store
sthlm_weather_fg.insert(sthlm_forecast_df)
malmo_weather_fg.insert(malmo_forecast_df)

Uploading Dataframe: 100.00% |██████████| Rows 145/145 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: stockholm_weather_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1207495/jobs/named/stockholm_weather_1_offline_fg_materialization/executions


Uploading Dataframe: 100.00% |██████████| Rows 145/145 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: malmo_weather_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1207495/jobs/named/malmo_weather_1_offline_fg_materialization/executions


(Job('malmo_weather_1_offline_fg_materialization', 'SPARK'), None)

## END