# <span style="font-width:bold; font-size: 3rem; color:#2656a3;">**Data Engineering and Machine Learning Operations in Business** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>

## <span style='color:#2656a3'> 🗒️ The notebook is divided into the following sections:
1. Parsing new data.
2. Inserting the new data into the Feature Store.

## <span style='color:#2656a3'> ⚙️ Import of libraries and packages

First, we install the Python packages required for this notebook. We'll use the --quiet command after specifying the names of the libraries to ensure a silent installation process. Then, we'll proceed to import all the necessary libraries.

In [1]:
# First we go one back in our directory to access the folder with our functions
%cd ..

# Now we import the functions from the features folder
# This is the functions we have created to generate features for electricity prices and weather measures
from features import electricity_prices, weather_measures 

# We go back into the notebooks folder
%cd notebooks

/Users/camillahannesbo/Documents/AAU/Master - BDS/2. semester/Data Engineering and Machine learning operations in Business/MLOPs-Assignment-
/Users/camillahannesbo/Documents/AAU/Master - BDS/2. semester/Data Engineering and Machine learning operations in Business/MLOPs-Assignment-/notebooks


In [2]:
# Importing the packages for the needed libraries for the Jupyter notebook
import pandas as pd
import requests

# Ignore warnings
import warnings 
warnings.filterwarnings('ignore')

## <span style='color:#2656a3'> 🪄 Parsing New Data
We are parsing new data setting `historical` to `False` in order to fetch real-time data. This is done for electricity prices and forecast of renewable energy. 

In order to provide real time weather measures, a weather forecast measure for the next 5 days is being fetched.

There are of course no changes to the calendar data, and therefore no new data is retrieved from it.

### <span style="color:#2656a3;">💸 Electricity Prices per day from Energinet

In [3]:
# Fetching non-historical electricity prices for area DK1
electricity_df = electricity_prices.electricity_prices(
    historical=False,
    area=["DK1"]
)

In [4]:
# Display the electricity dataframe
electricity_df

Unnamed: 0,timestamp,time,date,dk1_spotpricedkk_kwh
0,1714435200000,2024-04-30 00:00:00,2024-04-30,0.53689
1,1714438800000,2024-04-30 01:00:00,2024-04-30,0.48603
2,1714442400000,2024-04-30 02:00:00,2024-04-30,0.51251
3,1714446000000,2024-04-30 03:00:00,2024-04-30,0.4747
4,1714449600000,2024-04-30 04:00:00,2024-04-30,0.39348
5,1714453200000,2024-04-30 05:00:00,2024-04-30,0.39877
6,1714456800000,2024-04-30 06:00:00,2024-04-30,0.41623
7,1714460400000,2024-04-30 07:00:00,2024-04-30,0.60208
8,1714464000000,2024-04-30 08:00:00,2024-04-30,0.65973
9,1714467600000,2024-04-30 09:00:00,2024-04-30,0.57426


### <span style="color:#2656a3;">☀️💨 Forecast Renewable Energy next day from Energinet

In [5]:
# Fetching non-historical forecast of renewable energy data for area DK1
forecast_renewable_energy_df = electricity_prices.forecast_renewable_energy(
    historical=False,
    area=["DK1"]
)

In [6]:
# Display the forecast_renewable_energy dataframe
forecast_renewable_energy_df

Unnamed: 0,timestamp,time,date,dk1_offshore_wind_forecastintraday_kwh,dk1_onshore_wind_forecastintraday_kwh,dk1_solar_forecastintraday_kwh
0,1714435200000,2024-04-30 00:00:00,2024-04-30,0.663667,0.7955,0.0
1,1714438800000,2024-04-30 01:00:00,2024-04-30,0.632792,1.016167,0.0
2,1714442400000,2024-04-30 02:00:00,2024-04-30,0.712667,1.054708,0.0
3,1714446000000,2024-04-30 03:00:00,2024-04-30,0.695583,1.084542,0.0
4,1714449600000,2024-04-30 04:00:00,2024-04-30,0.753917,1.115708,2.6e-05
5,1714453200000,2024-04-30 05:00:00,2024-04-30,0.767125,1.1995,0.002789
6,1714456800000,2024-04-30 06:00:00,2024-04-30,0.897375,1.225875,0.090816
7,1714460400000,2024-04-30 07:00:00,2024-04-30,0.980542,1.131583,0.212591
8,1714464000000,2024-04-30 08:00:00,2024-04-30,0.981917,1.182875,0.409022
9,1714467600000,2024-04-30 09:00:00,2024-04-30,0.9255,1.199917,0.556957


### <span style="color:#2656a3;"> 🌤 Weather Measurements from Open Meteo

#### <span style="color:#2656a3;"> 🕰️ Historical Weather Measures

In [7]:
# Fetching non-historical weather data for area DK1
#historical_weather_df = weather_measures.historical_weather_measures(
#    historical=False
#)

In [8]:
# Display the first 5 rows of the dataframe
#historical_weather_df.head()

#### <span style="color:#2656a3;"> 🌈 Weather Forecast

In [9]:
# Fetching weather forecast measures for the next 5 days
weather_forecast_df = weather_measures.forecast_weather_measures(
    forecast_length=5
)

In [10]:
# Display the weather_forecast_df dataframe
weather_forecast_df

Unnamed: 0,timestamp,date,time,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m
0,1714435200000,2024-04-30,2024-04-30 00:00:00,9.6,90,0.0,0.0,0.0,3,82,9.7,19.4
1,1714438800000,2024-04-30,2024-04-30 01:00:00,9.4,89,0.0,0.0,0.0,2,76,10.8,18.7
2,1714442400000,2024-04-30,2024-04-30 02:00:00,9.5,87,0.0,0.0,0.0,2,68,16.2,30.6
3,1714446000000,2024-04-30,2024-04-30 03:00:00,9.3,88,0.0,0.0,0.0,2,64,13.7,24.5
4,1714449600000,2024-04-30,2024-04-30 04:00:00,9.7,89,0.0,0.0,0.0,3,82,15.1,25.9
...,...,...,...,...,...,...,...,...,...,...,...,...
115,1714849200000,2024-05-04,2024-05-04 19:00:00,14.7,59,0.0,0.0,0.0,0,0,5.4,14.0
116,1714852800000,2024-05-04,2024-05-04 20:00:00,12.9,65,0.0,0.0,0.0,0,0,4.8,11.2
117,1714856400000,2024-05-04,2024-05-04 21:00:00,11.5,69,0.0,0.0,0.0,0,0,4.4,7.9
118,1714860000000,2024-05-04,2024-05-04 22:00:00,10.6,72,0.0,0.0,0.0,0,0,4.1,7.6


In [11]:
# Converting to float type to align with Hopworks Feature Group as it converts the data to float automatically

# Converting 'relative_humidity_2m', 'weather_code' and 'cloud_cover' columns to float type
weather_forecast_df['relative_humidity_2m'] = weather_forecast_df['relative_humidity_2m'].astype(float)
weather_forecast_df['weather_code'] = weather_forecast_df['weather_code'].astype(float)
weather_forecast_df['cloud_cover'] = weather_forecast_df['cloud_cover'].astype(float)

In [12]:
# Display the first 5 rows of the weather_forecast dataframe
weather_forecast_df.head(5)

Unnamed: 0,timestamp,date,time,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m
0,1714435200000,2024-04-30,2024-04-30 00:00:00,9.6,90.0,0.0,0.0,0.0,3.0,82.0,9.7,19.4
1,1714438800000,2024-04-30,2024-04-30 01:00:00,9.4,89.0,0.0,0.0,0.0,2.0,76.0,10.8,18.7
2,1714442400000,2024-04-30,2024-04-30 02:00:00,9.5,87.0,0.0,0.0,0.0,2.0,68.0,16.2,30.6
3,1714446000000,2024-04-30,2024-04-30 03:00:00,9.3,88.0,0.0,0.0,0.0,2.0,64.0,13.7,24.5
4,1714449600000,2024-04-30,2024-04-30 04:00:00,9.7,89.0,0.0,0.0,0.0,3.0,82.0,15.1,25.9


## <span style="color:#2656a3;"> 📡 Connecting to Hopsworks Feature Store

We connect to Hopsworks Feature Store so we can access the Feature Groups and upload the new data into the Feature Groups.

In [13]:
# Importing the hopsworks module
import hopsworks

# Logging in to the Hopsworks project
project = hopsworks.login()

# Getting the feature store from the project
fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/550040
Connected. Call `.close()` to terminate connection gracefully.


In [14]:
# Retrieve the feature groups
electricity_fg = fs.get_feature_group(
    name="electricity_prices",
    version=1,
)

forecast_renewable_energy_fg = fs.get_feature_group(
    name="forecast_renewable_energy",
    version=1,
)

weather_fg = fs.get_feature_group(
    name="weather_measurements",
    version=1,
)

### <span style="color:#2656a3;"> ⬆️ Uploading new data to the Feature Store
Here we upload the new data to the retrieved Feature groups.

In [15]:
# Inserting the electricity_df into the feature group named electricity_fg
electricity_fg.insert(electricity_df)

Uploading Dataframe: 100.00% |██████████| Rows 24/24 | Elapsed Time: 00:06 | Remaining Time: 00:00


Launching job: electricity_prices_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/550040/jobs/named/electricity_prices_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x1359f1b50>, None)

In [18]:
# Inserting the forecast_renewable_energy_df into the feature group named forecast_renewable_energy_fg
forecast_renewable_energy_fg.insert(forecast_renewable_energy_df)

Uploading Dataframe: 100.00% |██████████| Rows 24/24 | Elapsed Time: 00:06 | Remaining Time: 00:00


Launching job: forecast_renewable_energy_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/550040/jobs/named/forecast_renewable_energy_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x1359b8690>, None)

In [20]:
# Inserting the weather_df into the feature group named weather_fg
weather_fg.insert(weather_forecast_df)

Uploading Dataframe: 100.00% |██████████| Rows 120/120 | Elapsed Time: 00:06 | Remaining Time: 00:00


Launching job: weather_measurements_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/550040/jobs/named/weather_measurements_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x137b46f90>, None)

%3|1714459530.257|FAIL|rdkafka#producer-3| [thrd:ssl://3.147.195.6:9092/bootstrap]: ssl://3.147.195.6:9092/3: Receive failed: SSL transport error: Operation timed out (after 639539ms in state UP)
%3|1714459541.559|FAIL|rdkafka#producer-5| [thrd:ssl://18.191.189.169:9092/bootstrap]: ssl://18.191.189.169:9092/2: Receive failed: SSL transport error: Operation timed out (after 640512ms in state UP)
%3|1714459553.867|FAIL|rdkafka#producer-3| [thrd:ssl://52.14.0.221:9092/bootstrap]: ssl://52.14.0.221:9092/1: Receive failed: SSL transport error: Operation timed out (after 657135ms in state UP)
%3|1714459554.065|FAIL|rdkafka#producer-9| [thrd:ssl://52.14.0.221:9092/bootstrap]: ssl://52.14.0.221:9092/1: Receive failed: SSL transport error: Operation timed out (after 620095ms in state UP)
%3|1714459565.679|FAIL|rdkafka#producer-5| [thrd:ssl://52.14.0.221:9092/bootstrap]: ssl://52.14.0.221:9092/1: Receive failed: SSL transport error: Operation timed out (after 658600ms in state UP)
%3|1714459580.

---
## <span style="color:#2656a3;">⏭️ **Next:** Part 03: Traning </span>

Next we will create a feature view and training dataset. Further we will train a model and save it in model registry.