# <span style="font-width:bold; font-size: 3rem; color:#2656a3;">**Msc. BDS - M7 Second Semester Project** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>

## <span style='color:#2656a3'> 🗒️ The notebook is divided into the following sections:
1. Parsing new data.
2. Inserting the new data into the Feature Store.

## <span style='color:#2656a3'> ⚙️ Import of libraries and packages

We start by accessing the folder we have created that holds the functions (incl. live API calls and data preprocessing) we need for electricity prices and weather measures. Then, we proceed to import some of the necessary libraries needed for this notebook and warnings to avoid unnecessary distractions and keep output clean.

In [1]:
# First we go one back in our directory to access the folder with our functions
%cd ..

# Now we import the functions from the features folder
# This is the functions we have created to generate features for electricity prices and weather measures
from features import electricity_prices, weather_measures

# We go back into the notebooks folder
%cd pipeline

/Users/camillahannesbo/Documents/AAU/Master - BDS/2. semester/2. semester project/bds_m7_second-semester-project
/Users/camillahannesbo/Documents/AAU/Master - BDS/2. semester/2. semester project/bds_m7_second-semester-project/pipeline


In [2]:
# Importing pandas for data handling
import pandas as pd

# Ignore warnings
import warnings 
warnings.filterwarnings('ignore')
warnings.filterwarnings("ignore", category=DeprecationWarning)

## <span style='color:#2656a3'> 🪄 Parsing New Data
To fetch non-historical electricity prices we are setting `historical` to `False`. 

In order to provide real time weather measures, a weather forecast measure for the next 5 days is being fetched.

There are of course no changes to the calendar data, and therefore no new data is retrieved from it.

### <span style="color:#2656a3;">💸 Electricity Prices per day from Energinet

In [3]:
# Fetching non-historical electricity prices for area DK1
electricity_df = electricity_prices.electricity_prices(
    historical=False,
    area=["DK1"]
)

In [4]:
# Display the electricity dataframe
electricity_df

Unnamed: 0,timestamp,datetime,date,hour,dk1_spotpricedkk_kwh
0,1716422400000,2024-05-23 00:00:00,2024-05-23,0,0.40434
1,1716426000000,2024-05-23 01:00:00,2024-05-23,1,0.43277
2,1716429600000,2024-05-23 02:00:00,2024-05-23,2,0.47724
3,1716433200000,2024-05-23 03:00:00,2024-05-23,3,0.51716
4,1716436800000,2024-05-23 04:00:00,2024-05-23,4,0.51418
5,1716440400000,2024-05-23 05:00:00,2024-05-23,5,0.55947
6,1716444000000,2024-05-23 06:00:00,2024-05-23,6,0.8057
7,1716447600000,2024-05-23 07:00:00,2024-05-23,7,0.88047
8,1716451200000,2024-05-23 08:00:00,2024-05-23,8,0.73004
9,1716454800000,2024-05-23 09:00:00,2024-05-23,9,0.56447


In [5]:
electricity_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   timestamp             24 non-null     int64         
 1   datetime              24 non-null     datetime64[ns]
 2   date                  24 non-null     datetime64[ns]
 3   hour                  24 non-null     int64         
 4   dk1_spotpricedkk_kwh  24 non-null     float64       
dtypes: datetime64[ns](2), float64(1), int64(2)
memory usage: 1.1 KB


### <span style="color:#2656a3;"> 🌈 Forecast Weather Measures from Open Meteo

In [6]:
# Fetching weather forecast measures for the next 5 days
weather_forecast_df = weather_measures.forecast_weather_measures(
    forecast_length=5
)

In [7]:
# Display the weather forecast dataframe
weather_forecast_df

Unnamed: 0,timestamp,datetime,date,hour,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m
0,1716422400000,2024-05-23 00:00:00,2024-05-23,0,18.3,70.0,0.0,0.0,0.0,0.0,6.0,19.4,33.8
1,1716426000000,2024-05-23 01:00:00,2024-05-23,1,18.2,70.0,0.0,0.0,0.0,0.0,5.0,19.8,34.2
2,1716429600000,2024-05-23 02:00:00,2024-05-23,2,17.6,71.0,0.0,0.0,0.0,0.0,5.0,20.5,36.0
3,1716433200000,2024-05-23 03:00:00,2024-05-23,3,17.1,69.0,0.0,0.0,0.0,2.0,78.0,20.2,36.7
4,1716436800000,2024-05-23 04:00:00,2024-05-23,4,16.6,69.0,0.0,0.0,0.0,1.0,24.0,21.2,36.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,1716836400000,2024-05-27 19:00:00,2024-05-27,19,17.4,89.0,0.5,0.0,0.0,80.0,81.0,9.4,23.0
116,1716840000000,2024-05-27 20:00:00,2024-05-27,20,16.7,92.0,0.5,0.0,0.0,80.0,90.0,8.4,21.6
117,1716843600000,2024-05-27 21:00:00,2024-05-27,21,16.2,93.0,0.5,0.0,0.0,80.0,100.0,7.9,20.5
118,1716847200000,2024-05-27 22:00:00,2024-05-27,22,15.8,93.0,0.1,0.1,0.0,80.0,100.0,8.1,20.9


In [8]:
weather_forecast_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   timestamp             120 non-null    int64         
 1   datetime              120 non-null    datetime64[ns]
 2   date                  120 non-null    datetime64[ns]
 3   hour                  120 non-null    int64         
 4   temperature_2m        120 non-null    float64       
 5   relative_humidity_2m  120 non-null    float64       
 6   precipitation         120 non-null    float64       
 7   rain                  120 non-null    float64       
 8   snowfall              120 non-null    float64       
 9   weather_code          120 non-null    float64       
 10  cloud_cover           120 non-null    float64       
 11  wind_speed_10m        120 non-null    float64       
 12  wind_gusts_10m        120 non-null    float64       
dtypes: datetime64[ns](2)

## <span style="color:#2656a3;"> 📡 Connecting to Hopsworks Feature Store

We connect to Hopsworks Feature Store so we can access the Feature Groups and upload the new data into the Feature Groups.

In [9]:
# Importing the hopsworks module for interacting with the Hopsworks platform
import hopsworks

# Logging into the Hopsworks project
project = hopsworks.login()

# Getting the feature store from the project
fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/550040
Connected. Call `.close()` to terminate connection gracefully.


In [10]:
# Retrieve the feature groups
electricity_fg = fs.get_feature_group(
    name="electricity_prices",
    version=1,
)

weather_fg = fs.get_feature_group(
    name="weather_measurements",
    version=1,
)

### <span style="color:#2656a3;"> ⬆️ Uploading new data to the Feature Store
Here we upload the new data to the retrieved Feature groups by using the `insert` function.

In [11]:
# Inserting the electricity_df into the feature group named electricity_fg
electricity_fg.insert(electricity_df, 
                      write_options={"wait_for_job" : False})

Uploading Dataframe: 100.00% |██████████| Rows 24/24 | Elapsed Time: 00:06 | Remaining Time: 00:00


Launching job: electricity_prices_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/550040/jobs/named/electricity_prices_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x12e15b1d0>, None)

In [13]:
# Inserting the weather_df into the feature group named weather_fg
weather_fg.insert(weather_forecast_df, 
                  write_options={"wait_for_job" : False})

Uploading Dataframe: 100.00% |██████████| Rows 120/120 | Elapsed Time: 00:06 | Remaining Time: 00:00


Launching job: weather_measurements_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/550040/jobs/named/weather_measurements_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x12e1a3710>, None)

---
## <span style="color:#2656a3;">⏭️ **Next:** Part 03: Training </span>

Next we will create a feature view and training dataset. Further we will train a model and save it in model registry.