# <span style="font-width:bold; font-size: 3rem; color:#2656a3;">**Data Engineering and Machine Learning Operations in Business** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Feature Backfill</span>

## 🗒️ This notebook is divided in 3 sections:
1. Load the data and process features.
2. Connect to the Hopsworks feature store.
3. Create feature groups and upload them to the feature store.

## <span style='color:#2656a3'> ⚙️ Import of libraries and packages

In [None]:
# Install of the packages for hopsworks
# !pip install -U hopsworks --quiet

In [1]:
# Importing of the packages for the needed libraries for the Jupyter notebook
import pandas as pd
import requests
import hopsworks
import os

# Ignore warnings
import warnings 
warnings.filterwarnings('ignore')

  from .autonotebook import tqdm as notebook_tqdm


## <span style="color:#2656a3;"> 💽 Load the historical data

The data you will use comes from three different sources:

- Electricity prices in Denmark per day from [Energinet](https://www.energidataservice.dk).
- Different meteorological observations from [Open meteo](https://www.open-meteo.com).

### <span style="color:#2656a3;">💸 Electricity prices per day from Energinet
*Hvis vi skal have tariffer med i modellen, anbefales det at vi sætter en faktor på 0.2 i tidsrummet 22 - 16 og en faktor på 0.6 eller 0.7 i tidsrummet 17 - 21.*

In [2]:
# Defining the URL for the API call to the electricity price data
electricity_api_url = ('https://api.energidataservice.dk/dataset/Elspotprices?offset=0&start=2022-01-01T00:00&end=2023-12-31T23:59&filter=%7B%22PriceArea%22:[%22DK1%22]%7D&sort=HourUTC%20DESC')

In [3]:
# Fetch data from the API and make the output to a pandas dataframe
electricity_data = requests.get(electricity_api_url).json()
electricity_df = pd.DataFrame(electricity_data['records'])

Unnamed: 0,HourUTC,HourDK,PriceArea,SpotPriceDKK,SpotPriceEUR
0,2023-12-31T22:00:00,2023-12-31T23:00:00,DK1,200.309998,26.870001
1,2023-12-31T21:00:00,2023-12-31T22:00:00,DK1,213.729996,28.67
2,2023-12-31T20:00:00,2023-12-31T21:00:00,DK1,220.660004,29.6
3,2023-12-31T19:00:00,2023-12-31T20:00:00,DK1,260.100006,34.889999
4,2023-12-31T18:00:00,2023-12-31T19:00:00,DK1,295.51001,39.639999


In [None]:
# Display the first 5 rows of the dataframe
electricity_df.head()

In [None]:
# Datapreprocessing by making the spotprice per kwh instead of mwh
electricity_df['SpotPriceDKK_KWH'] = electricity_df['SpotPriceDKK'] / 1000

In [4]:
# Datacleaning by removing the columns that are not needed
electricity_df.drop('SpotPriceDKK', axis=1, inplace=True)
electricity_df.drop('SpotPriceEUR', axis=1, inplace=True)
electricity_df.drop('HourUTC', axis=1, inplace=True)

In [7]:
# Renaming the columns and reformating the time column
electricity_df.rename(columns={'HourDK': 'time'}, inplace=True)
electricity_df['time'] = electricity_df['time'].astype(str).str[:-3]

In [9]:
# Display the first 5 rows of the dataframe
electricity_df.head()

Unnamed: 0,time,PriceArea,SpotPriceDKK_KWH
0,2023-12-31T23:00,DK1,0.20031
1,2023-12-31T22:00,DK1,0.21373
2,2023-12-31T21:00,DK1,0.22066
3,2023-12-31T20:00,DK1,0.26010
4,2023-12-31T19:00,DK1,0.29551
...,...,...,...
17515,2022-01-01T04:00,DK1,0.28013
17516,2022-01-01T03:00,DK1,0.33806
17517,2022-01-01T02:00,DK1,0.32141
17518,2022-01-01T01:00,DK1,0.30735


### <span style="color:#2656a3;"> 🌤 Meteorological measurements from Open Meteo

Burde have enddate 2023-12-31. url = ("https://archive-api.open-meteo.com/v1/archive?latitude=57.048&longitude=9.9187&start_date=2022-01-01&end_date=2023-12-31&hourly=temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m")

In [None]:
# Defining the URL for the API call to the weather data   
weather_api_url = ("https://archive-api.open-meteo.com/v1/archive?latitude=57.048&longitude=9.9187&start_date=2022-01-01&end_date=2023-12-31&hourly=temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m")

In [None]:
# Fetch data from the API and make the output to a pandas dataframe
weather_data = requests.get(weather_api_url).json()
weather_df = pd.DataFrame(weather_data['hourly'])

In [None]:
# Display the first 5 rows of the dataframe
weather_df.head()

## <span style="color:#2656a3;"> 📡 Connecting to Hopsworks Feature Store

In [None]:
project = hopsworks.login()

fs = project.get_feature_store()

## <span style="color:#2656a3;"> 🪄 Creating Feature Groups

In [None]:
# Creating the feature group for the weater data
weather_fg = fs.get_or_create_feature_group(
    name="weather_measurements",
    version=1,
    description="Weather measurements from Open Meteo API",
    primary_key=["time"],
    online_enabled=True,
)

In [None]:
# Inserting the weather_df into the feature group named weather_fg
weather_fg.insert(weather_df)

In [None]:
# Creating the feature group for the electricity prices
electricity_fg = fs.get_or_create_feature_group(
    name="electricity_prices",
    version=1,
    description="Electricity prices from Energidata API",
    primary_key=["time"],
    online_enabled=True,
)

In [None]:
# Inserting the electricity_df into the feature group named electricity_fg
electricity_fg.insert(electricity_df)

---
## <span style="color:#2656a3;">⏭️ **Next:** Part 02: Feature Pipeline </span>

In the next notebook, you will be generating new data for the Feature Groups.