# Electricity Price Prediction - Backfill Pipeline

## üóíÔ∏è Overview
This notebook collects historical data for electricity price prediction in Stockholm (SE3):
1. **Electricity prices** from elprisetjustnu.se API (hourly data from Nov 2022)
2. **Weather data** from Open-Meteo API (hourly historical data)

The goal is to train a model that predicts electricity prices for each hour of the next day.

### Weather Features Selected
We use weather features that affect electricity supply and demand:
- **Temperature** ‚Üí heating/cooling demand
- **Wind speed** (10m & 100m) ‚Üí wind power generation
- **Cloud cover** ‚Üí solar power generation
- **Precipitation** ‚Üí hydro power availability

In [23]:
from pathlib import Path
import sys
import pandas as pd
from datetime import date, timedelta
import warnings
warnings.filterwarnings("ignore")
import holidays

from dotenv import load_dotenv
import hopsworks

# 1. Find project root (one level up from notebooks/)
root_dir = Path("..").resolve()

# 2. Add project root to PYTHONPATH so we can import the src package
if str(root_dir) not in sys.path:
    sys.path.append(str(root_dir))

# 3. Load .env from project root
env_path = root_dir / ".env"
load_dotenv(env_path)

# 4. Load settings and utility functions (after adjusting PYTHONPATH)
from src.config import ElectricitySettings
from src import util

settings = ElectricitySettings()

# 5. Log in to Hopsworks and get feature store
project = hopsworks.login(engine="python")
fs = project.get_feature_store()

print("Successfully logged in to Hopsworks project:", settings.HOPSWORKS_PROJECT)
print(f"Feature Store: {fs}")

# Show the weather variables we'll be using
print(f"\nWeather variables: {util.HOURLY_WEATHER_VARIABLES}")

ElectricitySettings initialized
2025-12-12 20:49:12,585 INFO: Closing external client and cleaning up certificates.
2025-12-12 20:49:12,588 INFO: Connection closed.
2025-12-12 20:49:12,589 INFO: Initializing external client
2025-12-12 20:49:12,589 INFO: Base URL: https://eu-west.cloud.hopsworks.ai:443
2025-12-12 20:49:13,573 INFO: Python Engine initialized.

Logged in to project, explore it here https://eu-west.cloud.hopsworks.ai:443/p/127
Successfully logged in to Hopsworks project: ScalableProject
Feature Store: <hsfs.feature_store.FeatureStore object at 0x3061b2250>

Weather variables: ['temperature_2m', 'apparent_temperature', 'precipitation', 'rain', 'snowfall', 'cloud_cover', 'wind_speed_10m', 'wind_speed_100m', 'wind_direction_10m', 'wind_direction_100m', 'wind_gusts_10m', 'surface_pressure']


## ‚öôÔ∏è Configuration

Define the price area and date range for historical data collection.


In [24]:
# Configuration
PRICE_AREA = "SE3"  # Stockholm / S√∂dra Mellansverige
CITY = "Stockholm"
LATITUDE = 59.3251   # Stockholm coordinates
LONGITUDE = 18.0711

#LATITUDE, LONGITUDE = util.get_city_coordinates(CITY)

# Historical data range
# Electricity prices available from Nov 1, 2022
START_DATE = date(2022, 11, 1)
#START_DATE = date(2025, 12, 10)
END_DATE = date.today()  

print(f"Price Area: {PRICE_AREA}")
print(f"City: {CITY} ({LATITUDE}, {LONGITUDE})")
print(f"Date range: {START_DATE} to {END_DATE}")
print(f"Total days to fetch: {(END_DATE - START_DATE).days + 1}")


Price Area: SE3
City: Stockholm (59.3251, 18.0711)
Date range: 2022-11-01 to 2025-12-12
Total days to fetch: 1138


## ‚ö° Step 1: Fetch Historical Electricity Prices

Using the elprisetjustnu.se API to get hourly electricity prices for Stockholm (SE3).


In [25]:
# Using fetch_electricity_prices() from util.py
df_prices = util.fetch_electricity_prices(START_DATE, END_DATE, PRICE_AREA)
#df_prices = util.align_electricity_price_schema(df_prices)

# Ensure timezone-aware datetime and unix_time; keep only 'date'
df_prices['date'] = pd.to_datetime(df_prices['timestamp'], utc=True)
df_prices['unix_time'] = df_prices['date'].astype('int64') // 10**6
df_prices = df_prices.drop(columns=['timestamp'])

# Remove unused currency columns if present
df_prices = df_prices.drop(columns=['price_eur', 'exchange_rate'], errors='ignore')

# Use price area label consistently
df_prices['price_area'] = PRICE_AREA.lower()



Fetching electricity prices from 2022-11-01 to 2025-12-12 for SE3...


Fetching prices: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1138/1138 [10:09<00:00,  1.87it/s]

Fetched 27299 hourly price records across 1138 day(s)





In [26]:
df_prices['price_area'] = df_prices['price_area'].astype('string')
df_prices.head()

Unnamed: 0,date,hour,price_area,price_sek,unix_time
0,2022-10-31 23:00:00+00:00,23,se3,0.37995,1667257200000
1,2022-11-01 00:00:00+00:00,0,se3,0.37995,1667260800000
2,2022-11-01 01:00:00+00:00,1,se3,0.3843,1667264400000
3,2022-11-01 02:00:00+00:00,2,se3,0.39301,1667268000000
4,2022-11-01 03:00:00+00:00,3,se3,0.41173,1667271600000


In [27]:
# Check the electricity prices data
print(f"Shape: {df_prices.shape}")
print(f"\nDate range: {df_prices['date'].min()} to {df_prices['date'].max()}")
print(f"\nColumn types:")
df_prices.info()


Shape: (27299, 5)

Date range: 2022-10-31 23:00:00+00:00 to 2025-12-12 22:00:00+00:00

Column types:
<class 'pandas.core.frame.DataFrame'>
Index: 27299 entries, 0 to 27311
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype              
---  ------      --------------  -----              
 0   date        27299 non-null  datetime64[ns, UTC]
 1   hour        27299 non-null  int16              
 2   price_area  27299 non-null  string             
 3   price_sek   27299 non-null  float32            
 4   unix_time   27299 non-null  int64              
dtypes: datetime64[ns, UTC](1), float32(1), int16(1), int64(1), string(1)
memory usage: 1013.0 KB


## üå¶ Step 2: Fetch Historical Weather Data

Using Open-Meteo API to get hourly weather data that may correlate with electricity prices:
- Temperature affects heating/cooling demand
- Wind speed affects wind power generation
- Cloud cover affects solar power generation
- Precipitation can affect hydro power


In [28]:
# Using get_hourly_historical_weather() from electricity_utils.py
# Fetch hourly weather data for the date range

df_weather = util.get_hourly_historical_weather(
    latitude=LATITUDE,
    longitude=LONGITUDE, 
    start_date=str(pd.to_datetime(df_prices['date'].min()).date()),
    end_date=str(END_DATE),
    city=PRICE_AREA.lower()
)

# Align label with price area naming
if 'city' in df_weather.columns:
    df_weather.rename(columns={'city': 'price_area'}, inplace=True)

df_weather.head()


Fetching historical weather for se3 (59.3251, 18.0711)...
Date range: 2022-10-31 to 2025-12-12
Coordinates: 59.29701232910156¬∞N 18.163265228271484¬∞E
Elevation: 23.0 m asl
Fetched 27336 hourly weather records


Unnamed: 0,timestamp,temperature_2m,apparent_temperature,precipitation,rain,snowfall,cloud_cover,wind_speed_10m,wind_speed_100m,wind_direction_10m,wind_direction_100m,wind_gusts_10m,surface_pressure,price_area,date,hour
0,2022-10-30 23:00:00+00:00,4.0215,1.344135,0.0,0.0,0.0,10.0,8.161764,16.179985,228.576431,249.145462,16.559999,1014.421021,se3,2022-10-30,23
1,2022-10-31 00:00:00+00:00,3.8215,1.034508,0.0,0.0,0.0,12.0,8.913181,17.317459,226.636536,249.304459,12.599999,1014.518616,se3,2022-10-31,0
2,2022-10-31 01:00:00+00:00,3.7215,0.920544,0.0,0.0,0.0,50.0,8.942214,18.775303,220.100845,237.528824,16.919998,1014.617371,se3,2022-10-31,1
3,2022-10-31 02:00:00+00:00,3.6715,0.987128,0.0,0.0,0.0,100.0,8.209263,18.374111,195.255173,214.624222,15.48,1014.317627,se3,2022-10-31,2
4,2022-10-31 03:00:00+00:00,4.4715,1.743104,0.0,0.0,0.0,100.0,9.605998,21.578989,192.994614,207.847488,17.639999,1014.425537,se3,2022-10-31,3


In [29]:
# Check the weather data
print(f"Shape: {df_weather.shape}")
print(f"\nDate range: {df_weather['date'].min()} to {df_weather['date'].max()}")
print(f"\nWeather features: {[c for c in df_weather.columns if c not in ['price_area', 'date', 'hour']]}")


Shape: (27336, 16)

Date range: 2022-10-30 to 2025-12-12

Weather features: ['timestamp', 'temperature_2m', 'apparent_temperature', 'precipitation', 'rain', 'snowfall', 'cloud_cover', 'wind_speed_10m', 'wind_speed_100m', 'wind_direction_10m', 'wind_direction_100m', 'wind_gusts_10m', 'surface_pressure']


In [30]:
# Show weather statistics
print("\nTemperature statistics (¬∞C):")
print(df_weather['temperature_2m'].describe())
print("\nWind speed at 100m (km/h):")
print(df_weather['wind_speed_100m'].describe())



Temperature statistics (¬∞C):
count    27336.000000
mean         7.601556
std          8.072303
min        -17.528500
25%          1.321500
50%          6.871500
75%         14.471499
max         27.321501
Name: temperature_2m, dtype: float64

Wind speed at 100m (km/h):
count    27336.000000
mean        21.964926
std          9.140295
min          0.360000
25%         15.701066
50%         21.434364
75%         27.595825
max         69.316826
Name: wind_speed_100m, dtype: float64


## üîß Step 3: Data Processing

Clean and prepare the data for the feature store.


In [31]:
# The utility functions already handle type conversions and cleaning
# Ensure unix_time exists for primary key
if 'unix_time' not in df_prices.columns:
    df_prices['unix_time'] = pd.to_datetime(df_prices['date'], utc=True).astype('int64') // 10**6

# Sort for lag/rolling calculations
df_prices = df_prices.sort_values(['price_area', 'unix_time'])

# Calendar features
df_prices['weekday'] = df_prices['date'].dt.weekday.astype('int8')
df_prices['is_weekend'] = df_prices['weekday'].isin([5, 6]).astype('int8')
df_prices['month'] = df_prices['date'].dt.month.astype('int8')
# 0=vinter,1=v√•r,2=sommar,3=h√∂st
season_map = {12: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 3, 10: 3, 11: 3}
df_prices['season'] = df_prices['month'].map(season_map).astype('int8')

# Holidays (Sweden)
try:
    years = range(df_prices['date'].dt.year.min(), df_prices['date'].dt.year.max() + 1)
    se_holidays = holidays.Sweden(years=years)
    df_prices['is_holiday'] = df_prices['date'].dt.date.isin(se_holidays).astype('int8')
except Exception:
    # If holidays package not available, default to 0
    df_prices['is_holiday'] = 0

# Lagged prices and rolling mean (72h window)
for lag in [24, 48, 72]:
    df_prices[f'price_lag_{lag}'] = (
        df_prices.groupby('price_area')['price_sek']
                 .shift(lag)
                 .astype('float32')
    )

df_prices['price_roll3d'] = (
    df_prices.groupby('price_area')['price_sek']
             .rolling(72, min_periods=1)
             .mean()
             .reset_index(level=0, drop=True)
             .astype('float32')
)

# Drop rows with missing values after lag/holiday computation
df_prices = df_prices.dropna().reset_index(drop=True)

print("Electricity prices ready for feature store:")
print(f"  Shape: {df_prices.shape}")
print(f"  Columns: {list(df_prices.columns)}")
df_prices.info()


Electricity prices ready for feature store:
  Shape: (27227, 14)
  Columns: ['date', 'hour', 'price_area', 'price_sek', 'unix_time', 'weekday', 'is_weekend', 'month', 'season', 'is_holiday', 'price_lag_24', 'price_lag_48', 'price_lag_72', 'price_roll3d']
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27227 entries, 0 to 27226
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype              
---  ------        --------------  -----              
 0   date          27227 non-null  datetime64[ns, UTC]
 1   hour          27227 non-null  int16              
 2   price_area    27227 non-null  string             
 3   price_sek     27227 non-null  float32            
 4   unix_time     27227 non-null  int64              
 5   weekday       27227 non-null  int8               
 6   is_weekend    27227 non-null  int8               
 7   month         27227 non-null  int8               
 8   season        27227 non-null  int8               
 9   is_holiday    27227 non-nu

In [32]:
# The utility functions already handle type conversions and cleaning
# Ensure timezone-aware datetime columns; keep only 'date'
df_weather['date'] = pd.to_datetime(df_weather['timestamp'], utc=True)
df_weather['unix_time'] = df_weather['date'].astype('int64') // 10**6
df_weather = df_weather.drop(columns=['timestamp'])
df_weather['price_area'] = df_weather['price_area'].astype('string')
if 'city' in df_weather.columns:
    df_weather = df_weather.drop(columns=['city'])

# Calendar features
df_weather['weekday'] = df_weather['date'].dt.weekday.astype('int8')
df_weather['is_weekend'] = df_weather['weekday'].isin([5, 6]).astype('int8')
df_weather['month'] = df_weather['date'].dt.month.astype('int8')
season_map = {12: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 3, 10: 3, 11: 3}
df_weather['season'] = df_weather['month'].map(season_map).astype('int8')
try:
    import holidays
    years = range(df_weather['date'].dt.year.min(), df_weather['date'].dt.year.max() + 1)
    se_holidays = holidays.Sweden(years=years)
    df_weather['is_holiday'] = df_weather['date'].dt.date.isin(se_holidays).astype('int8')
except Exception:
    df_weather['is_holiday'] = 0

# Drop rows with missing values
df_weather = df_weather.dropna().reset_index(drop=True)

print("Weather data ready for feature store:")
print(f"  Shape: {df_weather.shape}")
print(f"  Columns: {list(df_weather.columns)}")
df_weather.info()


Weather data ready for feature store:
  Shape: (27336, 21)
  Columns: ['temperature_2m', 'apparent_temperature', 'precipitation', 'rain', 'snowfall', 'cloud_cover', 'wind_speed_10m', 'wind_speed_100m', 'wind_direction_10m', 'wind_direction_100m', 'wind_gusts_10m', 'surface_pressure', 'price_area', 'date', 'hour', 'unix_time', 'weekday', 'is_weekend', 'month', 'season', 'is_holiday']
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27336 entries, 0 to 27335
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype              
---  ------                --------------  -----              
 0   temperature_2m        27336 non-null  float32            
 1   apparent_temperature  27336 non-null  float32            
 2   precipitation         27336 non-null  float32            
 3   rain                  27336 non-null  float32            
 4   snowfall              27336 non-null  float32            
 5   cloud_cover           27336 non-null  float32            
 

In [33]:
# Check for missing values
print("Missing values in electricity prices:")
print(df_prices.isnull().sum())
print(f"\n{'='*50}\n")
print("Missing values in weather data:")
print(df_weather.isnull().sum())


Missing values in electricity prices:
date            0
hour            0
price_area      0
price_sek       0
unix_time       0
weekday         0
is_weekend      0
month           0
season          0
is_holiday      0
price_lag_24    0
price_lag_48    0
price_lag_72    0
price_roll3d    0
dtype: int64


Missing values in weather data:
temperature_2m          0
apparent_temperature    0
precipitation           0
rain                    0
snowfall                0
cloud_cover             0
wind_speed_10m          0
wind_speed_100m         0
wind_direction_10m      0
wind_direction_100m     0
wind_gusts_10m          0
surface_pressure        0
price_area              0
date                    0
hour                    0
unix_time               0
weekday                 0
is_weekend              0
month                   0
season                  0
is_holiday              0
dtype: int64


## ‚úÖ Step 4: Data Validation

Define validation rules using Great Expectations to ensure data quality.


In [34]:
# Add unix_time (ms) for online FG primary key
if 'unix_time' not in df_weather.columns:
    df_weather['unix_time'] = pd.to_datetime(df_weather['date'], utc=True).astype('int64') // 10**6

print("Added unix_time to weather data:")
print(f"  Columns: {list(df_weather.columns)}")
print(df_weather[['date', 'unix_time']].head())


Added unix_time to weather data:
  Columns: ['temperature_2m', 'apparent_temperature', 'precipitation', 'rain', 'snowfall', 'cloud_cover', 'wind_speed_10m', 'wind_speed_100m', 'wind_direction_10m', 'wind_direction_100m', 'wind_gusts_10m', 'surface_pressure', 'price_area', 'date', 'hour', 'unix_time', 'weekday', 'is_weekend', 'month', 'season', 'is_holiday']
                       date      unix_time
0 2022-10-30 23:00:00+00:00  1667170800000
1 2022-10-31 00:00:00+00:00  1667174400000
2 2022-10-31 01:00:00+00:00  1667178000000
3 2022-10-31 02:00:00+00:00  1667181600000
4 2022-10-31 03:00:00+00:00  1667185200000


In [35]:
import great_expectations as ge

# Expectation suite for electricity prices
price_expectation_suite = ge.core.ExpectationSuite(
    expectation_suite_name="electricity_price_expectations"
)

# Price should be reasonable (can be negative in some cases, but typically between -1 and 10 SEK/kWh)
price_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_min_to_be_between",
        kwargs={
            "column": "price_sek",
            "min_value": -5.0,  # Prices can occasionally be negative
            "max_value": 50.0,   # Upper bound sanity check
            "strict_min": False
        }
    )
)

# Hour should be between 0 and 23
price_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "hour",
            "min_value": 0,
            "max_value": 23
        }
    )
)

print("Price expectation suite created")


Price expectation suite created


In [36]:
# Expectation suite for weather data
weather_expectation_suite = ge.core.ExpectationSuite(
    expectation_suite_name="weather_expectations"
)

weather_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "temperature_2m",
            "min_value": -20.0,
            "max_value": 40.0
        }
    )
)

# Wind speed should be non-negative
weather_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_min_to_be_between",
        kwargs={
            "column": "wind_speed_10m",
            "min_value": -0.1,
            "max_value": 200.0,  # Max reasonable wind speed
            "strict_min": False
        }
    )
)

# Precipitation should be non-negative
weather_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_min_to_be_between",
        kwargs={
            "column": "precipitation",
            "min_value": -0.1,
            "max_value": 500.0,
            "strict_min": False
        }
    )
)

print("Weather expectation suite created")


Weather expectation suite created


## üíæ Step 5: Create Feature Groups in Hopsworks

Create feature groups for electricity prices and weather data, then insert the historical data.


In [37]:
# Add unix_time (ms) for online FG primary key (normalized approach)
# ensure UTC and then convert to milliseconds
if 'unix_time' in df_weather.columns:
    df_weather = df_weather.drop(columns=['unix_time'])
df_weather['date'] = pd.to_datetime(df_weather['date'], utc=True)
df_weather['unix_time'] = df_weather['date'].astype('int64') // 10**6

print("Added unix_time to weather data (normalized):")
print(f"  Columns: {list(df_weather.columns)}")
print(df_weather[['date', 'unix_time']].head())


Added unix_time to weather data (normalized):
  Columns: ['temperature_2m', 'apparent_temperature', 'precipitation', 'rain', 'snowfall', 'cloud_cover', 'wind_speed_10m', 'wind_speed_100m', 'wind_direction_10m', 'wind_direction_100m', 'wind_gusts_10m', 'surface_pressure', 'price_area', 'date', 'hour', 'weekday', 'is_weekend', 'month', 'season', 'is_holiday', 'unix_time']
                       date      unix_time
0 2022-10-30 23:00:00+00:00  1667170800000
1 2022-10-31 00:00:00+00:00  1667174400000
2 2022-10-31 01:00:00+00:00  1667178000000
3 2022-10-31 02:00:00+00:00  1667181600000
4 2022-10-31 03:00:00+00:00  1667185200000


In [38]:
# Create electricity prices feature group (online-only, SEK only)
electricity_fg = fs.get_or_create_feature_group(
    name='electricity_prices',
    description='Hourly electricity prices for Swedish price areas (SEK only)',
    version=1,                                  
    primary_key=['price_area', 'unix_time'],    
    event_time='date',                          
    expectation_suite=price_expectation_suite,
    online_enabled=True,
)

electricity_fg.insert(df_prices, wait=True)

print(f"Feature group created: {electricity_fg.name} v{electricity_fg.version}")

Feature Group created successfully, explore it at 
https://eu-west.cloud.hopsworks.ai:443/p/127/fs/74/fg/2107
2025-12-12 20:59:27,698 INFO: 	2 expectation(s) included in expectation_suite.
Validation succeeded.
Validation Report saved successfully, explore a summary at https://eu-west.cloud.hopsworks.ai:443/p/127/fs/74/fg/2107


Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 27227/27227 | Elapsed Time: 00:00 | Remaining Time: 00:00


Launching job: electricity_prices_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://eu-west.cloud.hopsworks.ai:443/p/127/jobs/named/electricity_prices_1_offline_fg_materialization/executions
2025-12-12 20:59:40,204 INFO: Waiting for execution to finish. Current state: SUBMITTED. Final status: UNDEFINED
2025-12-12 20:59:43,362 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2025-12-12 21:01:30,539 INFO: Waiting for execution to finish. Current state: AGGREGATING_LOGS. Final status: SUCCEEDED
2025-12-12 21:01:30,666 INFO: Waiting for log aggregation to finish.
2025-12-12 21:01:39,344 INFO: Execution finished successfully.


Online data ingestion progress: 0.00% |          | Rows 0/27227

Feature group created: electricity_prices v1


In [39]:
# Add feature descriptions for electricity prices (SEK only)
electricity_fg.update_feature_description("unix_time", "Timestamp in unix epoch milliseconds (Primary Key)")
electricity_fg.update_feature_description("date", "Timestamp of the price period start (hourly)")
electricity_fg.update_feature_description("hour", "Hour of the day (0-23)")
electricity_fg.update_feature_description("price_area", "Swedish electricity price area (SE1-SE4)")
electricity_fg.update_feature_description("price_sek", "Electricity price in SEK per kWh (excl. VAT)")

# Calendar / holiday features
electricity_fg.update_feature_description("weekday", "0=Mon ... 6=Sun")
electricity_fg.update_feature_description("is_weekend", "1 if Saturday/Sunday else 0")
electricity_fg.update_feature_description("month", "Month number 1-12")
electricity_fg.update_feature_description("season", "0 winter, 1 spring, 2 summer, 3 autumn")
electricity_fg.update_feature_description("is_holiday", "1 if Swedish public holiday else 0")

# Lag features
electricity_fg.update_feature_description("price_lag_24", "Price 24h ago")
electricity_fg.update_feature_description("price_lag_48", "Price 48h ago")
electricity_fg.update_feature_description("price_lag_72", "Price 72h ago")
electricity_fg.update_feature_description("price_roll3d", "Rolling mean over last 72h")

print("Feature descriptions added for electricity prices")


Feature descriptions added for electricity prices


In [40]:
# Create weather feature group (online-only to avoid HopsFS)
weather_fg = fs.get_or_create_feature_group(
    name='weather_hourly',
    description='Hourly weather data for electricity price prediction',
    version=1,
    primary_key=['price_area', 'unix_time'],
    event_time='date',
    expectation_suite=weather_expectation_suite,
    online_enabled=True,
)

print(f"Feature group created: {weather_fg.name} v{weather_fg.version}")


Feature group created: weather_hourly v1


In [41]:
weather_fg.insert(df_weather, wait=True)


Feature Group created successfully, explore it at 
https://eu-west.cloud.hopsworks.ai:443/p/127/fs/74/fg/2108
2025-12-12 21:01:56,644 INFO: 	3 expectation(s) included in expectation_suite.
Validation succeeded.
Validation Report saved successfully, explore a summary at https://eu-west.cloud.hopsworks.ai:443/p/127/fs/74/fg/2108


Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 27336/27336 | Elapsed Time: 00:00 | Remaining Time: 00:00


Launching job: weather_hourly_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://eu-west.cloud.hopsworks.ai:443/p/127/jobs/named/weather_hourly_1_offline_fg_materialization/executions
2025-12-12 21:02:09,126 INFO: Waiting for execution to finish. Current state: SUBMITTED. Final status: UNDEFINED
2025-12-12 21:02:12,285 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2025-12-12 21:04:05,294 INFO: Waiting for execution to finish. Current state: SUCCEEDING. Final status: UNDEFINED
2025-12-12 21:04:11,508 INFO: Waiting for execution to finish. Current state: FINISHED. Final status: SUCCEEDED
2025-12-12 21:04:11,783 INFO: Waiting for log aggregation to finish.
2025-12-12 21:04:11,784 INFO: Execution finished successfully.


Online data ingestion progress: 0.00% |          | Rows 0/27336

(Job('weather_hourly_1_offline_fg_materialization', 'PYSPARK'),
 {
   "success": true,
   "results": [
     {
       "success": true,
       "expectation_config": {
         "expectation_type": "expect_column_min_to_be_between",
         "kwargs": {
           "column": "wind_speed_10m",
           "min_value": -0.1,
           "max_value": 200.0,
           "strict_min": false
         },
         "meta": {
           "expectationId": 2073
         }
       },
       "result": {
         "observed_value": 0.0,
         "element_count": 27336,
         "missing_count": null,
         "missing_percent": null
       },
       "meta": {
         "ingestionResult": "INGESTED",
         "validationTime": "2025-12-12T08:01:56.000643Z"
       },
       "exception_info": {
         "raised_exception": false,
         "exception_message": null,
         "exception_traceback": null
       }
     },
     {
       "success": true,
       "expectation_config": {
         "expectation_type": "expect

In [42]:
# Add feature descriptions for weather data
# These match the variables defined in HOURLY_WEATHER_VARIABLES in util.py
weather_fg.update_feature_description("unix_time", "Timestamp in unix epoch milliseconds (Primary Key)")
weather_fg.update_feature_description("date", "Timestamp of the weather measurement (hourly)")
weather_fg.update_feature_description("hour", "Hour of the day (0-23)")
weather_fg.update_feature_description("price_area", "Swedish electricity price area (SE1-SE4)")

# Temperature features
weather_fg.update_feature_description("temperature_2m", "Air temperature at 2m height in ¬∞C")
weather_fg.update_feature_description("apparent_temperature", "Feels-like temperature in ¬∞C (affects heating/cooling demand)")

# Precipitation features
weather_fg.update_feature_description("precipitation", "Total precipitation (rain + snow) in mm")
weather_fg.update_feature_description("rain", "Rainfall in mm")
weather_fg.update_feature_description("snowfall", "Snowfall in cm")

# Cloud cover (affects solar power)
weather_fg.update_feature_description("cloud_cover", "Total cloud cover in % (affects solar power generation)")

# Wind features (affects wind power)
weather_fg.update_feature_description("wind_speed_10m", "Wind speed at 10m in km/h")
weather_fg.update_feature_description("wind_speed_100m", "Wind speed at 100m (turbine height) in km/h - key for wind power")
weather_fg.update_feature_description("wind_direction_10m", "Wind direction at 10m in degrees")
weather_fg.update_feature_description("wind_direction_100m", "Wind direction at 100m in degrees")
weather_fg.update_feature_description("wind_gusts_10m", "Wind gusts at 10m in km/h (can cause turbine shutdowns)")

# Pressure
weather_fg.update_feature_description("surface_pressure", "Surface pressure in hPa (weather patterns)")

# Calendar / holiday features
weather_fg.update_feature_description("weekday", "0=Mon ... 6=Sun")
weather_fg.update_feature_description("is_weekend", "1 if Saturday/Sunday else 0")
weather_fg.update_feature_description("month", "Month number 1-12")
weather_fg.update_feature_description("season", "0 winter, 1 spring, 2 summer, 3 autumn")
weather_fg.update_feature_description("is_holiday", "1 if Swedish public holiday else 0")

print("Feature descriptions added for weather data")


Feature descriptions added for weather data


## üîê Step 6: Save Configuration as Secrets

Store the location configuration in Hopsworks secrets for use in daily pipelines.


In [43]:
import json

# Save location configuration as a Hopsworks secret
location_config = {
    "price_area": PRICE_AREA,
    "city": CITY,
    "latitude": LATITUDE,
    "longitude": LONGITUDE
}

location_str = json.dumps(location_config)

# Get secrets API
secrets = hopsworks.get_secrets_api()

# Save or update the location secret
secret_name = "ELECTRICITY_LOCATION_JSON"
try:
    existing_secret = secrets.get_secret(secret_name)
    if existing_secret is not None:
        existing_secret.delete()
        print(f"Replacing existing {secret_name}")
except:
    pass

secrets.create_secret(secret_name, location_str)
print(f"Saved location configuration to secret: {secret_name}")
print(f"Config: {location_config}")


Replacing existing ELECTRICITY_LOCATION_JSON
Secret created successfully, explore it at https://eu-west.cloud.hopsworks.ai:443/account/secrets
Saved location configuration to secret: ELECTRICITY_LOCATION_JSON
Config: {'price_area': 'SE3', 'city': 'Stockholm', 'latitude': 59.3251, 'longitude': 18.0711}


In [44]:
# ‚úÖ Sanity checks for engineered features
check_dates = ["2024-06-21", "2024-06-22", "2024-12-24", "2024-12-25"]

print("Prices: midsommar & jul (weekday/season/holiday):")
print(
    df_prices[df_prices['date'].dt.date.astype(str).isin(check_dates)][
        ['date', 'weekday', 'is_weekend', 'month', 'season', 'is_holiday']
    ].sort_values('date').head(12)
)

print("\nHoliday counts per month (prices):")
print(df_prices.groupby(df_prices['date'].dt.month)['is_holiday'].sum().rename('holidays'))

print("\nWeather: midsommar & jul (weekday/season/holiday):")
print(
    df_weather[df_weather['date'].dt.date.astype(str).isin(check_dates)][
        ['date', 'weekday', 'is_weekend', 'month', 'season', 'is_holiday']
    ].sort_values('date').head(12)
)

print("\nLag feature preview (prices):")
print(df_prices[['date','price_sek','price_lag_24','price_lag_48','price_lag_72','price_roll3d']].head())


Prices: midsommar & jul (weekday/season/holiday):
                           date  weekday  is_weekend  month  season  \
14281 2024-06-21 00:00:00+00:00        4           0      6       2   
14282 2024-06-21 01:00:00+00:00        4           0      6       2   
14283 2024-06-21 02:00:00+00:00        4           0      6       2   
14284 2024-06-21 03:00:00+00:00        4           0      6       2   
14285 2024-06-21 04:00:00+00:00        4           0      6       2   
14286 2024-06-21 05:00:00+00:00        4           0      6       2   
14287 2024-06-21 06:00:00+00:00        4           0      6       2   
14288 2024-06-21 07:00:00+00:00        4           0      6       2   
14289 2024-06-21 08:00:00+00:00        4           0      6       2   
14290 2024-06-21 09:00:00+00:00        4           0      6       2   
14291 2024-06-21 10:00:00+00:00        4           0      6       2   
14292 2024-06-21 11:00:00+00:00        4           0      6       2   

       is_holiday  
14281 

## ‚úÖ Summary

Backfill complete! We have created:

1. **electricity_prices** feature group with hourly prices from elprisetjustnu.se
2. **weather_hourly** feature group with hourly weather data from Open-Meteo

### Next Steps
- Create a **daily feature pipeline** to update with new data
- Create a **training pipeline** to build a prediction model
- Create a **batch inference pipeline** to generate daily predictions


In [45]:
# Final summary
print("=" * 60)
print("BACKFILL COMPLETE")
print("=" * 60)
print(f"\nüìä Electricity Prices:")
print(f"   - Records: {len(df_prices):,}")
print(f"   - Date range: {df_prices['date'].min()} to {df_prices['date'].max()}")
print(f"   - Price area: {PRICE_AREA}")

print(f"\nüå¶ Weather Data:")
print(f"   - Records: {len(df_weather):,}")
print(f"   - Date range: {df_weather['date'].min()} to {df_weather['date'].max()}")
print(f"   - Price area: {PRICE_AREA}")

print(f"\nüîó Hopsworks Feature Groups:")
print(f"   - {electricity_fg.name} (v{electricity_fg.version})")
print(f"   - {weather_fg.name} (v{weather_fg.version})")


BACKFILL COMPLETE

üìä Electricity Prices:
   - Records: 27,227
   - Date range: 2022-11-03 23:00:00+00:00 to 2025-12-12 22:00:00+00:00
   - Price area: SE3

üå¶ Weather Data:
   - Records: 27,336
   - Date range: 2022-10-30 23:00:00+00:00 to 2025-12-12 22:00:00+00:00
   - Price area: SE3

üîó Hopsworks Feature Groups:
   - electricity_prices (v1)
   - weather_hourly (v1)
