# Electricity Price Prediction - Backfill Pipeline

## üóíÔ∏è Overview
This notebook collects historical data for electricity price prediction in Stockholm (SE3):
1. **Electricity prices** from elprisetjustnu.se API (hourly data from Nov 2022)
2. **Weather data** from Open-Meteo API (hourly historical data)

The goal is to train a model that predicts electricity prices for each hour of the next day.

### Weather Features Selected
We use weather features that affect electricity supply and demand:
- **Temperature** ‚Üí heating/cooling demand
- **Wind speed** (10m & 100m) ‚Üí wind power generation
- **Cloud cover** ‚Üí solar power generation
- **Precipitation** ‚Üí hydro power availability

In [1]:
from pathlib import Path
import sys
import pandas as pd
from datetime import date, timedelta
import warnings
warnings.filterwarnings("ignore")

from dotenv import load_dotenv
import hopsworks

# 1. Find project root (one level up from notebooks/)
root_dir = Path("..").resolve()

# 2. Add project root to PYTHONPATH so we can import the src package
if str(root_dir) not in sys.path:
    sys.path.append(str(root_dir))

# 3. Load .env from project root
env_path = root_dir / ".env"
load_dotenv(env_path)

# 4. Load settings and utility functions (after adjusting PYTHONPATH)
from src.config import ElectricitySettings
from src import util

settings = ElectricitySettings()

# 5. Log in to Hopsworks and get feature store
project = hopsworks.login()
fs = project.get_feature_store(name="scalableproject_featurestore")

print("Successfully logged in to Hopsworks project:", settings.HOPSWORKS_PROJECT)
print(f"Feature Store: {fs}")

# Show the weather variables we'll be using
print(f"\nWeather variables: {util.HOURLY_WEATHER_VARIABLES}")

ElectricitySettings initialized
2025-12-11 10:51:31,173 INFO: Initializing external client
2025-12-11 10:51:31,174 INFO: Base URL: https://eu-west.cloud.hopsworks.ai:443






2025-12-11 10:51:31,943 INFO: Python Engine initialized.

Logged in to project, explore it here https://eu-west.cloud.hopsworks.ai:443/p/127
Successfully logged in to Hopsworks project: ScalableProject
Feature Store: <hsfs.feature_store.FeatureStore object at 0x3067a5d50>

Weather variables: ['temperature_2m', 'apparent_temperature', 'precipitation', 'rain', 'snowfall', 'cloud_cover', 'wind_speed_10m', 'wind_speed_100m', 'wind_direction_10m', 'wind_direction_100m', 'wind_gusts_10m', 'surface_pressure']


## ‚öôÔ∏è Configuration

Define the price area and date range for historical data collection.


In [2]:
# Configuration
PRICE_AREA = "SE3"  # Stockholm / S√∂dra Mellansverige
CITY = "Stockholm"
LATITUDE = 59.3251   # Stockholm coordinates
LONGITUDE = 18.0711

#LATITUDE, LONGITUDE = util.get_city_coordinates(CITY)

# Historical data range
# Electricity prices available from Nov 1, 2022
START_DATE = date(2022, 11, 1)
END_DATE = date.today()  

print(f"Price Area: {PRICE_AREA}")
print(f"City: {CITY} ({LATITUDE}, {LONGITUDE})")
print(f"Date range: {START_DATE} to {END_DATE}")
print(f"Total days to fetch: {(END_DATE - START_DATE).days + 1}")


Price Area: SE3
City: Stockholm (59.3251, 18.0711)
Date range: 2022-11-01 to 2025-12-11
Total days to fetch: 1137


## ‚ö° Step 1: Fetch Historical Electricity Prices

Using the elprisetjustnu.se API to get hourly electricity prices for Stockholm (SE3).


In [3]:
# Using fetch_electricity_prices() from util.py
df_prices = util.fetch_electricity_prices(START_DATE, END_DATE, PRICE_AREA)
df_prices = util.align_electricity_price_schema(df_prices)

# Rename price_area value to city label for readability
df_prices['price_area'] = CITY

# --- Create numeric ID f√∂r Online Feature Store ---
# ensure UTC and convert to milliseconds (int)
df_prices['timestamp'] = pd.to_datetime(df_prices['timestamp'], utc=True)
df_prices['unix_time'] = df_prices['timestamp'].astype('int64') // 10**6

# Inkludera den nya kolumnen i urvalet
df_prices = df_prices[['unix_time', 'timestamp', 'date', 'hour', 'price_area', 'price_sek']]
df_prices.head()

Fetching electricity prices from 2022-11-01 to 2025-12-11 for SE3...


Fetching prices: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1137/1137 [10:07<00:00,  1.87it/s]

Fetched 27275 hourly price records across 1137 day(s)







Unnamed: 0,unix_time,timestamp,date,hour,price_area,price_sek
0,1667257200000,2022-10-31 23:00:00,2022-10-31,23,Stockholm,0.37995
1,1667260800000,2022-11-01 00:00:00,2022-11-01,0,Stockholm,0.37995
2,1667264400000,2022-11-01 01:00:00,2022-11-01,1,Stockholm,0.3843
3,1667268000000,2022-11-01 02:00:00,2022-11-01,2,Stockholm,0.39301
4,1667271600000,2022-11-01 03:00:00,2022-11-01,3,Stockholm,0.41173


In [4]:
# Check the electricity prices data
print(f"Shape: {df_prices.shape}")
print(f"\nDate range: {df_prices['date'].min()} to {df_prices['date'].max()}")
print(f"\nColumn types:")
df_prices.info()


Shape: (27275, 6)

Date range: 2022-10-31 00:00:00 to 2025-12-11 00:00:00

Column types:
<class 'pandas.core.frame.DataFrame'>
Index: 27275 entries, 0 to 27287
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   unix_time   27275 non-null  int64         
 1   timestamp   27275 non-null  datetime64[us]
 2   date        27275 non-null  datetime64[us]
 3   hour        27275 non-null  int32         
 4   price_area  27275 non-null  object        
 5   price_sek   27275 non-null  float32       
dtypes: datetime64[us](2), float32(1), int32(1), int64(1), object(1)
memory usage: 1.2+ MB


In [5]:
# Show price statistics
print(f"\nPrice statistics (SEK/kWh):")
print(df_prices['price_sek'].describe())



Price statistics (SEK/kWh):
count    27275.000000
mean         0.586212
std          0.697083
min         -0.691120
25%          0.168000
50%          0.387460
75%          0.775355
max          8.157090
Name: price_sek, dtype: float64


## üå¶ Step 2: Fetch Historical Weather Data

Using Open-Meteo API to get hourly weather data that may correlate with electricity prices:
- Temperature affects heating/cooling demand
- Wind speed affects wind power generation
- Cloud cover affects solar power generation
- Precipitation can affect hydro power


In [6]:
# Using get_hourly_historical_weather() from electricity_utils.py
# Fetch hourly weather data for the date range

df_weather = util.get_hourly_historical_weather(
    latitude=LATITUDE,
    longitude=LONGITUDE, 
    start_date=str(pd.to_datetime(df_prices['date'].min()).date()),
    end_date=str(END_DATE),
    city=CITY
)

df_weather.head()


Fetching historical weather for Stockholm (59.3251, 18.0711)...
Date range: 2022-10-31 to 2025-12-11
Coordinates: 59.29701232910156¬∞N 18.163265228271484¬∞E
Elevation: 23.0 m asl
Fetched 27312 hourly weather records


Unnamed: 0,timestamp,temperature_2m,apparent_temperature,precipitation,rain,snowfall,cloud_cover,wind_speed_10m,wind_speed_100m,wind_direction_10m,wind_direction_100m,wind_gusts_10m,surface_pressure,city,date,hour
0,2022-10-30 23:00:00+00:00,4.0215,1.344135,0.0,0.0,0.0,10.0,8.161764,16.179985,228.576431,249.145462,16.559999,1014.421021,Stockholm,2022-10-30,23
1,2022-10-31 00:00:00+00:00,3.8215,1.034508,0.0,0.0,0.0,12.0,8.913181,17.317459,226.636536,249.304459,12.599999,1014.518616,Stockholm,2022-10-31,0
2,2022-10-31 01:00:00+00:00,3.7215,0.920544,0.0,0.0,0.0,50.0,8.942214,18.775303,220.100845,237.528824,16.919998,1014.617371,Stockholm,2022-10-31,1
3,2022-10-31 02:00:00+00:00,3.6715,0.987128,0.0,0.0,0.0,100.0,8.209263,18.374111,195.255173,214.624222,15.48,1014.317627,Stockholm,2022-10-31,2
4,2022-10-31 03:00:00+00:00,4.4715,1.743104,0.0,0.0,0.0,100.0,9.605998,21.578989,192.994614,207.847488,17.639999,1014.425537,Stockholm,2022-10-31,3


In [7]:
# Check the weather data
print(f"Shape: {df_weather.shape}")
print(f"\nDate range: {df_weather['date'].min()} to {df_weather['date'].max()}")
print(f"\nWeather features: {[c for c in df_weather.columns if c not in ['timestamp', 'city', 'date', 'hour']]}")


Shape: (27312, 16)

Date range: 2022-10-30 to 2025-12-11

Weather features: ['temperature_2m', 'apparent_temperature', 'precipitation', 'rain', 'snowfall', 'cloud_cover', 'wind_speed_10m', 'wind_speed_100m', 'wind_direction_10m', 'wind_direction_100m', 'wind_gusts_10m', 'surface_pressure']


In [8]:
# Show weather statistics
print("\nTemperature statistics (¬∞C):")
print(df_weather['temperature_2m'].describe())
print("\nWind speed at 100m (km/h):")
print(df_weather['wind_speed_100m'].describe())



Temperature statistics (¬∞C):
count    27312.000000
mean         7.605280
std          8.074789
min        -17.528500
25%          1.321500
50%          6.921500
75%         14.471499
max         27.321501
Name: temperature_2m, dtype: float64

Wind speed at 100m (km/h):
count    27312.000000
mean        21.967611
std          9.143543
min          0.360000
25%         15.696165
50%         21.437386
75%         27.607563
max         69.316826
Name: wind_speed_100m, dtype: float64


## üîß Step 3: Data Processing

Clean and prepare the data for the feature store.


In [9]:
# The utility functions already handle type conversions and cleaning
# Just verify the data is ready for Hopsworks
print("Electricity prices ready for feature store:")
print(f"  Shape: {df_prices.shape}")
print(f"  Columns: {list(df_prices.columns)}")
df_prices.info()


Electricity prices ready for feature store:
  Shape: (27275, 6)
  Columns: ['unix_time', 'timestamp', 'date', 'hour', 'price_area', 'price_sek']
<class 'pandas.core.frame.DataFrame'>
Index: 27275 entries, 0 to 27287
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   unix_time   27275 non-null  int64         
 1   timestamp   27275 non-null  datetime64[us]
 2   date        27275 non-null  datetime64[us]
 3   hour        27275 non-null  int32         
 4   price_area  27275 non-null  object        
 5   price_sek   27275 non-null  float32       
dtypes: datetime64[us](2), float32(1), int32(1), int64(1), object(1)
memory usage: 1.2+ MB


In [10]:
# The utility functions already handle type conversions and cleaning
# Convert date to datetime for Hopsworks compatibility
df_weather['date'] = pd.to_datetime(df_weather['date'])

print("Weather data ready for feature store:")
print(f"  Shape: {df_weather.shape}")
print(f"  Columns: {list(df_weather.columns)}")
df_weather.info()


Weather data ready for feature store:
  Shape: (27312, 16)
  Columns: ['timestamp', 'temperature_2m', 'apparent_temperature', 'precipitation', 'rain', 'snowfall', 'cloud_cover', 'wind_speed_10m', 'wind_speed_100m', 'wind_direction_10m', 'wind_direction_100m', 'wind_gusts_10m', 'surface_pressure', 'city', 'date', 'hour']
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27312 entries, 0 to 27311
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype              
---  ------                --------------  -----              
 0   timestamp             27312 non-null  datetime64[ns, UTC]
 1   temperature_2m        27312 non-null  float32            
 2   apparent_temperature  27312 non-null  float32            
 3   precipitation         27312 non-null  float32            
 4   rain                  27312 non-null  float32            
 5   snowfall              27312 non-null  float32            
 6   cloud_cover           27312 non-null  float32            
 7

In [11]:
# Check for missing values
print("Missing values in electricity prices:")
print(df_prices.isnull().sum())
print(f"\n{'='*50}\n")
print("Missing values in weather data:")
print(df_weather.isnull().sum())


Missing values in electricity prices:
unix_time     0
timestamp     0
date          0
hour          0
price_area    0
price_sek     0
dtype: int64


Missing values in weather data:
timestamp               0
temperature_2m          0
apparent_temperature    0
precipitation           0
rain                    0
snowfall                0
cloud_cover             0
wind_speed_10m          0
wind_speed_100m         0
wind_direction_10m      0
wind_direction_100m     0
wind_gusts_10m          0
surface_pressure        0
city                    0
date                    0
hour                    0
dtype: int64


## ‚úÖ Step 4: Data Validation

Define validation rules using Great Expectations to ensure data quality.


In [None]:
# Add unix_time (ms) for online FG primary key
if 'unix_time' not in df_weather.columns:
    df_weather['unix_time'] = df_weather['timestamp'].astype('int64') // 10**6

print("Added unix_time to weather data:")
print(f"  Columns: {list(df_weather.columns)}")
print(df_weather[['timestamp', 'unix_time']].head())


In [12]:
import great_expectations as ge

# Expectation suite for electricity prices
price_expectation_suite = ge.core.ExpectationSuite(
    expectation_suite_name="electricity_price_expectations"
)

# Price should be reasonable (can be negative in some cases, but typically between -1 and 10 SEK/kWh)
price_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_min_to_be_between",
        kwargs={
            "column": "price_sek",
            "min_value": -5.0,  # Prices can occasionally be negative
            "max_value": 50.0,   # Upper bound sanity check
            "strict_min": False
        }
    )
)

# Hour should be between 0 and 23
price_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "hour",
            "min_value": 0,
            "max_value": 23
        }
    )
)

print("Price expectation suite created")


Price expectation suite created


In [13]:
# Expectation suite for weather data
weather_expectation_suite = ge.core.ExpectationSuite(
    expectation_suite_name="weather_expectations"
)

weather_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "temperature_2m",
            "min_value": -20.0,
            "max_value": 40.0
        }
    )
)

# Wind speed should be non-negative
weather_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_min_to_be_between",
        kwargs={
            "column": "wind_speed_10m",
            "min_value": -0.1,
            "max_value": 200.0,  # Max reasonable wind speed
            "strict_min": False
        }
    )
)

# Precipitation should be non-negative
weather_expectation_suite.add_expectation(
    ge.core.ExpectationConfiguration(
        expectation_type="expect_column_min_to_be_between",
        kwargs={
            "column": "precipitation",
            "min_value": -0.1,
            "max_value": 500.0,
            "strict_min": False
        }
    )
)

print("Weather expectation suite created")


Weather expectation suite created


## üíæ Step 5: Create Feature Groups in Hopsworks

Create feature groups for electricity prices and weather data, then insert the historical data.


In [None]:
# Add unix_time (ms) for online FG primary key (normalized approach)
# ensure UTC and then convert to milliseconds
if 'unix_time' in df_weather.columns:
    df_weather = df_weather.drop(columns=['unix_time'])
df_weather['timestamp'] = pd.to_datetime(df_weather['timestamp'], utc=True)
df_weather['unix_time'] = df_weather['timestamp'].astype('int64') // 10**6

print("Added unix_time to weather data (normalized):")
print(f"  Columns: {list(df_weather.columns)}")
print(df_weather[['timestamp', 'unix_time']].head())


In [14]:
# Create electricity prices feature group (online-only, SEK only)
electricity_fg = fs.get_or_create_feature_group(
    name='electricity_prices',
    description='Hourly electricity prices for Swedish price areas (SEK only)',
    version=1,                                  # Ny version (v3) f√∂r att b√∂rja rent
    primary_key=['price_area', 'unix_time'],    # Vi anv√§nder den f√∂rberedda ID-kolumnen
    event_time='timestamp',                     # Vi anv√§nder fortfarande 'timestamp' f√∂r tid
    expectation_suite=price_expectation_suite,
    online_enabled=True,
)

# Insert data
electricity_fg.insert(df_prices, storage="online", wait=True)

print(f"Feature group created: {electricity_fg.name} v{electricity_fg.version}")

Feature Group created successfully, explore it at 
https://eu-west.cloud.hopsworks.ai:443/p/127/fs/74/fg/2102
2025-12-11 11:04:00,121 INFO: 	2 expectation(s) included in expectation_suite.
Validation succeeded.
Validation Report saved successfully, explore a summary at https://eu-west.cloud.hopsworks.ai:443/p/127/fs/74/fg/2102


Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 27275/27275 | Elapsed Time: 00:00 | Remaining Time: 00:00


Launching job: electricity_prices_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://eu-west.cloud.hopsworks.ai:443/p/127/jobs/named/electricity_prices_1_offline_fg_materialization/executions
2025-12-11 11:04:13,177 INFO: Waiting for execution to finish. Current state: SUBMITTED. Final status: UNDEFINED
2025-12-11 11:04:19,422 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2025-12-11 11:06:06,060 INFO: Waiting for execution to finish. Current state: AGGREGATING_LOGS. Final status: SUCCEEDED
2025-12-11 11:06:06,143 INFO: Waiting for log aggregation to finish.
2025-12-11 11:06:14,501 INFO: Execution finished successfully.


Online data ingestion progress: 0.00% |          | Rows 0/27275

Feature group created: electricity_prices v1


In [20]:
# Add feature descriptions for electricity prices (SEK only)
electricity_fg.update_feature_description("timestamp", "Timestamp of the price period start (hourly)")
electricity_fg.update_feature_description("unix_time", "Timestamp in unix epoch milliseconds (Primary Key)")
electricity_fg.update_feature_description("date", "Date of the price measurement")
electricity_fg.update_feature_description("hour", "Hour of the day (0-23)")
electricity_fg.update_feature_description("price_area", "Swedish electricity price area (SE1-SE4)")
electricity_fg.update_feature_description("price_sek", "Electricity price in SEK per kWh (excl. VAT)")

print("Feature descriptions added for electricity prices")


Feature descriptions added for electricity prices


In [None]:
# Create weather feature group (online-only to avoid HopsFS)
weather_fg = fs.get_or_create_feature_group(
    name='weather_hourly',
    description='Hourly weather data for electricity price prediction',
    version=1,
    primary_key=['city', 'hour'],
    event_time='timestamp',
    expectation_suite=weather_expectation_suite,
    online_enabled=True,
)

print(f"Feature group created: {weather_fg.name} v{weather_fg.version}")


In [None]:
# Insert weather data (online-only FG)
weather_fg.insert(df_weather, storage="online", wait=True)


In [None]:
# Add feature descriptions for weather data
# These match the variables defined in HOURLY_WEATHER_VARIABLES in util.py
weather_fg.update_feature_description("timestamp", "Timestamp of the weather measurement (hourly)")
weather_fg.update_feature_description("date", "Date of the weather measurement")
weather_fg.update_feature_description("hour", "Hour of the day (0-23)")
weather_fg.update_feature_description("city", "City where weather is measured")

# Temperature features
weather_fg.update_feature_description("temperature_2m", "Air temperature at 2m height in ¬∞C")
weather_fg.update_feature_description("apparent_temperature", "Feels-like temperature in ¬∞C (affects heating/cooling demand)")

# Precipitation features
weather_fg.update_feature_description("precipitation", "Total precipitation (rain + snow) in mm")
weather_fg.update_feature_description("rain", "Rainfall in mm")
weather_fg.update_feature_description("snowfall", "Snowfall in cm")

# Cloud cover (affects solar power)
weather_fg.update_feature_description("cloud_cover", "Total cloud cover in % (affects solar power generation)")

# Wind features (affects wind power)
weather_fg.update_feature_description("wind_speed_10m", "Wind speed at 10m in km/h")
weather_fg.update_feature_description("wind_speed_100m", "Wind speed at 100m (turbine height) in km/h - key for wind power")
weather_fg.update_feature_description("wind_direction_10m", "Wind direction at 10m in degrees")
weather_fg.update_feature_description("wind_direction_100m", "Wind direction at 100m in degrees")
weather_fg.update_feature_description("wind_gusts_10m", "Wind gusts at 10m in km/h (can cause turbine shutdowns)")

# Pressure
weather_fg.update_feature_description("surface_pressure", "Surface pressure in hPa (weather patterns)")

print("Feature descriptions added for weather data")


## üîê Step 6: Save Configuration as Secrets

Store the location configuration in Hopsworks secrets for use in daily pipelines.


In [None]:
import json

# Save location configuration as a Hopsworks secret
location_config = {
    "price_area": PRICE_AREA,
    "city": CITY,
    "latitude": LATITUDE,
    "longitude": LONGITUDE
}

location_str = json.dumps(location_config)

# Get secrets API
secrets = hopsworks.get_secrets_api()

# Save or update the location secret
secret_name = "ELECTRICITY_LOCATION_JSON"
try:
    existing_secret = secrets.get_secret(secret_name)
    if existing_secret is not None:
        existing_secret.delete()
        print(f"Replacing existing {secret_name}")
except:
    pass

secrets.create_secret(secret_name, location_str)
print(f"Saved location configuration to secret: {secret_name}")
print(f"Config: {location_config}")


## ‚úÖ Summary

Backfill complete! We have created:

1. **electricity_prices** feature group with hourly prices from elprisetjustnu.se
2. **weather_hourly** feature group with hourly weather data from Open-Meteo

### Next Steps
- Create a **daily feature pipeline** to update with new data
- Create a **training pipeline** to build a prediction model
- Create a **batch inference pipeline** to generate daily predictions


In [None]:
# Final summary
print("=" * 60)
print("BACKFILL COMPLETE")
print("=" * 60)
print(f"\nüìä Electricity Prices:")
print(f"   - Records: {len(df_prices):,}")
print(f"   - Date range: {df_prices['date'].min()} to {df_prices['date'].max()}")
print(f"   - Price area: {PRICE_AREA}")

print(f"\nüå¶ Weather Data:")
print(f"   - Records: {len(df_weather):,}")
print(f"   - Date range: {df_weather['date'].min()} to {df_weather['date'].max()}")
print(f"   - City: {CITY}")

print(f"\nüîó Hopsworks Feature Groups:")
print(f"   - {electricity_fg.name} (v{electricity_fg.version})")
print(f"   - {weather_fg.name} (v{weather_fg.version})")
