# Notebook 2 – Daily Feature Pipeline

Syfte: hämta gårdagens elpris och väder, aligna med FG-schema (price_area+unix_time som PK) och upserta till Hopsworks.

Körordning:
1) Setup & login
2) Hämta FG (elpris, väder)
3) Konfiguration (PRICE_AREA, datum)
4) Steg 1: priser (inkl. kalender/helg/laggar) → insert
5) Steg 2: väder (inkl. kalender/helg) → insert

Notera: FG-version = 1 (nyskapat schema utan EUR/exchange), `price_area` är lower-case string, `unix_time` i ms.

In [1]:
from pathlib import Path
import sys
import datetime
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import holidays

from dotenv import load_dotenv
import hopsworks

# 1. Find project root (one level up from notebooks/)
root_dir = Path("..").resolve()

# 2. Add project root to PYTHONPATH so we can import the src package
if str(root_dir) not in sys.path:
    sys.path.append(str(root_dir))

# 3. Load .env from project root
env_path = root_dir / ".env"
load_dotenv(env_path)

# 4. Load settings and utility functions (after adjusting PYTHONPATH)
from src.config import ElectricitySettings
from src import util

settings = ElectricitySettings()

# 5. Log in to Hopsworks and get feature store
project = hopsworks.login(engine="python")
fs = project.get_feature_store()


print("Successfully logged in to Hopsworks project:", settings.HOPSWORKS_PROJECT)


ElectricitySettings initialized
2025-12-16 19:19:55,590 INFO: Initializing external client
2025-12-16 19:19:55,591 INFO: Base URL: https://eu-west.cloud.hopsworks.ai:443
2025-12-16 19:19:56,390 INFO: Python Engine initialized.

Logged in to project, explore it here https://eu-west.cloud.hopsworks.ai:443/p/127
Successfully logged in to Hopsworks project: ScalableProject


In [2]:
# Get the feature groups (new schema with engineered features)
electricity_prices_fg = fs.get_feature_group('electricity_prices', version=1)
weather_hourly_fg = fs.get_feature_group('weather_hourly', version=1)


In [3]:
# Configuration
PRICE_AREA = "SE3"  # Stockholm / South-Central Sweden
CITY = "Stockholm"
LATITUDE = 59.3251   # Stockholm coordinates
LONGITUDE = 18.0711

#LATITUDE, LONGITUDE = util.get_city_coordinates(CITY)

today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)

## Step 1 — Fetch and upsert yesterday's electricity prices
Pull yesterday's hourly prices, align them to the backfill schema (including `unix_time` as PK) and write to the `electricity_prices` feature group.

In [4]:
# Fetch a 4-day window to compute lags (yesterday + previous 3 days)
window_start = yesterday - datetime.timedelta(days=3)
df_prices = util.fetch_electricity_prices(
    start_date=window_start,
    end_date=yesterday,
    price_area=PRICE_AREA,
    show_progress=False,
    request_pause=0,
)
df_prices = util.align_electricity_price_schema(df_prices)

# Align to backfill schema: UTC date, unix_time ms, price_area lowercase string
df_prices["date"] = pd.to_datetime(df_prices["timestamp"], utc=True)
df_prices["unix_time"] = df_prices["date"].astype("int64") // 10**6
df_prices = df_prices.drop(columns=["timestamp"])
# Remove unused currency columns if present
df_prices = df_prices.drop(columns=["price_eur", "exchange_rate"], errors="ignore")
df_prices["price_area"] = PRICE_AREA.lower()
df_prices["price_area"] = df_prices["price_area"].astype("string")

# Sort for lag/rolling calculations
df_prices = df_prices.sort_values(["price_area", "unix_time"])

# Calendar features
df_prices["weekday"] = df_prices["date"].dt.weekday.astype("int8")
df_prices["is_weekend"] = df_prices["weekday"].isin([5, 6]).astype("int8")
df_prices["month"] = df_prices["date"].dt.month.astype("int8")
season_map = {12: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 3, 10: 3, 11: 3}
df_prices["season"] = df_prices["month"].map(season_map).astype("int8")
try:
    import holidays
    years = range(df_prices["date"].dt.year.min(), df_prices["date"].dt.year.max() + 1)
    se_holidays = holidays.Sweden(years=years)
    df_prices["is_holiday"] = df_prices["date"].dt.date.isin(se_holidays).astype("int8")
except Exception:
    df_prices["is_holiday"] = 0

# Lagged prices and 72h rolling mean
for lag in [24, 48, 72]:
    df_prices[f"price_lag_{lag}"] = (
        df_prices.groupby("price_area")["price_sek"].shift(lag).astype("float32")
    )

df_prices["price_roll3d"] = (
    df_prices.groupby("price_area")["price_sek"]
             .rolling(72, min_periods=1)
             .mean()
             .reset_index(level=0, drop=True)
             .astype("float32")
)

# Keep only yesterday's rows after lag computation
df_prices = df_prices[df_prices["date"].dt.date == yesterday].copy()
df_prices = df_prices.dropna().reset_index(drop=True)

price_columns = [
    "unix_time",
    "date",
    "hour",
    "price_area",
    "price_sek",
    "weekday",
    "is_weekend",
    "month",
    "season",
    "is_holiday",
    "price_lag_24",
    "price_lag_48",
    "price_lag_72",
    "price_roll3d",
]
df_prices = df_prices[price_columns]
print(f"Fetched {len(df_prices)} rows for {yesterday}")

# Insert new data
electricity_prices_fg.insert(df_prices, wait=True)

Fetching electricity prices from 2025-12-12 to 2025-12-15 for SE3...
Fetched 96 hourly price records across 4 day(s)


Fetched 23 rows for 2025-12-15
2025-12-16 19:19:59,055 INFO: 	2 expectation(s) included in expectation_suite.
Validation succeeded.
Validation Report saved successfully, explore a summary at https://eu-west.cloud.hopsworks.ai:443/p/127/fs/74/fg/2107


Uploading Dataframe: 100.00% |██████████| Rows 23/23 | Elapsed Time: 00:00 | Remaining Time: 00:00


Launching job: electricity_prices_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://eu-west.cloud.hopsworks.ai:443/p/127/jobs/named/electricity_prices_1_offline_fg_materialization/executions
2025-12-16 19:20:10,941 INFO: Waiting for execution to finish. Current state: SUBMITTED. Final status: UNDEFINED
2025-12-16 19:20:14,126 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2025-12-16 19:22:23,347 INFO: Waiting for execution to finish. Current state: SUCCEEDING. Final status: UNDEFINED
2025-12-16 19:22:26,538 INFO: Waiting for execution to finish. Current state: FINISHED. Final status: SUCCEEDED
2025-12-16 19:22:27,043 INFO: Waiting for log aggregation to finish.
2025-12-16 19:22:27,044 INFO: Execution finished successfully.


Online data ingestion progress: 0.00% |          | Rows 0/27227

(Job('electricity_prices_1_offline_fg_materialization', 'PYSPARK'),
 {
   "success": true,
   "results": [
     {
       "success": true,
       "expectation_config": {
         "expectation_type": "expect_column_min_to_be_between",
         "kwargs": {
           "column": "price_sek",
           "min_value": -5.0,
           "max_value": 50.0,
           "strict_min": false
         },
         "meta": {
           "expectationId": 2071
         }
       },
       "result": {
         "observed_value": 0.20020000636577606,
         "element_count": 23,
         "missing_count": null,
         "missing_percent": null
       },
       "meta": {
         "ingestionResult": "INGESTED",
         "validationTime": "2025-12-16T06:19:59.000055Z"
       },
       "exception_info": {
         "raised_exception": false,
         "exception_message": null,
         "exception_traceback": null
       }
     },
     {
       "success": true,
       "expectation_config": {
         "expectation_typ

## Step 2 — Fetch and upsert yesterday's weather
Fetch yesterday's hourly actual weather, align to the backfill schema, and write to the `weather_hourly` feature group.

In [5]:
# Fetch and upload yesterday's actual hourly weather
actual_weather_yesterday = util.get_yesterday_hourly_weather(
    latitude=LATITUDE,
    longitude=LONGITUDE,
    city=CITY,
)
actual_weather_yesterday['date'] = pd.to_datetime(actual_weather_yesterday['timestamp'], utc=True)
actual_weather_yesterday['unix_time'] = actual_weather_yesterday['date'].astype('int64') // 10**6

# Align schema to backfill FG: drop timestamp, use price_area label
actual_weather_yesterday['price_area'] = PRICE_AREA.lower()
actual_weather_yesterday['price_area'] = actual_weather_yesterday['price_area'].astype('string')
if 'city' in actual_weather_yesterday.columns:
    actual_weather_yesterday = actual_weather_yesterday.drop(columns=['city'])
actual_weather_yesterday = actual_weather_yesterday.drop(columns=['timestamp'])

# Calendar features
actual_weather_yesterday['weekday'] = actual_weather_yesterday['date'].dt.weekday.astype('int8')
actual_weather_yesterday['is_weekend'] = actual_weather_yesterday['weekday'].isin([5, 6]).astype('int8')
actual_weather_yesterday['month'] = actual_weather_yesterday['date'].dt.month.astype('int8')
season_map = {12: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 3, 10: 3, 11: 3}
actual_weather_yesterday['season'] = actual_weather_yesterday['month'].map(season_map).astype('int8')
try:
    import holidays
    years = range(actual_weather_yesterday['date'].dt.year.min(), actual_weather_yesterday['date'].dt.year.max() + 1)
    se_holidays = holidays.Sweden(years=years)
    actual_weather_yesterday['is_holiday'] = actual_weather_yesterday['date'].dt.date.isin(se_holidays).astype('int8')
except Exception:
    actual_weather_yesterday['is_holiday'] = 0

weather_cols = [
    "unix_time",
    "date",
    "hour",
    "price_area",
    "temperature_2m",
    "apparent_temperature",
    "precipitation",
    "rain",
    "snowfall",
    "cloud_cover",
    "wind_speed_10m",
    "wind_speed_100m",
    "wind_direction_10m",
    "wind_direction_100m",
    "wind_gusts_10m",
    "surface_pressure",
    "weekday",
    "is_weekend",
    "month",
    "season",
    "is_holiday",
]
actual_weather_yesterday = actual_weather_yesterday[weather_cols]

if len(actual_weather_yesterday):
    weather_hourly_fg.insert(actual_weather_yesterday, storage="online", wait=True)
    print(f"Inserted actual weather for yesterday: {len(actual_weather_yesterday)} rows for {yesterday}")
else:
    print("No actual weather rows for yesterday.")


Fetching historical weather for Stockholm (59.3251, 18.0711)...
Date range: 2025-12-15 to 2025-12-15
Coordinates: 59.29701232910156°N 18.163265228271484°E
Elevation: 23.0 m asl
Fetched 24 hourly weather records
2025-12-16 19:22:28,232 INFO: 	3 expectation(s) included in expectation_suite.
Validation succeeded.
Validation Report saved successfully, explore a summary at https://eu-west.cloud.hopsworks.ai:443/p/127/fs/74/fg/2108


Uploading Dataframe: 100.00% |██████████| Rows 23/23 | Elapsed Time: 00:00 | Remaining Time: 00:00


Launching job: weather_hourly_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://eu-west.cloud.hopsworks.ai:443/p/127/jobs/named/weather_hourly_1_offline_fg_materialization/executions
2025-12-16 19:22:39,753 INFO: Waiting for execution to finish. Current state: SUBMITTED. Final status: UNDEFINED
2025-12-16 19:22:42,879 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2025-12-16 19:24:49,586 INFO: Waiting for execution to finish. Current state: AGGREGATING_LOGS. Final status: SUCCEEDED
2025-12-16 19:24:49,719 INFO: Waiting for log aggregation to finish.
2025-12-16 19:24:58,504 INFO: Execution finished successfully.


Online data ingestion progress: 0.00% |          | Rows 0/23

Inserted actual weather for yesterday: 23 rows for 2025-12-15
