# Aurora Forecasting - Part 02: Daily Feature Pipeline

üóíÔ∏è This notebook is divided into the following sections:
Initialize Hopsworks connection.

Fetch the latest real-time Solar Wind data from NOAA.

Fetch the latest Cloud Cover forecast for Stockholm, Lule√•, and Kiruna.

Update the Feature Groups in the Hopsworks Feature Store.

# Imports and Login

In [1]:
import pandas as pd
import datetime
import hopsworks
from config import HopsworksSettings
import util
import warnings
warnings.filterwarnings("ignore")
import numpy

# Setup settings
settings = HopsworksSettings()

print(settings.HOPSWORKS_PROJECT)

# Login to Hopsworks
project = hopsworks.login(
    project=settings.HOPSWORKS_PROJECT,
    api_key_value=settings.HOPSWORKS_API_KEY.get_secret_value()
)
fs = project.get_feature_store()


HopsworksSettings initialized!
mac64
2026-01-05 15:37:12,165 INFO: Initializing external client
2026-01-05 15:37:12,166 INFO: Base URL: https://c.app.hopsworks.ai:443






2026-01-05 15:37:14,388 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1299605


# Step 1: Get Real-time Solar Wind Data

We use the NOAA SWPC API to get the most recent measurements from the DSCOVR/ACE satellites. These will serve as the features for our real-time inference.

In [3]:
print("Fetching real-time solar wind data from NOAA...")

# Uses the helper function from util.py to fetch and merge mag/plasma data
new_solar_df = util.get_noaa_realtime_data(
    settings.NOAA_MAG_URL,
    settings.NOAA_PLASMA_URL,
    settings.KP_INDEX_URL
)

# Rename column to match feature group schema
new_solar_df.rename(columns={'Kp': 'kp_index'}, inplace=True)
new_solar_df.rename(columns={'time_tag': 'date_and_time'}, inplace=True)

# Format the time_tag for Hopsworks compatibility
#new_solar_df['time'] = new_solar_df['time'].dt.strftime('%Y-%m-%d %H:%M:%S')

# Drop unecessary columns if any (spoiler, there are)
new_solar_df.drop(columns=['bx_gsm', 'lon_gsm', 'lat_gsm', 'bt', 'temperature', 'a_running', 'station_count'], inplace=True, errors='ignore')

print(f"Successfully retrieved {len(new_solar_df)} new solar wind records.")

Fetching real-time solar wind data from NOAA...
Magnetometer data:
                time_tag bx_gsm by_gsm bz_gsm lon_gsm lat_gsm    bt
0   2026-01-04 14:39:00   1.10  -5.93  -0.43  280.51   -4.05  6.05
1   2026-01-04 14:40:00   1.12  -5.88  -0.66  280.77   -6.28  6.03
2   2026-01-04 14:41:00   0.59  -6.19   0.02  275.48    0.16  6.22
3   2026-01-04 14:43:00   0.42  -6.11   0.31  273.91    2.88  6.13
4   2026-01-04 14:44:00  -0.39  -6.29  -0.07  266.44   -0.64  6.31
..                  ...    ...    ...    ...     ...     ...   ...
871 2026-01-05 14:32:00  -0.07  -3.47  -1.98  268.79  -29.67  4.00
872 2026-01-05 14:33:00  -0.01  -3.64  -0.84  269.85  -13.02  3.74
873 2026-01-05 14:34:00  -0.11  -3.66   0.31  268.33    4.78  3.68
874 2026-01-05 14:35:00  -0.28  -2.93   0.23  264.49    4.37  2.95
875 2026-01-05 14:36:00  -0.56  -2.44  -0.25  256.96   -5.75  2.52

[876 rows x 7 columns]
Plasma data:
                 time_tag density  speed temperature
0    2026-01-04 14:40:00    0.15  457.

In [None]:
# Filter out rows with missing values and sort by date_and_time
new_solar_df = new_solar_df.dropna()
new_solar_df = new_solar_df.sort_values(["date_and_time"])
new_solar_df = new_solar_df.reset_index(drop=True)

new_solar_df

Unnamed: 0,date_and_time,by_gsm,bz_gsm,density,speed,kp_index
0,2026-01-04 14:00:00,-5.493846,1.534615,0.125,449.73125,0.67
1,2026-01-04 15:00:00,-5.469189,2.812432,3.317255,447.294118,1.33
2,2026-01-04 16:00:00,-2.213256,2.457442,0.177959,450.263265,1.33
3,2026-01-04 17:00:00,-1.449792,-0.41625,0.301489,451.046809,1.33
4,2026-01-04 18:00:00,0.054912,-3.113509,0.2844,450.326,2.33
5,2026-01-04 19:00:00,-1.798431,-2.726863,0.33,444.833333,2.33
6,2026-01-04 20:00:00,-5.528409,-2.747727,0.805818,449.685455,2.33
7,2026-01-04 21:00:00,-6.511053,-6.092105,1.607917,466.327083,3.33
8,2026-01-04 22:00:00,-6.376818,-5.423636,0.92,470.644828,3.33
9,2026-01-04 23:00:00,-1.501875,-2.455,1.972286,464.174286,3.33


# Step 2: Get weather forecast

To decide if the aurora is "Visible," we need the cloud cover forecast for our three target cities.

In [13]:
weather_data = []
today = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

for city, coords in settings.CITIES.items():
    print(f"Fetching today cloud cover for {city}...")

    # Get current cloud cover percentage from Open-Meteo
    cloud_cover = util.get_city_weather_forecast(coords['lat'], coords['lon'])

    weather_data.append({
        'city': city,
        'date_and_time': today,
        'cloud_cover': cloud_cover
    })

new_weather_df = pd.DataFrame(weather_data)
# Convert date column from string to datetime format
new_weather_df['date_and_time'] = pd.to_datetime(new_weather_df['date_and_time'])

print(new_weather_df.dtypes)
new_weather_df.tail(100)

Fetching today cloud cover for Kiruna...
Fetching today cloud cover for Lule√•...
Fetching today cloud cover for Stockholm...
city                     object
date_and_time    datetime64[ns]
cloud_cover               int64
dtype: object


Unnamed: 0,city,date_and_time,cloud_cover
0,Kiruna,2026-01-05 15:51:40,16
1,Lule√•,2026-01-05 15:51:40,72
2,Stockholm,2026-01-05 15:51:40,100


# Step 3: Insert into Feature Groups

Now we push the new observations into the Feature Store. Hopsworks will handle the deduplication based on the primary keys defined in the backfill notebook.

In [14]:
print("Before casting:\n", new_solar_df)
# Clean and cast to correct types for Feature Store compatibility
# Convert numeric columns to float32 (Feature Store expects 'float' not 'double')
df = new_solar_df.copy()
float_cols = ['by_gsm', 'bz_gsm', 'density', 'speed', 'kp_index']
for col in float_cols:
    if col in df.columns:
        df[col] = pd.to_numeric(df[col], errors='coerce').astype('float32')

new_solar_df = df
# check data types of each column
print("After casting:\n", new_solar_df.dtypes)
new_solar_df

Before casting:
          date_and_time    by_gsm    bz_gsm   density       speed  kp_index
0  2026-01-04 14:00:00 -5.493846  1.534615  0.125000  449.731262      0.67
1  2026-01-04 15:00:00 -5.469189  2.812433  3.317255  447.294128      1.33
2  2026-01-04 16:00:00 -2.213256  2.457442  0.177959  450.263275      1.33
3  2026-01-04 17:00:00 -1.449792 -0.416250  0.301489  451.046814      1.33
4  2026-01-04 18:00:00  0.054912 -3.113509  0.284400  450.325989      2.33
5  2026-01-04 19:00:00 -1.798431 -2.726863  0.330000  444.833344      2.33
6  2026-01-04 20:00:00 -5.528409 -2.747727  0.805818  449.685455      2.33
7  2026-01-04 21:00:00 -6.511053 -6.092105  1.607917  466.327087      3.33
8  2026-01-04 22:00:00 -6.376818 -5.423636  0.920000  470.644836      3.33
9  2026-01-04 23:00:00 -1.501875 -2.455000  1.972286  464.174286      3.33
10 2026-01-05 00:00:00  2.485333  4.206000  2.901000  453.239990      3.00
11 2026-01-05 01:00:00 -2.849333 -5.560000  1.682353  467.723541      3.00
12 2026-

Unnamed: 0,date_and_time,by_gsm,bz_gsm,density,speed,kp_index
0,2026-01-04 14:00:00,-5.493846,1.534615,0.125,449.731262,0.67
1,2026-01-04 15:00:00,-5.469189,2.812433,3.317255,447.294128,1.33
2,2026-01-04 16:00:00,-2.213256,2.457442,0.177959,450.263275,1.33
3,2026-01-04 17:00:00,-1.449792,-0.41625,0.301489,451.046814,1.33
4,2026-01-04 18:00:00,0.054912,-3.113509,0.2844,450.325989,2.33
5,2026-01-04 19:00:00,-1.798431,-2.726863,0.33,444.833344,2.33
6,2026-01-04 20:00:00,-5.528409,-2.747727,0.805818,449.685455,2.33
7,2026-01-04 21:00:00,-6.511053,-6.092105,1.607917,466.327087,3.33
8,2026-01-04 22:00:00,-6.376818,-5.423636,0.92,470.644836,3.33
9,2026-01-04 23:00:00,-1.501875,-2.455,1.972286,464.174286,3.33


In [15]:
# Retrieve references to the Feature Groups
solar_wind_fg = fs.get_feature_group(name="solar_wind_fg", version=2)
city_weather_fg = fs.get_feature_group(name="city_weather_fg", version=2)

# Insert new data
# Note: For real-time pipelines, we often use online_enabled=True
# so the data is available for immediate inference.
solar_wind_fg.insert(new_solar_df)
city_weather_fg.insert(new_weather_df)

print("Daily Feature Pipeline execution complete!")

Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 20/20 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: solar_wind_fg_2_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1299605/jobs/named/solar_wind_fg_2_offline_fg_materialization/executions


Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 3/3 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: city_weather_fg_2_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1299605/jobs/named/city_weather_fg_2_offline_fg_materialization/executions
Daily Feature Pipeline execution complete!
