# Aurora Forecasting - Part 02: Daily Feature Pipeline

üóíÔ∏è This notebook is divided into the following sections:
Initialize Hopsworks connection.

Fetch the latest real-time Solar Wind data from NOAA.

Fetch the latest Cloud Cover forecast for Stockholm, Lule√•, and Kiruna.

Update the Feature Groups in the Hopsworks Feature Store.

# Imports and Login

In [25]:
import pandas as pd
import datetime
import hopsworks
from config import HopsworksSettings
import util
import warnings
warnings.filterwarnings("ignore")
import numpy

# Setup settings
settings = HopsworksSettings()

# Login to Hopsworks
project = hopsworks.login(
    project=settings.HOPSWORKS_PROJECT,
    api_key_value=settings.HOPSWORKS_API_KEY.get_secret_value()
)
fs = project.get_feature_store()

Aurora Project Settings initialized!
2025-12-31 15:55:32,154 INFO: Closing external client and cleaning up certificates.
Connection closed.
2025-12-31 15:55:32,159 INFO: Initializing external client
2025-12-31 15:55:32,160 INFO: Base URL: https://c.app.hopsworks.ai:443






2025-12-31 15:55:34,137 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1299605


# Step 1: Get Real-time Solar Wind Data

We use the NOAA SWPC API to get the most recent measurements from the DSCOVR/ACE satellites. These will serve as the features for our real-time inference.

In [26]:
print("Fetching real-time solar wind data from NOAA...")

# Uses the helper function from util.py to fetch and merge mag/plasma data
new_solar_df = util.get_noaa_realtime_data(
    settings.NOAA_MAG_URL,
    settings.NOAA_PLASMA_URL,
    settings.KP_INDEX_URL
)
new_solar_df = new_solar_df.dropna()

# Rename column to match feature group schema
new_solar_df.rename(columns={'Kp': 'kp_index'}, inplace=True)
new_solar_df.rename(columns={'time_tag': 'time'}, inplace=True)

# Format the time_tag for Hopsworks compatibility
#new_solar_df['time'] = new_solar_df['time'].dt.strftime('%Y-%m-%d %H:%M:%S')

# Drop unecessary columns if any (spoiler, there are)
new_solar_df.drop(columns=['bx_gsm', 'lon_gsm', 'lat_gsm', 'bt', 'temperature', 'a_running', 'station_count'], inplace=True, errors='ignore')

print(f"Successfully retrieved {len(new_solar_df)} new solar wind records.")
new_solar_df

Fetching real-time solar wind data from NOAA...
Magnetometer data:
                 time_tag bx_gsm  by_gsm bz_gsm lon_gsm lat_gsm     bt
0    2025-12-30 14:58:00  -4.76    1.67  -4.81  160.67  -43.63   6.97
1    2025-12-30 14:59:00  -4.60    1.48  -4.51  162.19  -43.00   6.61
2    2025-12-30 15:03:00   7.01    1.23   5.60    9.98   38.18   9.06
3    2025-12-30 15:08:00  -3.36    0.73  -4.66  167.72  -53.57   5.79
4    2025-12-30 15:10:00  -3.21    1.97  -4.58  148.47  -50.57   5.93
...                  ...    ...     ...    ...     ...     ...    ...
1331 2025-12-31 14:44:00   0.03  -11.02   2.45  270.13   12.54  11.29
1332 2025-12-31 14:45:00   0.59  -11.00   2.93  273.05   14.87  11.40
1333 2025-12-31 14:47:00   1.52  -10.59   3.12  278.18   16.25  11.15
1334 2025-12-31 14:48:00   2.23   -9.40   4.48  283.34   24.89  10.65
1335 2025-12-31 14:51:00   5.79   -6.43   4.56  312.01   27.81   9.79

[1336 rows x 7 columns]
Plasma data:
                 time_tag density  speed temperature
0

Unnamed: 0,time,by_gsm,bz_gsm,density,speed,kp_index
0,2025-12-30 14:00:00,1.575,-4.66,8.02,409.22,3.0
1,2025-12-30 15:00:00,0.726,-3.026333,14.660444,418.661702,2.67
2,2025-12-30 16:00:00,2.264667,-2.530167,5.216491,414.421053,2.67
3,2025-12-30 17:00:00,1.533167,0.101333,5.076441,409.232203,2.67
4,2025-12-30 18:00:00,1.500833,0.8105,3.989474,406.161404,1.67
5,2025-12-30 19:00:00,-1.179167,4.048667,4.8635,399.61,1.67
6,2025-12-30 20:00:00,-1.069333,4.7715,5.128136,395.355932,1.67
7,2025-12-30 21:00:00,-0.064576,3.827966,4.513,389.243333,1.0
8,2025-12-30 22:00:00,4.623333,1.672333,2.243818,391.814545,1.0
9,2025-12-30 23:00:00,5.779,2.082333,6.238333,402.276667,1.0


# Step 2: Get Daily Weather Forecast

To decide if the aurora is "Visible," we need the cloud cover forecast for our three target cities.

In [27]:
weather_data = []
today = datetime.date.today().strftime('%Y-%m-%d')

for city, coords in settings.CITIES.items():
    print(f"Fetching cloud cover forecast for {city}...")

    # Get current cloud cover percentage from Open-Meteo
    cloud_cover = util.get_city_weather_forecast(coords['lat'], coords['lon'])

    weather_data.append({
        'city': city,
        'date': today,
        'cloud_cover': cloud_cover
    })

new_weather_df = pd.DataFrame(weather_data)
# Convert date column from string to datetime format
new_weather_df['date'] = pd.to_datetime(new_weather_df['date'])

print(new_weather_df.dtypes)
new_weather_df.tail(100)

Fetching cloud cover forecast for Kiruna...
Fetching cloud cover forecast for Lule√•...
Fetching cloud cover forecast for Stockholm...
city                   object
date           datetime64[ns]
cloud_cover             int64
dtype: object


Unnamed: 0,city,date,cloud_cover
0,Kiruna,2025-12-31,2
1,Lule√•,2025-12-31,20
2,Stockholm,2025-12-31,35


# Step 3: Insert into Feature Groups

Now we push the new observations into the Feature Store. Hopsworks will handle the deduplication based on the primary keys defined in the backfill notebook.

In [28]:
print("Before casting:\n", new_solar_df)
# Clean and cast to correct types for Feature Store compatibility
# Convert numeric columns to float32 (Feature Store expects 'float' not 'double')
df = new_solar_df.copy()
float_cols = ['by_gsm', 'bz_gsm', 'density', 'speed', 'kp_index']
for col in float_cols:
    if col in df.columns:
        df[col] = pd.to_numeric(df[col], errors='coerce').astype('float32')

new_solar_df = df
# check data types of each column
print("After casting:\n", new_solar_df.dtypes)
new_solar_df

Before casting:
                   time    by_gsm    bz_gsm    density       speed kp_index
0  2025-12-30 14:00:00  1.575000 -4.660000   8.020000  409.220000     3.00
1  2025-12-30 15:00:00  0.726000 -3.026333  14.660444  418.661702     2.67
2  2025-12-30 16:00:00  2.264667 -2.530167   5.216491  414.421053     2.67
3  2025-12-30 17:00:00  1.533167  0.101333   5.076441  409.232203     2.67
4  2025-12-30 18:00:00  1.500833  0.810500   3.989474  406.161404     1.67
5  2025-12-30 19:00:00 -1.179167  4.048667   4.863500  399.610000     1.67
6  2025-12-30 20:00:00 -1.069333  4.771500   5.128136  395.355932     1.67
7  2025-12-30 21:00:00 -0.064576  3.827966   4.513000  389.243333     1.00
8  2025-12-30 22:00:00  4.623333  1.672333   2.243818  391.814545     1.00
9  2025-12-30 23:00:00  5.779000  2.082333   6.238333  402.276667     1.00
10 2025-12-31 00:00:00  0.352333  5.024167   6.136167  409.711667     1.67
11 2025-12-31 01:00:00  1.725667  5.099167   5.030678  416.672881     1.67
12 2025-

Unnamed: 0,time,by_gsm,bz_gsm,density,speed,kp_index
0,2025-12-30 14:00:00,1.575,-4.66,8.02,409.220001,3.0
1,2025-12-30 15:00:00,0.726,-3.026333,14.660444,418.661713,2.67
2,2025-12-30 16:00:00,2.264667,-2.530167,5.216491,414.421051,2.67
3,2025-12-30 17:00:00,1.533167,0.101333,5.076441,409.232208,2.67
4,2025-12-30 18:00:00,1.500833,0.8105,3.989474,406.161407,1.67
5,2025-12-30 19:00:00,-1.179167,4.048666,4.8635,399.609985,1.67
6,2025-12-30 20:00:00,-1.069333,4.7715,5.128136,395.355927,1.67
7,2025-12-30 21:00:00,-0.064576,3.827966,4.513,389.243347,1.0
8,2025-12-30 22:00:00,4.623333,1.672333,2.243818,391.814545,1.0
9,2025-12-30 23:00:00,5.779,2.082333,6.238333,402.276672,1.0


In [29]:
# Retrieve references to the Feature Groups
solar_wind_fg = fs.get_feature_group(name="solar_wind_fg", version=1)
city_weather_fg = fs.get_feature_group(name="city_weather_fg", version=1)

# Insert new data
# Note: For real-time pipelines, we often use online_enabled=True
# so the data is available for immediate inference.
solar_wind_fg.insert(new_solar_df)
city_weather_fg.insert(new_weather_df)

print("Daily Feature Pipeline execution complete!")

Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 20/20 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: solar_wind_fg_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1299605/jobs/named/solar_wind_fg_1_offline_fg_materialization/executions


Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 3/3 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: city_weather_fg_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1299605/jobs/named/city_weather_fg_1_offline_fg_materialization/executions
Daily Feature Pipeline execution complete!
