# Aurora Forecasting - Part 02: Daily Feature Pipeline

üóíÔ∏è This notebook is divided into the following sections:
Initialize Hopsworks connection.

Fetch the latest real-time Solar Wind data from NOAA.

Fetch the latest Cloud Cover forecast for Stockholm, Lule√•, and Kiruna.

Update the Feature Groups in the Hopsworks Feature Store.

# Imports and Login

In [9]:
import pandas as pd
import datetime
import hopsworks
from config import HopsworksSettings
import util
import warnings
warnings.filterwarnings("ignore")
import numpy

# Setup settings
settings = HopsworksSettings()

# Login to Hopsworks
project = hopsworks.login(
    project=settings.HOPSWORKS_PROJECT,
    api_key_value=settings.HOPSWORKS_API_KEY.get_secret_value()
)
fs = project.get_feature_store()

Aurora Project Settings initialized!
2025-12-30 23:53:13,972 INFO: Closing external client and cleaning up certificates.
Connection closed.
2025-12-30 23:53:13,977 INFO: Initializing external client
2025-12-30 23:53:13,977 INFO: Base URL: https://c.app.hopsworks.ai:443






2025-12-30 23:53:15,485 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1299605


# Step 1: Get Real-time Solar Wind Data

We use the NOAA SWPC API to get the most recent measurements from the DSCOVR/ACE satellites. These will serve as the features for our real-time inference.

In [2]:
print("Fetching real-time solar wind data from NOAA...")

# Uses the helper function from util.py to fetch and merge mag/plasma data
new_solar_df = util.get_noaa_realtime_data(
    settings.NOAA_MAG_URL,
    settings.NOAA_PLASMA_URL,
    settings.KP_INDEX_URL
)
new_solar_df = new_solar_df.dropna()

# Rename column to match feature group schema
new_solar_df.rename(columns={'Kp': 'kp_index'}, inplace=True)
new_solar_df.rename(columns={'time_tag': 'time'}, inplace=True)

# Format the time_tag for Hopsworks compatibility
new_solar_df['time'] = new_solar_df['time'].dt.strftime('%Y-%m-%d %H:%M:%S')

# Drop unecessary columns if any (spoiler, there are)
new_solar_df.drop(columns=['bx_gsm', 'lon_gsm', 'lat_gsm', 'bt', 'temperature', 'a_running', 'station_count'], inplace=True, errors='ignore')

print(f"Successfully retrieved {len(new_solar_df)} new solar wind records.")
new_solar_df

Fetching real-time solar wind data from NOAA...
Magnetometer data:
                 time_tag bx_gsm by_gsm bz_gsm lon_gsm lat_gsm    bt
0    2025-12-29 22:48:00  -5.32   0.00   3.16  180.02   30.72  6.19
1    2025-12-29 22:49:00  -4.45   0.69   4.12  171.17   42.49  6.11
2    2025-12-29 22:51:00  -3.73  -0.86   2.76  192.94   35.80  4.72
3    2025-12-29 22:52:00  -5.47  -1.60   2.40  196.31   22.87  6.18
4    2025-12-29 22:53:00  -5.51  -1.05   2.80  190.75   26.57  6.27
...                  ...    ...    ...    ...     ...     ...   ...
1126 2025-12-30 22:41:00  -1.49   8.20   1.71  100.32   11.60  8.51
1127 2025-12-30 22:42:00  -1.00   8.12   1.99   97.03   13.67  8.42
1128 2025-12-30 22:43:00  -0.80   7.64   2.07   95.98   15.06  7.95
1129 2025-12-30 22:44:00  -0.26   7.42   2.49   92.04   18.57  7.83
1130 2025-12-30 22:45:00  -1.20   7.46   1.31   99.15    9.80  7.67

[1131 rows x 7 columns]
Plasma data:
                time_tag density  speed temperature
0   2025-12-29 22:48:00   

Unnamed: 0,time,by_gsm,bz_gsm,density,speed,kp_index
0,2025-12-29 22:00:00,-0.224444,2.136667,4.682,390.555556,1.67
1,2025-12-29 23:00:00,-0.939778,3.821111,4.39875,395.233333,1.67
2,2025-12-30 00:00:00,1.200408,0.348776,2.298958,391.643182,1.67
3,2025-12-30 01:00:00,1.225306,-0.715306,1.907708,390.526667,1.67
4,2025-12-30 02:00:00,1.128519,-0.257037,2.804082,391.743182,1.67
5,2025-12-30 03:00:00,-1.291852,-2.908704,3.056087,389.627907,2.0
6,2025-12-30 04:00:00,0.077556,-1.905556,2.780513,394.665714,2.0
7,2025-12-30 05:00:00,0.111111,-1.842667,2.772973,395.090909,2.0
8,2025-12-30 06:00:00,0.866905,-1.950714,2.335135,394.947059,2.0
9,2025-12-30 07:00:00,3.382857,-0.011786,2.271111,396.868,2.0


# Step 2: Get Daily Weather Forecast

To decide if the aurora is "Visible," we need the cloud cover forecast for our three target cities.

In [3]:
weather_data = []
today = datetime.date.today().strftime('%Y-%m-%d')

for city, coords in settings.CITIES.items():
    print(f"Fetching cloud cover forecast for {city}...")

    # Get current cloud cover percentage from Open-Meteo
    cloud_cover = util.get_city_weather_forecast(coords['lat'], coords['lon'])

    weather_data.append({
        'city': city,
        'date': today,
        'cloud_cover': cloud_cover
    })

new_weather_df = pd.DataFrame(weather_data)
new_weather_df

Fetching cloud cover forecast for Kiruna...
Fetching cloud cover forecast for Lule√•...
Fetching cloud cover forecast for Stockholm...


Unnamed: 0,city,date,cloud_cover
0,Kiruna,2025-12-30,46
1,Lule√•,2025-12-30,68
2,Stockholm,2025-12-30,0


# Step 3: Insert into Feature Groups

Now we push the new observations into the Feature Store. Hopsworks will handle the deduplication based on the primary keys defined in the backfill notebook.

In [None]:
print("Before casting:\n", new_solar_df)
# Clean and cast to correct types for Feature Store compatibility
# Convert numeric columns to float32 (Feature Store expects 'float' not 'double')
df = new_solar_df.copy()
float_cols = ['by_gsm', 'bz_gsm', 'density', 'speed', 'kp_index']
for col in float_cols:
    if col in df.columns:
        df[col] = pd.to_numeric(df[col], errors='coerce').astype('float32')

new_solar_df = df
# check data types of each column
print("After casting:\n", new_solar_df.dtypes)
new_solar_df

Before casting:
                    time    by_gsm    bz_gsm    density       speed  kp_index
0   2025-12-29 22:00:00 -0.224444  2.136667   4.682000  390.555542         2
1   2025-12-29 23:00:00 -0.939778  3.821111   4.398750  395.233337         2
2   2025-12-30 00:00:00  1.200408  0.348776   2.298958  391.643188         2
3   2025-12-30 01:00:00  1.225306 -0.715306   1.907708  390.526672         2
4   2025-12-30 02:00:00  1.128518 -0.257037   2.804082  391.743195         2
5   2025-12-30 03:00:00 -1.291852 -2.908704   3.056087  389.627899         2
6   2025-12-30 04:00:00  0.077556 -1.905556   2.780513  394.665710         2
7   2025-12-30 05:00:00  0.111111 -1.842667   2.772973  395.090912         2
8   2025-12-30 06:00:00  0.866905 -1.950714   2.335135  394.947052         2
9   2025-12-30 07:00:00  3.382857 -0.011786   2.271111  396.868011         2
10  2025-12-30 08:00:00 -2.065714 -0.662857   3.583600  384.115997         2
11  2025-12-30 09:00:00  1.012222  0.081556   3.044706  398

Unnamed: 0,time,by_gsm,bz_gsm,density,speed,kp_index
0,2025-12-29 22:00:00,-0.224444,2.136667,4.682,390.555542,2
1,2025-12-29 23:00:00,-0.939778,3.821111,4.39875,395.233337,2
2,2025-12-30 00:00:00,1.200408,0.348776,2.298958,391.643188,2
3,2025-12-30 01:00:00,1.225306,-0.715306,1.907708,390.526672,2
4,2025-12-30 02:00:00,1.128518,-0.257037,2.804082,391.743195,2
5,2025-12-30 03:00:00,-1.291852,-2.908704,3.056087,389.627899,2
6,2025-12-30 04:00:00,0.077556,-1.905556,2.780513,394.66571,2
7,2025-12-30 05:00:00,0.111111,-1.842667,2.772973,395.090912,2
8,2025-12-30 06:00:00,0.866905,-1.950714,2.335135,394.947052,2
9,2025-12-30 07:00:00,3.382857,-0.011786,2.271111,396.868011,2


In [11]:
# Retrieve references to the Feature Groups
solar_wind_fg = fs.get_feature_group(name="solar_wind_fg", version=1)
city_weather_fg = fs.get_feature_group(name="city_weather_fg", version=1)

# Insert new data
# Note: For real-time pipelines, we often use online_enabled=True
# so the data is available for immediate inference.
solar_wind_fg.insert(new_solar_df)
city_weather_fg.insert(new_weather_df)

print("Daily Feature Pipeline execution complete!")

Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 18/18 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: solar_wind_fg_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1299605/jobs/named/solar_wind_fg_1_offline_fg_materialization/executions


Uploading Dataframe: 100.00% |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| Rows 3/3 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: city_weather_fg_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1299605/jobs/named/city_weather_fg_1_offline_fg_materialization/executions
Daily Feature Pipeline execution complete!
