# Aurora Forecasting - Part 03: Training Pipeline

üóíÔ∏è This notebook is divided into the following sections:
Initialize Hopsworks connection and retrieve Feature Groups.

Create a Feature View and Training Dataset.

Train a Random Forest model to predict the Kp index from solar wind features.

Evaluate model performance.

Register the model in the Hopsworks Model Registry.

# Import and setup


In [8]:
import pandas as pd
import hopsworks
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import joblib
import os
from config import HopsworksSettings

# Setup settings
settings = HopsworksSettings()

# Login to Hopsworks
project = hopsworks.login(
    project=settings.HOPSWORKS_PROJECT,
    api_key_value=settings.HOPSWORKS_API_KEY.get_secret_value()
)
fs = project.get_feature_store()

Aurora Project Settings initialized!
2025-12-31 15:56:58,268 INFO: Closing external client and cleaning up certificates.
Connection closed.
2025-12-31 15:56:58,279 INFO: Initializing external client
2025-12-31 15:56:58,281 INFO: Base URL: https://c.app.hopsworks.ai:443
To ensure compatibility please install the latest bug fix release matching the minor version of your backend (4.2) by running 'pip install hopsworks==4.2.*'







2025-12-31 15:56:59,895 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1299605


# Step 1: Create Feature View

The Feature View acts as a metadata layer over our Feature Group, allowing us to select specific features and labels for training. We will use the solar wind parameters (bx_gsm, by_gsm, bz_gsm, density, speed) as features and the kp_index as our target label.

In [9]:
# Get the solar wind feature group
solar_wind_fg = fs.get_feature_group(name="solar_wind_fg", version=1)

# Select features and the label
query = solar_wind_fg.select_all()

# Create or retrieve the Feature View
# Note: Weather data is used for visibility logic in inference,
# while Kp is predicted solely from solar wind data.
feature_view = fs.get_or_create_feature_view(
    name="aurora_kp_view",
    version=1,
    description="Predicting the Kp index from solar wind parameters",
    labels=["kp_index"],
    query=query
)

print("Feature View created/retrieved successfully.")

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1299605/fs/1287235/fv/aurora_kp_view/version/1
Feature View created/retrieved successfully.


# Step 2: Create Training Dataset

We split our historical data into training and testing sets to ensure the model generalizes well to unseen solar wind conditions.

In [11]:
# Create training and test split
# This also registers the split in Hopsworks for reproducibility
X_train, X_test, y_train, y_test = feature_view.train_test_split(
    test_size=0.2,
    description="Aurora Kp prediction training dataset"
)

print(f"Training set size: {len(X_train)}")
print(f"Test set size: {len(X_test)}")
print(X_train.head())
print(y_train.head())

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.91s) 

2025-12-31 15:58:01,125 INFO: Provenance cached data - overwriting last accessed/created training dataset from 1 to 2.
Training set size: 13388
Test set size: 3348
                        time  by_gsm  bz_gsm  density  speed
0  2025-01-23 02:00:00+00:00    -1.4    -2.7      4.0  432.0
1  2024-11-25 19:00:00+00:00    -3.6    -0.4      5.5  395.0
2  2025-06-21 14:00:00+00:00     3.7    -4.3      4.0  523.0
3  2025-02-01 05:00:00+00:00    15.4     0.6      8.8  410.0
5  2024-07-29 01:00:00+00:00     3.4    -0.6      6.5  377.0
   kp_index
0       3.3
1       1.7
2       2.7
3       3.7
5       2.0


In [15]:
X_features = X_train.drop(columns=['time'])
X_test_features = X_test.drop(columns=['time'])

# Step 3: Train the Model

Based on your project description, we are implementing a Random Forest Regressor. This model is well-suited for mapping the complex, non-linear relationships between solar wind plasma parameters and geomagnetic activity.

In [16]:
print("Training Random Forest Regressor...")

# Initialize and train the model
# You can tune hyperparameters like n_estimators and max_depth
rf_model = RandomForestRegressor(
    n_estimators=100,
    max_depth=10,
    random_state=42
)

rf_model.fit(X_features, y_train.values.ravel())

print("Model training complete.")

Training Random Forest Regressor...



Model training complete.


# Step 4: Model Evaluation

We evaluate the model using Mean Squared Error (MSE) and R-squared to determine how accurately it predicts the geomagnetic Kp index.

In [17]:
# Make predictions on the test set
y_pred = rf_model.predict(X_test_features)

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

metrics = {
    "mse": mse,
    "r2": r2
}

print(f"Model MSE: {mse:.4f}")
print(f"Model R2 Score: {r2:.4f}")










Model MSE: 0.5927
Model R2 Score: 0.6963


# Step 5: Register Model to Hopsworks

Once satisfied with the performance, we save the model artifacts and register them in the Hopsworks Model Registry so they can be retrieved by the Batch Inference pipeline.

In [19]:
# Create a local directory for model artifacts
model_dir = "aurora_model"
if not os.path.exists(model_dir):
    os.makedirs(model_dir)

# Save the model artifact
model_path = os.path.join(model_dir, "model.pkl")
joblib.dump(rf_model, model_path)

# Get the Model Registry
mr = project.get_model_registry()

# Create the model entry
aurora_model = mr.python.create_model(
    name=settings.MODEL_NAME, # "aurora_kp_rf_model" from config.py
    metrics=metrics,
    description="Random Forest Regressor for predicting Kp index based on solar wind features.",
    #input_example=X_train.sample(1),
    feature_view=feature_view
)

# Upload the model to the registry
aurora_model.save(model_dir)

print(f"Model '{settings.MODEL_NAME}' version {aurora_model.version} registered successfully.")

  0%|          | 0/6 [00:00<?, ?it/s]

Uploading /Users/nico/Documents/GitHub/ID2223_Project/aurora_model/model.pkl: 0.000%|          | 0/7590721 ela‚Ä¶

Uploading /Users/nico/Documents/GitHub/ID2223_Project/model_schema.json: 0.000%|          | 0/519 elapsed<00:0‚Ä¶

Model created, explore it at https://c.app.hopsworks.ai:443/p/1299605/models/aurora_kp_rf_model/2
Model 'aurora_kp_rf_model' version 2 registered successfully.
