# Example 2 - Training on one data set - evaluating on another

In this example, we use experimental AST data to train the model. Then, we import another dataset, and use the trained PINN to predict electrolyzer performance degradation based on the previously learned weights and biases.

First, we define the project root and append the source folder for importing modules.

In [None]:
import sys
import os

# Get the root project folder
project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))

# Add the src folder to sys.path
src_path = os.path.join(project_root, 'src')
if src_path not in sys.path:
    sys.path.append(src_path)

In [None]:
import yaml
from elec_pinn.data.preprocessing   import Preprocessor     
from elec_pinn.data.loader          import ScalerLoader     
from elec_pinn.cli                  import load_config, get_model
from elec_pinn.utils.visualization  import plot_pinn_performance 


Load the configurations from the config.yaml file.

In [None]:
cfg = load_config("example2_config.yaml")

We load the data into the preproccessor, which fits electrolyzer performance based on the initial part of the dataset - as defined in the config.yaml file

In [None]:
dp = Preprocessor(cfg["data"]["dataset_name"])
df = dp.preprocess(
                    t0 = cfg["data"]["t0"],
                    t1 = cfg["data"]["t1"],
                    plot_fit = True,   # show the performance fit curve
                    plot_raw = False   # show raw data over time
                )

Intiantiate a 'scaler' based on feature and targetnames as well as the scaling range. Then, use the .get_loader() method to convert the input data into Pytorch DataLoaders that are useful when working with neural networks. Here, the dataset is split into train, validation, test, and lastly a combined dataloader, where the combined DataLoader contains the whole dataset.

The training DataLoader is shuffled in time while the validation, testing and all (combined) dataloaders are not shuffled in time. The shuffling was found to enhance prediction accuracy by trial-and-error.

The resulting dataloaders are also normalized to the range specified in the config file.

In [None]:
scaler = ScalerLoader(
                feature_cols=cfg["data"]["feature_names"],
                target_cols=cfg["data"]["target_names"],
                scale_range=tuple(cfg["data"]["scale_range"]) 
                     ).fit(df)

train_loader, val_loader, test_loader, all_loader = scaler.get_loaders(
    df,
    f_train=cfg["data"]["train_frac"],
    f_val=  cfg["data"]["val_frac"],
    f_test=1 - cfg["data"]["train_frac"] - cfg["data"]["val_frac"],
    batch_sizes=tuple(cfg["training"]["batch_sizes"])
)

We then pull the requested PINN version. In this example we use the FullPINN containing all the prediction functionalities. Next, the model is trained using the training and validation loaders.

In [None]:
model   = get_model(cfg)
training_results = model.train_model( train_loader, 
                                      val_loader,
                                      cfg['training']['epochs'],
                                      cfg['training']['save_freq'],
                                      cfg['training']['patience'])

Next, we can use the plot_losses method to inspect the training process and store the results in the example_2 directory.

In [None]:
# we want to evaluate the model performance on the training dataset
save_path = os.path.join(project_root, "examples", "example_2")
model.plot_losses( save_path )


Next, we evaluate model performance to gauge whether the trained model is able to accurately mimic the training and validation data. In this example, a large part of the dataset is used for training and validation, and thus we expect the PINN to accurately model the training & validation dataset. 

In [None]:
result_df = model.evaluate(scaler, df, all_loader, cfg['data']['feature_names'], cfg['data']['target_names'] )

In [None]:
plot_pinn_performance(result_df, cfg['data']['feature_names'], cfg['data']['target_names'], train_frac = cfg["data"]["train_frac"], val_frac = cfg["data"]["val_frac"], save_path = save_path)

At this point we have trained the PINN on a dataset. We then want to evaluate how this specific cell would perform if we gave it another test protocol. For this purpose, we now test the case where the cell is operating based on the solar PV profile. 

The "SolarPV_synthethic_electrolyzer_data.csv" file contains ~6 months of solar PV data normalized to current density values similar to what is contained in the original trainind dataset.
First, we initiate a new data Preprocessor for the new forecasting dataset.

In [None]:
fdp = Preprocessor("SolarPV_synthethic_electrolyzer_data.csv")

In [None]:
fdp.load() # loading the data

Using the previous training datasset ("df"), we can still fit performance data on the new test protocol even though it does not contain any cell voltages. This is done by specifying fitting_df = df.

In [None]:
forecast_df = fdp.fit_performance( t0 = cfg["data"]["t0"],
                    t1 = cfg["data"]["t1"],
                    fitting_df = df )


We inspect that the new forecast_df does indeed contain the performance cell voltage for all time values.

In [None]:
import matplotlib.pyplot as plt

plt.figure()
plt.plot(forecast_df['t'], forecast_df['U_perf'], '.')

In [None]:
# we want to use the same scaler from before, so the data is scaled in an identical way

forecast_loader = scaler.get_inference_loader(
    forecast_df,
    batch_size=cfg["training"]["batch_sizes"][0] )


Now we can run the model.evaluate() method using the new forecast_df and forecast_loader to evaluate the model prediction on the new dataset. 

In [None]:
forecast_result_df = model.evaluate(scaler, forecast_df, forecast_loader, cfg['data']['feature_names'], cfg['data']['target_names'], save_folder = "forecast_plots")

Lastly, we can plot the results

In [None]:
plot_pinn_performance(forecast_result_df, cfg['data']['feature_names'], cfg['data']['target_names'], train_frac = 0.0, val_frac = 0.0, save_path = save_path)