In [2]:
import ML4PS as ml
import numpy as np
from matplotlib import pyplot as plt

# Choose dataset

In [3]:
data_dir = '../../data/case14'

# Defining a series of normalizing functions

The features contained in power systems objects may have very different orders of magnitude and display complex distributions with possibly multiple modes. For this reason, it is very important to build a series of functions that can convert those atypical distributions into something that looks more like a uniform law between $-1$ and $1$.

This normalization process is done by computing a piecewise linear approximation of the cumulative distribution function of each required feature. This estimation is performed by considering a subset of the trainset. This estimation does not need to be perfect, so it seems reasonable to estimate it over a reasonably small amount of samples (see option *amount_of_samples*).

In [4]:
normalizer = ml.Normalizer(data_dir = data_dir, backend_name = 'pandapower')
normalizer.save('normalizer.pkl')

Loading all the dataset: 100%|██████████████████| 50/50 [00:06<00:00,  7.44it/s]


# Defining an interface

In order to have our neural network learn, we have to pass it multiple batches of instances of power grids. The interface has three iterators (train, val and test) that can iterate over the whole dataset defined in *data_dir*, and return inputs $a$ and $x$ that will be read by the neural network, and a series of pypowsybl networks instances.

As some people may be interested in dealing with time series, we have included the possibility of considering rolling windows. The attribute *series_length* defines the time coherence of time series, while *time_window* defines the size of the time windows that the iterator should retrieve. The different snapshots of a time windows are aggregated by concatenating their respective features. By default, we consider that we are not interested in time series.

In [None]:
interface = ml.Interface(data_dir = data_dir,
    backend_name = 'pandapower', batch_size = 1)

In the following we propose to show what our data looks like. Notice that $a$ and $x$ are nested dictionnaries, whose values have a shape [n_batch, n_obj, time_window], where n_obj is the amount of objects of the considered class. $a$ contains addresses (integers), while $x$ contains features (float).

In [None]:
a, x, nets = next(iter(interface.train))
print("keys in a :{}".format(list(a.keys())))
print("keys in a['gen'] :{}".format(list(a['gen'].keys())))
print("values in a['gen']['name'] :{}".format(a['gen']['name']))
print("")
print("keys in x :{}".format(list(x.keys())))
print("keys in x['gen'] :{}".format(list(x['gen'].keys())))
print("values in x['gen']['p_mw'] :{}".format(x['gen']['p_mw']))