Data loading

"I disagree strongly with whatever work this quote is attached to." -- Randall Munroe

One can argue that loading data is the most important part of a postprocessing tool. In Postgkyl, it is handled by the class (there is a postgkyl.Data shortcut). It load data on initialization and serves as an input for all the other parts of Postgkyl.


Examples are provided simultaneously for scripting and command line using output files of an electrostatic two-stream instability simulation [:doc:`two-stream.lua<input/two-stream>`].

Gkeyll files are loaded in Postgkyl by creating a new instance of the Data class with the file name as the parameter.

import postgkyl as pg
data = pg.Data('filename')

Next, getGrid() and getValues() can be used to return the grid and values as NumPy arrays. For structured meshes, the getGrid() return a Python list of 1D NumPy arrays which represent the nodal points of the grid in each dimension. Note that since these are nodal points, these arrays will always have one more cell in each dimension in comparison to the value array. Another important note is that the value array always have one extra dimension for components. Components can represent many things from vector elements to discontinuous Galerkin expansion coefficients. As a rule, this extra dimension is always retained even if there is just one component.

Script example
import postgkyl as pg
data = pg.Data('two-stream_elc_0.bp')
It is also possible to create an empty instance and fill it using the push function.

In the command line mode, a data file is loaded by simply adding it to the pgkyl script chain at any position.

pgkyl filename


Under the hood, Postgkyl calls a hidden load command to load the file. When provided string does not match any command but is matching a file, the load command is invoked and the file name is passed to it. The load command should not be called manually but it can be used to access the help.

pgkyl load --help

Currently, Postgkyl supports h5 file that were used in Gkeyll 1, Gkeyll 2 ADIOS bp files, and Gkeyll 0 gkyl binary files. Many of the advanced functions like loading only partial data and some quality of life features like storing the polynomial order of DG representation are currently available only for the ADIOS bp files.

Loading multiple files in a script is straightforward; one creates more instances of the Data class. Postgkyl does naturally support loading any number of files.

pgkyl two-stream_elc_0.bp two-stream_elc_1.bp

All the commands are then generally batch performed on all the data sets and the :ref:`pg_cmd_plot` command creates a separate figure for each data set (this can be modified with :ref:`pg_cmd_plot` options like -f0).

When batch application of commands is not the desired behavior, some data files can be loaded later in the chain, loaded dataset can be changed from active to inactive (:ref:`pg_cmd_activate`/:ref:`pg_cmd_deactivate`), or the command scope can be limmited by specifying :ref:`tags <pg_keyconcepts_tags>`. The :ref:`pg_keyconcepts` section provides examples where one desired behavior is achieved in multiple ways. It is left up to the user to chose the preferred one.

Postgkyl also allows for loading with a wild card characters:

pgkyl 'two-stream*.bp'


While the quotes are entirely optional when loading a single file, they change behavior when used with wild card characters. With quotes, a single load command is performed and the wild card matching is done internally by Postgkyl. Without quotes, the wild card is replaced before calling Postgkyl which results in several load command calls. This leads to several key differences:

  1. With quotes, Postgkyl orders files correctly, i.e., file_2 will be before file_10.
  2. With quotes, tags, labels, etc., are applied to all the matching files, not just the last one.
  3. Some wildcard characters like [0-9] are not supported by every shell.

Using wild card characters might lead to unexpected situations. For example in the two-stream case, the query two-stream_elc_* is going to return two-stream_elc_0.bp but also the moment files like two-stream_elc_M0_0.bp. If we want to load just the distribution functions, we can limit the query. For example:

pgkyl 'two-stream_elc_[0-9]*.bp'

This requires the first character to be a number between 0 and 9, which effectively eliminates all the outputs except for the distribution functions themselves.

Following are details on load parameters which alter the behavior. Here, we would like to mention that these can be specified individually for each file of as the global options of the pgkyl script itself. For example, the partial loading flag --z0 (see bellow) can be applied to one file (file_0):

pgkyl file_0 --z0 0 file_1

Or it can be applied globally to all the files:

pgkyl --z0 0 file_0 file_1

This is analogous to:

pgkyl file_0 --z0 0 file_1 --z0 0

Gkeyll output files, especially the higher dimensional ones, can be large. Therefore, Postgkyl allows to load just a smaller subsection of each file. This is done with the optional z0 to z5 parameters for coordinates and comp for components. Each can be either an integer number or a string in the form of start:end. Note that this does follow the Python convention so the last index is excluded, i.e., 1:5 will load only the indices/components 1, 2, 3, and 4. This functionality is supported both in the script mode and the command line mode.

import postgkyl as pg
data = pg.Data('two-stream_elc_0.bp', z1='1:3', comp=0)
pgkyl two-stream_elc_0.bp --z1 1:3 -c 0

Note that the :ref:`pg_cmd_select` command has a similar use. In addition, it allows to specify a coordinate value instead of an index. However, it requires the whole file to be loaded into memory.

Datasets can be decorated with tags and labels. The former serve mostly to specify the scope of commands (see :ref:`tags <pg_keyconcepts_tags>`) in the command line mode while the later one allows to add custom labels for plots and print-outs.

When no labels are specified, Postgkyl attempts to find the shortest unique identifier and uses it as a label. For example:

pgkyl two-stream_elc_0.bp two-stream_elc_1.bp info -c
0 (default#0)
1 (default#1)
pgkyl two-stream_elc_0.bp two-stream_field_0.bp info -c
elc (default#0)
field (default#1)
pgkyl two-stream_elc_0.bp two-stream_field_1.bp info -c
elc_0 (default#0)
field_1 (default#1)

These labels, can be customized and can include LaTeX syntax, which will be properly rendered in a plot legend.

pgkyl two-stream_elc_0.bp -l '$t\omega_{pe}=0$' two-stream_elc_1.bp -l '$t\omega_{pe}=0.5$' info -c
$t\omega_{pe}=0$ (default#0)
$t\omega_{pe}=0.5$ (default#1)

Note, the in all these examples, both datasets have the default tag and are indexed 0 and 1. These can be manually specified.

pgkyl two-stream_elc_0.bp -t 'el' two-stream_field_0.bp -t 'em' info -c
elc (el#0)
field (em#0)


This feature was introduced in 1.6.7 and currently only works with gkyl binary files.

Postgkyl supports the c2p mapping used in Gkeyll. The file with the map can be specified using the --c2p keyword. Following are two plots where a Maxwellian particle distribution is evaluated in cylindrical coordinates with and without c2p map provided to Postgkyl.

pgkyl rt_eval_on_nodes_f-ser.gkyl interpolate -b ms -p2 plot -a

Plot of Maxwellian distribution in cylindrical coordinates without a c2p map.

pgkyl rt_eval_on_nodes_f-ser.gkyl --c2p rt_eval_on_nodes_rtheta-ten.gkyl interpolate -b ms -p2 plot -a

Plot of Maxwellian distribution in cylindrical coordinates with a c2p map provided with --c2p.

Gkeyll stores the c2p coordinate information as expansion coefficients of a finite element representation independent of the representation of the data itself. It is converted to plotting nodal points during the :ref:`pg_cmd_interpolate` command when the information about the data is provided. However, the :ref:`pg_cmd_interpolate` command is never used when working with finite-volume data. For this instance, the --fv flag is available which converts the expansion coefficients to nodal values immediately after loading.

pgkyl euler_axis_sodshock-euler_0.gkyl --c2p euler_axis_sodshock-mapc2p.gkyl --fv select -c0 plot -a

Plot of finite-volume data with --c2p provided and the --fv flag on.