# Tutorial 5: Using the $\mathbf{S^3}$ `Dataloader` and `Datawriter` classes to load and write arbitrary data to HDF5

### Outline
1. Using the $\mathbf{S^3}$ `Dataloader` to load $\mathbf{S^3}$ data
2. Using the $\mathbf{S^3}$ `Datawriter` to write arbitrary data to HDF5

In this tutorial, we will briefly explain the usage of the $S^3$ `Dataloader` and `Datawriter` to load data generated by $S^3$ and write arbitrary data for an $S^3$ grid.

As in the tutorials before, we first import the required modules and set all the paths:

In [1]:
import sys

from os.path import join
from os import environ, makedirs
from os.path import exists

environ["sparseSpatialSampling"] = join("..", "..", "..")
sys.path.insert(0, environ["sparseSpatialSampling"])

from sparseSpatialSampling.utils import Datawriter, Dataloader, compute_svd

Refer to the installation instructions at https://github.com/FlowModelingControl/flowtorch


In [2]:
# path to the CFD data and settings, assuming they are in the top-level of the repository
load_path = join("..", "..", "..", "run", "tutorials", "tutorial_1")

# define the path to where we want to save the results and the name of the file
save_path = join("..","..", "..", "run", "tutorials", "tutorial_5")

# name of the HDF5 file
file_name = "cylinder2D_metric_0.75"

# create directory
if not exists(save_path):
    makedirs(save_path)

## 1. Using the $\mathbf{S^3}$ `Dataloader` to load $\mathbf{S^3}$ data

We now want to use the $S^3$ `Dataloader` to access the data within the HDF5 file of [tutorial 1](tutorial1_cylinder2D_Re100.ipynb). Therefore, we instantiate the `Dataloader` as follows:

In [3]:
# instantiate a dataloader to load Scube data
dataloader = Dataloader(load_path, f"{file_name}.h5")

# print some infos about the contents of the HDF5 file
print("Available fields:", dataloader.field_names[dataloader.write_times[0]])
print("Grid size:", dataloader.vertices.size())

# we can also load the metric field used to generate the grid and the cell area (or volume)
print(dataloader.metric.size(), dataloader.weights.size())

Available fields: ['p']
Grid size: torch.Size([3734, 2])
torch.Size([3734]) torch.Size([3734])


We can access e.g. available write times, field names, the metric field and the cell volumes (`weights`) directly via the properties of the `dataloader` instance. We can also load snapshots using the `load_snapshot()`method for further post-processing. In the following step, we will compute an SVD of the velocity fields as an illustrative example:

In [4]:
# we want to load all available snapshots starting at t = 4s
t_start, t_end = 4, 5
write_times = sorted([t for t in dataloader.write_times if (t_start <= float(t) < t_end)], key=lambda x: float(x))

# load the velocity field for the specified write times
field = dataloader.load_snapshot("U", write_times)

# perform an SVD weighted with cell areas (dataloader.weights) as an example computation
s, U, V = compute_svd(field, dataloader.weights)
print(field.size(), s.shape, U.shape, V.shape)

torch.Size([3734, 2, 301]) torch.Size([124]) torch.Size([3734, 2, 124]) torch.Size([301, 124])


You can find the full documentation of the `Dataloader` in the [Python API](https://sparsespatialsampling.readthedocs.io/en/latest/sparseSpatialSampling.data.html).

## 2. Using the $S^3$ `Datawriter` to write arbitrary data to HDF5
In the previous tutorials, we used the `export()` method of the `ExportData` class to interpolate and export time-dependent field data to HDF5. However, the results of our SVD (or whatever post-processing you do) are not time-dependent. We could just dump it into a pickle file or similar, but then we wouldn't be able to visualize the POD modes, for example in `ParaView`. To solve this issue, we can use the $S^3$ `Datawriter` class to write any data into an HDF5 file.

The `write_data()` method of the `Datawriter` class can write three kinds of groups of data:
- `group="grid"` writes the $S^3$ grid stored in the `Dataloader`
- `group="constant"` writes constant data (static in time)
- `group="data"` writes time-dependent data

To visualize data later, we have to write a grid. We can do  it either manually, or use the `write_grid()` method for convenience, as shown below. Next, we write our results from the SVD. Since this data is static in time, we use the group `constant`. This data will be written into the first time step, in case we also write time-dependent data. Lastly, we want to write five snapshots of the velocity field to the group `data`, since the velocity field is time-dependent.

**Note 1:** To visualize data, e.g., in ParaView, the fields written to the HDF5 files have to be of the dimension of the grid generated by $S^3$. All data which aren't of the grid dimensions are not possible to visualize in ParaView.

**Note 2:** The `Datawriter` is not performing any kind of interpolation of data onto the $S^3$ grid (in contrast to the `ExportData` class).

In [5]:
# write the data to HDF5 & XDMF
datawriter = Datawriter(save_path, f"{file_name}_svd.h5")

# write the grid, for convenience we use the write_grid() method:
datawriter.write_grid(dataloader)

# we could also write the grid as:
# datawriter.n_cells = dataloader.vertices.shape[0]
# datawriter.write_data("centers", group="grid", data=dataloader.vertices)
# datawriter.write_data("vertices", group="grid", data=dataloader.nodes)
# datawriter.write_data("faces", group="grid", data=dataloader.faces)

# set the max. number of modes to write, here we want to write all available modes
n_modes = U.size(-1)

# write the modes as vectors, where each mode is treated as an independent vector.
# Since they have the dimensions of the grid we can visualize them in ParaView. However, they are not time-dependent, so we write them in group constant
for i in range(n_modes):
    if len(U.size()) == 2:
        datawriter.write_data(f"mode_{i + 1}", group="constant", data=U[:, i].squeeze())
    else:
        datawriter.write_data(f"mode_{i + 1}", group="constant", data=U[:, :, i].squeeze())

# write the remaining data. This data is not referenced in the XDMF file, since the size doesn't match the dimensions of the grid
datawriter.write_data("V", group="constant", data=V)
datawriter.write_data("s", group="constant", data=s)
datawriter.write_data("cell_area", group="constant", data=dataloader.weights)

# we can also write time-dependent data, for example the first 5 snapshots of the flow field.
# Therefore, we have to write it in the group data. We can only write a single time step per call, hence,
# for exporting temporal data the export() method is recommended to use
for i, t in enumerate(write_times[:5]):
    print(f"Writing time step t = {t} s.")
    datawriter.write_data("U", group="data", data=field[:, :, i], time_step=t)

# it is important to close the datawriter. Otherwise, we can't execute this cell multiple times, since jupyter is caching the file handler
datawriter.close()

# write XDMF file, so we can open it up in ParaView
datawriter.write_xdmf_file()

[2026-02-23 08:54:54] INFO     Writing XDMF file for file cylinder2D_metric_0.75_svd.h5


Writing time step t = 4.000 s.
Writing time step t = 4.001 s.
Writing time step t = 4.002 s.
Writing time step t = 4.003 s.
Writing time step t = 4.004 s.


We can now open up the `XDMF` file in `ParaView` and visualize the field data. This concludes tutorial 5.