# Tutorial 4: Loading existing `s_cube` objects and export options
## flowTorch workshop 29.09.2025 - 02.10.2025

### Outline
1. Load existing `s_cube`objects
2. Creating a new HDF5 file for each exported fields
3. Exporting data in batches or snapshot-by-snapshot

In this tutorial we will briefly look at different options when exporting the data from $S^3$. This is especially useful when dealing with large datasets, for which $S^3$ was originally designed for. The first steps are the same ass presented in tutorial 1.

**Prerequisites:** Execution of the cylinder2D simulation from tutorial 1.

## 1. Loading existing `s_cube`objects

In [1]:
import sys
import torch as pt

from typing import Union
from stl import mesh
from os.path import join
from os import environ, system

environ["sparseSpatialSampling"] = "../../.."
sys.path.insert(0, environ["sparseSpatialSampling"])

from sparseSpatialSampling.export import ExportData
from sparseSpatialSampling.utils import load_foam_data, load_original_Foam_fields

Refer to the installation instructions at https://github.com/FlowModelingControl/flowtorch


In [2]:
# path to the CFD data and settings
load_path = join("..","..", "..", "run", "tutorials", "tutorial_1")
load_path_cfd = join("..", "..","..", "..", "flow_data", "run", "cylinder_2D_Re100")

# define the path to where we want to save the results and the name of the file
save_path = join("..","..", "..", "run", "tutorials", "tutorial_4")

In [3]:
# load the s_cube object
s_cube = pt.load(join(load_path, "s_cube_cylinder2D_metric_0.75.pt"), weights_only=False)

# load the velocty and pressure field of the simulation
bounds = [[0, 0], [2.2, 0.41]]
field_U, coord, _, write_times = load_foam_data(load_path_cfd, bounds, field_name="U", t_start=8, scalar=False)
field_p, _, _, _ = load_foam_data(load_path_cfd, bounds, t_start=8)

[2025-08-15 11:42:17] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2025-08-15 11:42:17] INFO     Loading precomputed cell centers and volumes from processor1/constant
[2025-08-15 11:42:21] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2025-08-15 11:42:21] INFO     Loading precomputed cell centers and volumes from processor1/constant


## 2. Creating a new file for each field
In tutorial 1, we create a single HFD5 file containing all the data from our simulation. However, especially when dealing with large amounts of data, having a single large HDF5 file may be impractical. Instead, $S^3$ allows us to create a single HDF5 file for each field so we end up with a few but smaller files which may be handled easier. The corresponding field name will be appended to each HDF5 file name.

In [4]:
# instantiate an export object, here we want to create a new HDF5 file for each field
export = ExportData(s_cube, write_new_file_for_each_field=True)

# we have to overwrite the save_path and save_name, since we want to save this in another directory
export.save_dir = save_path
export.save_name = "cylinder2D_Re100_new_file"

# for demonstration purposes we only export a single snapshot
export.write_times = write_times[-1]

# now export the last snapshot of the velocity field
export.export(coord, field_U[:, :, -1].unsqueeze(-1), "U")

# now export the last snapshot of the pressure field into a new file
export.export(coord, field_p[:, -1].unsqueeze(-1).unsqueeze(-1), "p")

[2025-08-15 11:42:23] INFO     Starting interpolation and export of field U.
[2025-08-15 11:42:23] INFO     Writing HDF5 file for field U.
[2025-08-15 11:42:23] INFO     Writing XDMF file for file cylinder2D_Re100_new_file_U.h5
[2025-08-15 11:42:23] INFO     Finished export of field U in 0.101s.
[2025-08-15 11:42:23] INFO     Starting interpolation and export of field p.
[2025-08-15 11:42:23] INFO     Writing HDF5 file for field p.
[2025-08-15 11:42:23] INFO     Writing XDMF file for file cylinder2D_Re100_new_file_p.h5
[2025-08-15 11:42:23] INFO     Finished export of field p in 0.037s.


## 3. Exporting data in batches or snapshot-by-snapshot
So far we always loaded and exported the complete data matrix at once. However, for larger datasets it is very unlikely that all the data will fit into memory at once. To avoid this issue, instead of loading and exporting the complete data matrix at once, we can do it in batches or in case of very large snapshots, even snapshot-by-snapshot.

To make use of this functionality we only have to change the parameter `n_snapshots_total` in the `export()` method to `n_snapshots_total=len(write_times)`. This is required, so that the `export()` method knows how many snapshots it is expecting.

The overall approach can be summarized as followed:
1. Load a certain number of snapshots $N$, where $1 \le N \le N_\mathrm{snapshots}$ and has to be chosen based on the memory requirements
2. Pass them to the `export()`method as before, but pass the additional argument `n_snapshots_total=len(write_times)` (total number of snapshots to export)
3. Continue with *1.* until all snapshots are exported

This procedure will be shown in the following. The function `export_fields_snapshot_wise` below creates an abstraction for easier usage. 

**Note:** The following code will create an HDF5 and XDMF file which can't be opened in ParaView when executed in a Jupyter notebook for some reason. In case you want to use this code productively, you have to copy it into a python script and execute it sepately. Then everything works.

In [5]:
def export_fields_snapshot_wise(load_dir: str, datawriter: ExportData, field_names: Union[str, list], boundaries: list,
                                write_times: Union[str, list], batch_size: int = 25) -> None:
    """
    For each field specified, interpolate all snapshots onto the generated grid and export it to HDF5 & XDMF. The
    interpolation and export of the data is performed snapshot-by-snapshot (batch_size = 1) or in batches to avoid out
    of memory issues for large datasets.

    :param load_dir: path to the simulation data
    :param datawriter: DataWriter class after executing the S^3 algorithm
    :param field_names: names of the fields to export
    :param boundaries: boundaries of the masked area of the domain (needs to be the same as used for loading the
                       vertices and computing the metric)
    :param write_times: the write times of the simulation
    :param batch_size: batch size, number of snapshots which should be interpolated and exported at once
    :return: None
    """
    # make sure the type is correct
    write_times = write_times if isinstance(write_times, list) else [write_times]
    field_names = field_names if isinstance(field_names, list) else [field_names]

    # set the write times in case we haven't done that already
    if datawriter.write_times is None:
        datawriter.write_times = write_times

    # now loop over all fields
    for f in field_names:
        counter = 1

        # compute the required number of batches
        if not len(datawriter.write_times) % batch_size:
            n_batches = int(len(datawriter.write_times) / batch_size)
        else:
            n_batches = int(len(datawriter.write_times) / batch_size) + 1

        # now loop over all batches
        for i in pt.arange(0, len(datawriter.write_times), step=batch_size).tolist():
            print(f"Exporting batch {counter} / {n_batches}")

            # load the required number of snapshots
            coordinates, data = load_original_Foam_fields(load_dir, datawriter.n_dimensions, boundaries, field_names=f,
                                                          write_times=datawriter.write_times[i:i + batch_size])

            # in case the field is not available, the export()-method will return None
            if data is not None:
                # export the current batch
                datawriter.export(coordinates, data, f, n_snapshots_total=len(datawriter.write_times))
            counter += 1

In [6]:
# check how many snapshots we have
print(f"Number of snapshots: {len(write_times)}")

Number of snapshots: 1001


In [7]:
# now we want to export the data for the last 500 snapshots of the velocity field in batches
export = ExportData(s_cube)
export.save_name = "cylinder2D_Re100"
export.save_dir = save_path

# batch_size = 1 would mean we export the data snapshot-by-snapshot. Since our data is very small we choose a larger batch size
export_fields_snapshot_wise(load_path_cfd, export, "U", bounds, write_times[-500:], batch_size=100)

[2025-08-15 11:42:23] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2025-08-15 11:42:23] INFO     Loading precomputed cell centers and volumes from processor1/constant


Exporting batch 1 / 5


[2025-08-15 11:42:23] INFO     Starting interpolation and export of field U.
[2025-08-15 11:42:24] INFO     Writing HDF5 file for field U.
[2025-08-15 11:42:24] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2025-08-15 11:42:24] INFO     Loading precomputed cell centers and volumes from processor1/constant


Exporting batch 2 / 5


[2025-08-15 11:42:25] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2025-08-15 11:42:25] INFO     Loading precomputed cell centers and volumes from processor1/constant


Exporting batch 3 / 5


[2025-08-15 11:42:25] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2025-08-15 11:42:25] INFO     Loading precomputed cell centers and volumes from processor1/constant


Exporting batch 4 / 5


[2025-08-15 11:42:26] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2025-08-15 11:42:26] INFO     Loading precomputed cell centers and volumes from processor1/constant


Exporting batch 5 / 5


[2025-08-15 11:42:27] INFO     Writing XDMF file for file cylinder2D_Re100.h5
[2025-08-15 11:42:27] INFO     Finished export of field U in 4.153s.
