Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single precision ParaView output #658

Open
Thomas-Ulrich opened this issue Aug 26, 2022 · 3 comments
Open

Single precision ParaView output #658

Thomas-Ulrich opened this issue Aug 26, 2022 · 3 comments

Comments

@Thomas-Ulrich
Copy link
Contributor

In #649, @sebwolf-de improved checkpointing for single precision by allowing to write the checkpointed data either in double or in float precision (introducing HDF_C_REAL in the hdf5 writer).
We could apply the same recipe and write ParaView output in single or double precision.
This would nevertheless require some change in the xdmfwriter.
(Overall, I would always save storage and write output in single precision).

@Thomas-Ulrich
Copy link
Contributor Author

I double-checked and the ParaView outputs are written in the datatype used in the simulation.
That is single precision simulations get written in float, double in double.
(And that means to write the dataset in float when computing with double, we would only require a cast and changing the type of the xdmfwriter template).

But for datasets with a limited number of cells, e.g. fault or surface output, the output size will not be decreased by a factor of 2 in single precision, because of the large alignment and block size used.
On NG, we have a disc block size of:

di73yeq4@login03:/hppfs/work/pr45fi/di73yeq4/Examples/tpv12_13> stat -fc %s .
16777216

which might have motivated:

export XDMFWRITER_ALIGNMENT=8388608
export XDMFWRITER_BLOCK_SIZE=8388608

Looking at the surface output of the latest Turkey simulation (double precision) we can see the used block size leads to 1.3 larger datafile than expected

      <DataItem NumberType="UInt" Precision="4" Format="XML" Dimensions="3 2">0 0 1 1 1 2466245</DataItem>
      <DataItem NumberType="Float" Precision="8" Format="Binary" Dimensions="1 3145728">Turkey_ext4_o6_el_ev1-surface_cell/mesh0/v1.bin</DataItem>

(3145728*8= 3 * XDMFWRITER_BLOCK_SIZE), 3=ceil(2466245 * 8/XDMFWRITER_BLOCK_SIZE).

If we were writing in single precision, the dataset would be written on n = ceil(2466245*4/8388608)=2 blocks, that is the output would only take 66% the size of the double output ( and not 50%). This shows the limits of writing the output in float with large blocks, and explains the potential gain of rewriting the output sequentially.

Note that on Frontera the disc block size is much smaller:

ulrich@login2:/scratch1/09160/ulrich$ stat -fc %s .
4096

I wonder if anyone has ever tried decreasing XDMFWRITER_BLOCK_SIZE in this machine.

@sebwolf-de
Copy link
Contributor

I think, nobody has tried to tweak XDMFWRITER_BLOCK_SIZE. The 8388608 is just a magic number (although motivated) that everybody copies around.
Maybe, we can write some documentation of how to choose the optimal block size.

@krenzland
Copy link
Contributor

Note that on Frontera the disc block size is much smaller:

ulrich@login2:/scratch1/09160/ulrich$ stat -fc %s .
4096

I think this is because Frontera uses a different file system:
https://frontera-portal.tacc.utexas.edu/user-guide/files/#striping-large-files

Not sure how to tune the writers for this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants