Skip to content

NetCDF4 file growing 50% in size with MPI enabled. #1430

@leuchthelp

Description

@leuchthelp

To report a non-security related issue, please provide:

  • the version of the software with which you are encountering an issue
  • environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
  • a description of the issue with the steps needed to reproduce it

If you have a general question about the software, please view our Suggested Support Process.

Please consider me to be a novice when it comes to using NetCDF4 and all things related.

Version: - installed via spack v0.23.1

compiler: gcc@11.2.0

python@3.11.9
netcdf-c@4.9.2
py-netcdf4@1.7.1
py-h5py@3.12.1
py-mpi4py@4.0.1
hdf5@1.14.5~cxx~fortran+hl~ipo~java~map+mpi+shared+subfiling~szip+threadsafe+tools
openmpi@5.0.5

Both on:

  • Ubuntu 22.04 - 6.6.87.2-microsoft-standard-WSL2
  • Levante - 4.18.0-553.42.1.el8_10.x86_64
  • Additionally verified by a member of the DKRZ not running the exact environment used (i.e different software versions) (can get details if needed)

Any file create via the NetCDF4 python API grows exactly 50% in size (i.e 10->15GB, 20->30GB, 30->45GB, ...).

The code provided here (test.py) can be used to reproduce the issue. Simply enabling MPI via the netCDF4.Dataset(path, "w", format="NETCDF4", parallel=True) option results in a file being 50% larger than intended. Setting the flag the False creates the expected filesize. mpiexec, ,mpirun or -n N do not specifically need to be supplied for this effect to show. Simply running it with python test.py and setting the flag to True is enough to reproduce the issue. A way this can be viewed is by using a tool such as binocle to view the raw binary data.

from mpi4py import MPI
import netCDF4
import numpy as np

def create(path, form, dtype="f8", parallel=False):    
    
    root = netCDF4.Dataset(path, "w", format="NETCDF4", parallel=parallel)  # type: ignore

    root.createGroup("/")
    used = 0
    
    for variable, element in form.items():
        shape = element[0]
        chunks = element[1]
        dimensions = []
        
        for size in shape:
            root.createDimension(f"{used}", size)
            dimensions.append(f"{used}")
            used += 1
        
        if len(chunks) != 0: 
            x = root.createVariable(variable, dtype, dimensions, chunksizes=chunks)
        else: 
            x = root.createVariable(variable, dtype, dimensions)
        
        if parallel == False:
            print(len(np.random.random_sample(shape)))
            x[:] = np.random.random_sample(shape)
        else:
            rank = MPI.COMM_WORLD.rank  # type: ignore
            rsize = MPI.COMM_WORLD.size  # type: ignore
            total_size = shape[0]
            size = int(total_size / rsize)
            
            rstart = rank * size
            rend = rstart + size
            
            print(f"shape: {shape}, chunks: {chunks}, dimensions: {dimensions}, total chunksize: {total_size}, size per rank:{size} rank: {rank}, rsize: {rsize}, rstart: {rstart}, rend: {rend}")
            
            print(len(np.random.random_sample(size)))
            x[rstart:rend] = np.random.random_sample(size)
            MPI.COMM_WORLD.Barrier()  # type: ignore
            print(f"var: {x}, ncattrs after fill: {x.ncattrs()}, as dict: {x.__dict__}")
            

def main():
    
    create(form={"X": [[10 * 134217728], []]}, path="test.nc", parallel=True)

if __name__=="__main__":
    main()

This is an Image obtained from the broken, 50% larger file. This is zoomed out very far, though at the very beginning one would be able to see the header.
Image

This is what the file should look like. A lot less empty space before the data.
Image

Additional output obtained by aforementioned member of the DKRZ:

~/Git/Testprogramme/NetCDF/IO on master ● λ ncdump -h test_false.nc
netcdf test_false {
dimensions:
    \0 = 1342177280 ;
variables:
    double X(\0) ;
}
~/Git/Testprogramme/NetCDF/IO on master ● λ ncdump -h test_true.nc
netcdf test_true {
dimensions:
    \0 = 1342177280 ;
variables:
    double X(\0) ;
}
~/Git/Testprogramme/NetCDF/IO on master ● λ ls -lh test_*
-rw-r--r-- 1 user user 11G Sep 22 14:59 test_false.nc
-rw-r--r-- 1 user user 16G Sep 22 14:59 test_true.nc
~/Git/Testprogramme/NetCDF/IO on master ● λ du -shc test_*
11G    test_false.nc
11G    test_true.nc
21G    total

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions