NetCDF4 file growing 50% in size with MPI enabled.

To report a non-security related issue, please provide:

* the version of the software with which you are encountering an issue
* environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
* a description of the issue with the steps needed to reproduce it

If you have a general question about the software, please view our [Suggested Support Process](https://www.unidata.ucar.edu/support/#process).

Please consider me to be a novice when it comes to using NetCDF4 and all things related.

Version: - installed via spack v0.23.1

```
compiler: gcc@11.2.0

python@3.11.9
netcdf-c@4.9.2
py-netcdf4@1.7.1
py-h5py@3.12.1
py-mpi4py@4.0.1
hdf5@1.14.5~cxx~fortran+hl~ipo~java~map+mpi+shared+subfiling~szip+threadsafe+tools
openmpi@5.0.5
```

Both on:
- Ubuntu 22.04 - 6.6.87.2-microsoft-standard-WSL2
- [Levante](https://www.dkrz.de/en/systems/hpc) - 4.18.0-553.42.1.el8_10.x86_64
- Additionally verified by a member of the DKRZ not running the exact environment used (i.e different software versions) (can get details if needed)

Any file create via the NetCDF4 python API grows exactly 50% in size (i.e 10->15GB, 20->30GB, 30->45GB, ...). 

The code provided here ([test.py](https://github.com/user-attachments/files/22598553/test.py)) can be used to reproduce the issue. Simply enabling MPI via the `netCDF4.Dataset(path, "w", format="NETCDF4", parallel=True)` option results in a file being 50% larger than intended. Setting the flag the `False` creates the expected filesize. `mpiexec`, `,mpirun` or `-n N` do not specifically need to be supplied for this effect to show. Simply running it with `python test.py` and setting the flag to True is enough to reproduce the issue. A way this can be viewed is by using a tool such as [binocle](https://github.com/sharkdp/binocle) to view the raw binary data. 

```
from mpi4py import MPI
import netCDF4
import numpy as np

def create(path, form, dtype="f8", parallel=False):    
    
    root = netCDF4.Dataset(path, "w", format="NETCDF4", parallel=parallel)  # type: ignore

    root.createGroup("/")
    used = 0
    
    for variable, element in form.items():
        shape = element[0]
        chunks = element[1]
        dimensions = []
        
        for size in shape:
            root.createDimension(f"{used}", size)
            dimensions.append(f"{used}")
            used += 1
        
        if len(chunks) != 0: 
            x = root.createVariable(variable, dtype, dimensions, chunksizes=chunks)
        else: 
            x = root.createVariable(variable, dtype, dimensions)
        
        if parallel == False:
            print(len(np.random.random_sample(shape)))
            x[:] = np.random.random_sample(shape)
        else:
            rank = MPI.COMM_WORLD.rank  # type: ignore
            rsize = MPI.COMM_WORLD.size  # type: ignore
            total_size = shape[0]
            size = int(total_size / rsize)
            
            rstart = rank * size
            rend = rstart + size
            
            print(f"shape: {shape}, chunks: {chunks}, dimensions: {dimensions}, total chunksize: {total_size}, size per rank:{size} rank: {rank}, rsize: {rsize}, rstart: {rstart}, rend: {rend}")
            
            print(len(np.random.random_sample(size)))
            x[rstart:rend] = np.random.random_sample(size)
            MPI.COMM_WORLD.Barrier()  # type: ignore
            print(f"var: {x}, ncattrs after fill: {x.ncattrs()}, as dict: {x.__dict__}")
            

def main():
    
    create(form={"X": [[10 * 134217728], []]}, path="test.nc", parallel=True)

if __name__=="__main__":
    main()
```

This is an Image obtained from the broken, 50% larger file. This is zoomed out very far, though at the very beginning one would be able to see the header.
<img width="1385" height="1065" alt="Image" src="https://github.com/user-attachments/assets/c10f6168-1d1d-4493-b0bd-990837c8f9ee" />

This is what the file should look like. A lot less empty space before the data.
<img width="1080" height="1068" alt="Image" src="https://github.com/user-attachments/assets/374d6dfa-76eb-462e-bba1-2736c8a29008" />


Additional output obtained by aforementioned  member of the DKRZ:
```
~/Git/Testprogramme/NetCDF/IO on master ● λ ncdump -h test_false.nc
netcdf test_false {
dimensions:
    \0 = 1342177280 ;
variables:
    double X(\0) ;
}
~/Git/Testprogramme/NetCDF/IO on master ● λ ncdump -h test_true.nc
netcdf test_true {
dimensions:
    \0 = 1342177280 ;
variables:
    double X(\0) ;
}
```

```
~/Git/Testprogramme/NetCDF/IO on master ● λ ls -lh test_*
-rw-r--r-- 1 user user 11G Sep 22 14:59 test_false.nc
-rw-r--r-- 1 user user 16G Sep 22 14:59 test_true.nc
~/Git/Testprogramme/NetCDF/IO on master ● λ du -shc test_*
11G    test_false.nc
11G    test_true.nc
21G    total
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NetCDF4 file growing 50% in size with MPI enabled. #1430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NetCDF4 file growing 50% in size with MPI enabled. #1430

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions