Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMOR failure when attempting to write large volume of data #706

Closed
matthew-mizielinski opened this issue Sep 12, 2023 · 4 comments · Fixed by #708
Closed

CMOR failure when attempting to write large volume of data #706

matthew-mizielinski opened this issue Sep 12, 2023 · 4 comments · Fixed by #708

Comments

@matthew-mizielinski
Copy link

I've been trying to write some large chunks of data for the variable cl in CMIP6 table CFday for the model HadGEM3-GC31-HH (N512 atmosphere). The shape of the array I'm trying to write in one go is (T, 85, 768,1024), where T is one of (30, 45, 60, 90).

When T=30, everything works fine, When T=60 I get errors from CMOR of the form (and similar for T=45)

C Traceback:
! In function: cmor_write_var_to_file
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: cannot allocate memory for -284164096 float tmp elts var 'cl' (table: CFday)
!
!!!!!!!!!!!!!!!!!!!!!!!!!


C Traceback:
! In function: cmor_write_var_to_file
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: NetCDF Error (-101: NetCDF: HDF error), writing variable 'cl' (table CFday) to file
!
!!!!!!!!!!!!!!!!!!!!!!!!!

When I try with T=90 I'm seeing segfaults (signal 11) and no messages in the cmor log file.

I'm guessing that there is some form of array size limit here -- is it possible to either document that limit and fail early if attempting to exceed it, or raise it?

I think this can be got around by multiple cmor.write calls with smaller chunks of data -- if this is correct then understanding the limitations on array size within CMOR would be valuable.

This is using CMOR version 3.7.2 installed via conda.

@taylor13
Copy link
Collaborator

taylor13 commented Sep 12, 2023

It might have something to do with use of 64 bit integers (integer*4 in FORTRAN) , which allows a maximum value of 2147483647.
Note that for T=30, the array size is 2005401600 (smaller than max integer), but T=45 is larger than the maximum value represented by a 64 bit integer.
I agree that we should trap the problem if we can and display a clear error message.

@taylor13
Copy link
Collaborator

If the problem could be easily corrected by judicious use of integer*8 for the allocation step, that might be an option, but only if not a lot of work since the huge array can be broken down a written in smaller segments (if Matt's supposition is correct).

@matthew-mizielinski
Copy link
Author

Hi @taylor13, yes upping the type of appropriate variables to integer*8 is one option. Another would be deal with this in the python layer; if product of the array shape exceeds 2^31 raise an exception (ideally a dedicated one). Users can then react to this (e.g. by halving chunk size being written).

@mauzey1
Copy link
Collaborator

mauzey1 commented Sep 19, 2023

I have identified that this issue was being caused by the use of the type int in the function cmor_write_var_to_file when calculating the number of elements in the data. I have resolved this by replacing int with size_t for variables used to calculate the size of the data and indexes.

I have a branch that should be ready to merge but I am now encountering issues with the udunits2 library in the Linux builds. That's a separate issue that seems to be tied with the latest release of udunits2 on conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants