Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the analysis storage chunk equal to checkpoint interval #861

Closed
wants to merge 1 commit into from

Conversation

Lnaden
Copy link
Contributor

@Lnaden Lnaden commented Dec 21, 2017

Self-explanatory. Some variables are not chunked to the checkpoint_interval because they are not accessed in bulk usually (online free energy, timestamp).

This is an optimization for read/write.

Fixes #860

Self-explanatory. Some variables are not chunked to the checkpoint_interval because they are not accessed in bulk usually (online free energy, timestamp).

 This is an optimization for read/write.

 Fixes #860
@jchodera
Copy link
Member

Some variables are not chunked to the checkpoint_interval because they are not accessed in bulk usually (online free energy, timestamp).

I'd chunk anything that is written each iteration to the same interval. Otherwise, we won't get the speed benefit from not having to enlarge the file each iteration.

@Lnaden
Copy link
Contributor Author

Lnaden commented Dec 21, 2017

I'd chunk anything that is written each iteration to the same interval

Fair. The only one I would need to change then is the timestamp chunk since its written each interval. Online analysis is on its own interval, so I'll need to think about that a bit more.

@Lnaden
Copy link
Contributor Author

Lnaden commented Dec 21, 2017

This is currently throwing errors due to the following problem:

Unidata/netcdf4-python#566

Given that this has been open now for a year and a half, I'm not sure if it will ever be fixed.

@jchodera
Copy link
Member

The code is here.

I guess we're stuck with chunksizes of 1 for now, but it would be worth contributing what must be the trivial change to fix this back to Unidata.

@jchodera
Copy link
Member

jchodera commented Dec 21, 2017

It looks like you just need to modify line 1081
from

            if(var->dim[d]->len > 0 && chunksizes[d] > var->dim[d]->len)

to

            if(!var->dim[d]->unlimited && var->dim[d]->len > 0 && chunksizes[d] > var->dim[d]->len)

Maybe we should try to fix it for them?

@Lnaden
Copy link
Contributor Author

Lnaden commented Dec 21, 2017

@jchodera I had just written that up on the parent issue Unidata/netcdf-c#299 then came back here and saw the post. I'll see what they say about any pitfalls in chunking 2 variables on the same dimension differently with existing data.

@jchodera
Copy link
Member

jchodera commented Jan 2, 2018

@Lnaden has proposed a PR fix in Unidata/netcdf-c#760

@jchodera
Copy link
Member

@Lnaden : Can you revive this PR for the sams branch when able? Your updated NetCDF version is now live on conda.

Lnaden added a commit that referenced this pull request Mar 12, 2018
This is an optimization for read/write by using the `checkpoint_interval`
for the analysis file, and 1 for the checkpoint file.

Supersedes and closes #861 by merging into SAMS
Fixes #860
@Lnaden
Copy link
Contributor Author

Lnaden commented Mar 12, 2018

Superseded by #912, closing

@Lnaden Lnaden closed this Mar 12, 2018
@Lnaden Lnaden deleted the checkpoint_optimize branch March 12, 2018 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants