-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loadCMIP5 profiling #118
Comments
Here is my quick and dirty profile evaluation (running on a 2011 MacAir that was multi-tasking). Clearly I did NOT quite reach the level of optimization in my data frame functions that are currently in RCMIP5 but it gives a pretty good overview for the three candidate flavors we are looking at. Array seems to be both fast and small. The raster package is good if we manage everything from file instead of memory from a memory prospective but is relatively slow. We could consider rolling our own 'raster' package where we read in and process 'chuncks' of the netcdf file. But, from this run, I'm inclined to falling back on array. Thoughts? |
Thanks Kathe! I have a bunch of thoughts, as I've also spent the last few days doing quite a few profiling and performance tests.
|
Clearly I'm still behind the curve with this whole plyr thing. I have to admit, I still think that the array operations are simpler. I don't think that the array will have difficulties with the non-uniform grids as long as the lat doesn't change in a lon band and vis versa. However raster will error out on them which sadly invalidates that option. Another option, just to give even more choices, would be to use chunk reads-writes similar to the raster package. Doable but more work then just providing the data.frame vs array option. |
On further reflection, I think you're right that arrays can handle irregular grids too…though not our current implementation, as lat does change in a lon band, etc., for the MPI-class models. I guess I'd say that arrays are memory efficient, and conceptually simple but a bit complicated in code; data.frame simple in code, and much faster, but take much more memory. Are we willing to support both? |
I'm going to say yes. |
I have started some performance tests and immediately see that
loadCMIP5
has a big problem when it converts the loaded array to a data frame. It uses the slowest possible method to do so:The text was updated successfully, but these errors were encountered: