-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store tricubic results for loading later #5
Comments
The interpolator needs all input data at runtime, so even if you pickle the interpolator, one would have to include all input data into the pickle file, meaning you would still have to load all data. I can add support for pickle, but I assume it will not solve your specific performance problem. Maybe you can rewrite your algorithm so that the interpolator is reused instead of creating a new one each time. |
I think we are saying the same thing. I would like to have a light way method to load the interpolator and use it. I assume this means the input data does not need to be reloaded. Please help explain if I am on the right track and help explain your response with a little more detail. |
I believe we are still not on the same page. Since you were referring to regression earlier, I think you may be confusing regression and interpolation. To be clear, pytricubic performs interpolation only. Regression usually means fitting a simple few parameter model to a large data set. These representation through these few parameters can then be used as an efficient approximation to the entire data set, while the initial data set may not be needed anymore for some applications. Interpolation, however, fits a model to a data set, so that it goes exactly through all input data points. Here, one is usually not interested in a more efficient representation of the data, but rather in modelling features at a higher resolution than what is represented in the initial data set. Since interpolation is exact at the input data points, no more compact representation exists, i.e. all initial data are needed at runtime. Therefore, I think there is no way to extract a few parameter representation, which I believe you are looking for. Rather, I would suggest that you investigate whether you really need to load the input data, construct the interpolator, do the interpolation and terminate the program for each job you are running. You may be able to construct the interpolator once and then run all your jobs in a loop or similar. Please note that pytricubic contains some optimizations which make repeated calls to the interpolator quite efficient. |
My words were not matching what I meant to say. I did think from reading the paper that the tricubic function uses the input data to create a matrix which is then used for the interpolation. I believe I am mistaken to think that this matrix is all that is needed to call the interpolation, but really this matrix and the input data is required for interpolation. With the LinearNDinterpolator in scipy, this creates pickleable object which can be reloaded and called. I also thought this was storing coefficients which describe the fit which are then called. |
Unfortunately, you are mistaken. All the input data is needed at runtime, at least in the implementation that pytricubic is using. In principle, one could store all coefficients, but that would become quite costly in terms of memory for large grids. As a compromise, pytricubic memorizes the coefficients of the last accessed grid cell. Therefore, I could implement pickling, but most likely you will not gain any performance in case the rest of your code is already optimized. |
Thanks for the clarification. I have two further questions.
|
Your use of the interpolator sounds appropriate. With respect to your second question, I suggest you implement this function yourself. All you need to write is a function that takes the list of x values, the fixed y and z values and the interpolator object. Then you loop over the x values and call ip(x, y, z), save it to some list and return. This can also be expanded to using arrays for x, y, and z. Simply loop over all those. |
It has been a long time since there were any updates on this issue. Therefore, I close it. |
I have successfully implemented the tricubic code you have created and have been trying to find a way to store the resulting function from the regression and load it in a later instance to utilize the function. The current dataset used in the tricubic regression is hundreds of megabytes and in order to currently utilize tricubic, I load this dataset, run the regression and use. If I could more directly store the ip function, then I would not need to load all those megabytes of data in order to regress. This is an issue because the runtime of the code I have can be quite fast depending on the use case and the time to load the data can take several orders of magnitude longer to load making it inefficient.
The ip function cannot currently be pickled as it is a pycapsule object. Is there a way to store this data out of the pycapsule as pickle and then reload at another run?
The text was updated successfully, but these errors were encountered: