Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store tricubic results for loading later #5

Closed
mrh335 opened this issue Jul 6, 2020 · 8 comments
Closed

Store tricubic results for loading later #5

mrh335 opened this issue Jul 6, 2020 · 8 comments

Comments

@mrh335
Copy link

mrh335 commented Jul 6, 2020

I have successfully implemented the tricubic code you have created and have been trying to find a way to store the resulting function from the regression and load it in a later instance to utilize the function. The current dataset used in the tricubic regression is hundreds of megabytes and in order to currently utilize tricubic, I load this dataset, run the regression and use. If I could more directly store the ip function, then I would not need to load all those megabytes of data in order to regress. This is an issue because the runtime of the code I have can be quite fast depending on the use case and the time to load the data can take several orders of magnitude longer to load making it inefficient.

The ip function cannot currently be pickled as it is a pycapsule object. Is there a way to store this data out of the pycapsule as pickle and then reload at another run?

@danielguterding
Copy link
Owner

The interpolator needs all input data at runtime, so even if you pickle the interpolator, one would have to include all input data into the pickle file, meaning you would still have to load all data. I can add support for pickle, but I assume it will not solve your specific performance problem. Maybe you can rewrite your algorithm so that the interpolator is reused instead of creating a new one each time.

@mrh335
Copy link
Author

mrh335 commented Jul 8, 2020

I think we are saying the same thing. I would like to have a light way method to load the interpolator and use it. I assume this means the input data does not need to be reloaded. Please help explain if I am on the right track and help explain your response with a little more detail.

@danielguterding
Copy link
Owner

I believe we are still not on the same page. Since you were referring to regression earlier, I think you may be confusing regression and interpolation. To be clear, pytricubic performs interpolation only.

Regression usually means fitting a simple few parameter model to a large data set. These representation through these few parameters can then be used as an efficient approximation to the entire data set, while the initial data set may not be needed anymore for some applications.

Interpolation, however, fits a model to a data set, so that it goes exactly through all input data points. Here, one is usually not interested in a more efficient representation of the data, but rather in modelling features at a higher resolution than what is represented in the initial data set. Since interpolation is exact at the input data points, no more compact representation exists, i.e. all initial data are needed at runtime.

Therefore, I think there is no way to extract a few parameter representation, which I believe you are looking for.

Rather, I would suggest that you investigate whether you really need to load the input data, construct the interpolator, do the interpolation and terminate the program for each job you are running. You may be able to construct the interpolator once and then run all your jobs in a loop or similar. Please note that pytricubic contains some optimizations which make repeated calls to the interpolator quite efficient.

@mrh335
Copy link
Author

mrh335 commented Jul 9, 2020

My words were not matching what I meant to say. I did think from reading the paper that the tricubic function uses the input data to create a matrix which is then used for the interpolation. I believe I am mistaken to think that this matrix is all that is needed to call the interpolation, but really this matrix and the input data is required for interpolation.

With the LinearNDinterpolator in scipy, this creates pickleable object which can be reloaded and called. I also thought this was storing coefficients which describe the fit which are then called.

@danielguterding
Copy link
Owner

danielguterding commented Jul 19, 2020

I did think from reading the paper that the tricubic function uses the input data to create a matrix which is then used for the interpolation. I believe I am mistaken to think that this matrix is all that is needed to call the interpolation, but really this matrix and the input data is required for interpolation.

Unfortunately, you are mistaken. All the input data is needed at runtime, at least in the implementation that pytricubic is using.

In principle, one could store all coefficients, but that would become quite costly in terms of memory for large grids. As a compromise, pytricubic memorizes the coefficients of the last accessed grid cell.

Therefore, I could implement pickling, but most likely you will not gain any performance in case the rest of your code is already optimized.

@mrh335
Copy link
Author

mrh335 commented Jul 21, 2020

Thanks for the clarification.

I have two further questions.
I currently create a grid which is 20 x 10 x 60 for instance. Each axis represents a property such as temperature and pressure and humidity. I currently scale the inputs from engineering units such as temperature to the size of the given axis using a linear interpolation to determine the grid location (which is usually between the integer grid points). If my temperature range is 250 to 350 Kelvin on the axis with 20 grid points and I want to look up for the temperature of 301.6 K, I would linearly interpolation between 250 and 350 to find the grid location of 301.6 K and then use that grid location as an input to the IP function to return the value at the intersection of the 3 inputs.

  1. Is this the appropriate use or is there a more direct way to perform the lookup using engineering values and not have to convert back and forth from grid locations?

  2. Is it possible to enable a vector input to the lookup function? Therefore, given a range of temperatures with two other terms fixed, can it return the lookup values for those input temperatures? Could this be expanded to handle all inputs as arrays?

@danielguterding
Copy link
Owner

Your use of the interpolator sounds appropriate. With respect to your second question, I suggest you implement this function yourself. All you need to write is a function that takes the list of x values, the fixed y and z values and the interpolator object. Then you loop over the x values and call ip(x, y, z), save it to some list and return. This can also be expanded to using arrays for x, y, and z. Simply loop over all those.

@danielguterding
Copy link
Owner

It has been a long time since there were any updates on this issue. Therefore, I close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants