Store tricubic results for loading later #5

mrh335 · 2020-07-06T22:43:26Z

I have successfully implemented the tricubic code you have created and have been trying to find a way to store the resulting function from the regression and load it in a later instance to utilize the function. The current dataset used in the tricubic regression is hundreds of megabytes and in order to currently utilize tricubic, I load this dataset, run the regression and use. If I could more directly store the ip function, then I would not need to load all those megabytes of data in order to regress. This is an issue because the runtime of the code I have can be quite fast depending on the use case and the time to load the data can take several orders of magnitude longer to load making it inefficient.

The ip function cannot currently be pickled as it is a pycapsule object. Is there a way to store this data out of the pycapsule as pickle and then reload at another run?

danielguterding · 2020-07-07T12:10:44Z

The interpolator needs all input data at runtime, so even if you pickle the interpolator, one would have to include all input data into the pickle file, meaning you would still have to load all data. I can add support for pickle, but I assume it will not solve your specific performance problem. Maybe you can rewrite your algorithm so that the interpolator is reused instead of creating a new one each time.

mrh335 · 2020-07-08T06:21:48Z

I think we are saying the same thing. I would like to have a light way method to load the interpolator and use it. I assume this means the input data does not need to be reloaded. Please help explain if I am on the right track and help explain your response with a little more detail.

danielguterding · 2020-07-09T09:59:53Z

I believe we are still not on the same page. Since you were referring to regression earlier, I think you may be confusing regression and interpolation. To be clear, pytricubic performs interpolation only.

Regression usually means fitting a simple few parameter model to a large data set. These representation through these few parameters can then be used as an efficient approximation to the entire data set, while the initial data set may not be needed anymore for some applications.

Interpolation, however, fits a model to a data set, so that it goes exactly through all input data points. Here, one is usually not interested in a more efficient representation of the data, but rather in modelling features at a higher resolution than what is represented in the initial data set. Since interpolation is exact at the input data points, no more compact representation exists, i.e. all initial data are needed at runtime.

Therefore, I think there is no way to extract a few parameter representation, which I believe you are looking for.

Rather, I would suggest that you investigate whether you really need to load the input data, construct the interpolator, do the interpolation and terminate the program for each job you are running. You may be able to construct the interpolator once and then run all your jobs in a loop or similar. Please note that pytricubic contains some optimizations which make repeated calls to the interpolator quite efficient.

mrh335 · 2020-07-09T17:11:05Z

My words were not matching what I meant to say. I did think from reading the paper that the tricubic function uses the input data to create a matrix which is then used for the interpolation. I believe I am mistaken to think that this matrix is all that is needed to call the interpolation, but really this matrix and the input data is required for interpolation.

With the LinearNDinterpolator in scipy, this creates pickleable object which can be reloaded and called. I also thought this was storing coefficients which describe the fit which are then called.

danielguterding · 2020-07-19T20:12:22Z

I did think from reading the paper that the tricubic function uses the input data to create a matrix which is then used for the interpolation. I believe I am mistaken to think that this matrix is all that is needed to call the interpolation, but really this matrix and the input data is required for interpolation.

Unfortunately, you are mistaken. All the input data is needed at runtime, at least in the implementation that pytricubic is using.

In principle, one could store all coefficients, but that would become quite costly in terms of memory for large grids. As a compromise, pytricubic memorizes the coefficients of the last accessed grid cell.

Therefore, I could implement pickling, but most likely you will not gain any performance in case the rest of your code is already optimized.

mrh335 · 2020-07-21T06:35:10Z

Thanks for the clarification.

I have two further questions.
I currently create a grid which is 20 x 10 x 60 for instance. Each axis represents a property such as temperature and pressure and humidity. I currently scale the inputs from engineering units such as temperature to the size of the given axis using a linear interpolation to determine the grid location (which is usually between the integer grid points). If my temperature range is 250 to 350 Kelvin on the axis with 20 grid points and I want to look up for the temperature of 301.6 K, I would linearly interpolation between 250 and 350 to find the grid location of 301.6 K and then use that grid location as an input to the IP function to return the value at the intersection of the 3 inputs.

Is this the appropriate use or is there a more direct way to perform the lookup using engineering values and not have to convert back and forth from grid locations?
Is it possible to enable a vector input to the lookup function? Therefore, given a range of temperatures with two other terms fixed, can it return the lookup values for those input temperatures? Could this be expanded to handle all inputs as arrays?

danielguterding · 2020-07-27T14:51:21Z

Your use of the interpolator sounds appropriate. With respect to your second question, I suggest you implement this function yourself. All you need to write is a function that takes the list of x values, the fixed y and z values and the interpolator object. Then you loop over the x values and call ip(x, y, z), save it to some list and return. This can also be expanded to using arrays for x, y, and z. Simply loop over all those.

danielguterding · 2021-02-11T19:25:30Z

It has been a long time since there were any updates on this issue. Therefore, I close it.

danielguterding closed this as completed Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store tricubic results for loading later #5

Store tricubic results for loading later #5

mrh335 commented Jul 6, 2020

danielguterding commented Jul 7, 2020

mrh335 commented Jul 8, 2020

danielguterding commented Jul 9, 2020

mrh335 commented Jul 9, 2020

danielguterding commented Jul 19, 2020 •

edited

Loading

mrh335 commented Jul 21, 2020

danielguterding commented Jul 27, 2020

danielguterding commented Feb 11, 2021

Store tricubic results for loading later #5

Store tricubic results for loading later #5

Comments

mrh335 commented Jul 6, 2020

danielguterding commented Jul 7, 2020

mrh335 commented Jul 8, 2020

danielguterding commented Jul 9, 2020

mrh335 commented Jul 9, 2020

danielguterding commented Jul 19, 2020 • edited Loading

mrh335 commented Jul 21, 2020

danielguterding commented Jul 27, 2020

danielguterding commented Feb 11, 2021

danielguterding commented Jul 19, 2020 •

edited

Loading