-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poem 048 - MetaModelSemiStructuredComponent #106
Conversation
This looks pretty reasonable to me. One comment I have is that I think it's fairly common to have slightly more structure that could be exploited. For example, suppose I have 3 independent variables, x1 / x2 / x3, and for each x1, there is a complete x2-x3 grid (omitting the dependent variable(s)):
My inclination would be to fit a 2D interpolator for each x1, and I imagine the |
For the sake of extensibility and flexibility, the MetaModelStructured class uses a recursive interpolation routine that allows for arbitrary number of input dimensions. You're right that It would in theory be possible to offer faster performance for 2-D cases by hard-coding a structured interpolant that didn't need to do recursion. 1D and 2D interpolations are very common uses cases, so this is an interesting suggestion to get better performance in a common case. In theory the suggestion would apply to both MetaModelStructured and the proposed MetaModelSemiStructured. Some implementation details would need to be worked out in the semi-structured case though:
Alternatively, if we decide that a generalized implementation isn't possible but the performance is still worth while, we could implement your suggestion as a project-specific interpolation routine. The major challenge with doing so will be getting the necessary derivatives with respect to training data values though. |
My initial thought though was just to let the user specify this, though a detection method could also be useful for input checking. One way would assume this "stack of grids" format where you indicate how many of the last k inputs always form a complete grid: interp.set_input_metadata(..., n_complete_dims=2) Or maybe you could specify directly which inputs they are. I'm not actually sure if it makes sense for the component to be this flexible though: interp.set_input_metadata(..., complete_dims=["x2", "x3"])
Yeah, I don't think any of this should block the more general proposal moving forward- it seems like it could be added later preserving backwards compatibility. Implementing it as a project-specific optimization could be a good way to develop it and evaluate if it's worth generalizing and pushing it upstream. |
Whether the user defines the number of "complete" dimensions or we detect it is not the critical issue. OpenMDAO already has an interpolation routine for fully structured data, but it uses a recursive interpolation strategy that gives an arbitrary number of input dimensions. The cost of that generality is that the interpolation is a lot slower. If you want a faster interpolation routine, you have to code it for a fixed number of inputs (and maybe a fixed interpolation basis function). It is this customized interpolation routine that will take the majority of the development effort. |
I'd like to get a feel for expectations about extrapolation. Extrapolation happens whenever your evaluation point on a given row or dimension only has points on one side of it. With Structured grids, this only happens outside of the table "box" domain. With the proposed semi-structure, it is going to happen quite often. If we take your example
If I want to evaluate the point (1.1, 8.9, 15.5), we do this in four steps, one for each table dimension.
I don't think there is anything inherently wrong with this, but there is a way it might be unintuitive. You might notice that (1.1, 8.9, 15.5) is actually closer (measured by Euclidian distance) to a point in the second subtable (at x1=2) than to any point in the first subtable (at x1=1). However, this interpolation at this point is more strongly impacted by the further away table than the closer table simply because the order the dimensions are handled determines which axis the extrapolation occurs on. If you reordered this dataset to (x2, x1, x3) so that you interpolate on x2 and then on x1, then the opposite would be true. So, that's where your expectations come in. I think the story is good, but the drawback is you may not get what you expect if want to query points outside of the zones they are defined. In such a case, the structured-grid approach has its limitations because interior extrapolation is not expected. For the top level, you might rather have an unstructured approach where you interpolate a really sparse dimension by averaging the nearest n subtables, but you lose some of the speed of the structured approaches now that you have to do distance calculations. |
A few other random thoughts:
|
I don't think extrapolation is going to be that common. It will happen, but its not going to be all the time. The only real two options are
for now, Im ok with option #1 |
This POEM PR has been merged as: #115 |
A SemiStructuredMetaModel component will be needed for upcoming aircraft design use cases in OpenMDAO.