Poem 048 - MetaModelSemiStructuredComponent #106

JustinSGray · 2021-05-08T19:52:46Z

A SemiStructuredMetaModel component will be needed for upcoming aircraft design use cases in OpenMDAO.

ixjlyons · 2021-05-14T22:22:36Z

This looks pretty reasonable to me. One comment I have is that I think it's fairly common to have slightly more structure that could be exploited. For example, suppose I have 3 independent variables, x1 / x2 / x3, and for each x1, there is a complete x2-x3 grid (omitting the dependent variable(s)):

x1 = 1
    x2 = 5
        x3 = 8, 9, 20
    x2 = 6
        x3 = 8, 9, 20
x1 = 2
    x2 = 7
        x3 = 14, 15, 16
    x2 = 8
        x3 = 14, 15, 16
    x3 = 9
        x3 = 14, 15, 16

My inclination would be to fit a 2D interpolator for each x1, and I imagine the MetaModelSemiStructuredComp.set_input_metadata method could handle an argument specifying which or how many of the last k dimensions represent a fully structured grid. It may not be worth the complexity though, particularly if the underlying methods already handle the minimally-structured data case and this proposal is really just adding the ability to accept it. It may also not save all that much except in certain cases - in my current use case we're talking 3 dimensions of size ~10 each.

JustinSGray · 2021-05-15T13:36:00Z

For the sake of extensibility and flexibility, the MetaModelStructured class uses a recursive interpolation routine that allows for arbitrary number of input dimensions.

You're right that It would in theory be possible to offer faster performance for 2-D cases by hard-coding a structured interpolant that didn't need to do recursion. 1D and 2D interpolations are very common uses cases, so this is an interesting suggestion to get better performance in a common case. In theory the suggestion would apply to both MetaModelStructured and the proposed MetaModelSemiStructured.

Some implementation details would need to be worked out in the semi-structured case though:

Automatic detection of the structured data for the last 2 dimensions; Alternatively, the API would need to be modified to let the user declare it somehow? Do you have a suggestion for either of those?
its been long enough since I've written an interplant routine that Im not sure if we could keep the generality of multiple basis functions with a hard-coded 2D interpolant. This would need to be worked out though.

Alternatively, if we decide that a generalized implementation isn't possible but the performance is still worth while, we could implement your suggestion as a project-specific interpolation routine. The major challenge with doing so will be getting the necessary derivatives with respect to training data values though.

ixjlyons · 2021-05-18T18:05:33Z

Automatic detection of the structured data for the last 2 dimensions; Alternatively, the API would need to be modified to let the user declare it somehow? Do you have a suggestion for either of those?

My initial thought though was just to let the user specify this, though a detection method could also be useful for input checking. One way would assume this "stack of grids" format where you indicate how many of the last k inputs always form a complete grid:

interp.set_input_metadata(..., n_complete_dims=2)

Or maybe you could specify directly which inputs they are. I'm not actually sure if it makes sense for the component to be this flexible though:

interp.set_input_metadata(..., complete_dims=["x2", "x3"])

Alternatively, if we decide that a generalized implementation isn't possible but the performance is still worth while, we could implement your suggestion as a project-specific interpolation routine. The major challenge with doing so will be getting the necessary derivatives with respect to training data values though.

Yeah, I don't think any of this should block the more general proposal moving forward- it seems like it could be added later preserving backwards compatibility. Implementing it as a project-specific optimization could be a good way to develop it and evaluate if it's worth generalizing and pushing it upstream.

JustinSGray · 2021-05-18T22:29:40Z

Whether the user defines the number of "complete" dimensions or we detect it is not the critical issue. OpenMDAO already has an interpolation routine for fully structured data, but it uses a recursive interpolation strategy that gives an arbitrary number of input dimensions. The cost of that generality is that the interpolation is a lot slower.

If you want a faster interpolation routine, you have to code it for a fixed number of inputs (and maybe a fixed interpolation basis function). It is this customized interpolation routine that will take the majority of the development effort.

Kenneth-T-Moore · 2021-06-28T15:07:02Z

@ixjlyons

I'd like to get a feel for expectations about extrapolation.

Extrapolation happens whenever your evaluation point on a given row or dimension only has points on one side of it. With Structured grids, this only happens outside of the table "box" domain. With the proposed semi-structure, it is going to happen quite often. If we take your example

x1 = 1
    x2 = 5
        x3 = 8, 9, 20
    x2 = 6
        x3 = 8, 9, 20
x1 = 2
    x2 = 7
        x3 = 14, 15, 16
    x2 = 8
        x3 = 14, 15, 16
    x3 = 9
        x3 = 14, 15, 16

If I want to evaluate the point (1.1, 8.9, 15.5), we do this in four steps, one for each table dimension.

  x1: interpolate between x1=1 and x1=2
  
  x2: extrapolate on x1=1 past x2=6
      interpolate on x1=2 between x2=8 and x2=9
	  
  x3: interpolate on x1=1, x2=6 between x3=9 and x2=20
      extrapolate on x1=1, past x2=6 between x3=9 and x2=20
      interpolate on x1=2, x2=8 between x2=8 and x2=9
      interpolate on x1=2, x2=9 between x2=8 and x2=9
	  
  compute interpolated value from values at:
      (1, 6, 9), (1, 6, 20)
      extrapolated from (1, 6, 9), extrapolated from (1, 6, 20)
      (2, 8, 15), (1, 8, 16)
      (2, 9, 15), (1, 9, 16)

I don't think there is anything inherently wrong with this, but there is a way it might be unintuitive. You might notice that (1.1, 8.9, 15.5) is actually closer (measured by Euclidian distance) to a point in the second subtable (at x1=2) than to any point in the first subtable (at x1=1). However, this interpolation at this point is more strongly impacted by the further away table than the closer table simply because the order the dimensions are handled determines which axis the extrapolation occurs on. If you reordered this dataset to (x2, x1, x3) so that you interpolate on x2 and then on x1, then the opposite would be true.

So, that's where your expectations come in. I think the story is good, but the drawback is you may not get what you expect if want to query points outside of the zones they are defined. In such a case, the structured-grid approach has its limitations because interior extrapolation is not expected. For the top level, you might rather have an unstructured approach where you interpolate a really sparse dimension by averaging the nearest n subtables, but you lose some of the speed of the structured approaches now that you have to do distance calculations.

Kenneth-T-Moore · 2021-06-28T15:24:12Z

A few other random thoughts:

In an environment where some rows or columns might be really sparse, the interpolant needs to be able to "step down" from the requested order. For example, if you requested something like "lagrange2", but only have two points for some row in the table, you need to automatically step down to "linear".
I originally thought that some of the existing interpolation methods wouldn't work with this approach, but I have less doubts now. So far though, I've only done "slinear".
I'm currently using the following format for specifying the semi grid when instantiating the InterpNDSemi class and for internal storage.

grid = np.array([
   [1.0, 1.0],
   [1.0, 2.0],
   [2.0, 1.0],
   [2.0, 2.0],
   [3.0, 3.0],
   [3.0, 4.0],
   [4.0, 3.0],
   [4.0, 4.0],
])

values = np.array([
   3.7,
   5.4,
   2.0,
   1.1,
   6.8,
   11.3,
   5.7,
   2.2,
])

JustinSGray · 2021-06-28T16:26:28Z

I don't think extrapolation is going to be that common. It will happen, but its not going to be all the time. The only real two options are

hold the last value
use the basis function to extrapolate.

for now, Im ok with option #1

Kenneth-T-Moore · 2021-08-19T18:52:23Z

This POEM PR has been merged as: #115

JustinSGray added 4 commits May 6, 2021 08:16

starting 48

8e002c7

first cut at 048

c3b3367

fixing linting

fb7ea13

a little cleanup

fb6d360

JustinSGray closed this Jul 4, 2021

JustinSGray deleted the poem_048 branch July 4, 2021 10:22

JustinSGray restored the poem_048 branch July 4, 2021 10:22

JustinSGray reopened this Jul 4, 2021

Kenneth-T-Moore mentioned this pull request Aug 3, 2021

Poem 048 - MetaModelSemiStructuredComponent #115

Merged

Kenneth-T-Moore closed this Aug 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poem 048 - MetaModelSemiStructuredComponent #106

Poem 048 - MetaModelSemiStructuredComponent #106

JustinSGray commented May 8, 2021

ixjlyons commented May 14, 2021

JustinSGray commented May 15, 2021

ixjlyons commented May 18, 2021

JustinSGray commented May 18, 2021

Kenneth-T-Moore commented Jun 28, 2021

Kenneth-T-Moore commented Jun 28, 2021 •

edited

Loading

JustinSGray commented Jun 28, 2021

Kenneth-T-Moore commented Aug 19, 2021

Poem 048 - MetaModelSemiStructuredComponent #106

Poem 048 - MetaModelSemiStructuredComponent #106

Conversation

JustinSGray commented May 8, 2021

ixjlyons commented May 14, 2021

JustinSGray commented May 15, 2021

ixjlyons commented May 18, 2021

JustinSGray commented May 18, 2021

Kenneth-T-Moore commented Jun 28, 2021

Kenneth-T-Moore commented Jun 28, 2021 • edited Loading

JustinSGray commented Jun 28, 2021

Kenneth-T-Moore commented Aug 19, 2021

Kenneth-T-Moore commented Jun 28, 2021 •

edited

Loading