Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak how we use spans #42

Merged
merged 3 commits into from
Mar 11, 2019
Merged

Conversation

airbreather
Copy link
Member

  • Base implementations now have virtual methods that can be overridden as appropriate to provide optimized versions for any of the following layouts:
    • [x0, y0, z0, x1, y1, z1, ..., xn, yn, zn]
    • [x0, y0, x1, y1, ..., xn, yn, z0, z1, ..., zn]
    • [x0, x1, ..., xn, y0, y1, ..., yn, z0, z1, ..., zn]

All built-in algorithms now only override the one-by-one variants. The point of allowing subclasses to override for spans (and allowing different layouts) is to allow implementations that use SIMD and/or auxiliary DLLs for their implementations. There ought not to be a particularly significant advantage to using spans the way we were doing before.

- Base implementations now have virtual methods that can be overridden as appropriate to provide optimized versions for any of the following layouts:
    - [x0, y0, z0, x1, y1, z1, ..., xn, yn, zn]
    - [x0, y0, x1, y1, ..., xn, yn, z0, z1, ..., zn]
    - [x0, x1, ..., xn, y0, y1, ..., yn, z0, z1, ..., zn]

All built-in algorithms now only override the one-by-one variants.  The point of allowing subclasses to override for spans (and allowing different layouts) is to allow implementations that use SIMD and/or auxiliary DLLs for their implementations.  There ought not to be a **particularly** significant advantage to using spans the way we were doing before.
@airbreather airbreather changed the base branch from develop to perf/Transform_with_Span March 8, 2019 14:23
@airbreather
Copy link
Member Author

When looking at this, make sure to hide whitespace-only changes:
image

…y bit cheaper on the crusty ol' .NET Framework with basically no extra development effort
Copy link
Member

@FObermaier FObermaier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should come up with a performance test to help find a conclusion on this

@airbreather
Copy link
Member Author

We should come up with a performance test to help find a conclusion on this

Perhaps, but it's not just performance (especially not performance of ProjNet4GeoAPI itself) that prompted my reply, but a combination of how easy it is for dependents to use it, how easy it is to implement new projections, and what I'm going to call "ideal performance".

"Ideal performance"...

I don't know if there's a term for this already, but when I say "ideal performance", I mean the best performance that the library could possibly achieve (without breaking changes) when the caller provides the system with all the information that they have, without having to circumvent too many abstractions.

In other words, if someone has a PackedDoubleCoordinateSequence with Dimension = 2 and Measures = 0 and they want to transform it in-place, then they can do:

MapProjection theProjection = GetTheProjection();
PackedDoubleCoordinateSequence seq = GetTheSequence();
Span<XY> xys = MemoryMarshal.Cast<double, XY>(seq.GetRawCoordinates());
theProjection.Transform(xys, xys);

and then we have the option (if we choose to optimize this path) to call out to a native DLL that implements the routine as efficiently as the platform allows... but we probably won't, because the built-in is probably good enough.

If someone has a coordinate sequence that stores the points laid out SoA-style (because, perhaps, they do other SIMD operations), then they should be able to do:

MapProjection theProjection = GetTheProjection();
SOACoordinateSequence seq = GetTheSequence();
Span<double> xs = seq.GetXs();
Span<double> ys = seq.GetYs();
theProjection.Transform(xs, ys, xs, ys);

and then we have the option (again, if we choose to optimize it) to call out to a different native DLL (or use vector intrinsics coming in .NET Core 3.0 😁) to transform multiple points at a time... but again, we probably won't, because the built-in is still probably good enough.

BUT... suppose (purely hypothetically, of course) that someone has been passing around MapProjection instances throughout their application, and they just so happen to find that there could be some throughput benefits by dispatching to a different implementation upon seeing a particular set of parameter values.

  • This different implementation might have pre-computed some of the calculations that become invariant after doing so, rewritten the logic flow to maximize ILP (including, perhaps, unrolling loops), and even perhaps used advanced SIMD intrinsics that somehow only Intel's C++ compiler even bothers to offer (though who would be crazy enough to do that?).
  • Such a hypothetical individual might very much appreciate the opportunity to override a method on MapProjection that accepts an input in a compatible shape and uses the optimized code path when it sees that it can, falling back to the not-as-optimized version in all other cases.

This kind of thing is what I'm referring to when I talk about performance (which is why the comment was originally logged in GeoAPI since IMathTransform is an even better place for this)... basically giving people:

  1. something that's usable out-of-the-box and probably fast enough most of the time and
  2. the tools that they would need in order to selectively eliminate bottlenecks

@airbreather
Copy link
Member Author

we have the option (if we choose to optimize this path) to call out to a native DLL that implements the routine as efficiently as the platform allows

heh... it wouldn't even be just to optimize performance: I think all of the "bulk" methods I added can translate to efficient calls to PROJ.4's proj_trans_generic function if we wanted to make a P/Invoke wrapper around that to expand the list of projections we support 🤓.

@FObermaier
Copy link
Member

@airbreather, I've confirmed that your approach does not imply decrease of performance so I'm happy with persuing your approach.

Let's clean it up:

  • What do you think needs to be added to GeoAPI / GeoAPI.CoordinateSystems.

@airbreather
Copy link
Member Author

What do you think needs to be added to GeoAPI / GeoAPI.CoordinateSystems.

That depends on which of NetTopologySuite/GeoAPI#67 or NetTopologySuite/GeoAPI#68 (or neither) we choose to do for 2.0... my ideal would be along the lines of NetTopologySuite/GeoAPI#68: replace IMathTransform with an abstract base class that does the same thing I did for MathTransform in here, namely, one abstract "one-by-one" method and lots of virtual methods calling it.

@FObermaier
Copy link
Member

SequenceCoordinateConverterBase should be changed to SequenceTransformerBase so that it has access to the MathTransform object and can decide on its own which of the Transform overloads suits its data structure best.

@airbreather
Copy link
Member Author

airbreather commented Mar 11, 2019

SequenceCoordinateConverterBase should be changed to SequenceTransformerBase so that it has access to the MathTransform object and can decide on its own which of the Transform overloads suits its data structure best.

Ahh, yeah, I was thinking about that type too... my thought on it was that it should have something like (edit: "it" = the SequenceCoordinateConverterBase):

public virtual SomeNewEnum GetIdealLayout(ICoordinateSequence coordinateSequence)
{
    return SomeNewEnum.DoesntMatter;

    // DoesntMatter: we pick based on what's most efficient for us (or arbitrarily)
    // other defined values tell us which of the variants is most efficient for the sequence
}

And then expose methods to copy to/from the different layouts.

The idea being that it will work for all different possible sequences, but almost all sequences will naturally copy to/from a particular layout more efficiently than any other (because the coordinates are already in that order to begin with, so it will very often be possible to use the optimized Span<T> copy methods), and so we will want to know which of the different variants would be best for each sequence.

Edit edit: but yeah, you're right that it should also take into account the math transform object so that it can also consider which method(s) actually are optimized. Heh.

@FObermaier FObermaier merged commit 22d5cc8 into perf/Transform_with_Span Mar 11, 2019
@FObermaier FObermaier added this to Done in ProjNet V2 Mar 11, 2019
FObermaier pushed a commit that referenced this pull request Mar 11, 2019
* Tweak how we use spans.
- Base implementations now have virtual methods that can be overridden as appropriate to provide optimized versions for any of the following layouts:
    - [x0, y0, z0, x1, y1, z1, ..., xn, yn, zn]
    - [x0, y0, x1, y1, ..., xn, yn, z0, z1, ..., zn]
    - [x0, x1, ..., xn, y0, y1, ..., yn, z0, z1, ..., zn]

All built-in algorithms now only override the one-by-one variants.  The point of allowing subclasses to override for spans (and allowing different layouts) is to allow implementations that use SIMD and/or auxiliary DLLs for their implementations.  There ought not to be a **particularly** significant advantage to using spans the way we were doing before.

* Bleh, this was supposed to use the arrays, not the spans, to be a tiny bit cheaper on the crusty ol' .NET Framework with basically no extra development effort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
ProjNet V2
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants