Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for 2D Geometry validation. #98

Open
voney opened this issue Jan 31, 2023 · 9 comments
Open

Support for 2D Geometry validation. #98

voney opened this issue Jan 31, 2023 · 9 comments

Comments

@voney
Copy link

voney commented Jan 31, 2023

Currently the library allows positions to be z,y or x,y,z with no way to force a Geometry to only be 2d.

I'd like to propse a change where the geometries come in 2 flavours, 2D and 3D to allow for better Schemas/API specs and validation of geometries that are only supposed to be 2D.

I had shimmed the library in my own projects to add in Point2D and Polygon2D, but I think this would be better suited in the base library itself.

Thaughts?

@eseglem
Copy link
Collaborator

eseglem commented Jan 31, 2023

Interesting idea. It could also help with handling mixed dimensionality which I was thinking about yesterday. But shimming may be the better way.

As far as GeoJSON goes, it states: "Altitude or elevation MAY be included as an optional third element." So it would not be invalid GeoJSON to have a Z included. Trying to enforce having or not having a third element would be outside of the spec. (And the same would go for mixed dimensionality.) So at a library level I would be a bit hesitant to do it. Input with a Z is valid GeoJSON but you are free to ignore it on your end.

My first thoughts on trying to implement it would be something like this.

Position2D = Tuple[float, float]
Position3D = Tuple[float, float, float]
Position = Union[Position2D, Position3D]

MultiPointCoords2D = conlist(Position2D , min_items=1)
MultiPointCoords3D = conlist(Position3D, min_items=1)
# This has mixed dimensionality like we currently do
MultiPointCoords = conlist(Position, min_items=1)
# Or this if there was a desire to not be mixed
MultiPointCoords = Union[MultiPointCoords2D, MultiPointCoords3D]

It really blows things up and that doesn't even get to geometry or features yet. It would probably be possible to write things as generics and make it easier to overwrite the built in Position types. That would end up multiple nested generics and I am not sure if it would be worth the added complexity to cover something outside the spec. Definitely curious what others think about it.

Alternative Idea: Has Z + Validator

If we add a has_z property which checks if any Position in the coordinates has a Z. (The function is also useful for WKT purposes.) That function could then be used in a validator. Something like this:

# Untested idea, not guaranteed to work as is
class Feature2D(Feature):
    # This would be a post validator, so it has already been turned into a `Geometry`
    @validator("geometry")
    def enforce_2d_geometry(cls, geometry: Geometry) -> Geometry:
        if geometry.has_z:
            raise ValueError # Whatever the best error / code is here. 
        return geometry

Unfortunately that would still end up saying Position can be 2 or 3 dimension in something like Swagger docs. Which may cause confusion with end users.

Alternative idea: Drop Z

Perhaps it would be useful to borrow from Shapely here. It has two functions force_2d and force_3d that allow for forcing it on an existing geometry. It doesn't necessarily achieve the same goal of validating that the input does not have a Z. But it would facilitate the "MAY" better by allowing the Z to be dropped.

@voney
Copy link
Author

voney commented Jan 31, 2023

From my reading of the spec the "MAY" clause means that it's optional, not in the sense that the field itself MUST be included as an optional value, but that it MAY be included.

Per RFC2119 that outlines the use of the word MAY in other RFCs:

  1. MAY This word, or the adjective "OPTIONAL", mean that an item is
    truly optional. One vendor may choose to include the item because a
    particular marketplace requires it or because the vendor feels that
    it enhances the product while another vendor may omit the same item.
    An implementation which does not include a particular option MUST be
    prepared to interoperate with another implementation which does
    include the option, though perhaps with reduced functionality. In the
    same vein an implementation which does include a particular option
    MUST be prepared to interoperate with another implementation which
    does not include the option (except, of course, for the feature the
    option provides.)

With the above in mind, and since the library is intended to be implemented by vendors in their own codebases I would suggest that having the flexibility to implement 2D or 3D versions of the schemas is in line with the spec.

Having said that, I ALSO think that your suggestions around having the ability to "drop z" and validate on "haz z" are good ideas as it would allow vendors to choose whether they "accept and drop" ("reduced functionality" above).

I guess this comes down to the two sides of API design, when producing an external API we get to decide whether it includes Z or not, but when consuming a third party API we have to gracefully handle the presence of the full spec including MAY clauses.

@eseglem
Copy link
Collaborator

eseglem commented Jan 31, 2023

The MAY clause does mean that it is optional. Just as you are free to ignore it, the other side is free to include it. The key part of that quote is:

An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality.

You MUST be prepared to interoperate with a system which does include Z. Preventing another party from including Z would not comply with the GeoJSON spec. The way to comply with it would be to accept 3D coordinates and then drop them.

@voney
Copy link
Author

voney commented Feb 1, 2023

That makes perfect sense. My point is that in situations where API specifications are being generated automatically (my use case) based on what the model allows there's no way to communicate that the system only supports 2D.

So when creating API's that are only 2D it makes sense to have the models actually represent that, which goes to the sentance after the one you specified:

In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.)

So if the decision is to not support 2D models, how can we communicate that a 3D model will be reduced to 2D after processing?

@eseglem
Copy link
Collaborator

eseglem commented Feb 1, 2023

Yeah, it is not ideal. And I definitely understand wanting to restrict it more. I would probably want to do the same in that situation.

I don't know specifically what you are using, but you could override the Model and do Field(..., description="Any third dimension will be ignored."). Or assuming it is FastAPI you can use the docstring to explain at the endpoint level.

Since the other party cannot assume you are doing anything with the Z value, it seems reasonable to just state it. At least the attempt was made to communicate it.

The third dimension on GeoJSON feels pretty useless. Since it is optional, and people can do whatever they want, it is pretty much impossible to know what the other side means by it without additional discussions. Even big libraries like shapely will not input or output it, because geos under the hood ignores the third dimension.

@eseglem
Copy link
Collaborator

eseglem commented Feb 1, 2023

I ended up thinking about this a bunch over night. I still don't like the idea of restricting input to only 2D, as it can cause a bunch of interoperability issues. But I also don't think it is necessarily harmful to be able to specify that a systems output only contains 2D. Stating 'the GeoJSON I output will never have a 3rd dimension' is not against the spec, its just stating you won't ever use an optional parameter.

I am not really sure how to balance those two ideas. But I think it is probably is best (and easiest) to just keep everything as actual spec GeoJSON and just state it in the docs. Otherwise its adding extra types that may end up adding to confusion or causing other headaches.

I do have has_z written now. Just need to write some additional tests for it and can put in another PR.

@eseglem
Copy link
Collaborator

eseglem commented Jul 5, 2023

I don't know if its useful for this or not, but I believe I saw something about Pydantic 2.0 having separated Input and Output schemas. I think it was more for optional inputs which have defaults, so the output will always have a value there. But may be possible to utilize something like that to allow input of 3d, but only output 2d. Not sure the code would be very clean either, but may be worth exploring.

@vincentsarago
Copy link
Member

@eseglem do you think we should work on this for 1.0?

@eseglem
Copy link
Collaborator

eseglem commented Jul 11, 2023

@vincentsarago probably worth tinkering with but maybe not blocking on.

Would have to tinker with it to see if that actually works ok, but potentially: Generic on Position, and have a few class Position(RootModel):. One of which could have a validator that drops the 3rd dimension, and Input Schema allows 3d but Output Schema is 2d.

Would have to make all the coordinate types RootModel to get the Generic on them as well, adds a bit, but don't think it gets too crazy or hard to understand. Although I think that probably messes with mypy / pylance a bit since they are now classes and not just arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants