Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locality: spatial vs. point-radius #4259

Closed
dustymc opened this issue Jan 13, 2022 · 12 comments
Closed

Locality: spatial vs. point-radius #4259

dustymc opened this issue Jan 13, 2022 · 12 comments
Labels
Enhancement I think this would make Arctos even awesomer! Priority-Critical (Arctos is broken) Critical because it is breaking functionality.

Comments

@dustymc
Copy link
Contributor

dustymc commented Jan 13, 2022

Is your feature request related to a problem? Please describe.

The locality model contains both point-radius and "footprint" data. It's not clear which takes precedence, and there's nothing to stop them from conflicting.

Describe what you're trying to accomplish

  1. Clarify what locality data are primary
  2. Do magic spatial stuff

Describe the solution you'd like

This is very, very open for discussion; I'm not sure where these data come from and may be completely lost.

  • Add locality.spatial_source IN (point-radius, spatial)
  • When spatial_source is point-radius, calculate spatial (overwriting anything that might already be in there) from coordinates+error
  • When spatial_source is spatial, calculate point-radius (overwriting anything that might already be in there) from locality footprint

That is, locality "coordinate" data could be given as either coordinates+error or as a polygon, but not both (as is possible - maybe required - now). There would be no possibility for ambiguity. Additional localities can be created and attached to relevant objects if both kinds of data are available.

This will result in all localities with "coordinates" having both coordinates and spatial data, which will allow queries involving

  • coordinates
  • error
  • various forms of interaction with other spatial data (eg, intersects geography, is contained by geography)

Describe alternatives you've considered

I don't think there are viable alternatives to sorting this out; we finally have tools, this can't be ignored.

Functionality is completely open - this could result in very slightly better documentation, deep spatial abilities in Arctos, or anything in between.

It might be possible to calculate on demand rather than storing, but I suspect we don't have the CPU to fully support that.

Some sort of cache might be possible, but I think both kinds of spatial data are "core" so that seems like unnecessary complexity.

Priority

High - I want to implement all of the cool spatial stuff that's been on the back burner forever!

@dustymc dustymc added the Enhancement I think this would make Arctos even awesomer! label Jan 13, 2022
@dustymc dustymc added this to the Needs Discussion milestone Jan 13, 2022
@dustymc
Copy link
Contributor Author

dustymc commented Jan 19, 2022

How can we prioritize/resolve this?

  • geolocate's polygon data does horrible things to coordinates in the UI
  • there's a bunch of complexity maintaining fake_coordinate_error, which at best only sorta works
  • the relatively few localities with spatial data (from fishbase IIRC) seem to have done about what I've proposed and provided a point-radius that encloses the polygon; it would be very good to get in front of this problem while it's still mostly simple.

@dustymc
Copy link
Contributor Author

dustymc commented Feb 24, 2022

AWG: seems reasonable, go.

Name: primary_spatial_data IN (point-radius, polygon)

@dustymc dustymc modified the milestones: Next AWG Meeting, Next Task Feb 24, 2022
@dustymc dustymc modified the milestones: Next Task, Active Development Mar 9, 2022
@dustymc
Copy link
Contributor Author

dustymc commented Mar 10, 2022

I am setting this up so that a point-radius value is generated when a polygon is saved, and a polygon is generated when a point-radius is saved. Circle-polygon "resolution" is defined by specifying a number of segments per quarter-circle. Large values are more circular but take more disk space. Small values are nicer to store, back up, export to DWC, download as text, etc., etc., but also less circular/accurate. I'm not sure how to balance that.

A 4-segment circle looks like this:

Screen Shot 2022-03-10 at 10 51 21 AM

An 8-segment (postgis' default) takes about twice the disk space:

Screen Shot 2022-03-10 at 10 52 30 AM

VERY roughly, 4-segment circles would use about 25MB of disk and 8-segment would require around 50MB with current data. The current table (without calculated polygons) is about 400MB.

The shapes are the same with any size error radius, but more noticeable with large errors where a user is likely to zoom past the error boundary.

If nobody has opinions before this goes to production (sometime next week, maybe) I'll go with the default of 8 segments.

Help!

@mkoo

@sharpphyl
Copy link

@dusty Want to make sure I understand all this as creating polygons is something we've wanted to do to deal with marine offshore localities. We've only created a few and don't fully understand how it works.

I created a polygon in DMNS:Inv:26912 to describe locality 10897677 which was in your list of temp_lotsa_err as having an exorbitant error radius. It was attempt to describe the Peru-Chile trench https://en.wikipedia.org/wiki/Peru–Chile_Trench which is 5,900 km long.

The first attempt used a point/error radius approach resulting in an error radius the size of South America. So I drew a polygon (with more than 8 segments) which still shows the point lat/long in the record implying we know where in the trench the specimen was found. Is this what you're referring to in this thread?

Screen Shot 2022-03-12 at 6 44 48 AM

I am setting this up so that a point-radius value is generated when a polygon is saved, and a polygon is generated when a point-radius is saved.

Does that mean that you'll create a circle around this polygon to replace the many segments?

Is it possible to get rid of the point coordinates in the record and just have the spacial description? And is this moving toward a solution for offshore marine localities (which generate hundreds of annotations)? And does the polygon I created violate your limit on segments?

Or do we stop geolocating records like these? Or should it be a feature and can it have a (more precise) polygon as a feature? Should I convert these areas to WKTs (which I don't know how to do but Wikipedia says "For example, PostGIS contains functions that can convert geometries to and from a WKT representation, making them human readable.") Is it something we can do?

Showing the original attempt as "As entered coordinates" implies that we had coordinates to enter into the record which is not true. How can I get rid of the history in the public record? (New issue?)

I know I'm mucking up your focused issue here, but how to handle marine locations has been a problem for years. Does this issue/feature offer a solution we should be using?

@dustymc
Copy link
Contributor Author

dustymc commented Mar 12, 2022

don't fully understand how it works

This should fix that! Basically just open geolocate, draw a polygon, save, save - you can try it out at test.

exorbitant error radius.

Thanks, I'll add that to the other, I don't have any problem with any kind of actual data, I'm just trying to prevent nonsense.

with more than 8 segments

That's data, feel free to use all the segments you need. The ones I'm drawing are metadata approximations of point-radius data which allow me to access tools; they can be "rougher" because they can be re-created from the point-radius (actual data) at any time.

you'll create a circle around this polygon

Yes, but I'm not replacing anything, just creating a point-radius approximation for tools that can't use the spatial data. The core of this issue is keeping track of which one is primary, which lets me freely mess with the secondary.

replace the many segments?

Again, those are your data and you can use all you need.

rid of the point coordinates in the record and just have the spacial description?

As data - sure, I'll add them back, but you don't have to pay any attention to them, they're just for the primitives without cool tools.

moving toward a solution for offshore marine localities

I certainly hope so.

generate hundreds of annotations

See #3530, I have tools, this is making the data so I can use them, not in next release.

does the polygon I created violate your limit on segments?

I just want to be very clear on this: The limit is for me, not you. If you need a terabyte to describe some place I'll try to accommodate. (Part of how I do so is by not using any data I don't need to for myself, hence the segments question.)

stop geolocating records like these?

Not at all, you just have more shapes than one (circle) more available; you don't need to suggest alpine clams to include some complex coast anymore. (I will, but my huge circle will be accompanied by primary_spatial_data signifying to anyone with spatial tools to ignore my version; they don't it.)

Should I convert these areas to WKTs

I can convert, just open an issue if some tool is missing.

to and from a WKT representation, making them human readable

I suppose that's technically true, but I'm skipping a step and converting to shapes on a map.

Showing the original attempt as "As entered coordinates" implies that we had coordinates to enter into the record which is not true. How can I get rid of the history in the public record? (New issue?)

Yea, new issue - that's event, not locality, and beyond this.

Does this issue/feature offer a solution we should be using?

I don't think it changes anything for you. You can still assert some geography (which we generally don't seem to have), or my offer to do that for you could be resurrected. How I choose appropriate geography just got more complicated - I can't justify ignoring everything but the point now - but that's my problem.

@sharpphyl
Copy link

This should fix that! Basically just open geolocate, draw a polygon, save, save - you can try it out at test.

Woohoo! Much better. Do you have a date for the next release?

It looks like GBIF copies the polygon but gives the error Footprint WKT invalid. Should we ignore that?

Screen Shot 2022-03-12 at 9 52 38 AM

@dustymc
Copy link
Contributor Author

dustymc commented Mar 12, 2022

It keeps asking for coordinates.

Are you sure that's test? You can select primary_spatial_data, or geolocate will do it for you. There are three possible states:

No coordinate data:
Screen Shot 2022-03-12 at 8 56 31 AM

Point-radius:
Screen Shot 2022-03-12 at 8 56 41 AM

or polygon:
Screen Shot 2022-03-12 at 8 56 50 AM

Do I need both the polygon and coordinates?

You cannot - you do one, I'll fill in the other when you save.

date for the next release?

Mid/early next week, probably/hopefully.

GBIF..Should we ignore that?

For now - yes. Turns out WKT is super flaky, lots of legacy data were problematic to convert, or converted but can't be fully used because they have some wonky "loops around itself" feature or etc. This should (slowly, probably) fix that - as always, exposing data to new tools makes it better.

@sharpphyl
Copy link

sharpphyl commented Mar 12, 2022

Are you sure that's test? You can select primary_spatial_data, or geolocate will do it for you

Yes, I wasn't in test and tried to delete my comment but you beat me to it. It works great in test

Additional explanation very helpful.

@sharpphyl
Copy link

So here's what I get in test when I changed a point-radius to a polygon for http://test.arctos.database.museum/guid/DMNS:Inv:16436. (You do have to start from scratch rather than modifying the coordinates.)

Screen Shot 2022-03-12 at 1 41 15 PM

It looks like geolocate assigns the midpoint as the stated coordinates and the end points as the error radius. It no longer says that they are verbatim coordinates which is helpful for us. Just want to be certain I'm using this correctly as we will probably make a lot of them polygons.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 12, 2022

You do have to start from scratch rather than modifying the coordinates.

Yes, I hope I'll figure out modifying polygons at some point, but I don't have that yet.

geolocate assigns

Nope, geolocate is just building the polygon. When it saves a trigger builds whatever's missing (and converts the weird thing from geolocate into geography datatype), and there are two important considerations there:

  1. The stuff I add is not "data," think of it like a finding aid or approximation for tools that can't use the original data - that's the distinction the new concept makes, and
  2. Since it's generated, it can be regenerated at any time. I'm currently grabbing the centroid of the minimumboundingcircle around the polygon, then grabbing the minimum radius of that to build my circle. There's a bunch of flipflopping back and forth across geometry and geography to get the units I need and etc., and maybe a real GIS-ologist would have my head for it, but it results in something that fits in the point-radius model and looks about right on a map. If someone comes up with a better idea I'll just steal it and regenerate everything, no big deal.

no longer says that they are verbatim coordinates

Right, there's no need because I'm not doing anything fancy with data - I'm just converting to the right datatype (and making up some other disposable-ish stuff because it's useful to do so).

certain I'm using this correctly

Looks reasonable to me. Even if what you drew isn't a perfect representation of what you were given, it seems a heck of a lot more precise than the giant circle we've been mostly limited to - you're no longer asserting that the thing might be from a half-mile inland nor a half-mile offshore.

Note also that there's a new map border color - this one is orange because there's no geography spatial data. You might want to file an Issue (or not, I can find those from the geography data) - I don't know what to do about it right now, but hopefully at some point we'll find a way to get those data.

Screen Shot 2022-03-12 at 1 48 15 PM

@sharpphyl
Copy link

Big, big step forward for us. Yes, much better than the big circles

this one is orange because there's no geography spatial data.

By "geography spatial data" do you mean a WKT or similar "official" area? Just trying to learn the lingo.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 14, 2022

By "geography spatial data" do you mean

Just a spatial/polygon/mappable representation of the geography record. (WKT is a format of spatial data. Its no longer what we store, but we can convert to and from it.)

"official"

Just useful.

@dustymc dustymc removed this from the Active Development milestone Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement I think this would make Arctos even awesomer! Priority-Critical (Arctos is broken) Critical because it is breaking functionality.
Projects
None yet
Development

No branches or pull requests

3 participants