Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WKT-->spatial data conversion problems #4183

Closed
dustymc opened this issue Dec 10, 2021 · 13 comments
Closed

WKT-->spatial data conversion problems #4183

dustymc opened this issue Dec 10, 2021 · 13 comments
Assignees
Labels
Function-Locality/Event/Georeferencing Priority-Normal (Not urgent) Normal because this needs to get done but not immediately.

Comments

@dustymc
Copy link
Contributor

dustymc commented Dec 10, 2021

From @mkoo

Actually if we could move away from WKT's I'd be super happy. It's great for storing, say, a SQUARE but it gets challenging with coastlines and anything more complicated than an octagon (probably). I just know it's extra steps to convert from GIS data PLUS it stores no metadata! (that's the killer when comparing to a KML or geoJSON)

WKT is also a pain for me and now we (will soon, I hope...) have spatial tools, so

  • add polygon(geography) to geog_auth_rec and locality, move the WKT data in, lose the media_ids
  • convert to WKT or other formats as necessary/useful.

I think the biggest drawback is backups - this will make them larger. Seems a price well worth paying.

This will probably mean it's not possible to use media IDs in the catalog record bulkloader, but I don't think anyone ever has and it's possible (and introduces much less chance for confusion) to pre-create localities and use them.

Originally posted by @dustymc in #2660 (comment)

Splitting this off to a dedicated issue.

I played with this a tiny bit, a fair number of WKTs aren't valid for one reason or another, Google seems to ignore that, PGIS rightly complains. Another compelling reason to bring them into a GIS system rather than something text-based.

One possibility for backups and passing text data around would be to store the spatial data in a dedicated table and reference it; needs more investigation, input most welcome.

@dustymc dustymc added this to the Active Development milestone Dec 10, 2021
@dustymc
Copy link
Contributor Author

dustymc commented Dec 10, 2021

@mkoo it's actually that MOST WKT (I'm only looking at geography right now) isn't valid. https://arctos.database.museum/media/10598522 ferexample starts/ends with

-101.09635 32.96742
-101.09691 32.9674

and it turns out "close enough" isn't close enough for PGIS. (I can't find the "hand grenade mode" setting....)

What would be required to regenerate those data (as postgis geography datatype)?

If that isn't practical, I could probably - at least for straight POLYGON data - fake it, and just make a straight line from wherever the shape ends back to where it begins (which might be a step left out of some conversion, or it might just be complete garbage).

Starting over may or may not be less-evil than making up the last segment, and at this point I have no idea if the last segment is the entirety of the problem (for 3,167 records in test) or just the beginning.

@dustymc
Copy link
Contributor Author

dustymc commented Jan 12, 2022

@mkoo ??

#3530 is stuck behind this, I guess I'll start trying to fake it??

@dustymc
Copy link
Contributor Author

dustymc commented Jan 12, 2022

The spatial (datatype geography) data for 3365 geography records takes about 26MB of disk. That seems workable given the benefits (and necessary given the apparent slop in WKT data).

I was able to fake valid WKT in ways that aren't visually obvious to me for most geography WKTs; unless someone stops me now, I'm going to proceed with this and we can sort out whatever falls through the cracks later. In test, that's these:


                  status                  | geog_auth_rec_id |                               higher_geog                               
------------------------------------------+------------------+-------------------------------------------------------------------------
 parse error - invalid geometry           |          1001680 | North America, United States, Colorado, Denver County
 somethingweird                           |          1005405 | North America, United States, Virginia, Prince William County
 parse error - invalid geometry           |          1001969 | Pacific Ocean, United States, Hawaii, Kauai County, Hawaiian Islands
 somethingweird                           |          1004255 | North America, United States, Virginia, Fairfax County
 somethingweird                           |          1001667 | North America, United States, Colorado, Arapahoe County
 somethingweird                           |          1001671 | North America, United States, Colorado, Boulder County
 can not mix dimensionality in a geometry |             1096 | North America, United States, South Carolina, Aiken County
 somethingweird                           |         10004747 | North America, United States, Colorado, Broomfield County
 parse error - invalid geometry           |         10003572 | Pacific Ocean, United States, Hawaii, Hawaiian Islands
 parse error - invalid geometry           |         10003582 | Pacific Ocean, United States, Hawaii, Honolulu County, Hawaiian Islands
 parse error - invalid geometry           |         10003586 | Pacific Ocean, United States, Hawaii, Maui County, Hawaiian Islands

I'm not yet sure how loading geography data is going to work, if someone has any of those in whatever format and wants to pass it on I can experiment with that.

Also: who knew Kentucky has a weird thing?

@mkoo
Copy link
Member

mkoo commented Jan 13, 2022

i was offline/out of town so ignoring GH issues.... all the WKTs can be easily regenerated into other formats (no guarantees about speediness however we batch these since I havent thought out the process) so not a probably.

So what's the preferred format?

Anything useful in our Berkeleymapper repo? https://github.com/BNHM/spatial-layers
Mostly JSON stuff but also KMLs. maybe play around with these?

eg.
Federal National Forest lands in CA- as geojson

CONUS 7.5 minute topoquad index as KML

CA Ecoregions as KML

All the UC Natural Reserves as KML

@mkoo mkoo added Function-Locality/Event/Georeferencing Priority-Normal (Not urgent) Normal because this needs to get done but not immediately. Type-Data/Database labels Jan 13, 2022
@mkoo
Copy link
Member

mkoo commented Jan 13, 2022

not sure if that's the only weird thing about Kentucky...

@mkoo mkoo assigned mkoo and dustymc Jan 13, 2022
@dustymc
Copy link
Contributor Author

dustymc commented Jan 13, 2022

what's the preferred format?

You tell me, I'm new at this!

I'm storing as geography but it's trivial and fast to convert to other stuff - if the source is valid, and just about everything that's ever been through many WKT-based tools isn't....

MOST everything in test converted with just a little script-hammer-beating-magic, so I'm rebuilding UIs to use postgis. I'll make a list of the failures when this gets to production and we can deal with them after the fact. Maybe we'll want to update other stuff to use data with less jagged edges, IDK, but I think I have enough data to get the system in place without losing anything so I'm gonna rock on.

The locality failures (very few in test) will be bigger problems, they are "data" where geog is more metadata.

I'll hopefully have a UI this week, then we can figure out how to feed it better data.

@dustymc
Copy link
Contributor Author

dustymc commented Jan 19, 2022

These geography records will lose their (invalid) spatial data in production.


select higher_geog,temp_geo.geog_auth_rec_id,temp_geo.wkt_media_id,status 
from temp_geo 
left outer join geog_auth_rec on geog_auth_rec.geog_auth_rec_id=temp_geo.geog_auth_rec_id
where temp_geo.spatial_footprint is null;
                               higher_geog                               | geog_auth_rec_id | wkt_media_id |                  status                  
-------------------------------------------------------------------------+------------------+--------------+------------------------------------------
 North America, United States, Colorado, Denver County                   |          1001680 |     10596531 | parse error - invalid geometry
 North America, United States, Virginia, Prince William County           |          1005405 |     10599581 | somethingweird
 North America, United States, Tennessee, Roane County                   |          1003920 |     10596326 | parse error - invalid geometry
 Pacific Ocean, United States, Hawaii, Kauai County, Hawaiian Islands    |          1001969 |     10599492 | parse error - invalid geometry
 North America, United States, Virginia, Fairfax County                  |          1004255 |     10596710 | somethingweird
 North America, United States, Colorado, Arapahoe County                 |          1001667 |     10597141 | somethingweird
 North America, United States, Alabama, Marshall County                  |          1005939 |     10598997 | parse error - invalid geometry
 North America, United States, Virginia, Roanoke County                  |         10003235 |     10597413 | parse error - invalid geometry
 North America, United States, Virginia, Rockingham County               |         10003237 |     10597414 | parse error - invalid geometry
 North America, United States, Virginia, Washington County               |         10003246 |     10597418 | parse error - invalid geometry
 North America, United States, Virginia, Wise County                     |         10003247 |     10597419 | parse error - invalid geometry
 North America, United States, Virginia, Albemarle County                |         10003197 |     10599178 | parse error - invalid geometry
 North America, United States, Virginia, Augusta County                  |         10003200 |     10597488 | parse error - invalid geometry
 North America, United States, Virginia, Bedford County                  |         10003201 |     10597489 | parse error - invalid geometry
 Pacific Ocean, United States, Hawaii, Hawaiian Islands                  |         10003572 |     10597319 | parse error - invalid geometry
 Pacific Ocean, United States, Hawaii, Honolulu County, Hawaiian Islands |         10003582 |     10597320 | parse error - invalid geometry
 Pacific Ocean, United States, Hawaii, Maui County, Hawaiian Islands     |         10003586 |     10597321 | parse error - invalid geometry
 North America, United States, Virginia, Rockbridge County               |          1004304 |     10597335 | parse error - invalid geometry
 North America, United States, Virginia, Frederick County                |         10006019 |     10597996 | parse error - invalid geometry
 North America, United States, Alaska, Haines Borough                    |         10005340 |     10598806 | parse error - invalid geometry
 Asia, Singapore, Central Singapore Community Development Council        |         10016019 |     10612764 | 
 North America, United States, Tennessee, Cheatham County                |         10005924 |     10598793 | parse error - invalid geometry
 North America, United States, South Carolina, Aiken County              |             1096 |     10597180 | can not mix dimensionality in a geometry
 North America, United States, Tennessee, Monroe County                  |          1006061 |     10596436 | parse error - invalid geometry
 North America, United States, New Jersey, Hudson County                 |          1003339 |     10596394 | parse error - invalid geometry
 North America, United States, Colorado, Boulder County                  |          1001671 |     10599065 | somethingweird
 North America, United States, Colorado, Broomfield County               |         10004747 |     10596245 | somethingweird
 North America, United States, Georgia, Pike County                      |         10005296 |     10597955 | parse error - invalid geometry
 North America, United States, Virginia, Alleghany County                |         10005996 |     10599561 | parse error - invalid geometry
 North America, United States, Virginia, Greensville County              |         10006029 |     10598000 | parse error - invalid geometry
 North America, United States, Georgia, Schley County                    |         10005300 |     10599457 | parse error - invalid geometry
 North America, United States, Georgia, Terrell County                   |         10005304 |     10599117 | parse error - invalid geometry
 North America, United States, Virginia, Henry County                    |         10006174 |     10598413 | parse error - invalid geometry
 North America, United States, Maine, Sagadahoc County                   |         10005150 |     10598450 | parse error - invalid geometry
(34 rows)

@dustymc
Copy link
Contributor Author

dustymc commented Jan 19, 2022

These localities will lose their not-WKT spatial data in production.

temp_geo_loc_rejects.csv.zip

@dustymc
Copy link
Contributor Author

dustymc commented Jan 19, 2022

This is done, except the conversion failures above. Leaving it open in case someone wants to investigate further.

@mkoo
Copy link
Member

mkoo commented Jan 19, 2022 via email

@mkoo
Copy link
Member

mkoo commented Jan 19, 2022

Maybe we should try as geojson? or other format?
If I want to add a spatial footprint, how would I do that now (add media object then associate to HG? basically the same as wkt) But is it a media object still?

@dustymc
Copy link
Contributor Author

dustymc commented Jan 19, 2022

geojson

The docs say that should work. I definitely don't know enough about this to say much of anything, other than I'm willing to experiment with anything I can convert to geography.

add a spatial footprint, how would I do that now

You (mostly) can't, see #4263

is it a media object

No, it's columns of datatype geography.



arcroot@arctos>> \d geog_auth_rec
                                      Table "core.geog_auth_rec"
        Column         |          Type           | Collation | Nullable |           Default            
-----------------------+-------------------------+-----------+----------+------------------------------
 geog_auth_rec_id      | integer                 |           | not null | 
 continent_ocean       | character varying(50)   |           |          | 
 country               | character varying(50)   |           |          | 
 state_prov            | character varying(75)   |           |          | 
 county                | character varying(50)   |           |          | 
 quad                  | character varying(60)   |           |          | 
 ...
 spatial_footprint     | geography               |           |          | 



arcroot@arctos>> \d locality
                                Table "core.locality"
        Column         |            Type             | Collation | Nullable | Default 
-----------------------+-----------------------------+-----------+----------+---------
 locality_id           | integer                     |           | not null | 
 geog_auth_rec_id      | integer                     |           | not null | 
 spec_locality         | character varying(255)      |           |          | 
 dec_lat               | numeric(12,10)              |           |          | 
 dec_long              | numeric(13,10)              |           |          | 
 minimum_elevation     | double precision            |           |          | 
 maximum_elevation     | double precision            |           |          | 
 orig_elev_units       | character varying(30)       |           |          | 
 min_depth             | double precision            |           |          | 
 max_depth             | double precision            |           |          | 
 depth_units           | character varying(30)       |           |          | 
...
 locality_footprint    | geography                   |           |          | 

@dustymc dustymc changed the title WKT-->geography WKT-->spatial data conversion problems Jan 24, 2022
@dustymc
Copy link
Contributor Author

dustymc commented Feb 15, 2022

Anyone who was going to act probably has by now, tentatively closing.

@dustymc dustymc closed this as completed Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Function-Locality/Event/Georeferencing Priority-Normal (Not urgent) Normal because this needs to get done but not immediately.
Projects
None yet
Development

No branches or pull requests

2 participants