Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for uncertain date formats (ISO 8601-2:2019) #15

Open
jeffreyameyer opened this issue May 27, 2020 · 7 comments
Open

Add support for uncertain date formats (ISO 8601-2:2019) #15

jeffreyameyer opened this issue May 27, 2020 · 7 comments

Comments

@jeffreyameyer
Copy link
Member

What's your idea for a cool feature that would help you use OHM better.
It would be great to have support for uncertain dates and time ranges.

This uncertainty could accommodate for vague or unknown time starts, approximate dates, etc.

More details can be found at: https://www.iso.org/standard/70908.html

This feature will have significant impacts across OHM, to include data storage, rendering rules, and time slider behavior.

Current workarounds
None - the site currently requires specific, unqualified dates in the old 8601 format.

@danrademacher
Copy link
Member

Over in this comment #146 (comment) @hroest asked about whether this is the place to handle uncertain dates: https://github.com/OpenHistoricalMap/DateFunctions-plpgsql

I think the general answer is "yes" but there's a lot more to say.

The functions over there convert an ISO date to a "decimal date" - a year plus a fractional decimal that allows us to reliably sort and filter items with the timeslider.

We could in theory have methods to convert other date formats to "decimal date", but right now in all cases the translation would have to result in a single decimal number.

So 1950s would need to be assumed to be 1950. I wonder if we might end up with some issues where 1950 as start_date should be 1950 but as end_date it should be 1959.99999

1955..1960 would need to be either 1955 or 1960 or, who knows, maybe 1957.5.

I think each of these cases is solvable in code, but needs some clarity on what we want to do.

@danrademacher
Copy link
Member

danrademacher commented Sep 18, 2020

handling date ranges

Based on discussion in Discord with @padiwik, we have a feasible way forward here to at least manage these in the near term:

as a first approximation, i think it's ok if the server returns the latest possible start date, i.e. when we are sure the item in question exists

So given an input like DATE..DATE then we would:

  1. split on ..
  2. If start_date take the second item to show it when we have highest confidence the item exists
  3. If end_date take the first item to be conservative about when it's gone.

handling decades

For decades with YYYYs we would:

  1. Drop the s
  2. If start_date assume 1950-01-01 and convert to decimal date
  3. If end_date assume 1959-12-31 and convert to decimal date

@padiwik
Copy link

padiwik commented Sep 18, 2020

why is your suggested behavior for decades distinct from other ranges?

@danrademacher
Copy link
Member

Ah, maybe you'd prefer the more conservative approach of

  • If start_date assume 1959-12-31 and convert to decimal date
  • If end_date assume 1950-01-01 and convert to decimal date

I am naturally glib and prefer to see more data with less certainty than vice versa. But very open to the inverse!

@padiwik
Copy link

padiwik commented Sep 18, 2020

I suggested the conservative approach because I don't believe before 1850 should appear at the beginning of time. And then I thought the approach should be consistent, but it could also make sense to treat it differently in the case when both the beginning and the end of a range are known.

@jeffreyameyer
Copy link
Member Author

jeffreyameyer commented Jan 22, 2021

Where are we on this, given recent discussions about parsers, etc.?
@danrademacher @rwelty1889 @geohacker @batpad

I'm getting the sense that this may not be as tricky to implement across the stack (e.g. core db, tile filters, stylesheets, etc.) as I had thought. Am I wrong?

We do have a lot of user requests for supporting this.

A workaround I think might work (even if a workaround is lame) is [foo]_date.edtf =~1976

@1ec5
Copy link
Member

1ec5 commented May 29, 2023

A workaround I think might work (even if a workaround is lame) is [foo]_date.edtf =~1976

Unfortunately, I think it might be even more complex if we push the responsibility for parsing EDTF onto the client. The vector tiles currently encode start_date and end_date as decimal numbers to get around the fact that the Mapbox Style Specification’s expression language lacks some important string operations, such as regular expression matching (mapbox/mapbox-gl-js#4089) and string splitting (maplibre/maplibre-gl-js#2064). If the tiles contain EDTF verbatim, the frontend would need to use a fork of GL JS that provides a hook so that the website can extend it with an EDTF-parsing operator. That operator could be implemented using EDTF.js, but a fork of GL JS might come with some unwanted maintenance overhead, and it would limit compatibility with potential third-party projects.

The alternative of parsing within PostgresQL might be feasible. If we aren’t comfortable rolling our own parser in PL/pgSQL, perhaps ohm-deploy could define a Python function that uses python-edtf to do the parsing. Or a Rust function that uses edtf-rs if a Rust driver is installed, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants