Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Upgrade to 2018i #29

Closed
ghost opened this issue Nov 17, 2018 · 8 comments
Closed

Upgrade to 2018i #29

ghost opened this issue Nov 17, 2018 · 8 comments

Comments

@ghost
Copy link

ghost commented Nov 17, 2018

timezone-boundary-builder 2018g is out. tz-lookup should upgrade to it.

Unfortunately, this release made a radical change: timezone boundaries are now allowed to overlap (and do, in many disputed areas). This violates a necessary assumption of this library in two ways:

  1. The data storage format treats the earth as a bitmap where each pixel has only a single color. The format would need to be extended to support the assumption that there are overlaps.
  2. The interface to this library assumes that there is one—and exactly one—timezone for a given latitude and longitude. We'd need to change the interface to remove this assumption, and ideally support some kind of mechanism for disambiguating which timezone is the desired one. (I haven't the slightest idea how to do this, though, as I'm not familiar with the nature of the overlapping/disputed regions. Some research is necessary.)
@ghost
Copy link
Author

ghost commented Nov 19, 2018

Two examples of disputed territories:

  1. The Xinjiang conflict in central Asia between the People's Republic of China (who use Asia/Shanghai) and the Uyghur people (who use Asia/Urumqi). More background on the conflict from Human Rights Watch and Meduza.
  2. The ongoing Chinese civil war in the Pacific between the People's Republic of China (who use Asia/Shanghai) and the Republic of China (who use Asia/Taipei).

In both cases, which timezone to favor is a political distinction, rather than a regional distinction. For Taiwan, favoring Asia/Taipei makes sense as a practical matter (everyone who lives in these regions, I think, uses Asia/Taipei). But in the Uyghur case, which timezone is used depends heavily on who is asking, as both Han Chinese and Uyghur people live in the region.

I don't really like taking a political stance in a technical library, but one of the issues we'll run into is that most software doesn't (or can't!) know anything meaningful about the requester of the timezone. As a consequence, while it may make sense to have an optional request hint parameter (that gives some additional information about how to disambiguate a request for a timezone), in practice we'll need to set sensible defaults (since most software won't be able to supply such a hint).

Further, what kind of hint might be used is something of an open question. What could we use? Language? Some kind of locale setting?

Anyway, I think this means identifying each conflicting region and deciding which side to favor; and making it easy enough to change in case complaints are brought up.

@ghost
Copy link
Author

ghost commented Nov 19, 2018

I think the simplest technical solution to this is, for each disputed region, add a new pixel color indicating that the region covers timezones X and Y (and ...). We can then handle resolving those disputes entirely on the read side without needing to regenerate the database.

This was referenced Jan 3, 2019
@ghost ghost changed the title Upgrade to 2018g Upgrade to 2018i Jan 6, 2019
@ghost
Copy link
Author

ghost commented Jan 6, 2019

I'm a slowpoke and we're up to 2018i now.

@ghost
Copy link
Author

ghost commented Jan 11, 2019

Getting close on this: the new processing pipeline generally works and returns reasonable results. 6% (29 of 504) of the tests still fail, which I am presently digging into one by one. (There have been a number of changes to the data between 2018d and 2018i, so its possible some of the failing tests are expected.)

@ghost
Copy link
Author

ghost commented Jan 15, 2019

I've resolved all outstanding issues that I'm aware of with maritime zones, down to 21 of 503 tests failing (4%). I believe the rest are either cases of overly-lossy compression or else tests that have legitimately changed.

@ghost
Copy link
Author

ghost commented Jan 18, 2019

Very close now: 6 tests of 503 (1%) failing, all of those automated (e.g. from random locations around the globe). It's likely that the failing locations don't matter and the offending tests simply removed, though I'm going to do a little more verification to be more confident.

@ghost
Copy link
Author

ghost commented Jan 18, 2019

Can confirm that each of those locations that are failing are far from any labelled locations (towns, villages, etc.) at least on Google Maps. (And, naturally, they're rather near borders; e.g. between Xinjiang and the rest of China, etc.)

So I'm going to remove those tests and consider them unimportant for the time being.

Here is a map of what the current version encodes:
out

Library size is 71KB, which is somewhat better than the 118KB of the previous version. (This comes mostly from a couple bugs fixed, including packing unnecessary quadtree nodes.)

@ghost
Copy link
Author

ghost commented Jan 23, 2019

All set: v6.1.12 is out.

@ghost ghost closed this as completed Jan 23, 2019
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants