Skip to content
This repository has been archived by the owner on Feb 2, 2023. It is now read-only.

Data stale? #1001

Closed
dpprdan opened this issue Feb 17, 2020 · 3 comments
Closed

Data stale? #1001

dpprdan opened this issue Feb 17, 2020 · 3 comments

Comments

@dpprdan
Copy link

dpprdan commented Feb 17, 2020

The status page reports under the data freshness heading:

OSM Data from 2018-09-17
Geonames Data from 2018-09-05
BAN Data from 2018-09-17

So the data is almost 1.5 years old. Is there a particular reason for the (OSM) data not being more recent?
Would it be possible to update it (more) regularly?

@lstables
Copy link

lstables commented Mar 6, 2020

Yes, I'm finding this also dn14 7sl for example doesn't show the correct place. So this very much needs updating..

+1

@mirshko
Copy link

mirshko commented Apr 16, 2020

Related to #1033 it seems.

@JonathanMontane
Copy link
Contributor

JonathanMontane commented Apr 27, 2020

Hello everyone.

First of all, I wanted to apologise for the long silence on this ticket.

We have updated our OSM data. This took much longer than expected due to a series of complications with our import pipeline where changes in the way certain fields are handled by the OSM community resulted in data loss in our pipeline. This lead us to having to revert back a previous release as it was missing data fields, which our test suite had not detected. To avoid encountering the same issue again, we decided to not release until we had confidence that our test suite was properly functioning and we have focused a large amount of our energy on improving our test suite so that such changes would be better detected.

We have done a slow rollout of the new data two weeks ago and we haven't had a single ticket about data issues related to OSM. We are now confident that our test suite and our pipeline are safe.

Now let me be transparent, testing geographical data changes is something that is surprisingly hard to do: localisation data, hierarchies, geometry, etc. all change all the time. There are no source of truths that you can rely on, so you have to build ways of investigating the data without trust. In the end, what we are testing is not only our transformation pipeline, but also the data itself. This is good, because we do not want to release an update that would be worse than what we currently have, but, of course, this is also prone to failures and we may miss something again in the future. For now, at least, we believe we are armed with a great battery of tools that will help us troubleshoot and release OSM updates faster.

As this took a lot of time, the data we released is already a bit old (~6months), but we are confident that we will be able to improve our cycle frequency in the future thanks to these tools.

There are still bugs in Places, and we will work our way through them, one at a time, but the data is now better than it ever was.

Thank you all for your patience.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants