Add relevant ONSPD records to the postcode collection worker#556
Merged
Conversation
e33a463 to
efa5de6
Compare
efa5de6 to
08a27fa
Compare
…he check. - Some ONSPD postcodes (small, active) should be eligible for checking against OS Places API, in case higher quality data exists for them.
- previously we would only get os_places records in the update_postcode method. Now we might get onspd records, and if os_places data is returned for them we should update the results and source to promote them to the higher-quality data. - We don't want to delete onspd records if no os places data exists for them, so add a test into the rescue which deletes os_places records and touches onspd records so they don't come up again till the next round (essentially making sure they're checked once a week).
- This table records ONSPD imports. We record the URL for reference, but the main detail is the created_at value, which we can use to determine whether a new dataset is available. - Previously we've used the maximum updated_at value for any ONSPD postcode to check this, but now we're handling that the same as the updated_at for OS Places records, so it becomes unreliable as an indicator of imports (ie we may touch it when we try and fail to update the record to higher-quality OS Places record so that we don't constantly try to update records that OS Places doesn't know about). - This is also just a clearer, simpler way of doing this, and gives us a better audit trail of ONSPD updates.
9ae0522 to
e6dfcce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It's possible that the low quality ONSPD postcode data contains postcodes that are small/active postcodes (ie not retired and not Large User Postcodes), which might therefore exist in the OS Places database, but which we missed in our initial import and have therefore been filled in with ONSPD data. This would mean they could never be updated, since ONSPD and OS Places data hasn't mixed until now.
This PR adds the potentially upgradeable records (about 15k of them) into the postcode collection worker's candidate pool, ensuring they are checked once a week. We handle them slightly differently from OS Places records (if OS places finds no data for them, we don't want to delete them, but we do touch them so that they won't be checked until the next cycle).
Before merging we should run this code in integration for a day or two to confirm it's working and see how many records it updates.
https://trello.com/c/zEZxyZol/425-check-whether-locations-api-low-quality-records-could-be-replaced-with-higher-quality-ones, Jira issue PNP-9228
Follow these steps if you are doing a Rails upgrade.