Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match_area / area_lookup get out of date #71

Open
Moohan opened this issue Nov 30, 2021 · 6 comments
Open

match_area / area_lookup get out of date #71

Moohan opened this issue Nov 30, 2021 · 6 comments

Comments

@Moohan
Copy link
Member

Moohan commented Nov 30, 2021

Probably one for @jackhannah95 as he wrote all the fancy SPARQL code!

The intention with area_lookup was that it should be updated with every package release but there haven't been any package releases. There's also the issue of people not updating to the latest version of phsmethods.

Could the solution be to have match_area do the API query live? Ideally, it could cache data so only one call would usually be needed. Another option (not sure how possible these are) could be to include data with the package then check how old it is (or even better check if there is a new version available at the API). If the data in the package is old; use the live call instead.

@jackhannah95
Copy link
Contributor

Not me I'm afraid - @jvillacampa wrote the SPARQL.

At the time we decided not to have the API query built into the function because we were concerned about performance. The query can be quite slow to run at times on the server and there are the expected timeout issues with the VPN on desktop.

It doesn't solve the longer-term issue of how to keep the area lookup up to date but maybe a one-off update of it would provide a temporary fix.

@Moohan
Copy link
Member Author

Moohan commented Dec 1, 2021

Yeah, the function is a really nice idea but keeping the data behind it current is a really thorny issue!

We could probably set up a GitHub action to regularly (daily/weekly/monthly) run the scripts in data-raw then commit or open a PR to commit changes... I've used actions a bit before on some other repos / packages but never anything that 'custom'.

@jvillacampa
Copy link
Contributor

Good point raised! I remember this was one of the things Jack and I discussed the most when creating the function. I think the idea of the GitHub action sounds very good (but never done it I am afraid). We could add a warning message to advice users to update their version of the package when using this function to ensure they have the latest lookup?

@Moohan
Copy link
Member Author

Moohan commented Dec 8, 2021

I have no time right now but will hopefully have some over Christmas... would you want to look at a warning if I looked at a GH action (I'd first look at opening a monthly PR I think).
Also aware that I have at least 3 other things I've said I'd contribute to in this package, none of which are finished 😛

@Moohan
Copy link
Member Author

Moohan commented Nov 15, 2022

@jvillacampa I was feeling inspired so I was having a look at creating a GH action for this. However, I first tried running the script (as it would have to be totally automated). I noticed quite a few of the areas have new names, including some which are affected by the unicode issue you highlighted in the script.

I found a site on the error - https://www.i18nqa.com/debug/utf8-debug.html which also has a handy lookup table - unicode_fix_lookup.csv (I made the csv). I tried to get a nicer way to fix the replacement but this was the best I could do:

area_lookup %<>%
  dplyr::mutate(area_name = 
                  stringr::str_replace_all(area_name, "ì", "ì") %>% 
                  stringr::str_replace_all("ò", "ò") %>% 
                  stringr::str_replace_all("è", "è") %>% 
                  stringr::str_replace_all("Ù", "Ù") %>% 
                  stringr::str_replace_all("È", "È") %>% 
                  stringr::str_replace_all("Ã\\s", "a") %>% 
                  stringr::str_replace_all("Â", "") %>% 
                  stringr::str_replace_all("’", "'") %>% 
                  stringr::str_squish())

So only slightly more 'hands-free' than your solution...

It seems the encoding issue should probably be solved further upstream, either the Scot gov open data is returning the wrong encoding, or SPARQL is not using the right encoding.

@Moohan
Copy link
Member Author

Moohan commented Jul 3, 2024

@Tina815 I tried running the script to get / update the data data-raw/area_lookup.R and the SPARQL query was erroring. I couldn't figure out how to fix it but I'm not at all familiar with SPARQL...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants