Skip to content

Created lsip_lad data and fetch_lsip() function#133

Open
mzayeddfe wants to merge 15 commits intomainfrom
121-add-lad-lsip-lookup
Open

Created lsip_lad data and fetch_lsip() function#133
mzayeddfe wants to merge 15 commits intomainfrom
121-add-lad-lsip-lookup

Conversation

@mzayeddfe
Copy link
Copy Markdown
Contributor

@mzayeddfe mzayeddfe commented Dec 31, 2025

Brief overview of changes

This PR introduces support for LSIP-LAD lookups and adds a new fetch_lsip_lad function, along with documentation, tests and refactoring for shared fetch filtering logic.

NOTE: I'm not sure how to test for the things outlined by the code coverage report. If you have suggestions, please let me know!

Why are these changes being made?

  • To enable users to access and filter LSIP-LAD relationships in a consistent way, similar to other geography lookups in the package.
  • To improve code maintainability by reducing duplication in fetch filtering logic.

Detailed description of changes

  • Added a new script and workflow for downloading and processing LSIP-LAD data from the ONS Open Geography Portal API.
  • Created the lsip_lad dataset and integrated it into the package through documentation and tests.
  • Implemented the fetch_lsip function for easy access and filtering of LSIP codes and names.
  • Refactored shared year filtering and summarisation logic into the summarise_locations_by_year helper function, used by both fetch_lsip and other fetch functions.
  • Added tests for fetch_lsip and the helper function, including checks for structure, filtering, duplicates, and type safety.
  • Included data sources and update procedures in documentation.
  • The latest test added for the air formatting returns a no jobs run msg and sometimes it's not just an email but an error (screenshot of the email below) - i don't think this is because of the code here but wanted to check
image

Still for me to do after initial review:

  • Pump package version - dependent on amends to pretty_num as well.
  • Update NEWS - will capture along with pretty_num update

Essential Checklist

  • I have read the contributing guidelines
  • The code follows the package style and naming conventions
  • All new and existing tests pass (devtools::test())
  • I have updated the documentation using devtools::document()
  • I have checked that my changes do not break existing functionality

Consider (as applicable)

  • I have added or updated documentation (function documentation, vignettes, readme, etc.)
  • I have updated the NEWS.md file with a summary of my changes
  • I have considered if a version bump is required in DESCRIPTION
  • I have added examples or usage where relevant
  • I have resolved styling (formatted using Air, or styler::style_pkg()) and lintr issues (lintr::lint_package())

Issue ticket number/s and link

Closes #121

@mzayeddfe mzayeddfe linked an issue Dec 31, 2025 that may be closed by this pull request
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 31, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.40%. Comparing base (95c586e) to head (50c638e).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #133       +/-   ##
===========================================
- Coverage   68.80%   56.40%   -12.41%     
===========================================
  Files          15       18        +3     
  Lines        1106     1569      +463     
===========================================
+ Hits          761      885      +124     
- Misses        345      684      +339     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95c586e...50c638e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mzayeddfe mzayeddfe changed the title built function to get and bind the data and added some documentation Created lsip_lad data and fetch_lsip_lad() function Dec 31, 2025
@mzayeddfe mzayeddfe changed the title Created lsip_lad data and fetch_lsip_lad() function Created lsip_lad data and fetch_lsip() function Dec 31, 2025
@mzayeddfe mzayeddfe marked this pull request as ready for review December 31, 2025 14:37
@mzayeddfe mzayeddfe requested a review from cjrace December 31, 2025 14:38
@mzayeddfe mzayeddfe requested a review from rmbielby January 8, 2026 16:03
Copy link
Copy Markdown
Contributor

@cjrace cjrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mzayeddfe, this is great, some nice improvements in here!

Some of the approach you've used looks like you've figured it out yourself rather than reusing the approach we have already so I've detailed an alternative approach for get_lsip_lad() that I'd have done. Mostly for interest / future awareness as I know you wanted to use this as a chance to get stuck into the code, and I'm aware looking back at it the logic I was using isn't the most obvious (partly as geo_hierarchy has become a bit of a b-EES-t! Happy to have chat through if helpful 😄

For code cov I wouldn't worry, that's just showing how many lines of code are ran in tests versus not, for a PR like this adding a lot of internal code bespoke to querying the ONS API I'd expect coverage to go down slightly as not all of those lines of code are worth testing.

#' @return data frame of LSIP-LAD relationships
#' @export
#' @inherit fetch examples
fetch_lsip <- function(year = "All") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generally used the fetch functions for getting a list of locations of one specific type. So from the way the other fetch_* functions work I'd have expected fetch_lsip() to only return lsip_name and lsip_code, not the LADs too (Regions / Countries are exceptions as they have data frames that only have one kind of location in)

Users can get the full lookup using dfeR::lsip_lad and easily filter that, so for consistency with other functions I'd drop LADs from this one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was only producing the LSIP but the documentation was wrong - fixed and will push

Comment on lines +607 to +663
# Base URL components
url_prefix_1 <- "https://services1.arcgis.com/"
url_prefix_2 <- "ESMARspQHYMw9BZ9/arcgis/rest/services/"
url_suffix <- "/FeatureServer/0/query?outFields=*&where=1%3D1&f=json"

# Year-specific URL segments
yr_specific_url <- list(
"2023" = "LAD23_LSIP23_EN_LU",
"2025" = "LAD25_LSIP25_EN_LU"
)
#Create an empty list to store data frames
data_frames <- list()
#Loop through each year and fetch data
for (year in names(yr_specific_url)) {
#Construct the full URL
full_url <- paste0(
url_prefix_1,
url_prefix_2,
yr_specific_url[[year]],
url_suffix
)

#Make the GET request and parse the JSON response
response <- httr::GET(full_url)
# get the content and convert from json
data <- jsonlite::fromJSON(httr::content(response, "text"))

#Extract the attributes and convert to data frame
df <- as.data.frame(data$features$attributes) |>
#create a year column
dplyr::mutate(year = as.integer(year)) |>
#rename columns based on position so binding works
dplyr::select(
year,
lad_code = 1,
lad_name = 2,
lsip_code = 3,
lsip_name = 4
)

#put the data frame into the list
data_frames[[year]] <- df
}
#Combine all data frames into one
combined_df <- do.call(rbind, data_frames)
#get first_available and most_recent year columns
combined_df <- combined_df |>
collapse_timeseries() |>
# strip extra whitespace from all columns
dplyr::mutate(
dplyr::across(
dplyr::everything(),
~ trimws(.x)
)
) |>
#make sure we remove duplicates
dplyr::distinct()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is functioning perfectly well, so no need to change this unless you want to. While I'm here I wanted to highlight a couple of things in case they're helpful though!

  1. httr has been overtaken by httr2, so moving forwards it's generally better to use that for making http requests / API queries

  2. You could also use the get_ons_api_data() helper that's already in dfeR to do this, which would follow the approach I took for other functions, if you do this, you'd need to do a couple of extra steps

a) Add additional column shorthands for LSIP into the ons_geog_shorthands table (and then rerun that script / update that data object), e.g.

## code to prepare `ons_geog_shorthands` data set goes here

ons_level_shorthands <- c(
  "WD",
  "PCON",
  "LAD",
  "UTLA",
  "CTYUA",
  "LSIP",
  "CAUTH",
  "GOR",
  "RGN",
  "CTRY"
)
name_column <- paste0(
  c(
    "ward",
    "pcon",
    "lad",
    "la",
    "la",
    "lsip",
    "cauth",
    "region",
    "region",
    "country"
  ),
  "_name"
)
code_column <- paste0(
  c(
    "ward",
    "pcon",
    "lad",
    "new_la",
    "new_la",
    "lsip",
    "cauth",
    "region",
    "region",
    "country"
  ),
  "_code"
)

ons_geog_shorthands <- data.frame(
  ons_level_shorthands,
  name_column,
  code_column
)

usethis::use_data(ons_geog_shorthands, overwrite = TRUE)

b) Update the get_lsip_lad() function to use get_ons_api_data(), e.g. something like

#' Fetch and combine LSIP-LAD lookup data for multiple years
#'
#' Helper function to extract data from the LSIP-LAD lookups
#'
#' @param year four digit year of the lookup
#'
#' @return data.frame for the individual year of the lookup
#'
#' @keywords internal
#' @noRd
get_lsip_lad <- function(year) {
  year_end <- year %% 100

  data_id <- paste0("LAD", year_end, "_LSIP", year_end, "_EN_LU")

  fields <- paste0(
    "LSIP",
    year_end,
    "CD,LSIP",
    year_end,
    "NM,LAD",
    year_end,
    "CD,LAD",
    year_end,
    "NM"
  )

  output <- get_ons_api_data(
    data_id = data_id,
    params <- list(
      where = "1=1",
      outFields = fields,
      outSR = 4326,
      f = "json"
    )
  )

  tidy_raw_lookup(output)
}

c) udpate the data-raw/lsip_lad.R script to lapply the get_lsip_lad() function over every year you want the lookup for (then this is the place you come back to to edit and update when new lookups are published), e.g.

# First boundaries published in 2023, ONS didn't publish a 2024 set
lsip_lad <- lapply(c(2023, 2025), get_lsip_lad) |>
  create_time_series_lookup()

# Save the data to the package's data directory
usethis::use_data(lsip_lad, overwrite = TRUE)

I think for lsip_lad as it is now, you could leave your code as it is if you didn't want to make these changes, as there's very few rows and the logic you've written seems to return everything as expected (the years come back as character instead of numeric, but that's the only difference I could spot). This is mostly a suggestion for how I'd have written this / how I intended for the helper functions to be used as I know you wanted to use this to learn more about the code in here so far!

One of the reasons for the way I wrote the other code is that there's a limit for the amount of rows you can get in a single query so the approach you've used wouldn't work for larger tables (and therefore you need to use some kind of batching logic like I've put in get_ons_api_data() to send multiple queries to a dataset on the Open Geography Portal to get all the rows).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this @cjrace ! definitely interested in learning more about how the these helper functions work so will try to use them before getting you to re-review. also completely understand the point about the queries getting cut off!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way Git shows the changes in this script takes a bit of getting your head around but had a dig through this and it's a nice breaking out of the logic into smaller parts - I like it (good typo spot too in one of the comments)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add LAD-LSIP lookup

2 participants