# R Scripts to obtain enhanced Historic England Heritage at Risk Data

This Jupyter notebook explains step by step how to get enhanced data from Historic England and the UK Gov Planning department to create the basis of the 
[Peripleo demo visualisation](https://heritage-at-risk.museologi.st) which builds on software created for the AHRC funded Locating a National Collection. 

This notebook takes data from CSV files, scrapes data from the web, combines it and then creates data ready to go into OpenRefine, before being transformed 
using NodeJS into Linked Places GeoJson. 

# Obtain raw Heritage at Risk dataset

First, we need to obtain the raw dataset that we want to use to replicate the processes taken to get to the [Peripleo visualisation](https://heritage-at-risk.museologi.st) and then use to start playing with data sources. This will not download the entire dataset of over 380,000 records from the feature server to enable this to run in a timely manner. The code explains how to do this for the entire data set. 

In [42]:
# Download data from the Digital Planning page in csv format
library(readr)
rawdata <- read.csv('https://files.planning.data.gov.uk/dataset/heritage-at-risk.csv')
# Check the data you downloaded for structure
head(rawdata)
message(This gives you a dataset of 20 columns)

Unnamed: 0_level_0,dataset,end.date,entity,entry.date,geojson,geometry,name,organisation.entity,point,prefix,reference,start.date,typology,categories,documentation.url,geography,heritage.at.risk,legislation,notes,organisation
Unnamed: 0_level_1,<chr>,<lgl>,<int>,<chr>,<lgl>,<chr>,<chr>,<int>,<chr>,<chr>,<int>,<lgl>,<chr>,<lgl>,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,heritage-at-risk,,7500000,2024-09-26,,"MULTIPOLYGON (((-2.140406 52.691736,-2.140134 52.691751,-2.140105 52.691766,-2.140096 52.691800,-2.140079 52.692048,-2.140027 52.693374,-2.140035 52.693526,-2.140074 52.693677,-2.140109 52.693717,-2.140151 52.693742,-2.140207 52.693759,-2.140290 52.693770,-2.140513 52.693778,-2.141410 52.693796,-2.142850 52.693839,-2.143188 52.693857,-2.143404 52.693834,-2.143436 52.693819,-2.143453 52.693787,-2.143460 52.693738,-2.143433 52.691810,-2.143429 52.691754,-2.143417 52.691719,-2.143397 52.691694,-2.143376 52.691681,-2.143223 52.691662,-2.142031 52.691719,-2.141295 52.691711,-2.140406 52.691736)))",Roman fort west of Eaton House,16,POINT(-2.141775 52.692762),heritage-at-risk,1006098,,geography,,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35747,,,,,
2,heritage-at-risk,,7500001,2021-05-27,,"MULTIPOLYGON (((-3.529071 51.050513,-3.528912 51.050517,-3.528974 51.050646,-3.529731 51.050616,-3.529786 51.050423,-3.529854 51.050251,-3.530052 51.049873,-3.529864 51.049878,-3.529870 51.049916,-3.529745 51.049922,-3.529747 51.049945,-3.529552 51.049950,-3.529556 51.050001,-3.529418 51.050007,-3.529419 51.050030,-3.529363 51.050032,-3.529360 51.050007,-3.529231 51.050012,-3.529216 51.049849,-3.528877 51.049839,-3.528887 51.050269,-3.528870 51.050385,-3.529127 51.050382,-3.529126 51.050454,-3.529069 51.050454,-3.529071 51.050513)))",Barlinch Priory,16,POINT(-3.529384 51.050242),heritage-at-risk,1006213,,geography,,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/42043,,,,,
3,heritage-at-risk,,7500002,2024-09-26,,"MULTIPOLYGON (((-2.861665 52.670295,-2.861330 52.670484,-2.861162 52.670614,-2.861089 52.670697,-2.861085 52.670777,-2.861138 52.670863,-2.861216 52.670948,-2.861323 52.671037,-2.861628 52.671269,-2.861980 52.671505,-2.862035 52.671528,-2.862105 52.671522,-2.862193 52.671488,-2.862409 52.671378,-2.862512 52.671317,-2.863067 52.671038,-2.863099 52.671008,-2.863061 52.670973,-2.862911 52.670876,-2.862707 52.670778,-2.862003 52.670305,-2.861851 52.670193,-2.861665 52.670295)))",Roman villa 150 yards (140 metres) south east of Lea Hall,16,POINT(-2.862003 52.670884),heritage-at-risk,1006246,,geography,,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/41990,,,,,
4,heritage-at-risk,,7500003,2024-09-26,,"MULTIPOLYGON (((-2.702104 52.619080,-2.703156 52.619377,-2.703886 52.618757,-2.703751 52.618281,-2.703742 52.618212,-2.703771 52.618081,-2.703439 52.617900,-2.702104 52.619080)))",Abutment of Roman Bridge at Radnor Bridge,16,POINT(-2.703145 52.618715),heritage-at-risk,1006280,,geography,,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/47933,,,,,
5,heritage-at-risk,,7500004,2024-09-26,,"MULTIPOLYGON (((-1.125435 51.602194,-1.125322 51.602193,-1.125441 51.602482,-1.125436 51.602755,-1.125472 51.602849,-1.125620 51.602967,-1.125668 51.603085,-1.125701 51.603295,-1.125677 51.603808,-1.126543 51.603805,-1.126994 51.603783,-1.127204 51.603766,-1.127649 51.603683,-1.128086 51.603493,-1.128235 51.603340,-1.128406 51.603031,-1.128530 51.602127,-1.128583 51.601409,-1.128317 51.601440,-1.127538 51.601463,-1.127228 51.601365,-1.127150 51.601182,-1.126602 51.601148,-1.126499 51.601128,-1.126446 51.601098,-1.126423 51.601058,-1.126392 51.600955,-1.126275 51.600956,-1.126287 51.601070,-1.126210 51.601077,-1.126240 51.601209,-1.125755 51.601297,-1.125462 51.601338,-1.125457 51.601407,-1.124810 51.601430,-1.124791 51.601828,-1.124930 51.601829,-1.125139 51.601807,-1.125225 51.601773,-1.125293 51.601720,-1.125359 51.601652,-1.125447 51.601586,-1.125435 51.602194)),((-1.127917 51.598020,-1.127969 51.598130,-1.127984 51.598853,-1.127987 51.599736,-1.127780 51.599771,-1.127711 51.600820,-1.128375 51.600852,-1.128644 51.600857,-1.128829 51.599192,-1.128922 51.598586,-1.128943 51.598346,-1.128936 51.598153,-1.128888 51.598091,-1.127867 51.597914,-1.127917 51.598020)))",Defences to the Saxon town [within Bull and Kine Croft],16,POINT(-1.127250 51.601647),heritage-at-risk,1006329,,geography,,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35923,,,,,
6,heritage-at-risk,,7500005,2024-09-26,,"MULTIPOLYGON (((-1.542041 51.746577,-1.541330 51.747420,-1.541002 51.747907,-1.540604 51.748604,-1.540221 51.749196,-1.539946 51.749482,-1.540739 51.749842,-1.544506 51.748465,-1.543543 51.746731,-1.543425 51.746539,-1.543276 51.746327,-1.543081 51.746086,-1.542896 51.745886,-1.542718 51.745712,-1.542041 51.746577)))",Rectangular enclosures 1100yds (1010m) north west of Mount Owen Farm,16,POINT(-1.542321 51.748031),heritage-at-risk,1006348,,geography,,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/30561,,,,,


You will see above that the raw data set has 20 columns of data. We probably don't need all of these right now. So let's remove the ones we do not need. You will see there is a geometry file for the GeoJSON outline, we could get the centre of this, but I wanted to show how to convert British National Grid to LatLon coordinates so we drop this too. 

In [45]:
# Define columns to drop
drops <- c("organisation.entity","prefix", "categories", "legislation", "notes", "geometry", "geojson")
# Drop the columns
rawDataCols <-rawdata[ , !(names(rawdata) %in% drops)]
# Rename reference column
names(rawDataCols)[names(rawDataCols) == "reference"] <- "ListEntry"

# Test your data again
head(rawDataCols)
write_csv(rawDataCols, 'HAR.csv')
message('This produces a 13 column data set of HAR data')

Unnamed: 0_level_0,dataset,end.date,entity,entry.date,name,point,ListEntry,start.date,typology,documentation.url,geography,heritage.at.risk,organisation
Unnamed: 0_level_1,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<int>,<lgl>,<chr>,<chr>,<lgl>,<lgl>,<lgl>
1,heritage-at-risk,,7500000,2024-09-26,Roman fort west of Eaton House,POINT(-2.141775 52.692762),1006098,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35747,,,
2,heritage-at-risk,,7500001,2021-05-27,Barlinch Priory,POINT(-3.529384 51.050242),1006213,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/42043,,,
3,heritage-at-risk,,7500002,2024-09-26,Roman villa 150 yards (140 metres) south east of Lea Hall,POINT(-2.862003 52.670884),1006246,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/41990,,,
4,heritage-at-risk,,7500003,2024-09-26,Abutment of Roman Bridge at Radnor Bridge,POINT(-2.703145 52.618715),1006280,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/47933,,,
5,heritage-at-risk,,7500004,2024-09-26,Defences to the Saxon town [within Bull and Kine Croft],POINT(-1.127250 51.601647),1006329,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35923,,,
6,heritage-at-risk,,7500005,2024-09-26,Rectangular enclosures 1100yds (1010m) north west of Mount Owen Farm,POINT(-1.542321 51.748031),1006348,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/30561,,,


This produces a 13 column data set of HAR data



We now have a slightly modified set of data. 20 columns has been reduced to 13. 
We are now going to get some extra data from the NHLE resources. Have a look at the [ArcView Feature Server](https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/ArcGIS/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer) page to work out what to download. There is an array of data layers with corresponding integers - eg 0, 6 etc and the server is **ZOdPfBS3aqqDYPUQ**. You will want all these data and all the fields, so wild card away.

In [118]:
# Load the jsonlite library
library(jsonlite)
# Get the json response from the ARCVIEW server
# To get the parameters required read the API manual https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service/
# The id for the server is ZOdPfBS3aqqDYPUQ
# The FeatureServer is 0
# The dataset for NHLE points is National_Heritage_List_for_England_NHLE_v02_VIEW
countNHLE <- fromJSON('https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/ArcGIS/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/0/query?where=0%3D0&objectIds=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&resultType=none&distance=0.0&units=esriSRUnit_Meter&relationParam=&returnGeodetic=false&outFields=*&returnGeometry=true&returnEnvelope=false&featureEncoding=esriDefault&multipatchOption=xyFootprint&maxAllowableOffset=&geometryPrecision=&outSR=&defaultSR=&datumTransformation=&applyVCSProjection=false&returnIdsOnly=false&returnUniqueIdsOnly=false&returnCountOnly=true&returnExtentOnly=false&returnQueryGeometry=false&returnDistinctValues=false&cacheHint=false&collation=&orderByFields=&groupByFieldsForStatistics=&outStatistics=&having=&resultOffset=&resultRecordCount=200&returnZ=false&returnM=false&returnTrueCurves=true&returnExceededLimitFeatures=false&quantizationParameters=&sqlFormat=none&f=json&token=')
# Get the count
totalNHLE <- countNHLE$count[1]
# Print this value
message(paste0('The total number of NHLE listed buildings equals: ', totalNHLE))
# We are going to get 1000 records at a time
recordsToReturn <- 50
# We are going to paginate this response
pagination <- ceiling(totalNHLE/recordsToReturn)
# Set the pagination limit low for this demo script 
pagination <- 20

The total number of NHLE listed buildings equals: 379557



## Download the NHLE listed buildings data

We are now going to download data from the NHLE listed buildings data feature layer. This is layer 0. 

In [48]:
# Obtain data from the ArcGIS feature server
# The id for the server is ZOdPfBS3aqqDYPUQ
# The FeatureServer is 0
# The dataset for NHLE points is National_Heritage_List_for_England_NHLE_v02_VIEW
url <- paste0("https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/0/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=", recordsToReturn)
json <- fromJSON(url)
data <- json$features$attributes
# Decide which columns to keep
keeps <- c("ListEntry","Name","Grade","hyperlink","NGR","Easting","Northing")
data <- data[,(names(data) %in% keeps)]

# Now paginate through the data set
for (i in seq(from=(1 * recordsToReturn), to=(pagination*recordsToReturn), by=recordsToReturn)){
  urlDownload <- paste(url, '&resultOffset=', i, sep='')
  print(urlDownload)
  pagedJson <- fromJSON(urlDownload)
  records <- pagedJson$features$attributes
  records <- records[,(names(records) %in% keeps)]
  data <-rbind(data,records)
  # Add a snooze so we don't get blocked easily
  Sys.sleep(1.0)
}
head(data)

message('This produces a data frame that is 7 column data set') 

# Check if packages exist and if not install them for use
packages <- c("sf", "utils")
any_not_installed <- !all(packages %in% installed.packages()[, "Package"])
if (any_not_installed) {
  # Code to execute if at least one package is not installed
  message("At least one of the packages is not installed.")
  # Install missing packages
  missing_packages <- packages[!(packages %in% installed.packages()[, "Package"])]
  if (length(missing_packages) > 0) {
    install.packages(missing_packages)
  }
} 

library(utils)
library(sf)

# Subset the point data
pointData <- subset(data, select = c("Easting","Northing"))

## Create coordinates variable
coordsCast <- cbind(Easting = as.numeric(as.character(pointData$Easting)),
                Northing = as.numeric(as.character(pointData$Northing)))

# Create an sf object with BNG coordinates
bng_coords <- st_as_sf(data.frame(coordsCast), 
                     coords = c("Easting", "Northing"), 
                     crs = 27700) 

# Transform to WGS84 (EPSG:4326)
wgs84_coords <- st_transform(bng_coords, 4326)

# Extract latitude and longitude and append to data
data$lat <- st_coordinates(wgs84_coords)[, "Y"]
data$lon <- st_coordinates(wgs84_coords)[, "X"]

# Print the head rows of the new data frame
head(data)
message('This produces a data frame that is 9 column data set') 

# Write to a csv file 
write_csv(data, 'NHLE.csv')

[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/0/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=50&resultOffset=50"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/0/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=50&resultOffset=100"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/0/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=50&resultOffset=150"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/0/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=50&resultOffset=200"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/Featu

Unnamed: 0_level_0,ListEntry,Name,Grade,hyperlink,NGR,Easting,Northing
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<int>,<int>
1,1021466,20 and 20A Whitbourne Springs,II,https://historicengland.org.uk/listing/the-list/list-entry/1021466,ST8338544428,383389,144430
2,1021467,TENNIS CORNER FARMHOUSE WITH GRANARY AND STABLE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021467,ST 82839 50851,382839,150851
3,1021468,CHALCOT HOUSE,II*,https://historicengland.org.uk/listing/the-list/list-entry/1021468,ST 84294 48824,384294,148824
4,1021469,FIVE LORDS FARMHOUSE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021469,ST 82772 50187,382772,150187
5,1021470,PENLEIGH MILL,II,https://historicengland.org.uk/listing/the-list/list-entry/1021470,ST 85722 50467,385722,150467
6,1021471,PENLEIGH HOUSE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021471,ST 85622 50828,385622,150828


This produces a data frame that is 7 column data set



Unnamed: 0_level_0,ListEntry,Name,Grade,hyperlink,NGR,Easting,Northing,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
1,1021466,20 and 20A Whitbourne Springs,II,https://historicengland.org.uk/listing/the-list/list-entry/1021466,ST8338544428,383389,144430,51.19884,-2.239118
2,1021467,TENNIS CORNER FARMHOUSE WITH GRANARY AND STABLE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021467,ST 82839 50851,382839,150851,51.25656,-2.2473
3,1021468,CHALCOT HOUSE,II*,https://historicengland.org.uk/listing/the-list/list-entry/1021468,ST 84294 48824,384294,148824,51.23838,-2.22636
4,1021469,FIVE LORDS FARMHOUSE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021469,ST 82772 50187,382772,150187,51.25059,-2.248227
5,1021470,PENLEIGH MILL,II,https://historicengland.org.uk/listing/the-list/list-entry/1021470,ST 85722 50467,385722,150467,51.25319,-2.205971
6,1021471,PENLEIGH HOUSE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021471,ST 85622 50828,385622,150828,51.25643,-2.207419


This produces a data frame that is 9 column data set



We are now going to do this for the scheduled monuments, conservation and battlefields listed data sets as well. Same drill, similar outcome. 

## Download the NHLE Scheduled Monument Data

Now to repeat the steps above, but for a different set of data, this time for scheduled monument data. This is feature server layer 6. 

In [119]:
# Load the jsonlite library
library(jsonlite)
# Get the json response from the ARCVIEW server
# To get the parameters required read the API manual https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service/
# The id for the server is ZOdPfBS3aqqDYPUQ
# The FeatureServer is 6
# The dataset for NHLE points is National_Heritage_List_for_England_NHLE_v02_VIEW
countNHLE_Monuments <- fromJSON('https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/ArcGIS/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/6/query?where=0%3D0&objectIds=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&resultType=none&distance=0.0&units=esriSRUnit_Meter&relationParam=&returnGeodetic=false&outFields=*&returnGeometry=true&returnEnvelope=false&featureEncoding=esriDefault&multipatchOption=xyFootprint&maxAllowableOffset=&geometryPrecision=&outSR=&defaultSR=&datumTransformation=&applyVCSProjection=false&returnIdsOnly=false&returnUniqueIdsOnly=false&returnCountOnly=true&returnExtentOnly=false&returnQueryGeometry=false&returnDistinctValues=false&cacheHint=false&collation=&orderByFields=&groupByFieldsForStatistics=&outStatistics=&having=&resultOffset=&resultRecordCount=200&returnZ=false&returnM=false&returnTrueCurves=true&returnExceededLimitFeatures=false&quantizationParameters=&sqlFormat=none&f=json&token=')
# Get the count
totalScheduled <- countNHLE_Monuments$count[1]
# Print this value
message(totalScheduled)
# We are going to get 1000 records at a time
recordsToReturn <- 10
# We are going to paginate this response
pagination <- ceiling(totalScheduled/recordsToReturn)

# Set the pagination to 10 for this demo
pagination <- 10

# Obtain data from the ArcGIS feature server - Scheduled monuments are layer 6
url <- paste0("https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/6/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=", recordsToReturn)
json <- fromJSON(url)
data <- json$features$attributes
# Decide which columns to keep
keeps <- c("ListEntry","Name","Grade","hyperlink","NGR","Easting","Northing")
data <- data[,(names(data) %in% keeps)]

# Now paginate through the data set
for (i in seq(from=(1 * recordsToReturn), to=(pagination*recordsToReturn), by=recordsToReturn)){
  urlDownload <- paste(url, '&resultOffset=', i, sep='')
  print(urlDownload)
  pagedJson <- fromJSON(urlDownload)
  records <- pagedJson$features$attributes
  records <- records[,(names(records) %in% keeps)]
  data <-rbind(data,records)
  # Add a snooze so we don't get blocked easily
  Sys.sleep(1.0)
}
data$grade <- NA
head(data)

message('This produces a data frame that is 7 column data set') 

# Check if packages exist and if not install them for use
packages <- c("sf", "utils")
any_not_installed <- !all(packages %in% installed.packages()[, "Package"])
if (any_not_installed) {
  # Code to execute if at least one package is not installed
  message("At least one of the packages is not installed.")
  # Install missing packages
  missing_packages <- packages[!(packages %in% installed.packages()[, "Package"])]
  if (length(missing_packages) > 0) {
    install.packages(missing_packages)
  }
} 
library(utils)
library(sf)

# Subset the point data
pointData <- subset(data, select = c("Easting","Northing"))

## Create coordinates variable
coordsCast <- cbind(Easting = as.numeric(as.character(pointData$Easting)),
                Northing = as.numeric(as.character(pointData$Northing)))

# Create an sf object with BNG coordinates
bng_coords <- st_as_sf(data.frame(coordsCast), 
                     coords = c("Easting", "Northing"), 
                     crs = 27700) 

# Transform to WGS84 (EPSG:4326)
wgs84_coords <- st_transform(bng_coords, 4326)

# Extract latitude and longitude and append to data
data$lat <- st_coordinates(wgs84_coords)[, "Y"]
data$lon <- st_coordinates(wgs84_coords)[, "X"]

# Print the head rows of the new data frame
head(data)
message('This produces a data frame that is 9 column data set') 

# Write to a csv file 
write_csv(data, 'ScheduledMonuments.csv')

19990



[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/6/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=10"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/6/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=20"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/6/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=30"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/6/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=40"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureS

Unnamed: 0_level_0,ListEntry,Name,hyperlink,NGR,Easting,Northing,grade
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>
1,1001718,Mound S of Woodbrook,https://historicengland.org.uk/listing/the-list/list-entry/1001718,SO 30447 54456,330447,254456,
2,1001719,Castle Twts,https://historicengland.org.uk/listing/the-list/list-entry/1001719,SO 27703 55474,327703,255474,
3,1001720,Lyonshall Castle,https://historicengland.org.uk/listing/the-list/list-entry/1001720,SO 33165 56319,333165,256319,
4,1001721,Mound 1200yds (1100m) NNE of the Church,https://historicengland.org.uk/listing/the-list/list-entry/1001721,SO 37237 68101,337237,268101,
5,1001722,Limebrook Priory,https://historicengland.org.uk/listing/the-list/list-entry/1001722,SO 37411 66071,337411,266071,
6,1001723,"Two bowl barrows, one 220m east of Lower Longbeak and the other 320m east of Higher Longbeak",https://historicengland.org.uk/listing/the-list/list-entry/1001723,"SS 19954 03896, SS 19871 03235",219871,103235,


This produces a data frame that is 7 column data set



Unnamed: 0_level_0,ListEntry,Name,hyperlink,NGR,Easting,Northing,grade,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>,<dbl>,<dbl>
1,1001718,Mound S of Woodbrook,https://historicengland.org.uk/listing/the-list/list-entry/1001718,SO 30447 54456,330447,254456,,52.18395,-3.018721
2,1001719,Castle Twts,https://historicengland.org.uk/listing/the-list/list-entry/1001719,SO 27703 55474,327703,255474,,52.19274,-3.059066
3,1001720,Lyonshall Castle,https://historicengland.org.uk/listing/the-list/list-entry/1001720,SO 33165 56319,333165,256319,,52.20103,-2.979341
4,1001721,Mound 1200yds (1100m) NNE of the Church,https://historicengland.org.uk/listing/the-list/list-entry/1001721,SO 37237 68101,337237,268101,,52.30742,-2.921963
5,1001722,Limebrook Priory,https://historicengland.org.uk/listing/the-list/list-entry/1001722,SO 37411 66071,337411,266071,,52.28919,-2.919033
6,1001723,"Two bowl barrows, one 220m east of Lower Longbeak and the other 320m east of Higher Longbeak",https://historicengland.org.uk/listing/the-list/list-entry/1001723,"SS 19954 03896, SS 19871 03235",219871,103235,,50.80061,-4.557574


This produces a data frame that is 9 column data set



# Download Registered Parks and Gardens

We now download registered parks and gardens data, the feature layer is 7. 

In [120]:
# Load the jsonlite library
library(jsonlite)
# Get the json response from the ARCVIEW server
# To get the parameters required read the API manual https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service/
# The id for the server is ZOdPfBS3aqqDYPUQ
# The FeatureServer is 7
# The dataset for NHLE points is National_Heritage_List_for_England_NHLE_v02_VIEW
countNHLEParks <- fromJSON('https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/ArcGIS/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/7/query?where=0%3D0&objectIds=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&resultType=none&distance=0.0&units=esriSRUnit_Meter&relationParam=&returnGeodetic=false&outFields=*&returnGeometry=true&returnEnvelope=false&featureEncoding=esriDefault&multipatchOption=xyFootprint&maxAllowableOffset=&geometryPrecision=&outSR=&defaultSR=&datumTransformation=&applyVCSProjection=false&returnIdsOnly=false&returnUniqueIdsOnly=false&returnCountOnly=true&returnExtentOnly=false&returnQueryGeometry=false&returnDistinctValues=false&cacheHint=false&collation=&orderByFields=&groupByFieldsForStatistics=&outStatistics=&having=&resultOffset=&resultRecordCount=200&returnZ=false&returnM=false&returnTrueCurves=true&returnExceededLimitFeatures=false&quantizationParameters=&sqlFormat=none&f=json&token=')
# Get the count
totalParks <- countNHLEParks$count[1]
# Print this value
head(totalParks)
# We are going to get 10 records at a time
recordsToReturn <- 10
# We are going to paginate this response
pagination <- ceiling(totalParks/recordsToReturn)
# Set the pagination level to 10 for this demo.
pagination <- 10
# Obtain data from the ArcGIS feature server 
url <- paste0("https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/7/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=", recordsToReturn)
json <- fromJSON(url)
data <- json$features$attributes
# Decide which columns to keep
keeps <- c("ListEntry","Name","Grade","hyperlink","NGR","Easting","Northing")
data <- data[,(names(data) %in% keeps)]

# Now paginate through the data set
for (i in seq(from=(1 * recordsToReturn), to=(pagination*recordsToReturn), by=recordsToReturn)){
  urlDownload <- paste(url, '&resultOffset=', i, sep='')
  print(urlDownload)
  pagedJson <- fromJSON(urlDownload)
  records <- pagedJson$features$attributes
  records <- records[,(names(records) %in% keeps)]
  data <-rbind(data,records)
  # Add a snooze so we don't get blocked easily
  Sys.sleep(1.0)
}
head(data)

message('This produces a data frame that is 7 column data set') 

# Check if packages exist and if not install them for use
packages <- c("sf", "utils")
any_not_installed <- !all(packages %in% installed.packages()[, "Package"])
if (any_not_installed) {
  # Code to execute if at least one package is not installed
  message("At least one of the packages is not installed.")
  # Install missing packages
  missing_packages <- packages[!(packages %in% installed.packages()[, "Package"])]
  if (length(missing_packages) > 0) {
    install.packages(missing_packages)
  }
} 
library(utils)
library(sf)

# Subset the point data
pointData <- subset(data, select = c("Easting","Northing"))

## Create coordinates variable
coordsCast <- cbind(Easting = as.numeric(as.character(pointData$Easting)),
                Northing = as.numeric(as.character(pointData$Northing)))

# Create an sf object with BNG coordinates
bng_coords <- st_as_sf(data.frame(coordsCast), 
                     coords = c("Easting", "Northing"), 
                     crs = 27700) 

# Transform to WGS84 (EPSG:4326)
wgs84_coords <- st_transform(bng_coords, 4326)

# Extract latitude and longitude and append to data
data$lat <- st_coordinates(wgs84_coords)[, "Y"]
data$lon <- st_coordinates(wgs84_coords)[, "X"]

# Print the head rows of the new data frame
head(data)
message('This produces a data frame that is 9 column data set') 

# Write to a csv file 
write_csv(data, 'Parks.csv')
message('These data have been written to Parks.csv') 


[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/7/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=10"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/7/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=20"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/7/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=30"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/7/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=40"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureS

Unnamed: 0_level_0,ListEntry,Name,Grade,hyperlink,NGR,Easting,Northing
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<int>,<int>
1,1000107,ROUSHAM,I,https://historicengland.org.uk/listing/the-list/list-entry/1000107,"SP 48283 26028, SP 47678 24234, SP4802625124",447679,224233
2,1000108,HAMPTON COURT,I,https://historicengland.org.uk/listing/the-list/list-entry/1000108,TQ 16570 68051,516570,168051
3,1000109,HIGHCLERE PARK,I,https://historicengland.org.uk/listing/the-list/list-entry/1000109,SU4498459143,444984,159257
4,1000110,FARNBOROUGH HALL,I,https://historicengland.org.uk/listing/the-list/list-entry/1000110,SP4274249465,443000,249084
5,1000111,CHISWICK HOUSE,I,https://historicengland.org.uk/listing/the-list/list-entry/1000111,TQ2083077489,520882,177637
6,1000112,LEIGH PARK (STAUNTON COUNTRY PARK),II*,https://historicengland.org.uk/listing/the-list/list-entry/1000112,SU 71991 08944,471991,108944


This produces a data frame that is 7 column data set



Unnamed: 0_level_0,ListEntry,Name,Grade,hyperlink,NGR,Easting,Northing,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
1,1000107,ROUSHAM,I,https://historicengland.org.uk/listing/the-list/list-entry/1000107,"SP 48283 26028, SP 47678 24234, SP4802625124",447679,224233,51.91459,-1.3082246
2,1000108,HAMPTON COURT,I,https://historicengland.org.uk/listing/the-list/list-entry/1000108,TQ 16570 68051,516570,168051,51.39952,-0.3256924
3,1000109,HIGHCLERE PARK,I,https://historicengland.org.uk/listing/the-list/list-entry/1000109,SU4498459143,444984,159257,51.33064,-1.3557267
4,1000110,FARNBOROUGH HALL,I,https://historicengland.org.uk/listing/the-list/list-entry/1000110,SP4274249465,443000,249084,52.13839,-1.3731311
5,1000111,CHISWICK HOUSE,I,https://historicengland.org.uk/listing/the-list/list-entry/1000111,TQ2083077489,520882,177637,51.48478,-0.260467
6,1000112,LEIGH PARK (STAUNTON COUNTRY PARK),II*,https://historicengland.org.uk/listing/the-list/list-entry/1000112,SU 71991 08944,471991,108944,50.87552,-0.9781671


This produces a data frame that is 9 column data set

These data have been written to Parks.csv



# Download Registered Battlefields

In [121]:
# Load the jsonlite library
library(jsonlite)
# Get the json response from the ARCVIEW server
# To get the parameters required read the API manual https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service/
# The id for the server is ZOdPfBS3aqqDYPUQ
# The FeatureServer layer is 8
# The dataset for NHLE points is National_Heritage_List_for_England_NHLE_v02_VIEW
countNHLEBattlefields <- fromJSON('https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/ArcGIS/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/8/query?where=0%3D0&objectIds=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&resultType=none&distance=0.0&units=esriSRUnit_Meter&relationParam=&returnGeodetic=false&outFields=*&returnGeometry=true&returnEnvelope=false&featureEncoding=esriDefault&multipatchOption=xyFootprint&maxAllowableOffset=&geometryPrecision=&outSR=&defaultSR=&datumTransformation=&applyVCSProjection=false&returnIdsOnly=false&returnUniqueIdsOnly=false&returnCountOnly=true&returnExtentOnly=false&returnQueryGeometry=false&returnDistinctValues=false&cacheHint=false&collation=&orderByFields=&groupByFieldsForStatistics=&outStatistics=&having=&resultOffset=&resultRecordCount=200&returnZ=false&returnM=false&returnTrueCurves=true&returnExceededLimitFeatures=false&quantizationParameters=&sqlFormat=none&f=json&token=')
# Get the count
totalBattles <- countNHLEBattlefields$count[1]
# Print this value
head(totalBattles)
# We are going to get 10 records at a time
recordsToReturn <- 10
# We are going to paginate this response
pagination <- ceiling(totalParks/recordsToReturn)
# For this demo set the pagination to 10 
paginatin <- 10 
message(paste0('Pages to download: ', pagination))

# Obtain data from the ArcGIS feature server - Battlefields are layer 8
url <- paste0("https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/8/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=", recordsToReturn)
json <- fromJSON(url)
data <- json$features$attributes
# Decide which columns to keep
keeps <- c("ListEntry","Name","Grade","hyperlink","NGR","Easting","Northing","geometry")
data <- data[,(names(data) %in% keeps)]

# Now paginate through the data set
for (i in seq(from=(1 * recordsToReturn), to=(pagination*recordsToReturn), by=recordsToReturn)){
  urlDownload <- paste(url, '&resultOffset=', i, sep='')
  print(urlDownload)
  pagedJson <- fromJSON(urlDownload)
  records <- pagedJson$features$attributes
  records <- records[,(names(records) %in% keeps)]
  data <-rbind(data,records)
  # Add a snooze so we don't get blocked easily
  Sys.sleep(1.0)
}
data$Grade <- NA
head(data)

message('This produces a data frame that is 6 column data set') 

# Check if packages exist and if not install them for use
packages <- c("sf", "utils")
any_not_installed <- !all(packages %in% installed.packages()[, "Package"])
if (any_not_installed) {
  # Code to execute if at least one package is not installed
  message("At least one of the packages is not installed.")
  # Install missing packages
  missing_packages <- packages[!(packages %in% installed.packages()[, "Package"])]
  if (length(missing_packages) > 0) {
    install.packages(missing_packages)
  }
} 
library(utils)
library(sf)

# Subset the point data
pointData <- subset(data, select = c("Easting","Northing"))

## Create coordinates variable
coordsCast <- cbind(Easting = as.numeric(as.character(pointData$Easting)),
                Northing = as.numeric(as.character(pointData$Northing)))

# Create an sf object with BNG coordinates
bng_coords <- st_as_sf(data.frame(coordsCast), 
                     coords = c("Easting", "Northing"), 
                     crs = 27700) 

# Transform to WGS84 (EPSG:4326)
wgs84_coords <- st_transform(bng_coords, 4326)

# Extract latitude and longitude and append to data
data$lat <- st_coordinates(wgs84_coords)[, "Y"]
data$lon <- st_coordinates(wgs84_coords)[, "X"]

# Print the head rows of the new data frame
head(data)
message('This produces a data frame that is a 9 column data set') 

# Write to a csv file 
write_csv(data, 'Battlefields.csv')
message('These data have been written to Battlefields.csv') 


Pages to download172



[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/8/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=10"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/8/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=20"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/8/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=30"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureServer/8/query?where=0%3D0&outFields=%2A&f=json&resultRecordCount=10&resultOffset=40"
[1] "https://services-eu1.arcgis.com/ZOdPfBS3aqqDYPUQ/arcgis/rest/services/National_Heritage_List_for_England_NHLE_v02_VIEW/FeatureS

Unnamed: 0_level_0,ListEntry,Name,hyperlink,NGR,Easting,Northing,Grade
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>
1,1000000,Battle of Adwalton Moor 1643,https://historicengland.org.uk/listing/the-list/list-entry/1000000,SE2164428855,421360,429026,
2,1000001,Battle of Barnet 1471,https://historicengland.org.uk/listing/the-list/list-entry/1000001,TQ 24514 97590,524514,197590,
3,1000002,Battle of Blore Heath 1459,https://historicengland.org.uk/listing/the-list/list-entry/1000002,SJ 71345 35300,371345,335300,
4,1000003,Battle of Boroughbridge 1322,https://historicengland.org.uk/listing/the-list/list-entry/1000003,SE 39851 67186,439851,467186,
5,1000004,Battle of Bosworth (Field) 1485,https://historicengland.org.uk/listing/the-list/list-entry/1000004,SP3947698675,439476,298925,
6,1000005,Battle of Braddock Down 1643,https://historicengland.org.uk/listing/the-list/list-entry/1000005,SX 17575 63008,217575,63008,


This produces a data frame that is 6 column data set



Unnamed: 0_level_0,ListEntry,Name,hyperlink,NGR,Easting,Northing,Grade,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>,<dbl>,<dbl>
1,1000000,Battle of Adwalton Moor 1643,https://historicengland.org.uk/listing/the-list/list-entry/1000000,SE2164428855,421360,429026,,53.75716,-1.6775108
2,1000001,Battle of Barnet 1471,https://historicengland.org.uk/listing/the-list/list-entry/1000001,TQ 24514 97590,524514,197590,,51.6633,-0.2011212
3,1000002,Battle of Blore Heath 1459,https://historicengland.org.uk/listing/the-list/list-entry/1000002,SJ 71345 35300,371345,335300,,52.91435,-2.4275743
4,1000003,Battle of Boroughbridge 1322,https://historicengland.org.uk/listing/the-list/list-entry/1000003,SE 39851 67186,439851,467186,,54.09903,-1.3921086
5,1000004,Battle of Bosworth (Field) 1485,https://historicengland.org.uk/listing/the-list/list-entry/1000004,SP3947698675,439476,298925,,52.5867,-1.418771
6,1000005,Battle of Braddock Down 1643,https://historicengland.org.uk/listing/the-list/list-entry/1000005,SX 17575 63008,217575,63008,,50.4385,-4.5703334


This produces a data frame that is a 9 column data set

These data have been written to Battlefields.csv



# Now start merging files

We now want to have the merged files for all the common data to enable the next stage. This has been done for simplicity, there are other ways to do this. I wanted the person reading this to see the process. First load your data and check what it looks like with head. 

In [54]:
# Load necessary library
library(dplyr)
# Read the CSV files
file0 <- read.csv("HAR.csv")
head(file0)
file1 <- read.csv("NHLE.csv")
head(file1)
file2 <- read.csv("Battlefields.csv")
head(file2)
file3 <- read.csv("Parks.csv")
head(file3)
file4 <- read.csv("ScheduledMonuments.csv")
head(file4)

# Set the common column, which is always ListEntry
common_column <- "ListEntry" 



Unnamed: 0_level_0,dataset,end.date,entity,entry.date,name,point,ListEntry,start.date,typology,documentation.url,geography,heritage.at.risk,organisation
Unnamed: 0_level_1,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<int>,<lgl>,<chr>,<chr>,<lgl>,<lgl>,<lgl>
1,heritage-at-risk,,7500000,2024-09-26,Roman fort west of Eaton House,POINT(-2.141775 52.692762),1006098,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35747,,,
2,heritage-at-risk,,7500001,2021-05-27,Barlinch Priory,POINT(-3.529384 51.050242),1006213,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/42043,,,
3,heritage-at-risk,,7500002,2024-09-26,Roman villa 150 yards (140 metres) south east of Lea Hall,POINT(-2.862003 52.670884),1006246,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/41990,,,
4,heritage-at-risk,,7500003,2024-09-26,Abutment of Roman Bridge at Radnor Bridge,POINT(-2.703145 52.618715),1006280,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/47933,,,
5,heritage-at-risk,,7500004,2024-09-26,Defences to the Saxon town [within Bull and Kine Croft],POINT(-1.127250 51.601647),1006329,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35923,,,
6,heritage-at-risk,,7500005,2024-09-26,Rectangular enclosures 1100yds (1010m) north west of Mount Owen Farm,POINT(-1.542321 51.748031),1006348,,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/30561,,,


Unnamed: 0_level_0,ListEntry,Name,Grade,hyperlink,NGR,Easting,Northing,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
1,1021466,20 and 20A Whitbourne Springs,II,https://historicengland.org.uk/listing/the-list/list-entry/1021466,ST8338544428,383389,144430,51.19884,-2.239118
2,1021467,TENNIS CORNER FARMHOUSE WITH GRANARY AND STABLE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021467,ST 82839 50851,382839,150851,51.25656,-2.2473
3,1021468,CHALCOT HOUSE,II*,https://historicengland.org.uk/listing/the-list/list-entry/1021468,ST 84294 48824,384294,148824,51.23838,-2.22636
4,1021469,FIVE LORDS FARMHOUSE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021469,ST 82772 50187,382772,150187,51.25059,-2.248227
5,1021470,PENLEIGH MILL,II,https://historicengland.org.uk/listing/the-list/list-entry/1021470,ST 85722 50467,385722,150467,51.25319,-2.205971
6,1021471,PENLEIGH HOUSE,II,https://historicengland.org.uk/listing/the-list/list-entry/1021471,ST 85622 50828,385622,150828,51.25643,-2.207419


Unnamed: 0_level_0,ListEntry,Name,hyperlink,NGR,Easting,Northing,grade,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>,<dbl>,<dbl>
1,1000000,Battle of Adwalton Moor 1643,https://historicengland.org.uk/listing/the-list/list-entry/1000000,SE2164428855,421360,429026,,53.75716,-1.6775108
2,1000001,Battle of Barnet 1471,https://historicengland.org.uk/listing/the-list/list-entry/1000001,TQ 24514 97590,524514,197590,,51.6633,-0.2011212
3,1000002,Battle of Blore Heath 1459,https://historicengland.org.uk/listing/the-list/list-entry/1000002,SJ 71345 35300,371345,335300,,52.91435,-2.4275743
4,1000003,Battle of Boroughbridge 1322,https://historicengland.org.uk/listing/the-list/list-entry/1000003,SE 39851 67186,439851,467186,,54.09903,-1.3921086
5,1000004,Battle of Bosworth (Field) 1485,https://historicengland.org.uk/listing/the-list/list-entry/1000004,SP3947698675,439476,298925,,52.5867,-1.418771
6,1000005,Battle of Braddock Down 1643,https://historicengland.org.uk/listing/the-list/list-entry/1000005,SX 17575 63008,217575,63008,,50.4385,-4.5703334


Unnamed: 0_level_0,ListEntry,Name,Grade,hyperlink,NGR,Easting,Northing,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
1,1000107,ROUSHAM,I,https://historicengland.org.uk/listing/the-list/list-entry/1000107,"SP 48283 26028, SP 47678 24234, SP4802625124",447679,224233,51.91459,-1.3082246
2,1000108,HAMPTON COURT,I,https://historicengland.org.uk/listing/the-list/list-entry/1000108,TQ 16570 68051,516570,168051,51.39952,-0.3256924
3,1000109,HIGHCLERE PARK,I,https://historicengland.org.uk/listing/the-list/list-entry/1000109,SU4498459143,444984,159257,51.33064,-1.3557267
4,1000110,FARNBOROUGH HALL,I,https://historicengland.org.uk/listing/the-list/list-entry/1000110,SP4274249465,443000,249084,52.13839,-1.3731311
5,1000111,CHISWICK HOUSE,I,https://historicengland.org.uk/listing/the-list/list-entry/1000111,TQ2083077489,520882,177637,51.48478,-0.260467
6,1000112,LEIGH PARK (STAUNTON COUNTRY PARK),II*,https://historicengland.org.uk/listing/the-list/list-entry/1000112,SU 71991 08944,471991,108944,50.87552,-0.9781671


Unnamed: 0_level_0,ListEntry,Name,hyperlink,NGR,Easting,Northing,grade,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>,<dbl>,<dbl>
1,1001718,Mound S of Woodbrook,https://historicengland.org.uk/listing/the-list/list-entry/1001718,SO 30447 54456,330447,254456,,52.18395,-3.018721
2,1001719,Castle Twts,https://historicengland.org.uk/listing/the-list/list-entry/1001719,SO 27703 55474,327703,255474,,52.19274,-3.059066
3,1001720,Lyonshall Castle,https://historicengland.org.uk/listing/the-list/list-entry/1001720,SO 33165 56319,333165,256319,,52.20103,-2.979341
4,1001721,Mound 1200yds (1100m) NNE of the Church,https://historicengland.org.uk/listing/the-list/list-entry/1001721,SO 37237 68101,337237,268101,,52.30742,-2.921963
5,1001722,Limebrook Priory,https://historicengland.org.uk/listing/the-list/list-entry/1001722,SO 37411 66071,337411,266071,,52.28919,-2.919033
6,1001723,"Two bowl barrows, one 220m east of Lower Longbeak and the other 320m east of Higher Longbeak",https://historicengland.org.uk/listing/the-list/list-entry/1001723,"SS 19954 03896, SS 19871 03235",219871,103235,,50.80061,-4.557574


Merge the National Heritage Listed Buildings with Heritage at Risk. As this is a demo, you will not get the entire dataset. 

In [70]:
# Now start enriching by scraping
# Filter file0 and join with NHLE
common_rows_nhle <- merge(file0, file1, by = common_column) 
head(common_rows_nhle)

Unnamed: 0_level_0,ListEntry,dataset,end.date,entity,entry.date,name,point,start.date,typology,documentation.url,⋯,heritage.at.risk,organisation,Name,Grade,hyperlink,NGR,Easting,Northing,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<lgl>,<chr>,<chr>,⋯,<lgl>,<lgl>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
1,1021635,heritage-at-risk,,7501659,2024-09-26,"Courtfield House, Polebarn Road",POINT(-2.202634 51.319415),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/327822,⋯,,,COURTFIELD HOUSE,II*,https://historicengland.org.uk/listing/the-list/list-entry/1021635,ST 85975 57832,385975,157832,51.31942,-2.202638


Merge the Registered Battlefields data with Heritage at risk. As this is a demo, you won't have the entire dataset. 

In [71]:
common_rows_battlefields <- merge(file0, file2, by = common_column) 
head(common_rows_battlefields)

Unnamed: 0_level_0,ListEntry,dataset,end.date,entity,entry.date,name,point,start.date,typology,documentation.url,⋯,heritage.at.risk,organisation,Name,hyperlink,NGR,Easting,Northing,grade,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<lgl>,<chr>,<chr>,⋯,<lgl>,<lgl>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>,<dbl>,<dbl>
1,1000000,heritage-at-risk,,7500659,2024-09-26,Battle of Adwalton Moor,POINT(-1.675013 53.756361),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24508,⋯,,,Battle of Adwalton Moor 1643,https://historicengland.org.uk/listing/the-list/list-entry/1000000,SE2164428855,421360,429026,,53.75716,-1.677511
2,1000003,heritage-at-risk,,7500377,2024-09-26,Battle of Boroughbridge,POINT(-1.390626 54.098949),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24509,⋯,,,Battle of Boroughbridge 1322,https://historicengland.org.uk/listing/the-list/list-entry/1000003,SE 39851 67186,439851,467186,,54.09903,-1.392109
3,1000025,heritage-at-risk,,7500660,2024-09-26,Battle of Newburn Ford,POINT(-1.745530 54.978350),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24542,⋯,,,Battle of Newburn Ford 1640,https://historicengland.org.uk/listing/the-list/list-entry/1000025,NZ 16383 63953,416383,563953,,54.96993,-1.74562


Merge parks data with Heritage at Risk. As this is a demo, you won't have the entire dataset. 

In [72]:
common_rows_parks <- merge(file0, file3, by = common_column) 
head(common_rows_parks)

Unnamed: 0_level_0,ListEntry,dataset,end.date,entity,entry.date,name,point,start.date,typology,documentation.url,⋯,heritage.at.risk,organisation,Name,Grade,hyperlink,NGR,Easting,Northing,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<lgl>,<chr>,<chr>,⋯,<lgl>,<lgl>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
1,1000124,heritage-at-risk,,7500379,2024-09-26,Crewe Hall,POINT(-2.399744 53.084148),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/26104,⋯,,,CREWE HALL,II,https://historicengland.org.uk/listing/the-list/list-entry/1000124,SJ7337054317,373275,353572,53.0787,-2.40039146
2,1000129,heritage-at-risk,,7500337,2024-09-26,Stoke Park,POINT(-2.552928 51.490522),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/26148,⋯,,,STOKE PARK,II,https://historicengland.org.uk/listing/the-list/list-entry/1000129,ST 61723 77175,361723,177175,51.49222,-2.55272411
3,1000137,heritage-at-risk,,7500380,2024-09-26,Thwaite Hall,POINT(-0.401842 53.780615),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/26152,⋯,,,THWAITE HALL,II,https://historicengland.org.uk/listing/the-list/list-entry/1000137,TA 05392 32798,505392,432798,53.78085,-0.40203228
4,1000155,heritage-at-risk,,7500395,2024-09-26,Shrubland Hall,POINT(1.103647 52.133384),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24684,⋯,,,SHRUBLAND HALL,I,https://historicengland.org.uk/listing/the-list/list-entry/1000155,TM1181653099,612274,252889,52.13351,1.10019451
5,1000165,heritage-at-risk,,7500351,2024-09-26,Bramshill Park,POINT(-0.910021 51.332322),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24706,⋯,,,Bramshill Park,I,https://historicengland.org.uk/listing/the-list/list-entry/1000165,SU7626960141,476801,159656,51.33082,-0.89904043
6,1000194,heritage-at-risk,,7501977,2024-09-26,Wanstead Park,POINT(0.032099 51.567394),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24710,⋯,,,WANSTEAD PARK,II*,https://historicengland.org.uk/listing/the-list/list-entry/1000194,TQ4104087270,540989,187461,51.5684,0.03285775


Merge scheduled monuments data. 

In [73]:
common_rows_scheduled <- merge(file0, file4, by = common_column) 
head(common_rows_scheduled)

Unnamed: 0_level_0,ListEntry,dataset,end.date,entity,entry.date,name,point,start.date,typology,documentation.url,⋯,heritage.at.risk,organisation,Name,hyperlink,NGR,Easting,Northing,grade,lat,lon
Unnamed: 0_level_1,<int>,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<lgl>,<chr>,<chr>,⋯,<lgl>,<lgl>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>,<dbl>,<dbl>
1,1001731,heritage-at-risk,,7500726,2024-09-26,"Offa's Dyke: Rushock Hill section, extending 1630yds (1490m) east to Kennel Wood",POINT(-3.032993 52.229575),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/37764,⋯,,,"Offa's Dyke: Rushock Hill section, extending 1630yds (1490m) E to Kennel Wood",https://historicengland.org.uk/listing/the-list/list-entry/1001731,SO 29562 59529,329562,259529,,52.22944,-3.03272
2,1001733,heritage-at-risk,,7500390,2021-12-01,Offa's Dyke: the section 630yds (580m) long west of Lyonshall,POINT(-2.984735 52.198062),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/34594,⋯,,,Offa's Dyke: the section 630yds (580m) long W of Lyonshall,https://historicengland.org.uk/listing/the-list/list-entry/1001733,SO 32790 55995,332790,255995,,52.19807,-2.984763
3,1001735,heritage-at-risk,,7500391,2024-09-26,Offa's Dyke: section north west of Holme Marsh extending 615 yards (560 metres) to the railway,POINT(-2.974065 52.189224),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/37756,⋯,,,Offa's Dyke: section NW of Holme Marsh extending 615yds (560m) to the railway,https://historicengland.org.uk/listing/the-list/list-entry/1001735,SO 33493 55016,333493,255016,,52.18936,-2.974287
4,1001738,heritage-at-risk,,7500727,2024-09-26,Offa's Dyke: the section extending 950yds (870m) N and S of Big Oaks,POINT(-2.867624 52.083725),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/34587,⋯,,,Offa's Dyke: the section extending 950yds (870m) N and S of Big Oaks,https://historicengland.org.uk/listing/the-list/list-entry/1001738,"SO 40636 43192, SO 40811 42789",340636,243192,,52.08388,-2.867744
5,1001747,heritage-at-risk,,7500728,2024-09-26,Sutton Walls (camp),POINT(-2.694551 52.113900),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35712,⋯,,,Sutton Walls (camp),https://historicengland.org.uk/listing/the-list/list-entry/1001747,SO 52520 46404,352520,246404,,52.11391,-2.694778
6,1001758,heritage-at-risk,,7500504,2024-09-26,Dinedor Camp,POINT(-2.695883 52.023538),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35701,⋯,,,Dinedor Camp,https://historicengland.org.uk/listing/the-list/list-entry/1001758,SO 52357 36358,352357,236358,,52.02358,-2.695753


## Now merge the 4 files

We will now merge all 4 files and write to a CSV file for reuse and scraping. 

In [116]:
library(dplyr)
merged <- bind_rows(common_rows_scheduled, common_rows_parks, common_rows_battlefields, common_rows_nhle)
names(merged)[names(merged) == "documentation.url"] <- "url"
head(merged)
write.csv(merged, "merged_har.csv", row.names = FALSE) 

Unnamed: 0_level_0,ListEntry,dataset,end.date,entity,entry.date,name,point,start.date,typology,url,⋯,organisation,Name,hyperlink,NGR,Easting,Northing,grade,lat,lon,Grade
Unnamed: 0_level_1,<int>,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<lgl>,<chr>,<chr>,⋯,<lgl>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>,<dbl>,<dbl>,<chr>
1,1001731,heritage-at-risk,,7500726,2024-09-26,"Offa's Dyke: Rushock Hill section, extending 1630yds (1490m) east to Kennel Wood",POINT(-3.032993 52.229575),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/37764,⋯,,"Offa's Dyke: Rushock Hill section, extending 1630yds (1490m) E to Kennel Wood",https://historicengland.org.uk/listing/the-list/list-entry/1001731,SO 29562 59529,329562,259529,,52.22944,-3.03272,
2,1001733,heritage-at-risk,,7500390,2021-12-01,Offa's Dyke: the section 630yds (580m) long west of Lyonshall,POINT(-2.984735 52.198062),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/34594,⋯,,Offa's Dyke: the section 630yds (580m) long W of Lyonshall,https://historicengland.org.uk/listing/the-list/list-entry/1001733,SO 32790 55995,332790,255995,,52.19807,-2.984763,
3,1001735,heritage-at-risk,,7500391,2024-09-26,Offa's Dyke: section north west of Holme Marsh extending 615 yards (560 metres) to the railway,POINT(-2.974065 52.189224),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/37756,⋯,,Offa's Dyke: section NW of Holme Marsh extending 615yds (560m) to the railway,https://historicengland.org.uk/listing/the-list/list-entry/1001735,SO 33493 55016,333493,255016,,52.18936,-2.974287,
4,1001738,heritage-at-risk,,7500727,2024-09-26,Offa's Dyke: the section extending 950yds (870m) N and S of Big Oaks,POINT(-2.867624 52.083725),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/34587,⋯,,Offa's Dyke: the section extending 950yds (870m) N and S of Big Oaks,https://historicengland.org.uk/listing/the-list/list-entry/1001738,"SO 40636 43192, SO 40811 42789",340636,243192,,52.08388,-2.867744,
5,1001747,heritage-at-risk,,7500728,2024-09-26,Sutton Walls (camp),POINT(-2.694551 52.113900),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35712,⋯,,Sutton Walls (camp),https://historicengland.org.uk/listing/the-list/list-entry/1001747,SO 52520 46404,352520,246404,,52.11391,-2.694778,
6,1001758,heritage-at-risk,,7500504,2024-09-26,Dinedor Camp,POINT(-2.695883 52.023538),,geography,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35701,⋯,,Dinedor Camp,https://historicengland.org.uk/listing/the-list/list-entry/1001758,SO 52357 36358,352357,236358,,52.02358,-2.695753,


## Scraping Historic England's website for enhanced data

The Heritage at Risk data set that you obtain at the start has lots of missing data. These can be obtained by some simple scraping using Rvest. All the extra data is held within the HTML node with class .HARListEntry__bullets-container which will no doubt change in future when the website revamps are done. This is therefore a point in time piece of code. Some of these urls will fail as they don't seem to have a page. 

In [132]:
# Check if packages exist and if not install them for use
packages <- c("rvest", "readr", "tidyverse", "stringr")
any_not_installed <- !all(packages %in% installed.packages()[, "Package"])
if (any_not_installed) {
  # Code to execute if at least one package is not installed
  message("At least one of the packages is not installed.")
  # Install missing packages
  missing_packages <- packages[!(packages %in% installed.packages()[, "Package"])]
  if (length(missing_packages) > 0) {
    install.packages(missing_packages)
  }
} 

library(rvest)
library(readr)
library(tidyverse)
library(stringr)

# Get the urls list from the previous CSV file created 
urls <- read.csv("merged_har.csv")
# Create an empty list to store the scraped data
scraped_data <- list()
# Loop through the list of urls.
for (i in 1:nrow(urls)) {
  url <- urls[i, 10] # Assuming the URLs are in the tenth column
  reference <-urls[i,1] # Assuming the reference number is column 1
  print(url)
  print(reference)
  # Use tryCatch to handle connection errors
  tryCatch({
    page <- read_html(url)
    css_selector <- ".HARListEntry__bullets-container" # Replace with the actual CSS selector
    
    # Extract the text content from the specified CSS selector
    #text_content <- page %>% html_nodes(css_selector) %>% html_text()
    text_content <- page %>% 
      html_nodes(css_selector) %>% 
      html_text() %>% 
      paste(collapse = " ")

    # Store the extracted data in the list
    scraped_data[[i]] <- list(url = url, text_content = text_content)
    # Remove the page object to free up memory
    rm(page)
    closeAllConnections()
    # Run garbage collection to free up memory
    gc()
  }, error = function(e) {
    # Handle the error (e.g., print a message and continue)
    message(sprintf("Failed to scrape %s: %s", url, e$message))
  })
  # Print the extracted text content
  print(text_content)
  Sys.sleep(0.5)
}
errors_df <- do.call(rbind, lapply(errors, as.data.frame))
head(scraped_data_df)
head(errors_df)

Failed to scrape https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/34594: cannot open the connection



ERROR: Error in scraped_data[[i]]: subscript out of bounds


There's some code here that needs removing as it breaks the text steps in the next part. I did the simplest thing possible and removed what wasn't needed - Designated Site Name. 

In [128]:
remove_text_before_location <- function(text) {
   heritage_index <- str_locate(text, "Heritage Category:")
  if (!is.na(heritage_index[1])) { 
    return(str_sub(text, heritage_index[1])) 
  } else {
    return(text) 
  }
}

scraped_data_df <- scraped_data_df %>% 
  mutate(text_content = sapply(text_content, remove_text_before_location))
head(scraped_data_df)

# Write the scraped data to a new CSV file
write_csv(scraped_data_df, "scraped_data.csv") 
# Write the errors to a new csv file
write_csv(errors_df, "errors.csv")

Unnamed: 0_level_0,url,text_content
Unnamed: 0_level_1,<chr>,<chr>
1,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/37764,"Heritage Category:  Scheduled Monument  List Entry Number:  1001731  Local Planning Authority:  Herefordshire, County of (UA)  Site Type:  Defence  Unitary Authority:  Herefordshire, County of (UA)  Parish:  Kington Rural / Knill  Parliamentary Constituency:  North Herefordshire  Region:  Midlands  Assessment Type:  Archaeology  Condition:  Generally satisfactory but with significant localised problems  Principal Vunerability:  Stock erosion - localised/limited  Trend:  Improving  Ownership:  Private"
2,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/37756,"Heritage Category:  Scheduled Monument  List Entry Number:  1001735  Local Planning Authority:  Herefordshire, County of (UA)  Site Type:  Barrier  Unitary Authority:  Herefordshire, County of (UA)  Parish:  Lyonshall  Parliamentary Constituency:  North Herefordshire  Region:  Midlands  Assessment Type:  Archaeology  Condition:  Generally satisfactory but with significant localised problems  Principal Vunerability:  Development requiring planning permission Trend:  Declining  Ownership:  Private, multiple owners"
3,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/34587,"Heritage Category:  Scheduled Monument  List Entry Number:  1001738  Local Planning Authority:  Herefordshire, County of (UA)  Site Type:  Defence  Unitary Authority:  Herefordshire, County of (UA)  Parish:  Bishopstone / Bridge Sollers / Byford  Parliamentary Constituency:  Hereford and South Herefordshire  Region:  Midlands  Assessment Type:  Archaeology  Condition:  Generally satisfactory but with minor localised problems  Principal Vunerability:  Arable clipping Trend:  Declining  Ownership:  Private"
4,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35712,"Heritage Category:  Scheduled Monument  List Entry Number:  1001747  Local Planning Authority:  Herefordshire, County of (UA)  Site Type:  Defence > Hillfort  Unitary Authority:  Herefordshire, County of (UA)  Parish:  Marden / Sutton  Parliamentary Constituency:  North Herefordshire  Region:  Midlands  Assessment Type:  Archaeology  Condition:  Extensive significant problems  Principal Vunerability:  Dumping Trend:  Stable  Ownership:  Charity (heritage)"
5,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35701,"Heritage Category:  Scheduled Monument  List Entry Number:  1001758  Local Planning Authority:  Herefordshire, County of (UA)  Site Type:  Defence > Hillfort  Unitary Authority:  Herefordshire, County of (UA)  Parish:  Dinedor  Parliamentary Constituency:  Hereford and South Herefordshire  Region:  Midlands  Assessment Type:  Archaeology  Condition:  Generally satisfactory but with significant localised problems  Principal Vunerability:  Scrub/tree growth  Trend:  Improving  Ownership:  Other not for profit group"
6,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/43323,"Heritage Category:  Scheduled Monument  List Entry Number:  1001778  Local Planning Authority:  Herefordshire, County of (UA)  Site Type:  Defence > Castle  Unitary Authority:  Herefordshire, County of (UA)  Parish:  Walterstone  Parliamentary Constituency:  Hereford and South Herefordshire  Region:  Midlands  Assessment Type:  Archaeology  Condition:  Generally unsatisfactory with major localised problems  Principal Vunerability:  Scrub/tree growth  Trend:  Declining  Ownership:  Private"


ERROR: Error in write_delim(x, file, delim = ",", na = na, append = append, col_names = col_names, : is.data.frame(x) is not TRUE


In [115]:
# Tidyverse was installed previously. 
library(tidyverse)

# Read the CSV file
df <- read_csv('scraped_data.csv')

# Function to split text_content into key-value pairs
split_text_content <- function(text) {
  lines <- str_split(text, "\n")[[1]]
  data <- list()
  key <- NULL
  for (line in lines) {
    if (str_detect(line, ":")) {
      parts <- str_split(line, ":", n = 2)[[1]]
      key <- str_trim(parts[1])
      value <- str_trim(parts[2])
      data[[key]] <- value
    } else if (!is.null(key)) {
      data[[key]] <- paste(data[[key]], str_trim(line))
    }
  }
  return(data)
}

# Apply the function to the text_content column
split_data <- map(df$text_content, split_text_content)

# Create a new data frame from the split data
split_df <- bind_rows(split_data)

# Combine the original URL column with the new data frame
result_df <- bind_cols(df %>% select(url), split_df)
head(result_df)
# Write the result to a new CSV file
write_csv(result_df, 'processed_scraped_data.csv')

[1mRows: [22m[34m22[39m [1mColumns: [22m[34m2[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (2): url, text_content

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


url,Heritage Category,List Entry Number,Local Planning Authority,Site Type,Unitary Authority,Parish,Parliamentary Constituency,Region,Assessment Type,⋯,Street Name,Locality,County,District / Borough,Occupancy / Use,Priority,Previous Priority,Designation,List Entry Number(s),Vulnerability
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/37764,Scheduled Monument,1001731,"Herefordshire, County of (UA)",Defence,"Herefordshire, County of (UA)",Kington Rural / Knill,North Herefordshire,Midlands,Archaeology,⋯,,,,,,,,,,
https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/37756,Scheduled Monument,1001735,"Herefordshire, County of (UA)",Barrier,"Herefordshire, County of (UA)",Lyonshall,North Herefordshire,Midlands,Archaeology,⋯,,,,,,,,,,
https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/34587,Scheduled Monument,1001738,"Herefordshire, County of (UA)",Defence,"Herefordshire, County of (UA)",Bishopstone / Bridge Sollers / Byford,Hereford and South Herefordshire,Midlands,Archaeology,⋯,,,,,,,,,,
https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35712,Scheduled Monument,1001747,"Herefordshire, County of (UA)",Defence > Hillfort,"Herefordshire, County of (UA)",Marden / Sutton,North Herefordshire,Midlands,Archaeology,⋯,,,,,,,,,,
https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/35701,Scheduled Monument,1001758,"Herefordshire, County of (UA)",Defence > Hillfort,"Herefordshire, County of (UA)",Dinedor,Hereford and South Herefordshire,Midlands,Archaeology,⋯,,,,,,,,,,
https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/43323,Scheduled Monument,1001778,"Herefordshire, County of (UA)",Defence > Castle,"Herefordshire, County of (UA)",Walterstone,Hereford and South Herefordshire,Midlands,Archaeology,⋯,,,,,,,,,,


In [117]:
# Load necessary library
library(dplyr)

# Read the CSV files
file1 <- read.csv("processed_scraped_data.csv")
file2 <- read.csv("merged_har.csv")
common_column <- "url" # Replace with the actual column name

# Find common rows based on the common column
common_rows <- merge(file1, file2, by = common_column)


# Print the common rows
head(common_rows)
write_csv(common_rows, 'openrefineHAR.csv')


Unnamed: 0_level_0,url,Heritage.Category,List.Entry.Number,Local.Planning.Authority,Site.Type,Unitary.Authority,Parish,Parliamentary.Constituency,Region,Assessment.Type,⋯,organisation,Name,hyperlink,NGR,Easting,Northing,grade,lat,lon,Grade
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<lgl>,<chr>,<chr>,<chr>,<int>,<int>,<lgl>,<dbl>,<dbl>,<chr>
1,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24508,Registered Battlefield,1000000,Bradford,Battlefield,,Drighlington,Leeds South West and Morley / Bradford South,North East and Yorkshire,Battlefield,⋯,,Battle of Adwalton Moor 1643,https://historicengland.org.uk/listing/the-list/list-entry/1000000,SE2164428855,421360,429026,,53.75716,-1.67751079,
2,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24509,Registered Battlefield,1000003,North Yorkshire Council (UA),Battlefield,North Yorkshire Council (UA),Boroughbridge / Langthorpe / Milby,Wetherby and Easingwold,North East and Yorkshire,Battlefield,⋯,,Battle of Boroughbridge 1322,https://historicengland.org.uk/listing/the-list/list-entry/1000003,SE 39851 67186,439851,467186,,54.09903,-1.39210863,
3,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24542,Registered Battlefield,1000025,Newcastle upon Tyne,Battlefield,,,Blaydon and Consett / Hexham,North East and Yorkshire,Battlefield,⋯,,Battle of Newburn Ford 1640,https://historicengland.org.uk/listing/the-list/list-entry/1000025,NZ 16383 63953,416383,563953,,54.96993,-1.74561992,
4,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24684,Registered Park and Garden grade I,1000155,Mid Suffolk,Gardens parks and urban spaces > Landscape park,,Hemingstone / Barham / Coddenham,Central Suffolk and North Ipswich,East of England,Park and garden,⋯,,SHRUBLAND HALL,https://historicengland.org.uk/listing/the-list/list-entry/1000155,TM1181653099,612274,252889,,52.13351,1.10019451,I
5,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24706,Registered Park and Garden grade I,1000165,Hart,Gardens parks and urban spaces > Landscape park,,Mattingley / Bramshill / Eversley,North East Hampshire,London and South East,Park and garden,⋯,,Bramshill Park,https://historicengland.org.uk/listing/the-list/list-entry/1000165,SU7626960141,476801,159656,,51.33082,-0.89904043,I
6,https://historicengland.org.uk/advice/heritage-at-risk/search-register/list-entry/24710,Registered Park and Garden grade II*,1000194,Redbridge,Gardens parks and urban spaces > Park,,,Leyton and Wanstead,London and South East,Park and garden,⋯,,WANSTEAD PARK,https://historicengland.org.uk/listing/the-list/list-entry/1000194,TQ4104087270,540989,187461,,51.5684,0.03285775,II*
