# Dataset staleness and related issues

The notebook contains a range of queries that can be used to cast light on issues of data quality such as 'staleness'.

## Using

You need to 

1. Download the digital_land dataset from https://datasette.planning.data.gov.uk/digital-land
1. Point the `source_file` variable (below) at your download. 

The first time it runs, it takes a few minutes to build some indexes which speed up subsequent queries. 

### Variables

These determine what data is processed. 

* source_file - the file you downloaded above
* collection - the collection to process
* staleness_days - if data has been collected without changing for this number of days, then consider it to be stale
* recent_entry_cutoff - used to say if an endpoint was recently added ( greater than this value)
* organisation - used to show the endpoints for just one organisation.

In [65]:
import pandas as pd
import urllib.parse
import sqlite3

In [66]:
datasette_url = "https://datasette.planning.data.gov.uk/"

source_file = "/mnt/c/Users/MarkSmith/Downloads/digital-land_2023)10_26.sqlite3" # or whatever you called your download.
dest_file=  "/mnt/c/Users/MarkSmith/Downloads/entity_2023_10_26_crosstab.csv" # or wherever you want your output

collection = "brownfield-land"
staleness_days=365*3
current_year = "2023"
viewName = F"{collection}_{current_year}".replace("-", "_")

cnx = sqlite3.connect(source_file)

cursor = cnx.cursor()

def add_index (table, column) :
    cnx.execute(F"CREATE INDEX if not exists idx_{table}_{column} ON {table}({column})") 

add_index ("log", "endpoint")
add_index ("log", "resource")
add_index ("log", "status")

add_index ("source", "collection")
add_index ("source", "endpoint")
add_index ("source", "start_date")
add_index ("source", "end_date")

add_index ("endpoint", "endpoint")
add_index ("endpoint", "entry_date")
add_index ("endpoint", "start_date")
add_index ("endpoint", "end_date")

def query_datasette (query_text):
    return pd.read_sql_query(query_text, cnx)


In [67]:

sql = F"""CREATE VIEW IF NOT EXISTS {viewName} AS
    select organisation, strftime('%Y',ep.entry_date) as year, strftime('%m',ep.entry_date) as month 
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}" 
    and year = "{current_year}"
"""

sql
cnx.execute(sql)

query_datasette(F"SELECT * FROM {viewName}")

Unnamed: 0,organisation,year,month
0,local-authority-eng:WOK,2023,08
1,local-authority-eng:LEE,2023,07
2,local-authority-eng:BAB,2023,08
3,local-authority-eng:NGM,2023,09
4,local-authority-eng:BAN,2023,06
...,...,...,...
291,local-authority-eng:HRT,2023,09
292,local-authority-eng:BNS,2023,09
293,local-authority-eng:COT,2023,09
294,local-authority-eng:LIF,2023,10


# Live endpoints still using HTTP instead of HTTPS

These are candidates for being old, since http (compared with https) was retired years ago.

In [68]:
http_sql = F"""
  select src.organisation, src.endpoint, ep.entry_date, ep.endpoint_url, ep.start_date, ep.end_date  from source src
  inner join endpoint ep on src.endpoint = ep.endpoint
  where src.collection  = "{collection}"
  and ep.end_date = ""
  and ep.endpoint_url like "http:%"
  order by src.organisation, ep.start_date ASC NULLS LAST
"""

query_datasette(http_sql)


Unnamed: 0,organisation,endpoint,entry_date,endpoint_url,start_date,end_date
0,development-corporation:Q6670544,4c238528f325bcca9a03697583d9f39a91ecf1ec1ecb66...,2018-07-05T00:00:00Z,http://www.queenelizabetholympicpark.co.uk/-/m...,,
1,local-authority-eng:AMB,1945e57b32134a160bfb4972bcd6f86935b4c6fc0bd000...,2019-11-23T00:00:00Z,http://info.ambervalley.gov.uk/shareddatasets/...,,
2,local-authority-eng:AMB,239f7a38efe51fb6271d40bd76dd7413fecb458a5c4d1e...,2019-11-23T00:00:00Z,http://info.ambervalley.gov.uk/shareddatasets/...,,
3,local-authority-eng:AMB,c0fcdc64525d3be69736427cd397bf5a3c2d68104e8a90...,2018-05-22T00:00:00Z,http://info.ambervalley.gov.uk/shareddatasets/...,2017-12-22,
4,local-authority-eng:AMB,533d03f6fd2b5717597c4f3057106b907f4b23c8fb2e7b...,2020-05-21T00:00:00Z,http://info.ambervalley.gov.uk/shareddatasets/...,2020-05-21,
...,...,...,...,...,...,...
71,local-authority-eng:WOI,30b94b097f288a5f2df6ad2cae6fa403a5599e39bece53...,2018-05-22T00:00:00Z,http://woking2027.info/res/uploads/WokingBorou...,,
72,local-authority-eng:WYE,36c1c91ee6ec46da952246ce66916adf3200f7b14d42f5...,2018-05-22T00:00:00Z,http://www.wyreforestdc.gov.uk/media/3585747/W...,,
73,local-authority-eng:WYO,466796e9f6e18591dead03d9e7836e02d6494ec1a5ac9e...,2018-05-22T00:00:00Z,http://data.wycombe.gov.uk/download/planning/b...,2019-06-21,
74,local-authority-eng:WYO,7b6a413b1c1018215fe400dae82e6366b80fb8f89c0240...,2019-11-24T00:00:00Z,http://data.wycombe.gov.uk/download/planning/b...,2019-08-21,


# Live endpoints with stale data

These are endpoints that have not given us any new data in {staleness_days}.

In [69]:
sql = F"""
	select src.organisation, src.start_date, src.end_date,  log.endpoint, log.resource, count (log.resource) as c from log 
    inner join source src on src.endpoint = log.endpoint
	where log.status = 200
	and src.end_date = ""
	and src.collection = "{collection}"
    group by 1,2, 3, 4, 5
	having c > {staleness_days}
	order by c desc
	limit 100
"""

query_datasette(sql)

Unnamed: 0,organisation,start_date,end_date,endpoint,resource,c
0,local-authority-eng:ALL,2017-12-20,,9c2e8adfd12b4f474e7d511580029d1e69d1e08a17e0cb...,3b694528538878d378fd6892a649fa634f0013ef299331...,1410
1,local-authority-eng:BDF,,,f8979dadf073ed0fa1373b6c018fa376833c5aa9e740a2...,12e709c847614f924fc6bd9dcb9c162816e59623887765...,1410
2,local-authority-eng:CAN,2017-12-01,,660ed3b8f8bffe8a326dfb597ddce7a3b4c9b28a04fe78...,4c524bd78ea05989aaaabdcad0cae87d48f4ad9eaf18a4...,1410
3,local-authority-eng:CAS,2017-12-01,,b7d3310848346fdcc872fb40dcb80697f05dfa1f593565...,cd0c958ec2843364957c2fd9757151fdfb6659bb8e0d51...,1410
4,local-authority-eng:CHR,,,26ed7ad003da61c66f9656ba61750b14a53b3ac74a1865...,1a349651d94dfdbdc44773a256313c391c7418f0bcad53...,1410
...,...,...,...,...,...,...
95,local-authority-eng:KIR,2018-02-28,,36440d2479b935805c1d13fcb46b78968ba968c959b9e7...,93f7961f5e90ed8e26b632353221a0cc955a70fdbe3b01...,1389
96,local-authority-eng:BAB,2017-12-21,,cf91b1e94fe298ecf6383060c79b3dd7d030cdfd00bda9...,49a918dc51d8d4063061e5b0d257943aa194370ae1f07a...,1387
97,local-authority-eng:CHR,2018-12-03,,9e1b22e993f2219cd48baa60b5f08a77782cd5f347aa22...,c94e35967405344dd2fd4116f2705be22d222407eb82d5...,1387
98,local-authority-eng:MSU,2017-12-21,,cf91b1e94fe298ecf6383060c79b3dd7d030cdfd00bda9...,49a918dc51d8d4063061e5b0d257943aa194370ae1f07a...,1387


# Endpoints with no documentation URL

Just from LPAs

In [70]:
sql = F"""
select organisation, entry_date,  source, start_date from source 
where collection = "{collection}" and documentation_url = ""  and end_date = "" and organisation like "local-authority-eng%"  
order by 2 
"""

query_datasette(sql)


Unnamed: 0,organisation,entry_date,source,start_date
0,local-authority-eng:HAL,2018-05-22T00:00:00Z,59748eb2bbe6634a99620f87acaf937e,2017-12-31
1,local-authority-eng:BAB,2018-05-22T00:00:00Z,f92ceb2080500b1b68f7d4b33ceadedc,2017-12-21
2,local-authority-eng:THE,2018-05-22T00:00:00Z,f7721c61cb8732bf0fcad7083400d972,
3,local-authority-eng:HIG,2018-05-22T00:00:00Z,efb9c28a1591def1700ee912bb07bfe1,
4,local-authority-eng:EXE,2018-05-22T00:00:00Z,0787bb89a3ec6994493219b2c8551daa,
5,local-authority-eng:BRT,2018-05-22T00:00:00Z,07c446af20f78efab3ba5ab35e1a77ef,2017-12-13
6,local-authority-eng:DUR,2018-05-22T00:00:00Z,09de2780b40099bf37cd0828834dbb93,2017-12-01
7,local-authority-eng:HAE,2018-05-22T00:00:00Z,09e319746177cd551d1ac5542bb0d622,
8,local-authority-eng:EPS,2018-05-22T00:00:00Z,e6d3bda67c7c5f30dd70846bb1f38267,
9,local-authority-eng:FOR,2018-05-22T00:00:00Z,e284fb7807bef3d01ff56ec43512d3cc,2017-12-13


# Recently Added endpoints with no start date

A recently added edpoint should have a start date, either derived from the LPA documentation page or from the data.

In [71]:
recent_entry_cutoff= "2023-06-01"

sql = F"""
    select src.organisation, src.documentation_url,  src.start_date as start_date, src.end_date, ep.endpoint, ep.endpoint_url, ep.entry_date as entry_date
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	and src.collection = "{collection}"
	and ep.entry_date >= "{recent_entry_cutoff}"
	and ep.start_date = ""
    and src.end_date = ""
    order by entry_date DESC, start_date DESC 
    """

query_datasette(sql)


Unnamed: 0,organisation,documentation_url,start_date,end_date,endpoint,endpoint_url,entry_date
0,local-authority-eng:EHA,https://www.easthants.gov.uk/planning-services...,2023-07-07,,57ab427342eab58168dce8bf1c03a2351b8ad46d17bd3f...,https://www.easthants.gov.uk/media/6377/downlo...,2023-09-13T00:00:00Z
1,local-authority-eng:NGM,https://www.opendatanottingham.org.uk/dataset....,2021-11-05,,c883436d69d3f852a8225da903fda36aa29d86f9ca5c4f...,https://geoserver.nottinghamcity.gov.uk/openda...,2023-09-13T00:00:00Z
2,local-authority-eng:GLA,https://data.london.gov.uk/dataset/brownfield_...,2020-07-12,,890c3ac73da82610fe1b7d444c8c89c92a7f368316e3c0...,https://data.london.gov.uk/download/brownfield...,2023-09-13T00:00:00Z
3,local-authority-eng:EHE,https://www.eastherts.gov.uk/planning-building...,2017-12-31,,97d36593114822252b12c5cef9cc4d8d324d07d317efae...,https://www.eastherts.gov.uk/sites/default/fil...,2023-09-13T00:00:00Z
4,local-authority-eng:HAM,https://www.easthants.gov.uk/planning-services...,,,57ab427342eab58168dce8bf1c03a2351b8ad46d17bd3f...,https://www.easthants.gov.uk/media/6377/downlo...,2023-09-13T00:00:00Z
...,...,...,...,...,...,...,...
173,local-authority-eng:BEN,https://www.brent.gov.uk/planning-and-building...,,,2d4bee3ace07e537a3dc44610fd48fed3c629f69b70ff4...,https://legacy.brent.gov.uk/media/16420302/bro...,2023-07-04T07:07:30Z
174,local-authority-eng:BEX,https://www.bexley.gov.uk/services/planning-an...,,,06aee45c3ecbb755ddf3a2184a3858308163456238d88e...,https://www.bexley.gov.uk/sites/default/files/...,2023-07-04T07:07:30Z
175,local-authority-eng:BAR,https://www.barrowbc.gov.uk/residents/planning...,,,c78c84a3bde8a5ea9bad3faa9e9a1146679975a454eb84...,https://www.barrowbc.gov.uk/_resources/assets/...,2023-07-04T07:07:29Z
176,local-authority-eng:BBD,https://www.blackburn.gov.uk/planning/planning...,,,c618cc6e158b44a78bcab28f07508182b8af75e818e075...,https://blackburn-darwen.org.uk/wp-content/upl...,2023-07-04T07:07:29Z


# Inspect a single LPA

You can use the query below to get a quick overview of what entries we have for a single LPA in the collection 

In [112]:
organisation= "local-authority-eng:WIN"


sql = F"""
    select src.organisation, src.documentation_url,  src.start_date, src.end_date, ep.endpoint_url, ep.endpoint, ep.entry_date as ep_entry_date,  ep.start_date as ep_start_date, ep.end_date as ep_end_date, ep.plugin
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.organisation = "{organisation}"
    and src.collection = "{collection}"
    order by ep.entry_date
    """

endpoints = query_datasette(sql)

for i in range(len(endpoints)):
    print("Organisation: " + endpoints.loc[i, "organisation"])
    print("documentation_url: " + endpoints.loc[i, "documentation_url"])
    print("endpoint_url: " + endpoints.loc[i, "endpoint_url"])
    print("plugin:  ", endpoints.loc[i, "plugin"])
    print("ep_entry_date:", endpoints.loc[i, "ep_entry_date"])
    print("ep_start_date:", endpoints.loc[i, "ep_start_date"])
    print("ep_end_date:  ", endpoints.loc[i, "ep_end_date"])

    print("\n")

endpoints



Organisation: local-authority-eng:WIN
documentation_url: 
endpoint_url: http://www.winchester.gov.uk/assets/attach/13774/Brownfield%20Land%20Register%202017.csv
plugin:   
ep_entry_date: 2018-05-22T00:00:00Z
ep_start_date: 
ep_end_date:   2019-12-15


Organisation: local-authority-eng:WIN
documentation_url: https://www.winchester.gov.uk/planning-policy/brownfield-land-register
endpoint_url: https://www.winchester.gov.uk/assets/attach/13774/Brownfield%20Land%20Register%202017.csv
plugin:   
ep_entry_date: 2019-12-15T00:00:00Z
ep_start_date: 
ep_end_date:   2023-09-05


Organisation: local-authority-eng:WIN
documentation_url: https://www.winchester.gov.uk/planning-policy/brownfield-land-register
endpoint_url: https://www.winchester.gov.uk/assets/attach/17460/Brownfield%20Register%202018.csv
plugin:   
ep_entry_date: 2019-12-22T00:00:00Z
ep_start_date: 
ep_end_date:   


Organisation: local-authority-eng:WIN
documentation_url: https://www.winchester.gov.uk/planning-policy/brownfield-land-

Unnamed: 0,organisation,documentation_url,start_date,end_date,endpoint_url,endpoint,ep_entry_date,ep_start_date,ep_end_date,plugin
0,local-authority-eng:WIN,,,2019-12-15,http://www.winchester.gov.uk/assets/attach/137...,48b44cd7af3b861361c22a90c915c6c87a0c768531fd1b...,2018-05-22T00:00:00Z,,2019-12-15,
1,local-authority-eng:WIN,https://www.winchester.gov.uk/planning-policy/...,,2023-09-05,https://www.winchester.gov.uk/assets/attach/13...,6aa8f0e3cb4a35bc13f4dd8fb2803404a337bce31d6a04...,2019-12-15T00:00:00Z,,2023-09-05,
2,local-authority-eng:WIN,https://www.winchester.gov.uk/planning-policy/...,,,https://www.winchester.gov.uk/assets/attach/17...,12ca9bd2b912e5ac4336907c0c35a49f17898111daa195...,2019-12-22T00:00:00Z,,,
3,local-authority-eng:WIN,https://www.winchester.gov.uk/planning-policy/...,,2023-09-05,https://www.winchester.gov.uk/assets/attach/20...,8c72913acc099caaf84aa702308b5bb3e3ba45e29e20ff...,2019-12-22T00:00:00Z,,2023-09-05,
4,local-authority-eng:WIN,https://www.winchester.gov.uk/planning-policy/...,2021-09-09,,https://www.winchester.gov.uk/assets/attach/27...,a453584baba30a957648ae88a28d611a6a470f926e49f5...,2021-09-08T00:00:00Z,2021-09-09,,
5,local-authority-eng:WIN,https://www.winchester.gov.uk/planning-policy/...,2021-09-09,,https://www.winchester.gov.uk/assets/attach/20...,2258441ef147e834873d3dd74d148df7900f7182023cba...,2021-09-08T00:00:00Z,2021-09-09,,
6,local-authority-eng:WIN,https://www.winchester.gov.uk/planning-policy/...,2021-09-09,,https://www.winchester.gov.uk/assets/attach/13...,4166ed48929df5b851829e5e69fe144c1e4b9dd7f44764...,2021-09-08T00:00:00Z,2021-09-09,,
7,local-authority-eng:WIN,https://www.winchester.gov.uk/planning-policy/...,2021-09-09,,https://www.winchester.gov.uk/assets/attach/17...,b0e362766c44c934b9d7deae12b305d46fd686aaa75a25...,2021-09-08T00:00:00Z,2021-09-09,,
8,local-authority-eng:WIN,https://www.winchester.gov.uk/planning-policy/...,,,https://www.winchester.gov.uk/assets/attach/31...,7eaea234caf233fc53495f8558b73fee074666266a0ae6...,2022-03-09T00:00:00Z,,,


In [73]:
sql = F"""
    select src.organisation, src.documentation_url,  src.start_date, src.end_date, ep.endpoint_url, ep.endpoint, ep.entry_date,  ep.start_date, ep.end_date
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}"
    and organisation like "local-authority-eng:%"
    order by 1, 2
    """

query_datasette(sql)



Unnamed: 0,organisation,documentation_url,start_date,end_date,endpoint_url,endpoint,entry_date,start_date.1,end_date.1
0,local-authority-eng:ADU,https://www.adur-worthing.gov.uk/planning-poli...,,2019-12-17,"https://www.adur-worthing.gov.uk/media/media,1...",3106ba8d16954b9e21a902c13a49046a0b2d37e4a8135c...,2018-05-22T00:00:00Z,,2019-12-17
1,local-authority-eng:ADU,https://www.adur-worthing.gov.uk/planning-poli...,2019-12-17,2023-08-09,"https://www.adur-worthing.gov.uk/media/media,1...",87f2583f9562c85268d236c150dcbb9da3b373fdb28dab...,2019-12-18T00:00:00Z,2019-12-17,2023-08-09
2,local-authority-eng:ADU,https://www.adur-worthing.gov.uk/planning-poli...,2023-07-06,,"https://www.adur-worthing.gov.uk/media/Media,1...",ea98ea4d156ee47ff09af98d96d09951395b58e66d8b5f...,2023-08-11T09:09:57Z,,
3,local-authority-eng:ALL,https://www.allerdale.gov.uk/en/planning-build...,2017-12-20,,https://df4iy9syor5px.cloudfront.net/media/fil...,9c2e8adfd12b4f474e7d511580029d1e69d1e08a17e0cb...,2018-05-22T00:00:00Z,2017-12-20,
4,local-authority-eng:ALL,https://www.allerdale.gov.uk/en/planning-build...,2017-12-20,2019-11-25,https://www-cloudfront.allerdale.gov.uk/media/...,5a0fcb2fdbe9d6f407b554642ab661a897a02c7a9e068a...,2018-07-30T00:00:00Z,2017-12-20,2019-11-25
...,...,...,...,...,...,...,...,...,...
1279,local-authority-eng:YOR,,,,https://opendata.arcgis.com/datasets/24e275a6e...,425a3e0cf53ef4980e9c133f850a6e9e1a34cfd2e444e4...,2018-05-22T00:00:00Z,,
1280,local-authority-eng:YOR,https://data.yorkopendata.org/dataset/brownfie...,,,https://data.yorkopendata.org/dataset/7b937604...,0f78fc9d126414d17f07e90d637ae76d65cdb09319fc20...,2023-08-10T13:13:26Z,,
1281,local-authority-eng:YOR,https://www.york.gov.uk/BrownfieldRegister,2017-12-22,,https://data.yorkopendata.org/dataset/7b937604...,ed8725acf5769b2dd2467dcab1783eebc46afcdd7718d6...,2019-12-01T00:00:00Z,2017-12-22,
1282,local-authority-eng:YOR,https://www.york.gov.uk/BrownfieldRegister,2017-12-22,,https://data.yorkopendata.org/dataset/7b937604...,2fc9a0b88861aa02584f2b90292bff7e1ccba9d420ef08...,2019-12-01T00:00:00Z,2017-12-22,


In [74]:
# LPAs that we have not updated

sql = F"""
    select organisation, strftime('%Y',ep.entry_date) as year, strftime('%m',ep.entry_date) as month 
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}" 
    and year = "{current_year}"
    order by 3 DESC
    """

query_datasette(sql)

Unnamed: 0,organisation,year,month
0,local-authority-eng:LIF,2023,10
1,local-authority-eng:GRT,2023,10
2,local-authority-eng:NGM,2023,09
3,local-authority-eng:DEB,2023,09
4,local-authority-eng:DST,2023,09
...,...,...,...
291,local-authority-eng:SRI,2023,07
292,local-authority-eng:SST,2023,07
293,local-authority-eng:STT,2023,07
294,local-authority-eng:TOB,2023,07


In [75]:
sql = F"""
    select organisation, count(ep.endpoint) as count
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}" 
    and ep.end_date != ""
    group by 1
    having count > 1
    order by 2 desc
    """

query_datasette(sql)

Unnamed: 0,organisation,count
0,local-authority-eng:BAR,11
1,local-authority-eng:TEI,7
2,national-park-authority:Q72617669,6
3,local-authority-eng:SRI,6
4,local-authority-eng:LEE,6
...,...,...
170,local-authority-eng:BAN,2
171,local-authority-eng:ASH,2
172,local-authority-eng:ASF,2
173,local-authority-eng:ARU,2


In [76]:
sql = F"""
    select organisation, ep.*, strftime('%Y',ep.entry_date) as year
    from source src 
	left outer join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}" 
    and ep.end_date != ""
    order by 1
    """

query_datasette(sql)

Unnamed: 0,organisation,end_date,endpoint,endpoint_url,entry_date,parameters,plugin,start_date,year
0,development-corporation:Q20648596,2023-08-09,5d30aee8c82e775dd4be67dd417bf782b33de8522edc1e...,https://www.london.gov.uk/sites/default/files/...,2018-07-05T00:00:00Z,,,2017-12-31,2018
1,local-authority-eng:ADU,2019-12-17,3106ba8d16954b9e21a902c13a49046a0b2d37e4a8135c...,"https://www.adur-worthing.gov.uk/media/media,1...",2018-05-22T00:00:00Z,,,,2018
2,local-authority-eng:ADU,2023-08-09,87f2583f9562c85268d236c150dcbb9da3b373fdb28dab...,"https://www.adur-worthing.gov.uk/media/media,1...",2019-12-18T00:00:00Z,,,2019-12-17,2019
3,local-authority-eng:ALL,2019-11-25,5a0fcb2fdbe9d6f407b554642ab661a897a02c7a9e068a...,https://www-cloudfront.allerdale.gov.uk/media/...,2018-07-30T00:00:00Z,,,2017-12-20,2018
4,local-authority-eng:ARU,2023-08-09,421cc4d3b8a060560139bbcede1787693ad9fbb503f0a9...,https://www.arun.gov.uk/download.cfm?doc=docm9...,2018-05-22T00:00:00Z,,,,2018
...,...,...,...,...,...,...,...,...,...
576,national-park-authority:Q72617669,2023-09-05,81bf05fd3a01607072e23896ec4e61979e2f572828147f...,https://www.northyorkmoors.org.uk/__data/asset...,2022-01-04T00:00:00Z,,,,2022
577,national-park-authority:Q72617784,2020-05-08,687aab26f2ffbf2738a0b3abf3f164d0795ce933decb32...,http://www.exmoor-nationalpark.gov.uk/__data/a...,2018-05-22T00:00:00Z,,,2017-12-19,2018
578,national-park-authority:Q72617784,2020-05-08,20d3494f78924f4f202612668ad78949b8fad0726890e2...,https://www.exmoor-nationalpark.gov.uk/__data/...,2019-12-14T00:00:00Z,,,,2019
579,national-park-authority:Q72617988,2019-12-14,f88e661f7b1e5b3dde4af9bf69d323261dec7cfd81eebd...,http://www.peakdistrict.gov.uk/__data/assets/f...,2018-05-22T00:00:00Z,,,2017-12-18,2018


In [77]:
http_sql = F"""
  select src.organisation, src.documentation_url, ep.* from source src
  inner join endpoint ep on src.endpoint = ep.endpoint
  where src.collection  = "{collection}"
  and src.organisation not in (SELECT organisation FROM {viewName})
  and src.organisation like "local-authority-eng%"
   and ep.end_date != ""
  order by src.organisation, ep.start_date DESC NULLS LAST
"""

df = query_datasette(http_sql)
df.to_csv('hitlist.csv', index=False)

df

Unnamed: 0,organisation,documentation_url,end_date,endpoint,endpoint_url,entry_date,parameters,plugin,start_date
0,local-authority-eng:CAB,https://www.cambridge.gov.uk/brownfield-land-r...,2019-11-24,2ef4e42d7e4d6124010ed891ecf89cc131f7f9098abbf6...,https://www.cambridge.gov.uk/sites/default/fil...,2018-05-22T00:00:00Z,,,
1,local-authority-eng:CHN,,2019-11-29,1f4daca5b252bd8ed2437f533c30c57937d63b45b92051...,http://www.chiltern.gov.uk/media/11738/Chilter...,2018-05-22T00:00:00Z,,,
2,local-authority-eng:CHW,https://consult.cheshirewestandchester.gov.uk/...,2020-03-14,197d1b990b5cc6ebfb194ff9d039de5a2b85af6b732e26...,https://consult.cheshirewestandchester.gov.uk/...,2019-12-08T00:00:00Z,,,
3,local-authority-eng:CRA,https://www.cravendc.gov.uk/planning/brownfiel...,2019-10-20,cd11a0daec5ec86adae6f34ee831af06d1a32235466fa3...,https://www.cravendc.gov.uk/media/5414/brownfi...,2018-05-22T00:00:00Z,,,
4,local-authority-eng:EDN,https://www.eden.gov.uk/planning-and-building/...,2019-11-25,d1eb92649597a50f2cf43c528c17c14414b595590c4468...,https://www.eden.gov.uk/media/4921/brownfield_...,2018-05-22T00:00:00Z,,,
...,...,...,...,...,...,...,...,...,...
65,local-authority-eng:WLL,https://go.walsall.gov.uk/planning/planning_po...,2023-09-05,7c9a93e427aef9a4e0ea799c9cdea1bb148fef8d6d750a...,https://go.walsall.gov.uk/Portals/0/Uploads/Pl...,2020-03-13T00:00:00Z,,,2020-03-13
66,local-authority-eng:WLL,https://go.walsall.gov.uk/planning/planning_po...,2023-09-05,86fe71308d7cc6075a6740cb5f79393241bebcfa17d36e...,https://go.walsall.gov.uk/Portals/0/Uploads/Pl...,2019-12-15T00:00:00Z,,,2018-12-17
67,local-authority-eng:WLL,https://go.walsall.gov.uk/planning/planning_po...,2023-09-05,bfdedb9b3291b4dfe9bbbef99548204071bd6eb5166832...,https://go.walsall.gov.uk/Portals/0/Uploads/Pl...,2020-03-13T00:00:00Z,,,
68,local-authority-eng:WSO,https://www.westsomersetonline.gov.uk/Planning...,2019-03-31,944cbf2c4ea26ced18bd14478f65611302d7e22883a238...,https://www.westsomersetonline.gov.uk/Docs/Bro...,2018-05-22T00:00:00Z,,,2017-12-31


In [78]:
http_sql = F"""
  select distinct src.organisation from source src
  where src.collection  = "{collection}"
  and src.organisation not in (SELECT organisation FROM {viewName})
  and src.organisation like "local-authority-eng%"
  order by src.organisation
"""
df = query_datasette(http_sql)

print ("These organisations need checking")
df

These organisations need checking


Unnamed: 0,organisation
0,local-authority-eng:AYL
1,local-authority-eng:CAB
2,local-authority-eng:CAN
3,local-authority-eng:CHN
4,local-authority-eng:CHW
5,local-authority-eng:COR
6,local-authority-eng:CRA
7,local-authority-eng:DAV
8,local-authority-eng:EDN
9,local-authority-eng:EDO
