# Dataset staleness and related issues

The notebook contains a range of queries that can be used to cast light on issues of data quality such as 'staleness'.

## Using

You need to 

1. Download the digital_land dataset from https://datasette.planning.data.gov.uk/digital-land
1. Point the `source_file` variable (below) at your download. 

The first time it runs, it takes a few minutes to build some indexes which speed up subsequent queries. 

### Variables

These determine what data is processed. 

* source_file - the file you downloaded above
* collection - the collection to process
* staleness_days - if data has been collected without changing for this number of days, then consider it to be stale
* recent_entry_cutoff - used to say if an endpoint was recently added ( greater than this value)
* organisation - used to show the endpoints for just one organisation.

In [1]:
import pandas as pd
import urllib.parse
import sqlite3

In [2]:
datasette_url = "https://datasette.planning.data.gov.uk/"

source_file = "/mnt/c/Users/MarkSmith/Downloads/digital-land_2023_11_16.sqlite3" # or whatever you called your download.
dest_file=  "/mnt/c/Users/MarkSmith/Downloads/entity_2023_11_16_crosstab.csv" # or wherever you want your output

collection = "brownfield-land"
staleness_days=365*3
current_year = "2023"
viewName = F"{collection}_{current_year}".replace("-", "_")

cnx = sqlite3.connect(source_file)

cursor = cnx.cursor()

def add_index (table, column) :
    cnx.execute(F"CREATE INDEX if not exists idx_{table}_{column} ON {table}({column})") 

add_index ("log", "endpoint")
add_index ("log", "resource")
add_index ("log", "status")

add_index ("source", "collection")
add_index ("source", "endpoint")
add_index ("source", "start_date")
add_index ("source", "end_date")

add_index ("endpoint", "endpoint")
add_index ("endpoint", "entry_date")
add_index ("endpoint", "start_date")
add_index ("endpoint", "end_date")

def query_datasette (query_text):
    return pd.read_sql_query(query_text, cnx)


In [3]:

sql = F"""CREATE VIEW IF NOT EXISTS {viewName} AS
    select organisation, strftime('%Y',ep.entry_date) as year, strftime('%m',ep.entry_date) as month 
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}" 
    and year = "{current_year}"
"""

sql
cnx.execute(sql)

query_datasette(F"SELECT * FROM {viewName}")

Unnamed: 0,organisation,year,month
0,local-authority-eng:LEE,2023,07
1,local-authority-eng:WOT,2023,07
2,local-authority-eng:BAN,2023,06
3,local-authority-eng:BAR,2023,07
4,local-authority-eng:BBD,2023,07
...,...,...,...
77,local-authority-eng:ERY,2023,07
78,local-authority-eng:ESK,2023,07
79,local-authority-eng:EST,2023,07
80,local-authority-eng:EXE,2023,07


# Live endpoints still using HTTP instead of HTTPS

These are candidates for being old, since http (compared with https) was retired years ago.

In [16]:
http_sql = F"""
  select src.organisation, src.endpoint, ep.entry_date, ep.endpoint_url, ep.start_date, ep.end_date  from source src
  inner join endpoint ep on src.endpoint = ep.endpoint
  where src.collection  = "{collection}"
  and ep.end_date = ""
  and ep.endpoint_url like "http:%"
  order by src.organisation, ep.entry_date ASC NULLS LAST
"""

query_datasette(http_sql)


Unnamed: 0,organisation,endpoint,entry_date,endpoint_url,start_date,end_date
0,development-corporation:Q6670544,4c238528f325bcca9a03697583d9f39a91ecf1ec1ecb66...,2018-07-05T00:00:00Z,http://www.queenelizabetholympicpark.co.uk/-/m...,,
1,local-authority-eng:AMB,c0fcdc64525d3be69736427cd397bf5a3c2d68104e8a90...,2018-05-22T00:00:00Z,http://info.ambervalley.gov.uk/shareddatasets/...,2017-12-22,
2,local-authority-eng:AMB,1945e57b32134a160bfb4972bcd6f86935b4c6fc0bd000...,2019-11-23T00:00:00Z,http://info.ambervalley.gov.uk/shareddatasets/...,,
3,local-authority-eng:AMB,239f7a38efe51fb6271d40bd76dd7413fecb458a5c4d1e...,2019-11-23T00:00:00Z,http://info.ambervalley.gov.uk/shareddatasets/...,,
4,local-authority-eng:AMB,533d03f6fd2b5717597c4f3057106b907f4b23c8fb2e7b...,2020-05-21T00:00:00Z,http://info.ambervalley.gov.uk/shareddatasets/...,2020-05-21,
...,...,...,...,...,...,...
101,local-authority-eng:WYE,36c1c91ee6ec46da952246ce66916adf3200f7b14d42f5...,2018-05-22T00:00:00Z,http://www.wyreforestdc.gov.uk/media/3585747/W...,,
102,local-authority-eng:WYO,466796e9f6e18591dead03d9e7836e02d6494ec1a5ac9e...,2018-05-22T00:00:00Z,http://data.wycombe.gov.uk/download/planning/b...,2019-06-21,
103,local-authority-eng:WYO,7b6a413b1c1018215fe400dae82e6366b80fb8f89c0240...,2019-11-24T00:00:00Z,http://data.wycombe.gov.uk/download/planning/b...,2019-08-21,
104,local-authority-eng:WYO,daf78166cf6af62a0b40625a2fd6c8401f27b9fffab4a9...,2019-11-24T00:00:00Z,http://data.wycombe.gov.uk/download/planning/b...,2019-08-21,


# Live endpoints with stale data

These are endpoints that have not given us any new data in {staleness_days}.

In [5]:
sql = F"""
	select src.organisation, src.start_date, src.end_date,  log.endpoint, log.resource, count (log.resource) as days_unchanged from log 
    inner join source src on src.endpoint = log.endpoint
	where log.status = 200
	and src.end_date = ""
	and src.collection = "{collection}"
    group by 1,2, 3, 4, 5
	having days_unchanged > {staleness_days}
	order by days_unchanged desc
	limit 100
"""

query_datasette(sql)

Unnamed: 0,organisation,start_date,end_date,endpoint,resource,days_unchanged
0,local-authority-eng:ALL,2017-12-20,,9c2e8adfd12b4f474e7d511580029d1e69d1e08a17e0cb...,3b694528538878d378fd6892a649fa634f0013ef299331...,1307
1,local-authority-eng:BDF,,,f8979dadf073ed0fa1373b6c018fa376833c5aa9e740a2...,12e709c847614f924fc6bd9dcb9c162816e59623887765...,1307
2,local-authority-eng:CAN,2017-12-01,,660ed3b8f8bffe8a326dfb597ddce7a3b4c9b28a04fe78...,4c524bd78ea05989aaaabdcad0cae87d48f4ad9eaf18a4...,1307
3,local-authority-eng:CAS,2017-12-01,,b7d3310848346fdcc872fb40dcb80697f05dfa1f593565...,cd0c958ec2843364957c2fd9757151fdfb6659bb8e0d51...,1307
4,local-authority-eng:CHR,,,26ed7ad003da61c66f9656ba61750b14a53b3ac74a1865...,1a349651d94dfdbdc44773a256313c391c7418f0bcad53...,1307
...,...,...,...,...,...,...
95,local-authority-eng:SBU,,,024ad59af8227d3c99024a506a1a87cd9106174b96e732...,024295e6c9a6d22e57e341b08dd494173b8e81c999f2c4...,1288
96,local-authority-eng:WKF,2018-11-27,,7a0cd914e9242ffb452d9baf90762aefddc652b495a767...,386d71e880a38c1c9cbb4bafd6a65d7dfb49aad9067d74...,1288
97,local-authority-eng:WRL,,,560d9611860046e86e142c95bf632c39b28f67e03bbd9e...,6a9a2fadde5d38023cc88d97ce04c3e52781d6f5332c79...,1288
98,local-authority-eng:YOR,2017-12-22,,2fc9a0b88861aa02584f2b90292bff7e1ccba9d420ef08...,7bbf483dd896de667a8476ab102c8aa657f01c9cb8dea5...,1288


# Endpoints with no documentation URL

Just from LPAs

In [6]:
sql = F"""
select organisation, entry_date,  source, start_date from source 
where collection = "{collection}" and documentation_url = ""  and end_date = "" and organisation like "local-authority-eng%"  
order by 2 
"""

query_datasette(sql)


Unnamed: 0,organisation,entry_date,source,start_date
0,local-authority-eng:HAL,2018-05-22T00:00:00Z,59748eb2bbe6634a99620f87acaf937e,2017-12-31
1,local-authority-eng:BPL,2018-05-22T00:00:00Z,fd37d9b9b86dea6c6bc26b960fcd8cea,2018-01-04
2,local-authority-eng:BAB,2018-05-22T00:00:00Z,f92ceb2080500b1b68f7d4b33ceadedc,2017-12-21
3,local-authority-eng:THE,2018-05-22T00:00:00Z,f7721c61cb8732bf0fcad7083400d972,
4,local-authority-eng:HIG,2018-05-22T00:00:00Z,efb9c28a1591def1700ee912bb07bfe1,
...,...,...,...,...
70,local-authority-eng:LCR,2020-03-10T00:00:00Z,4d9b46ce85aa741040eb3d5da299b9ed,2020-03-10
71,local-authority-eng:LCR,2020-03-10T00:00:00Z,67b0b64057e8e29ca86422d4df5efffd,2020-03-10
72,local-authority-eng:BUC,2021-08-26T00:00:00Z,69757e95c1b4df45ad4758df10374530,
73,local-authority-eng:WNUA,2021-08-26T00:00:00Z,c58211a8f5e8cf293ed2aa89c6a98ef0,


# Recently Added endpoints with no start date

A recently added edpoint should have a start date, either derived from the LPA documentation page or from the data.

In [7]:
recent_entry_cutoff= "2023-06-01"

sql = F"""
    select src.organisation, src.documentation_url,  src.start_date as start_date, src.end_date, ep.endpoint, ep.endpoint_url, ep.entry_date as entry_date
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	and src.collection = "{collection}"
	and ep.entry_date >= "{recent_entry_cutoff}"
	and ep.start_date = ""
    and src.end_date = ""
    order by entry_date DESC, src.organisation
    """

query_datasette(sql)


Unnamed: 0,organisation,documentation_url,start_date,end_date,endpoint,endpoint_url,entry_date
0,local-authority-eng:BLA,https://www.blaby.gov.uk/planning-and-building...,,,6b00487d6258bffd58dc54266fbc8d2426b4b5551e5ceb...,https://www.blaby.gov.uk/media/1757/brownfield...,2023-07-04T07:07:32Z
1,local-authority-eng:BIR,https://www.birmingham.gov.uk/info/20054/local...,,,41c6f5b8827fbd4a23949e9a354c5d527b14b978734e7a...,https://www.birmingham.gov.uk/download/downloa...,2023-07-04T07:07:31Z
2,local-authority-eng:BEN,https://www.brent.gov.uk/planning-and-building...,,,2d4bee3ace07e537a3dc44610fd48fed3c629f69b70ff4...,https://legacy.brent.gov.uk/media/16420302/bro...,2023-07-04T07:07:30Z
3,local-authority-eng:BEX,https://www.bexley.gov.uk/services/planning-an...,,,06aee45c3ecbb755ddf3a2184a3858308163456238d88e...,https://www.bexley.gov.uk/sites/default/files/...,2023-07-04T07:07:30Z
4,local-authority-eng:BAR,https://www.barrowbc.gov.uk/residents/planning...,,,c78c84a3bde8a5ea9bad3faa9e9a1146679975a454eb84...,https://www.barrowbc.gov.uk/_resources/assets/...,2023-07-04T07:07:29Z
5,local-authority-eng:BBD,https://www.blackburn.gov.uk/planning/planning...,,,c618cc6e158b44a78bcab28f07508182b8af75e818e075...,https://blackburn-darwen.org.uk/wp-content/upl...,2023-07-04T07:07:29Z
6,local-authority-eng:BAN,,,,8440f35a3e42919f5e9c851e98d5445a3c418c8ac314f6...,https://www.basingstoke.gov.uk/content/page/72...,2023-06-28T13:13:00Z


# Inspect a single LPA

You can use the query below to get a quick overview of what entries we have for a single LPA in the collection 

In [22]:
organisation= "local-authority-eng:BAR"


sql = F"""
    select src.organisation, src.documentation_url,  src.start_date, src.end_date, ep.endpoint_url, ep.endpoint, ep.entry_date as ep_entry_date,  ep.start_date as ep_start_date, ep.end_date as ep_end_date, ep.plugin
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.organisation = "{organisation}"
    and src.collection = "{collection}"
    order by ep.entry_date
    """

endpoints = query_datasette(sql)

for i in range(len(endpoints)):
    print("Organisation: " + endpoints.loc[i, "organisation"])
    print("documentation_url: " + endpoints.loc[i, "documentation_url"])
    print("endpoint_url: " + endpoints.loc[i, "endpoint_url"])
    print("plugin:  ", endpoints.loc[i, "plugin"])
    print("ep_entry_date:", endpoints.loc[i, "ep_entry_date"])
    print("ep_start_date:", endpoints.loc[i, "ep_start_date"])
    print("ep_end_date:  ", endpoints.loc[i, "ep_end_date"])

    print("\n")


Organisation: local-authority-eng:BAR
documentation_url: https://www.barrowbc.gov.uk/residents/planning/planning-policy/brownfield-land-register/
endpoint_url: https://data.barrowbc.gov.uk/dataset/3f587e30-a91e-4d0e-9525-a660706df8c0/resource/af1eb832-575e-46a0-a615-666daa5ab196/download/barrow-in-furnessbrownfieldregister2017-12-20rev1.csv
plugin:   
ep_entry_date: 2018-05-22T00:00:00Z
ep_start_date: 2017-12-20
ep_end_date:   


Organisation: local-authority-eng:BAR
documentation_url: https://www.barrowbc.gov.uk/residents/planning/planning-policy/brownfield-land-register/
endpoint_url: https://data.barrowbc.gov.uk/dataset/3f587e30-a91e-4d0e-9525-a660706df8c0/resource/e6c654ca-95c8-4896-a33c-43cd88bbad63/download/barrow-in-furnessbrownfieldregister2018-06-28rev1.csv
plugin:   
ep_entry_date: 2019-12-01T00:00:00Z
ep_start_date: 2018-06-28
ep_end_date:   


Organisation: local-authority-eng:BAR
documentation_url: https://www.barrowbc.gov.uk/residents/planning/planning-policy/brownfield-l

In [9]:
sql = F"""
    select src.organisation, src.documentation_url, src.start_date, src.end_date, ep.endpoint_url, ep.endpoint, ep.entry_date,  ep.start_date, ep.end_date
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}"
    and organisation like "local-authority-eng:%"
    order by 1, 2
    """

query_datasette(sql)



Unnamed: 0,organisation,documentation_url,start_date,end_date,endpoint_url,endpoint,entry_date,start_date.1,end_date.1
0,local-authority-eng:ADU,https://www.adur-worthing.gov.uk/planning-poli...,,2019-12-17,"https://www.adur-worthing.gov.uk/media/media,1...",3106ba8d16954b9e21a902c13a49046a0b2d37e4a8135c...,2018-05-22T00:00:00Z,,2019-12-17
1,local-authority-eng:ADU,https://www.adur-worthing.gov.uk/planning-poli...,2019-12-17,,"https://www.adur-worthing.gov.uk/media/media,1...",87f2583f9562c85268d236c150dcbb9da3b373fdb28dab...,2019-12-18T00:00:00Z,2019-12-17,
2,local-authority-eng:ADU,https://www.adur-worthing.gov.uk/planning-poli...,2023-07-06,,"https://www.adur-worthing.gov.uk/media/Media,1...",ea98ea4d156ee47ff09af98d96d09951395b58e66d8b5f...,2023-07-06T11:11:52Z,2023-07-06,
3,local-authority-eng:ALL,https://www.allerdale.gov.uk/en/planning-build...,2017-12-20,,https://df4iy9syor5px.cloudfront.net/media/fil...,9c2e8adfd12b4f474e7d511580029d1e69d1e08a17e0cb...,2018-05-22T00:00:00Z,2017-12-20,
4,local-authority-eng:ALL,https://www.allerdale.gov.uk/en/planning-build...,2017-12-20,2019-11-25,https://www-cloudfront.allerdale.gov.uk/media/...,5a0fcb2fdbe9d6f407b554642ab661a897a02c7a9e068a...,2018-07-30T00:00:00Z,2017-12-20,2019-11-25
...,...,...,...,...,...,...,...,...,...
1100,local-authority-eng:WYR,https://www.wyre.gov.uk/info/200317/planning_p...,2020-10-08,,http://www.wyre.gov.uk/download/downloads/id/6...,4f2c2b32a7fb44c2a778354ff3a42688f888f42bd7a344...,2020-10-08T00:00:00Z,2020-10-08,
1101,local-authority-eng:YOR,,,,https://opendata.arcgis.com/datasets/24e275a6e...,425a3e0cf53ef4980e9c133f850a6e9e1a34cfd2e444e4...,2018-05-22T00:00:00Z,,
1102,local-authority-eng:YOR,https://www.york.gov.uk/BrownfieldRegister,2017-12-22,,https://data.yorkopendata.org/dataset/7b937604...,ed8725acf5769b2dd2467dcab1783eebc46afcdd7718d6...,2019-12-01T00:00:00Z,2017-12-22,
1103,local-authority-eng:YOR,https://www.york.gov.uk/BrownfieldRegister,2017-12-22,,https://data.yorkopendata.org/dataset/7b937604...,2fc9a0b88861aa02584f2b90292bff7e1ccba9d420ef08...,2019-12-01T00:00:00Z,2017-12-22,


In [10]:
# LPAs that we have not updated

sql = F"""
    select organisation, strftime('%Y',ep.entry_date) as year, strftime('%m',ep.entry_date) as month 
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}" 
    and year = "{current_year}"
    order by year+month asc
    """

query_datasette(sql)

Unnamed: 0,organisation,year,month
0,local-authority-eng:BAN,2023,06
1,local-authority-eng:LEE,2023,07
2,local-authority-eng:WOT,2023,07
3,local-authority-eng:BAR,2023,07
4,local-authority-eng:BBD,2023,07
...,...,...,...
77,local-authority-eng:ERY,2023,07
78,local-authority-eng:ESK,2023,07
79,local-authority-eng:EST,2023,07
80,local-authority-eng:EXE,2023,07


In [19]:
sql = F"""
    select organisation, count(ep.endpoint) as count
	from source src 
	inner join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}" 
    and ep.end_date == ""
    group by 1
    having count > 1
    order by 2 desc
    """

query_datasette(sql)

Unnamed: 0,organisation,count
0,local-authority-eng:BAR,12
1,local-authority-eng:BNH,9
2,local-authority-eng:WIN,8
3,local-authority-eng:SND,6
4,local-authority-eng:PLY,6
...,...,...
240,local-authority-eng:BEX,2
241,local-authority-eng:BDG,2
242,local-authority-eng:ALL,2
243,local-authority-eng:ADU,2


In [12]:
sql = F"""
    select organisation, ep.*, strftime('%Y',ep.entry_date) as year
    from source src 
	left outer join endpoint ep on ep.endpoint = src.endpoint
	where src.collection = "{collection}" 
    and ep.end_date != ""
    order by 1
    """

query_datasette(sql)

Unnamed: 0,organisation,end_date,endpoint,endpoint_url,entry_date,parameters,plugin,start_date,year
0,local-authority-eng:ADU,2019-12-17,3106ba8d16954b9e21a902c13a49046a0b2d37e4a8135c...,"https://www.adur-worthing.gov.uk/media/media,1...",2018-05-22T00:00:00Z,,,,2018
1,local-authority-eng:ALL,2019-11-25,5a0fcb2fdbe9d6f407b554642ab661a897a02c7a9e068a...,https://www-cloudfront.allerdale.gov.uk/media/...,2018-07-30T00:00:00Z,,,2017-12-20,2018
2,local-authority-eng:ASF,2020-03-06,d39c74cd0801e8161d4e16dfedbd5a396d609ba6ad628b...,https://www.ashford.gov.uk/media/5782/ashford_...,2018-05-22T00:00:00Z,,,2018-02-26,2018
3,local-authority-eng:ASF,2020-03-06,32c99b76fe0a4d0074672adc8596bdd05155f9e60949b7...,https://www.ashford.gov.uk/media/7259/ashford_...,2019-11-24T00:00:00Z,,,2018-12-17,2019
4,local-authority-eng:ASH,2019-11-25,6aaff7289b4e8add599540462712becdb1f33f6cb9a048...,https://www.ashfield.gov.uk/media/3811/ashfiel...,2018-05-22T00:00:00Z,,,2017-12-18,2018
...,...,...,...,...,...,...,...,...,...
320,national-park-authority:Q72617669,2020-04-02,62602294b4f4f77f8b25db64986ec4dfa3c19e4c0e2f1e...,https://www.northyorkmoors.org.uk/__data/asset...,2019-12-19T00:00:00Z,,,2019-12-17,2019
321,national-park-authority:Q72617784,2020-05-08,687aab26f2ffbf2738a0b3abf3f164d0795ce933decb32...,http://www.exmoor-nationalpark.gov.uk/__data/a...,2018-05-22T00:00:00Z,,,2017-12-19,2018
322,national-park-authority:Q72617784,2020-05-08,20d3494f78924f4f202612668ad78949b8fad0726890e2...,https://www.exmoor-nationalpark.gov.uk/__data/...,2019-12-14T00:00:00Z,,,,2019
323,national-park-authority:Q72617988,2019-12-14,f88e661f7b1e5b3dde4af9bf69d323261dec7cfd81eebd...,http://www.peakdistrict.gov.uk/__data/assets/f...,2018-05-22T00:00:00Z,,,2017-12-18,2018


In [13]:
http_sql = F"""
  select src.organisation, src.documentation_url, ep.* from source src
  inner join endpoint ep on src.endpoint = ep.endpoint
  where src.collection  = "{collection}"
  and src.organisation not in (SELECT organisation FROM {viewName})
  and src.organisation like "local-authority-eng%"
   and ep.end_date != ""
  order by src.organisation, ep.start_date DESC NULLS LAST
"""

df = query_datasette(http_sql)
df.to_csv('hitlist.csv', index=False)

df

Unnamed: 0,organisation,documentation_url,end_date,endpoint,endpoint_url,entry_date,parameters,plugin,start_date
0,local-authority-eng:BDF,https://www.bedford.gov.uk/planning-and-buildi...,2019-11-01,24c96d2f4f5dd16c24a94b871f6660752c8be5d26b88a6...,https://www.bedford.gov.uk/environment_and_pla...,2018-05-22T00:00:00Z,,,
1,local-authority-eng:CAB,https://www.cambridge.gov.uk/brownfield-land-r...,2019-11-24,2ef4e42d7e4d6124010ed891ecf89cc131f7f9098abbf6...,https://www.cambridge.gov.uk/sites/default/fil...,2018-05-22T00:00:00Z,,,
2,local-authority-eng:CHN,,2019-11-29,1f4daca5b252bd8ed2437f533c30c57937d63b45b92051...,http://www.chiltern.gov.uk/media/11738/Chilter...,2018-05-22T00:00:00Z,,,
3,local-authority-eng:CHO,,2020-01-30,f3b25a2c0d403adb6d72d4cee56c1a977e18aaff86d7d8...,http://chorley.gov.uk/Documents/Planning/Plann...,2018-05-22T00:00:00Z,,,2017-10-12
4,local-authority-eng:CHW,https://consult.cheshirewestandchester.gov.uk/...,2020-03-14,197d1b990b5cc6ebfb194ff9d039de5a2b85af6b732e26...,https://consult.cheshirewestandchester.gov.uk/...,2019-12-08T00:00:00Z,,,
...,...,...,...,...,...,...,...,...,...
225,local-authority-eng:WYC,https://www.wychavon.gov.uk/brownfield-land-re...,2019-11-30,101ca55e27b01d7632a888af70f3dc382c05aa83e2e496...,https://www.wychavon.gov.uk/documents/10586/88...,2018-05-22T00:00:00Z,,,
226,local-authority-eng:WYE,http://www.wyreforestdc.gov.uk/planning-and-bu...,2020-07-08,000a05de97fae4ea6e8c7ebbb6adf7ed07f595ef5cf322...,http://www.wyre.gov.uk/download/downloads/id/5...,2019-12-22T00:00:00Z,,,
227,local-authority-eng:WYO,https://www.wycombe.gov.uk/pages/Planning-and-...,2019-11-01,1234a961deb842fca9c1a660ca8d02ae26a8273341df8a...,https://www.wycombe.gov.uk/uploads/public/docu...,2019-10-21T00:00:00Z,,,
228,local-authority-eng:WYR,,2020-10-08,f52de3d0d6050f4313917a3cd4e0728a3400a62df501f8...,http://www.wyre.gov.uk/download/downloads/id/4...,2018-05-22T00:00:00Z,,,2019-11-24


In [14]:
http_sql = F"""
  select distinct src.organisation from source src
  where src.collection  = "{collection}"
  and src.organisation not in (SELECT organisation FROM {viewName})
  and src.organisation like "local-authority-eng%"
  order by src.organisation
"""
df = query_datasette(http_sql)

print ("These organisations need checking")
df

These organisations need checking


Unnamed: 0,organisation
0,local-authority-eng:AYL
1,local-authority-eng:BAS
2,local-authority-eng:BDF
3,local-authority-eng:BDG
4,local-authority-eng:BUR
...,...
249,local-authority-eng:WYC
250,local-authority-eng:WYE
251,local-authority-eng:WYO
252,local-authority-eng:WYR
