I've found some issues with the get_endpoints and get_latest_endpoints query, which suggest that the latest resource may not be returned using these.

Issue initially raised looking at the compliance report, and comparing for a single resource the number of records coming back using the `get_fields_for_resource()` and `get_column_mappings_for_resource()` functions. I was finding that there were some resources which had column mapping results but no results for fields. 

In [31]:
import numpy as np
import pandas as pd
import os
import urllib
from master_report_endpoint_utils import *

pd.set_option("display.max_rows", 100)
# %pip install wget
# import wget

In [None]:
datasette_url = "https://datasette.planning.data.gov.uk/"

def get_datasette_results(sql):
  
    params = urllib.parse.urlencode({
        "sql": "{}".format(sql),
        "_size": "max"
    })
    
    url = f"{datasette_url}digital-land.csv?{params}"
    resource_df = pd.read_csv(url)
    return resource_df


In [5]:
# get endpoints for a single LPA and look at those for TPO
dov_endpoints_df = get_endpoints("local-authority-eng:DOV")

print(len(dov_endpoints_df))
dov_endpoints_df[dov_endpoints_df["collection"] == "tree-preservation-order"]

24


Unnamed: 0,endpoint_url,status,exception,collection,pipelines,organisation,name,resource,maxentrydate,entrydate,end_date
1,https://services-eu1.arcgis.com/xk4RA36G57mVH7...,200.0,,tree-preservation-order,tree,local-authority-eng:DOV,Dover District Council,12d72e771b966bc0d9234fc76bf8adcd454240600376ce...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z,
5,https://services-eu1.arcgis.com/xk4RA36G57mVH7...,200.0,,tree-preservation-order,tree-preservation-order,local-authority-eng:DOV,Dover District Council,f0082ba711ef431ccd5cc0c23c8c643fcbd6aec1c37161...,2023-12-22T00:14:17Z,2023-12-21T09:09:57Z,2023-12-22
18,https://services-eu1.arcgis.com/xk4RA36G57mVH7...,200.0,,tree-preservation-order,tree-preservation-zone,local-authority-eng:DOV,Dover District Council,0519df49c2ecc3c53948b4283704bfd5b905ac4db6e4b5...,2024-02-29T00:14:46Z,2023-10-11T11:11:16Z,
23,https://services-eu1.arcgis.com/xk4RA36G57mVH7...,200.0,,tree-preservation-order,tree-preservation-order,local-authority-eng:DOV,Dover District Council,2405352ba05c212e9734c05d03ca1bf9500a346b600878...,2024-02-29T00:14:46Z,2023-12-22T10:10:13Z,


In this example `0519df49c2ecc3c53948b4283704bfd5b905ac4db6e4b5a0ae709c1fc495bc81` is the resource returned for tree-preservation-zone dataset, with a maxentrydate of 2024-02-29T00:14:46Z.

In [61]:
# simplified example of the get_endpoints function query

q1 = """
select
    l.endpoint,
    l.status,
    l.exception,
    s.collection,
    re.resource,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    log l
    inner join source s on l.endpoint = s.endpoint
    inner join resource_endpoint re on l.endpoint = re.endpoint
    inner join endpoint e on l.endpoint = e.endpoint
where
    s.organisation = "local-authority-eng:DOV" and collection="tree-preservation-order" 
    
order by log_entry_date desc """

q1_df = get_datasette_results(q1)

q1_df.sort_values("log_entry_date", ascending = False).head(10)

Unnamed: 0,endpoint,status,exception,collection,resource,log_entry_date,endpoint_entry_date
0,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,0519df49c2ecc3c53948b4283704bfd5b905ac4db6e4b5...,2024-02-29T00:14:46Z,2023-10-11T11:11:16Z
44,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,aecba0ef0580987bf09ec391af53583fa30bcfedb0545e...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
31,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,411b42fc174616017e4e284e8c197f310d155c709edb42...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
32,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,48f85b32b46fab4cd285f06901ea7699b704bfbdcac0a8...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
33,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,4df3166fc737a10c4be8a181277bba2d099e1739613e09...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
34,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,5e8e9ad7940d36ca4acece400ab675b592be6808671a6c...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
35,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,6d4f1c13de56cebeb1788d52d77326437545ba68853afd...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
36,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,6f5c36a9594d79727cb37b2446ea932757ce248598fb74...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
37,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
38,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,7e9a0e71f3ddfe5a0fce6594f2e66ba58ce27f375b2b67...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z


In [29]:
# count of records with the same date
print(len(q1_df[q1_df["log_entry_date"] == "2024-02-29T00:14:46Z"]))
# q1_df[q1_df["log_entry_date"] == "2024-02-29T00:14:46Z"]

57


This is returning many resources for the same endpoint and collection, with the same log_entry_date, which doesn't make sense.

Realised, this is because of the join from log to resource_endpoint re on l.endpoint = re.endpoint, which for each row in log is bringing through every resource that the endpoint has ever had. 

Taking out that join and using the resource field in the log table shows results which look more sensible, with three endpoints which have log entries for each day:

In [37]:
q2 = """
select
    l.endpoint,
    l.status,
    l.exception,
    s.collection,
    l.resource,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    log l
    inner join source s on l.endpoint = s.endpoint

    inner join endpoint e on l.endpoint = e.endpoint
where
    s.organisation = "local-authority-eng:DOV" and s.collection="tree-preservation-order" 
    
order by log_entry_date desc
"""

q2_df = get_datasette_results(q2)

# q1_df.sort_values("log_entry_date", ascending = False)
print(len(q2_df))
q2_df.head(20)

340


Unnamed: 0,endpoint,status,exception,collection,resource,log_entry_date,endpoint_entry_date
0,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,2024-02-29T00:14:46Z,2023-10-11T11:11:16Z
1,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
2,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,2024-02-29T00:14:46Z,2023-12-22T10:10:13Z
3,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,2024-02-28T00:14:56Z,2023-10-11T11:11:16Z
4,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,2024-02-28T00:14:56Z,2023-10-26T12:12:45Z
5,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,2024-02-28T00:14:56Z,2023-12-22T10:10:13Z
6,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,2024-02-27T00:14:39Z,2023-10-11T11:11:16Z
7,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,2024-02-27T00:14:39Z,2023-10-26T12:12:45Z
8,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,2024-02-27T00:14:39Z,2023-12-22T10:10:13Z
9,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,2024-02-26T00:15:15Z,2023-10-11T11:11:16Z


In [36]:
# we can add in pipelines too
q3 = """
select
    l.endpoint,
    l.status,
    l.exception,
    s.collection,
    l.resource,
    sp.pipeline,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    log l
    inner join source s on l.endpoint = s.endpoint
    inner join endpoint e on l.endpoint = e.endpoint
    
    inner join source_pipeline sp on s.source = sp.source
where
    s.organisation = "local-authority-eng:DOV" and s.collection="tree-preservation-order" 
    
order by log_entry_date desc
"""

q3_df = get_datasette_results(q3)

# q1_df.sort_values("log_entry_date", ascending = False)
print(len(q3_df))
q3_df.head(20)

340


Unnamed: 0,endpoint,status,exception,collection,resource,pipeline,log_entry_date,endpoint_entry_date
0,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,2024-02-29T00:14:46Z,2023-10-11T11:11:16Z
1,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z
2,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,tree-preservation-order,2024-02-29T00:14:46Z,2023-12-22T10:10:13Z
3,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,2024-02-28T00:14:56Z,2023-10-11T11:11:16Z
4,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-28T00:14:56Z,2023-10-26T12:12:45Z
5,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,tree-preservation-order,2024-02-28T00:14:56Z,2023-12-22T10:10:13Z
6,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,2024-02-27T00:14:39Z,2023-10-11T11:11:16Z
7,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-27T00:14:39Z,2023-10-26T12:12:45Z
8,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,tree-preservation-order,2024-02-27T00:14:39Z,2023-12-22T10:10:13Z
9,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,2024-02-26T00:15:15Z,2023-10-11T11:11:16Z


In [67]:
# we can add in pipelines too
q4 = """
select
--    e.endpoint_url,
    l.endpoint,
    l.status,
    l.exception,
    s.collection,
    l.resource,
    sp.pipeline,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date,
    
    RANK() OVER(
      PARTITION BY l.endpoint, l.status, sp.pipeline
      ORDER BY l.resource DESC
      ) as resource_rank
    
from
    log l
    inner join source s on l.endpoint = s.endpoint
    inner join endpoint e on l.endpoint = e.endpoint
    
    inner join source_pipeline sp on s.source = sp.source
where
    s.organisation = "local-authority-eng:DOV" and s.collection="tree-preservation-order" 

order by log_entry_date desc
"""

q4_df = get_datasette_results(q4)

# q1_df.sort_values("log_entry_date", ascending = False)
print(len(q4_df))
q4_df.head().values

343


array([['0cfd31d51f6c7f8970b87a59b295083e383661d48ec12ea3ff5f652498b9b867',
        200, nan, 'tree-preservation-order',
        '73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b869dc397998ac795ef',
        'tree', '2024-03-01T00:16:36Z', '2023-10-26T12:12:45Z', 52],
       ['c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756c63cabdd8a01e065e7',
        200, nan, 'tree-preservation-order',
        '10e2dc4aa70c8612fdf2737847db1c94364e745658623748c69921ac40be0345',
        'tree-preservation-zone', '2024-03-01T00:16:36Z',
        '2023-10-11T11:11:16Z', 104],
       ['ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8e9e5158774d1d5824a',
        200, nan, 'tree-preservation-order',
        'd73e0e40dc03a8677414002059eef97446863a94a636b3e4d90a830917b28365',
        'tree-preservation-order', '2024-03-01T00:16:36Z',
        '2023-12-22T10:10:13Z', 15],
       ['0cfd31d51f6c7f8970b87a59b295083e383661d48ec12ea3ff5f652498b9b867',
        200, nan, 'tree-preservation-order',
        '73d773620d7c7bc689

In [68]:
q4_df[q4_df["endpoint"] == "0cfd31d51f6c7f8970b87a59b295083e383661d48ec12ea3ff5f652498b9b867"]

Unnamed: 0,endpoint,status,exception,collection,resource,pipeline,log_entry_date,endpoint_entry_date,resource_rank
0,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-03-01T00:16:36Z,2023-10-26T12:12:45Z,52
3,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z,52
6,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-28T00:14:56Z,2023-10-26T12:12:45Z,52
9,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-27T00:14:39Z,2023-10-26T12:12:45Z,52
12,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-26T00:15:15Z,2023-10-26T12:12:45Z,52
...,...,...,...,...,...,...,...,...,...
319,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,aecba0ef0580987bf09ec391af53583fa30bcfedb0545e...,tree,2023-10-30T00:15:48Z,2023-10-26T12:12:45Z,16
321,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,aecba0ef0580987bf09ec391af53583fa30bcfedb0545e...,tree,2023-10-29T00:16:04Z,2023-10-26T12:12:45Z,16
323,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,aecba0ef0580987bf09ec391af53583fa30bcfedb0545e...,tree,2023-10-28T00:13:20Z,2023-10-26T12:12:45Z,16
325,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,aecba0ef0580987bf09ec391af53583fa30bcfedb0545e...,tree,2023-10-27T00:14:05Z,2023-10-26T12:12:45Z,16


In [62]:
q4_df.head(10)

Unnamed: 0,endpoint,status,exception,collection,resource,pipeline,log_entry_date,endpoint_entry_date,rank
0,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z,1
1,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,2024-02-29T00:14:46Z,2023-10-11T11:11:16Z,1
2,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,tree-preservation-order,2024-02-29T00:14:46Z,2023-12-22T10:10:13Z,1
3,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-28T00:14:56Z,2023-10-26T12:12:45Z,2
4,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,2024-02-28T00:14:56Z,2023-10-11T11:11:16Z,2
5,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,tree-preservation-order,2024-02-28T00:14:56Z,2023-12-22T10:10:13Z,2
6,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-27T00:14:39Z,2023-10-26T12:12:45Z,3
7,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,2024-02-27T00:14:39Z,2023-10-11T11:11:16Z,3
8,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,tree-preservation-order,2024-02-27T00:14:39Z,2023-12-22T10:10:13Z,3
9,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-26T00:15:15Z,2023-10-26T12:12:45Z,4


In [41]:
q4_df[q4_df["resource_rank"] == 1]

Unnamed: 0,endpoint,status,exception,collection,resource,pipeline,log_entry_date,endpoint_entry_date,rank
0,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,2024-02-29T00:14:46Z,2023-10-26T12:12:45Z,1
1,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,2024-02-29T00:14:46Z,2023-10-11T11:11:16Z,1
2,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,200,,tree-preservation-order,d73e0e40dc03a8677414002059eef97446863a94a636b3...,tree-preservation-order,2024-02-29T00:14:46Z,2023-12-22T10:10:13Z,1
209,3c46c9628d91bb798b66479672035f07958c1aa5055924...,200,,tree-preservation-order,f0082ba711ef431ccd5cc0c23c8c643fcbd6aec1c37161...,tree-preservation-order,2023-12-22T00:14:17Z,2023-12-21T09:09:57Z,1


In [39]:
q3_df[["endpoint", "pipeline"]].drop_duplicates()

Unnamed: 0,endpoint,pipeline
0,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,tree-preservation-zone
1,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,tree
2,ff199b1dc647436cb33b44f767c27b282ac993f2c0b8b8...,tree-preservation-order
210,3c46c9628d91bb798b66479672035f07958c1aa5055924...,tree-preservation-order


In [70]:
# recreating the get_endpoints function, but using latest_log table instead

def get_endpoints_new(organisation):
    if organisation:
        query = f" s.organisation = '{organisation}'"
    else:
        query = f" s.organisation LIKE '%'"
    params = urllib.parse.urlencode({
        "sql": f"""
select
    e.endpoint_url,
    l.endpoint,
    l.status,
    l.exception,
    s.collection,
    l.resource,
    sp.pipeline,
    s.organisation,
    o.name,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    most_recent_log l
    inner join source s on l.endpoint = s.endpoint
    inner join endpoint e on l.endpoint = e.endpoint
    inner join organisation o on o.organisation = replace(s.organisation, '-eng', '')
    inner join source_pipeline sp on s.source = sp.source
where
    {query} and not s.collection="brownfield-land" 

order by log_entry_date desc
        """,
        "_size": "max"
    })
    
    url = f"{datasette_url}digital-land.csv?{params}"

    try:
        endpoints_df = pd.read_csv(url)
    except:
        endpoints_df = pd.DataFrame({"organisation":[organisation]})
    
    return endpoints_df

Limitation of this approach:   
switched to using most_recent_log table because the old code wasn't correctly getting the latest resource. Correcting that code to rank and sort properly to get the latest resource causes challenges with data limit and timeouts.
So, using most_recent_log table is much easier. BUT this means that for endpoints which aren't 200, we can't get the resource from the last time it was 200 in this table, which is what the old code did. (Though this could be added as another step, which wouldn't be too fiddly.)

In [48]:
dov_test = get_endpoints_new("local-authority-eng:DOV")

print(len(dov_test))
dov_test

15


Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date
0,https://www.dover.gov.uk/Planning/Planning-Pol...,bb47d70c6fd3dab53c9ec5890e8eae0209c121e6abb8e9...,404,,developer-contributions,,developer-agreement,local-authority-eng:DOV,Dover District Council,2024-03-01T00:22:55Z,2021-10-07T00:00:00Z
1,https://www.dover.gov.uk/Planning/Planning-Pol...,65c04fb830320a32ceb91061f02d85be6fdb0f55fcae67...,404,,developer-contributions,,developer-agreement-contribution,local-authority-eng:DOV,Dover District Council,2024-03-01T00:22:55Z,2021-10-07T00:00:00Z
2,https://www.dover.gov.uk/Planning/Planning-Pol...,94ffaa7ca746ca4922da6ce1ff5ae1cdd5a783614e1e3d...,404,,developer-contributions,,developer-agreement-transaction,local-authority-eng:DOV,Dover District Council,2024-03-01T00:22:55Z,2021-10-07T00:00:00Z
3,https://www.dover.gov.uk/Planning/Planning-Pol...,739d4ab8af624694d0a2960571e2fadad37c9d2d353138...,404,,developer-contributions,,developer-agreement,local-authority-eng:DOV,Dover District Council,2024-03-01T00:22:55Z,2021-12-31T11:11:11Z
4,https://www.dover.gov.uk/Planning/Planning-Pol...,dc6f131eadf4073ba19daea59baf20180bad8f07fbcf50...,404,,developer-contributions,,developer-agreement-contribution,local-authority-eng:DOV,Dover District Council,2024-03-01T00:22:55Z,2021-12-31T11:11:13Z
5,https://www.dover.gov.uk/Planning/Planning-Pol...,11d0f9dae62103b91e42742f01f2e92e7ceb18690cb7b9...,404,,developer-contributions,,developer-agreement-transaction,local-authority-eng:DOV,Dover District Council,2024-03-01T00:22:55Z,2021-12-31T11:11:15Z
6,https://services-eu1.arcgis.com/xk4RA36G57mVH7...,701a969f68db2fb8bae189534a1f32bb1cc5e357a3f67a...,200,,article-4-direction,ea2249d2057917107a8a5ed41d37e3e02684d65e350b5a...,article-4-direction-area,local-authority-eng:DOV,Dover District Council,2024-03-01T00:18:15Z,2023-12-21T11:11:26Z
7,https://services-eu1.arcgis.com/xk4RA36G57mVH7...,03b17776d0e7707bdb98768bdcac82cf7d01595a353603...,200,,article-4-direction,9787c6411008bbf132064cc5c76f1014923715e71f292d...,article-4-direction,local-authority-eng:DOV,Dover District Council,2024-03-01T00:18:15Z,2023-12-21T11:11:47Z
8,https://services-eu1.arcgis.com/xk4RA36G57mVH7...,c02b990ae3d48107c12c7482ab3a9b8662a2ca8b081756...,200,,tree-preservation-order,10e2dc4aa70c8612fdf2737847db1c94364e7456586237...,tree-preservation-zone,local-authority-eng:DOV,Dover District Council,2024-03-01T00:16:36Z,2023-10-11T11:11:16Z
9,https://services-eu1.arcgis.com/xk4RA36G57mVH7...,0cfd31d51f6c7f8970b87a59b295083e383661d48ec12e...,200,,tree-preservation-order,73d773620d7c7bc689446e572a7dfcbd9ebc8a04d2e70b...,tree,local-authority-eng:DOV,Dover District Council,2024-03-01T00:16:36Z,2023-10-26T12:12:45Z


In [51]:
def get_funded_organisations():
    params = urllib.parse.urlencode({
        "sql": f"""
        select organisation, name, statistical_geography
        from organisation   
        where organisation in (
            select distinct organisation 
            from provision 
            where cohort IN (
                "ODP-Track1",
                "RIPA-BOPS",
                "ODP-Track3",
                "ODP-Track2"
            )
            and provision_reason = "expected")
        order by organisation
        """,
        "_size": "max"
        })
    url = f"https://datasette.planning.data.gov.uk/digital-land.csv?{params}"
    df = pd.read_csv(url)
    return df


funded_orgs_df = get_funded_organisations()
# add in old-style "-eng" names
funded_orgs_df["organisation_old"] = funded_orgs_df["organisation"].apply(lambda x: "-eng:".join(x.split(":")))

funded_orgs_df.head()

Unnamed: 0,organisation,name,statistical_geography,organisation_old
0,local-authority:BIR,Birmingham City Council,E08000025,local-authority-eng:BIR
1,local-authority:BNE,London Borough of Barnet,E09000003,local-authority-eng:BNE
2,local-authority:BOS,Bolsover District Council,E07000033,local-authority-eng:BOS
3,local-authority:CAT,Canterbury City Council,E07000106,local-authority-eng:CAT
4,local-authority:CMD,London Borough of Camden,E09000007,local-authority-eng:CMD


In [110]:
dataset_list = ['article-4-direction', 'article-4-direction-area', 'conservation-area', 'conservation-area-document', 'listed-building-outline', 'tree-preservation-order', 'tree-preservation-zone', 'tree']

# dictionary of results, with org name as key
# results_dict = {org_name : get_endpoints_new(org_name) for org_name in funded_orgs_df["organisation_old"].values}

# record orgs which didn't return any results
no_result_orgs = [v for v in funded_orgs_df["organisation_old"].values if len(results_dict[v]) == 0]
# concat results into df
endpoint_resource_df = pd.concat([results_dict[v] for v in funded_orgs_df["organisation_old"].values if len(results_dict[v]) > 0])

# filter to only records in pipelines we want
endpoint_resource_df = endpoint_resource_df[endpoint_resource_df["pipeline"].isin(dataset_list)]

print(len(endpoint_resource_df))
endpoint_resource_df.head()

96


Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date
0,https://maps.birmingham.gov.uk/server/rest/ser...,2d9575d771afff89f6d731be59a1ff8cedfd99efcd8bb2...,200.0,,article-4-direction,7a937605655b895bf9ebfbe29f8e35af8d3f606fd811b4...,article-4-direction-area,local-authority-eng:BIR,Birmingham City Council,2024-03-01T00:18:15Z,2023-11-14T00:00:00Z
1,https://maps.birmingham.gov.uk/server/rest/ser...,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200.0,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,conservation-area,local-authority-eng:BIR,Birmingham City Council,2024-03-01T00:04:51Z,2023-11-14T00:00:00Z
0,https://open.barnet.gov.uk/download/2ylny/z7y/...,4d69e04b32ecfa83f9c17f1fed6f13a94dc8c839607dd8...,200.0,,article-4-direction,9656b52800fc3bec4c9baaacddbb9e27b801867439cab1...,article-4-direction,local-authority-eng:BNE,London Borough of Barnet,2024-03-01T00:18:15Z,2023-12-18T00:00:00Z
1,https://open.barnet.gov.uk/download/e5l77/dhv/...,d4c09389082ca55e33cf532eb045ea7eb9dc447a24547f...,200.0,,article-4-direction,8370346f35a81b8b3509f4e3645bb98e43951d09c5cf61...,article-4-direction-area,local-authority-eng:BNE,London Borough of Barnet,2024-03-01T00:18:15Z,2023-12-18T00:00:00Z
2,https://open.barnet.gov.uk/download/e5nge/ktw/...,57af3de9713f3fa5d70cedb44a444ae955b80d658774fd...,200.0,,tree-preservation-order,77fe8c4978ab17814a30f0d6fac7444026f17a14f84c12...,tree-preservation-order,local-authority-eng:BNE,London Borough of Barnet,2024-03-01T00:16:36Z,2023-11-07T11:11:48Z


In [112]:
org_dataset_count = endpoint_resource_df.groupby(["organisation", "pipeline"]).size().reset_index(name = "count")

org_dataset_count[org_dataset_count["count"] > 1]

Unnamed: 0,organisation,pipeline,count
10,local-authority-eng:CAT,article-4-direction-area,2
17,local-authority-eng:CMD,conservation-area,2
19,local-authority-eng:DNC,article-4-direction-area,4
20,local-authority-eng:DNC,conservation-area,2
21,local-authority-eng:DNC,listed-building-outline,2
25,local-authority-eng:DOV,article-4-direction-area,2
29,local-authority-eng:DOV,tree-preservation-order,2
33,local-authority-eng:EPS,conservation-area,2
47,local-authority-eng:GRY,tree-preservation-zone,2
56,local-authority-eng:NET,conservation-area,3


In [117]:
endpoint_resource_df[endpoint_resource_df["organisation"] == "local-authority-eng:GLO"].sort_values("pipeline").values

Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date
3,https://gcty.dynamicmaps.co.uk:8443/geoserver/...,5fc20cc8e733d12f262bab292b81cb38ff35fb8d17b850...,200.0,,article-4-direction,2b544d4d84578f6e104ce8802d12c1f85549ff52158264...,article-4-direction-area,local-authority-eng:GLO,Gloucester City Council,2024-03-01T00:18:15Z,2022-06-30T09:09:45Z
4,https://gcty.dynamicmaps.co.uk:8443/geoserver/...,3d871564775d1789a0e3fca4a2efed10a5c3219fdbdcf1...,200.0,,conservation-area,dc20276f1ee8f1b61ee85207664d7af37b217a04af7c32...,conservation-area,local-authority-eng:GLO,Gloucester City Council,2024-03-01T00:04:51Z,2022-11-15T17:09:14Z
5,https://gcty.dynamicmaps.co.uk:8443/geoserver/...,2b0ef9c4c1482e0f32139e231a146d52a3cde2e4148db8...,400.0,,listed-building,,listed-building-outline,local-authority-eng:GLO,Gloucester City Council,2024-02-26T00:15:59Z,2022-07-28T13:21:40Z


In [120]:
endpoint_resource_df[endpoint_resource_df["status"] != 200].sort_values("pipeline")

Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date
0,https://mapping.canterbury.gov.uk/arcgis/rest/...,351fdbd179616dcf25ce0c4498cbd7fd5a917c5bbedcbc...,502.0,,article-4-direction,,article-4-direction-area,local-authority-eng:CAT,Canterbury City Council,2024-03-01T00:18:15Z,2021-11-11T14:14:25Z
1,https://mapping.canterbury.gov.uk/arcgis/rest/...,d29690420f36280214af4fb7121d131c73d775fb9c6434...,502.0,,article-4-direction,,article-4-direction-area,local-authority-eng:CAT,Canterbury City Council,2024-03-01T00:18:15Z,2022-03-28T14:27:45Z
6,https://maps.doncaster.gov.uk/server/rest/serv...,fc6886730a30350775d915a11a8bf5644e42c32395315a...,,EsriDownloadError,article-4-direction,,article-4-direction-area,local-authority-eng:DNC,Doncaster Metropolitan Borough Council,2024-02-06T00:15:38Z,2023-08-30T10:10:04Z
4,https://mapping.canterbury.gov.uk/arcgis/rest/...,fc00c8a1bd0212fe6ae7e8fe9be5bc15bcccd68faf39aa...,502.0,,conservation-area,,conservation-area,local-authority-eng:CAT,Canterbury City Council,2024-03-01T00:04:51Z,2021-11-16T13:13:02Z
4,https://opendata.camden.gov.uk/api/geospatial/...,3e7bcbc7b099c29cf896423408043fdbd7e74e3a590fd4...,404.0,,conservation-area,,conservation-area,local-authority-eng:CMD,London Borough of Camden,2022-09-27T00:10:05Z,2020-09-12T18:13:42Z
7,http://myeebc.epsom-ewell.gov.uk/getOWS.ashx?M...,e1f6e219b7adc81361f0384476275dd486f12aa73368af...,,ConnectionError,conservation-area,,conservation-area,local-authority-eng:EPS,Epsom and Ewell Borough Council,2023-12-04T00:04:22Z,2020-09-06T10:16:45Z
9,https://datamillnorth.org/download/conservatio...,ff3a568c9cc2f76a4358e0a60a71a7c92c10f07df7e257...,404.0,,conservation-area,,conservation-area,local-authority-eng:NET,Newcastle City Council,2023-10-10T00:05:26Z,2020-09-06T15:39:06Z
9,https://gis.doncaster.gov.uk/arcgis/rest/servi...,3ee8504f4511028a881ec4c4bd3ba0d2e45571839a658e...,,,listed-building,945fde4abdc8b96d9981cbcfe4cc2ee54402d9949324ea...,listed-building-outline,local-authority-eng:DNC,Doncaster Metropolitan Borough Council,2022-10-31T00:24:12Z,2022-07-04T09:09:18Z
5,https://gcty.dynamicmaps.co.uk:8443/geoserver/...,2b0ef9c4c1482e0f32139e231a146d52a3cde2e4148db8...,400.0,,listed-building,,listed-building-outline,local-authority-eng:GLO,Gloucester City Council,2024-02-26T00:15:59Z,2022-07-28T13:21:40Z
7,https://tewkesburybc.s3.eu-west-2.amazonaws.co...,18b8591914390e3b482e630ca800ab90e75f1022406ed2...,403.0,,tree-preservation-order,,tree,local-authority-eng:TEW,Tewkesbury Borough Council,2024-02-29T00:14:46Z,2024-01-16T00:00:00Z


In [118]:
endpoint_resource_df[endpoint_resource_df["resource"] == "0fe950b55f7ff4425fc051fe9dc5eaa6d7dd18cea1e16bbb0b996f059b94524b"]

Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date


In [20]:
# get endpoints for a single LPA and look at those for TPO
# sal_endpoints_df = get_endpoints("local-authority-eng:SAL")

print(len(sal_endpoints_df))
sal_endpoints_df[sal_endpoints_df["collection"] == "article-4-direction"].values

9


array([['https://www.stalbans.gov.uk/sites/default/files/documents/publications/planning-building-control/Agile/Article_4_Direction_Dataset.json',
        200, nan, 'article-4-direction', 'article-4-direction-area',
        'local-authority-eng:SAL', 'St Albans City and District Council',
        '004e273e15af7f9c5ffe43cda70764da076e53c090c128f937031e63c7ce7a8d',
        '2024-02-29T00:16:02Z', '2024-02-06T00:00:00Z', nan],
       ['https://www.stalbans.gov.uk/sites/default/files/documents/publications/planning-building-control/Agile/Article_4_Area_Direction_Dataset.csv',
        200, nan, 'article-4-direction', 'article-4-direction',
        'local-authority-eng:SAL', 'St Albans City and District Council',
        '66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186fa95940c3a24034fd2',
        '2024-02-29T00:16:02Z', '2024-02-06T00:00:00Z', nan]],
      dtype=object)

In [23]:
q2 = """
select
    e.endpoint_url,
    l.status,
    l.exception,
    s.collection,
    re.resource,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    log l
    inner join source s on l.endpoint = s.endpoint
    inner join resource_endpoint re on l.endpoint = re.endpoint
    inner join endpoint e on l.endpoint = e.endpoint
where
    s.organisation = "local-authority-eng:SAL" and collection="article-4-direction" """

q2_df = get_datasette_results(q2)

q2_df.sort_values("log_entry_date").tail(10)

Unnamed: 0,endpoint_url,status,exception,collection,resource,log_entry_date,endpoint_entry_date
41,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-25T00:17:38Z,2024-02-06T00:00:00Z
18,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-25T00:17:38Z,2024-02-06T00:00:00Z
42,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-26T00:15:44Z,2024-02-06T00:00:00Z
19,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-26T00:15:44Z,2024-02-06T00:00:00Z
43,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-27T00:16:37Z,2024-02-06T00:00:00Z
20,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-27T00:16:37Z,2024-02-06T00:00:00Z
44,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-28T00:15:48Z,2024-02-06T00:00:00Z
21,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-28T00:15:48Z,2024-02-06T00:00:00Z
22,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-29T00:16:02Z,2024-02-06T00:00:00Z
45,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-29T00:16:02Z,2024-02-06T00:00:00Z


In [24]:
q2_df[["resource"]].drop_duplicates().values

array([['66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186fa95940c3a24034fd2'],
       ['004e273e15af7f9c5ffe43cda70764da076e53c090c128f937031e63c7ce7a8d']],
      dtype=object)