I've found some issues with the get_endpoints and get_latest_endpoints query, which suggest that the latest resource may not be returned using these.

Issue initially raised looking at the compliance report, and comparing for a single resource the number of records coming back using the `get_fields_for_resource()` and `get_column_mappings_for_resource()` functions. I was finding that there were some resources which had column mapping results but no results for fields. 

In [1]:
import numpy as np
import pandas as pd
import os
import urllib
from master_report_endpoint_utils import *

pd.set_option("display.max_rows", 100)
# %pip install wget
# import wget

In [2]:
datasette_url = "https://datasette.planning.data.gov.uk/"

def get_datasette_results(sql):
  
    params = urllib.parse.urlencode({
        "sql": "{}".format(sql),
        "_size": "max"
    })
    
    url = f"{datasette_url}digital-land.csv?{params}"
    resource_df = pd.read_csv(url)
    return resource_df


In [4]:
# get endpoints for a single LPA and look at those for TPO
bir_endpoints_df = get_endpoints("local-authority-eng:BIR")

print(len(bir_endpoints_df))
bir_endpoints_df[bir_endpoints_df["collection"] == "conservation-area"]

2


Unnamed: 0,endpoint_url,status,exception,collection,pipelines,organisation,name,resource,maxentrydate,entrydate,end_date
1,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,conservation-area,local-authority-eng:BIR,Birmingham City Council,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2024-03-04T00:04:20Z,2023-11-14T00:00:00Z,


In this example `81ed286e34b43d1f9f3053e463a6151224b182538ce98f9064f43ebd30dc2973` is the resource returned for conservation-area dataset

In [11]:
# simplified example of the get_endpoints function query - showing duplication of resource field

q1 = """
select
    l.endpoint,
    e.endpoint_url,
    l.status,
    l.exception,
    s.collection,
    re.resource,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    log l
    inner join source s on l.endpoint = s.endpoint
    inner join resource_endpoint re on l.endpoint = re.endpoint
    inner join endpoint e on l.endpoint = e.endpoint
where
    s.organisation = "local-authority-eng:BIR" and collection="conservation-area" 
    
order by log_entry_date desc """

q1_df = get_datasette_results(q1)

q1_df.sort_values("log_entry_date", ascending = False).head(10)

Unnamed: 0,endpoint,endpoint_url,status,exception,collection,resource,log_entry_date,endpoint_entry_date
0,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2024-03-04T00:04:20Z,2023-11-14T00:00:00Z
1,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-03-04T00:04:20Z,2023-11-14T00:00:00Z
2,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2024-03-03T00:04:33Z,2023-11-14T00:00:00Z
3,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-03-03T00:04:33Z,2023-11-14T00:00:00Z
4,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2024-03-02T00:04:15Z,2023-11-14T00:00:00Z
5,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-03-02T00:04:15Z,2023-11-14T00:00:00Z
6,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2024-03-01T00:04:51Z,2023-11-14T00:00:00Z
7,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-03-01T00:04:51Z,2023-11-14T00:00:00Z
8,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2024-02-29T00:05:21Z,2023-11-14T00:00:00Z
9,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,https://maps.birmingham.gov.uk/server/rest/ser...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-02-29T00:05:21Z,2023-11-14T00:00:00Z


Above we can see we're getting two resource IDs for each log_entry_date, this is because the join to resource_endpoint is picking up all resources which have ever existed for that endpoint for each row, rather than just the resource ID for that log.


Taking out that join and using the resource field in the log table shows results which look more sensible, with three endpoints which have log entries for each day:

In [8]:
q2 = """
select
    l.endpoint,
    l.status,
    l.exception,
    s.collection,
    l.resource,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    log l
    inner join source s on l.endpoint = s.endpoint

    inner join endpoint e on l.endpoint = e.endpoint
where
    s.organisation = "local-authority-eng:BIR" and collection="conservation-area" 
    
order by log_entry_date desc
"""

q2_df = get_datasette_results(q2)

# q1_df.sort_values("log_entry_date", ascending = False)
print(len(q2_df))
q2_df.head(10)

111


Unnamed: 0,endpoint,status,exception,collection,resource,log_entry_date,endpoint_entry_date
0,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-03-04T00:04:20Z,2023-11-14T00:00:00Z
1,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-03-03T00:04:33Z,2023-11-14T00:00:00Z
2,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-03-02T00:04:15Z,2023-11-14T00:00:00Z
3,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-03-01T00:04:51Z,2023-11-14T00:00:00Z
4,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-02-29T00:05:21Z,2023-11-14T00:00:00Z
5,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-02-28T00:05:16Z,2023-11-14T00:00:00Z
6,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-02-27T00:05:38Z,2023-11-14T00:00:00Z
7,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-02-26T00:05:43Z,2023-11-14T00:00:00Z
8,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-02-25T00:05:30Z,2023-11-14T00:00:00Z
9,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2024-02-24T00:03:51Z,2023-11-14T00:00:00Z


Now we see, that actually the resource with most recent logs is `acb88aac41434c4cfccb9ee77f6471f5c682616617604cd0db502893e9c08579`.

The most recent log for `81ed286e34b43d1f9f3053e463a6151224b182538ce98f9064f43ebd30dc2973` was back in Nov. 2023.

In [None]:
q2_df[q2_df["resource"] == "81ed286e34b43d1f9f3053e463a6151224b182538ce98f9064f43ebd30dc2973"]

Unnamed: 0,endpoint,status,exception,collection,resource,log_entry_date,endpoint_entry_date
108,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2023-11-17T00:04:11Z,2023-11-14T00:00:00Z
109,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2023-11-16T00:04:15Z,2023-11-14T00:00:00Z
110,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2023-11-15T00:05:29Z,2023-11-14T00:00:00Z


This may not be an issue here because the `get_latest_resource_for_endpoint()` command is used to get the right resource for the endpoint in the `get_latest_endpoints()` command.
However, this doesn't look to be returning the right resource:

In [12]:
bir_endpoints_df[bir_endpoints_df["collection"] == "conservation-area"]["endpoint_url"].values

array(['https://maps.birmingham.gov.uk/server/rest/services/planx/PlanX/FeatureServer/4/query?where=1=1&outfields=*&f=geojson'],
      dtype=object)

In [13]:
get_latest_resource_for_endpoint("https://maps.birmingham.gov.uk/server/rest/services/planx/PlanX/FeatureServer/4/query?where=1=1&outfields=*&f=geojson")

Unnamed: 0,resource
0,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...


The `get_latest_resource_for_endpoint()` command is taking the resource with the latest entry_date from the [resource table]([`get_latest_resource_for_endpoint()`](https://datasette.planning.data.gov.uk/digital-land/resource)), but this field isn't populated.

Checking the resource table shows that the `81ed286e34b43d1f9f3053e463a6151224b182538ce98f9064f43ebd30dc2973` resource actually has an end date:

In [14]:
get_datasette_results("""
    select
        r.*
    from
        endpoint e
    inner join resource_endpoint re on e.endpoint = re.endpoint
    inner join resource r on re.resource = r.resource
    where
        e.endpoint_url='https://maps.birmingham.gov.uk/server/rest/services/planx/PlanX/FeatureServer/4/query?where=1=1&outfields=*&f=geojson'
    order by
        r.start_date desc
""")

Unnamed: 0,bytes,end_date,entry_date,mime_type,resource,start_date
0,485051,,,,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,2023-11-18
1,485282,2023-11-17,,,81ed286e34b43d1f9f3053e463a6151224b182538ce98f...,2023-11-15


Just changing to sort by start date won't work either, as retired resources can have a more recent start date than active ones, as in this example:

In [18]:
get_datasette_results("""
    select
        r.*
    from
        endpoint e
        inner join resource_endpoint re on e.endpoint = re.endpoint
        inner join resource r on re.resource = r.resource
    where
        e.endpoint_url='https://services7.arcgis.com/gTvs0fY5s2kJG3gO/arcgis/rest/services/SPE_Listed_Buildings/FeatureServer/3/query?outFields=*&where=1%3D1&f=geojson'
    order by
        r.start_date desc
""")

Unnamed: 0,bytes,end_date,entry_date,mime_type,resource,start_date
0,66,2024-02-10,,,5b581e6a7c56986ad78ec94f6f8ee3aefc2f9cee32b347...,2024-02-10
1,130966,,,,dd6c3678a9df12bd7836e7cf50bbbadc79e26420352a29...,2023-11-30
2,130960,2023-11-28,,,529353c25e457678ddc673328eafc21d646c711a5e787b...,2023-10-12


Checking for an endpoints resources with latest start date and no end date seems to return 0 records for some endpoints... need to test more.

But is an alternative approach just taking the resource with the latest log record? 
Below is alternative `get_endpoints()` function which uses the `latest_log` table, and looks like it returns correct resources in BIR conservation-area case. Need to check more thoroughly that it's not returning any others with end dates etc.

In [16]:
# recreating the get_endpoints function, but using latest_log table instead

def get_endpoints_new(organisation):
    if organisation:
        query = f" s.organisation = '{organisation}'"
    else:
        query = f" s.organisation LIKE '%'"
    params = urllib.parse.urlencode({
        "sql": f"""
select
    e.endpoint_url,
    l.endpoint,
    l.status,
    l.exception,
    s.collection,
    l.resource,
    sp.pipeline,
    s.organisation,
    o.name,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    most_recent_log l
    inner join source s on l.endpoint = s.endpoint
    inner join endpoint e on l.endpoint = e.endpoint
    inner join organisation o on o.organisation = replace(s.organisation, '-eng', '')
    inner join source_pipeline sp on s.source = sp.source
where
    {query} and not s.collection="brownfield-land" 

order by log_entry_date desc
        """,
        "_size": "max"
    })
    
    url = f"{datasette_url}digital-land.csv?{params}"

    try:
        endpoints_df = pd.read_csv(url)
    except:
        endpoints_df = pd.DataFrame({"organisation":[organisation]})
    
    return endpoints_df

In [None]:
bir_test = get_endpoints_new("local-authority-eng:BIR")

print(len(bir_test))
bir_test

2


Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date
0,https://maps.birmingham.gov.uk/server/rest/ser...,2d9575d771afff89f6d731be59a1ff8cedfd99efcd8bb2...,200,,article-4-direction,7a937605655b895bf9ebfbe29f8e35af8d3f606fd811b4...,article-4-direction-area,local-authority-eng:BIR,Birmingham City Council,2024-03-04T00:15:58Z,2023-11-14T00:00:00Z
1,https://maps.birmingham.gov.uk/server/rest/ser...,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,conservation-area,local-authority-eng:BIR,Birmingham City Council,2024-03-04T00:04:20Z,2023-11-14T00:00:00Z


Limitation of this approach:   
switched to using most_recent_log table because the old code wasn't correctly getting the latest resource. Correcting that code to rank and sort properly to get the latest resource causes challenges with data limit and timeouts.
So, using most_recent_log table is much easier. BUT this means that for endpoints which aren't 200, we can't get the resource from the last time it was 200 in this table, which is what the old code did. (Though this could be added as another step, which wouldn't be too fiddly.)

## Scrap

In [51]:
def get_funded_organisations():
    params = urllib.parse.urlencode({
        "sql": f"""
        select organisation, name, statistical_geography
        from organisation   
        where organisation in (
            select distinct organisation 
            from provision 
            where cohort IN (
                "ODP-Track1",
                "RIPA-BOPS",
                "ODP-Track3",
                "ODP-Track2"
            )
            and provision_reason = "expected")
        order by organisation
        """,
        "_size": "max"
        })
    url = f"https://datasette.planning.data.gov.uk/digital-land.csv?{params}"
    df = pd.read_csv(url)
    return df


funded_orgs_df = get_funded_organisations()
# add in old-style "-eng" names
funded_orgs_df["organisation_old"] = funded_orgs_df["organisation"].apply(lambda x: "-eng:".join(x.split(":")))

funded_orgs_df.head()

Unnamed: 0,organisation,name,statistical_geography,organisation_old
0,local-authority:BIR,Birmingham City Council,E08000025,local-authority-eng:BIR
1,local-authority:BNE,London Borough of Barnet,E09000003,local-authority-eng:BNE
2,local-authority:BOS,Bolsover District Council,E07000033,local-authority-eng:BOS
3,local-authority:CAT,Canterbury City Council,E07000106,local-authority-eng:CAT
4,local-authority:CMD,London Borough of Camden,E09000007,local-authority-eng:CMD


In [110]:
dataset_list = ['article-4-direction', 'article-4-direction-area', 'conservation-area', 'conservation-area-document', 'listed-building-outline', 'tree-preservation-order', 'tree-preservation-zone', 'tree']

# dictionary of results, with org name as key
# results_dict = {org_name : get_endpoints_new(org_name) for org_name in funded_orgs_df["organisation_old"].values}

# record orgs which didn't return any results
no_result_orgs = [v for v in funded_orgs_df["organisation_old"].values if len(results_dict[v]) == 0]
# concat results into df
endpoint_resource_df = pd.concat([results_dict[v] for v in funded_orgs_df["organisation_old"].values if len(results_dict[v]) > 0])

# filter to only records in pipelines we want
endpoint_resource_df = endpoint_resource_df[endpoint_resource_df["pipeline"].isin(dataset_list)]

print(len(endpoint_resource_df))
endpoint_resource_df.head()

96


Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date
0,https://maps.birmingham.gov.uk/server/rest/ser...,2d9575d771afff89f6d731be59a1ff8cedfd99efcd8bb2...,200.0,,article-4-direction,7a937605655b895bf9ebfbe29f8e35af8d3f606fd811b4...,article-4-direction-area,local-authority-eng:BIR,Birmingham City Council,2024-03-01T00:18:15Z,2023-11-14T00:00:00Z
1,https://maps.birmingham.gov.uk/server/rest/ser...,a09608d26986c205de7ab8dc54b5d76c776ca236a9ecf9...,200.0,,conservation-area,acb88aac41434c4cfccb9ee77f6471f5c682616617604c...,conservation-area,local-authority-eng:BIR,Birmingham City Council,2024-03-01T00:04:51Z,2023-11-14T00:00:00Z
0,https://open.barnet.gov.uk/download/2ylny/z7y/...,4d69e04b32ecfa83f9c17f1fed6f13a94dc8c839607dd8...,200.0,,article-4-direction,9656b52800fc3bec4c9baaacddbb9e27b801867439cab1...,article-4-direction,local-authority-eng:BNE,London Borough of Barnet,2024-03-01T00:18:15Z,2023-12-18T00:00:00Z
1,https://open.barnet.gov.uk/download/e5l77/dhv/...,d4c09389082ca55e33cf532eb045ea7eb9dc447a24547f...,200.0,,article-4-direction,8370346f35a81b8b3509f4e3645bb98e43951d09c5cf61...,article-4-direction-area,local-authority-eng:BNE,London Borough of Barnet,2024-03-01T00:18:15Z,2023-12-18T00:00:00Z
2,https://open.barnet.gov.uk/download/e5nge/ktw/...,57af3de9713f3fa5d70cedb44a444ae955b80d658774fd...,200.0,,tree-preservation-order,77fe8c4978ab17814a30f0d6fac7444026f17a14f84c12...,tree-preservation-order,local-authority-eng:BNE,London Borough of Barnet,2024-03-01T00:16:36Z,2023-11-07T11:11:48Z


In [112]:
org_dataset_count = endpoint_resource_df.groupby(["organisation", "pipeline"]).size().reset_index(name = "count")

org_dataset_count[org_dataset_count["count"] > 1]

Unnamed: 0,organisation,pipeline,count
10,local-authority-eng:CAT,article-4-direction-area,2
17,local-authority-eng:CMD,conservation-area,2
19,local-authority-eng:DNC,article-4-direction-area,4
20,local-authority-eng:DNC,conservation-area,2
21,local-authority-eng:DNC,listed-building-outline,2
25,local-authority-eng:DOV,article-4-direction-area,2
29,local-authority-eng:DOV,tree-preservation-order,2
33,local-authority-eng:EPS,conservation-area,2
47,local-authority-eng:GRY,tree-preservation-zone,2
56,local-authority-eng:NET,conservation-area,3


In [117]:
endpoint_resource_df[endpoint_resource_df["organisation"] == "local-authority-eng:GLO"].sort_values("pipeline").values

Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date
3,https://gcty.dynamicmaps.co.uk:8443/geoserver/...,5fc20cc8e733d12f262bab292b81cb38ff35fb8d17b850...,200.0,,article-4-direction,2b544d4d84578f6e104ce8802d12c1f85549ff52158264...,article-4-direction-area,local-authority-eng:GLO,Gloucester City Council,2024-03-01T00:18:15Z,2022-06-30T09:09:45Z
4,https://gcty.dynamicmaps.co.uk:8443/geoserver/...,3d871564775d1789a0e3fca4a2efed10a5c3219fdbdcf1...,200.0,,conservation-area,dc20276f1ee8f1b61ee85207664d7af37b217a04af7c32...,conservation-area,local-authority-eng:GLO,Gloucester City Council,2024-03-01T00:04:51Z,2022-11-15T17:09:14Z
5,https://gcty.dynamicmaps.co.uk:8443/geoserver/...,2b0ef9c4c1482e0f32139e231a146d52a3cde2e4148db8...,400.0,,listed-building,,listed-building-outline,local-authority-eng:GLO,Gloucester City Council,2024-02-26T00:15:59Z,2022-07-28T13:21:40Z


In [120]:
endpoint_resource_df[endpoint_resource_df["status"] != 200].sort_values("pipeline")

Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date
0,https://mapping.canterbury.gov.uk/arcgis/rest/...,351fdbd179616dcf25ce0c4498cbd7fd5a917c5bbedcbc...,502.0,,article-4-direction,,article-4-direction-area,local-authority-eng:CAT,Canterbury City Council,2024-03-01T00:18:15Z,2021-11-11T14:14:25Z
1,https://mapping.canterbury.gov.uk/arcgis/rest/...,d29690420f36280214af4fb7121d131c73d775fb9c6434...,502.0,,article-4-direction,,article-4-direction-area,local-authority-eng:CAT,Canterbury City Council,2024-03-01T00:18:15Z,2022-03-28T14:27:45Z
6,https://maps.doncaster.gov.uk/server/rest/serv...,fc6886730a30350775d915a11a8bf5644e42c32395315a...,,EsriDownloadError,article-4-direction,,article-4-direction-area,local-authority-eng:DNC,Doncaster Metropolitan Borough Council,2024-02-06T00:15:38Z,2023-08-30T10:10:04Z
4,https://mapping.canterbury.gov.uk/arcgis/rest/...,fc00c8a1bd0212fe6ae7e8fe9be5bc15bcccd68faf39aa...,502.0,,conservation-area,,conservation-area,local-authority-eng:CAT,Canterbury City Council,2024-03-01T00:04:51Z,2021-11-16T13:13:02Z
4,https://opendata.camden.gov.uk/api/geospatial/...,3e7bcbc7b099c29cf896423408043fdbd7e74e3a590fd4...,404.0,,conservation-area,,conservation-area,local-authority-eng:CMD,London Borough of Camden,2022-09-27T00:10:05Z,2020-09-12T18:13:42Z
7,http://myeebc.epsom-ewell.gov.uk/getOWS.ashx?M...,e1f6e219b7adc81361f0384476275dd486f12aa73368af...,,ConnectionError,conservation-area,,conservation-area,local-authority-eng:EPS,Epsom and Ewell Borough Council,2023-12-04T00:04:22Z,2020-09-06T10:16:45Z
9,https://datamillnorth.org/download/conservatio...,ff3a568c9cc2f76a4358e0a60a71a7c92c10f07df7e257...,404.0,,conservation-area,,conservation-area,local-authority-eng:NET,Newcastle City Council,2023-10-10T00:05:26Z,2020-09-06T15:39:06Z
9,https://gis.doncaster.gov.uk/arcgis/rest/servi...,3ee8504f4511028a881ec4c4bd3ba0d2e45571839a658e...,,,listed-building,945fde4abdc8b96d9981cbcfe4cc2ee54402d9949324ea...,listed-building-outline,local-authority-eng:DNC,Doncaster Metropolitan Borough Council,2022-10-31T00:24:12Z,2022-07-04T09:09:18Z
5,https://gcty.dynamicmaps.co.uk:8443/geoserver/...,2b0ef9c4c1482e0f32139e231a146d52a3cde2e4148db8...,400.0,,listed-building,,listed-building-outline,local-authority-eng:GLO,Gloucester City Council,2024-02-26T00:15:59Z,2022-07-28T13:21:40Z
7,https://tewkesburybc.s3.eu-west-2.amazonaws.co...,18b8591914390e3b482e630ca800ab90e75f1022406ed2...,403.0,,tree-preservation-order,,tree,local-authority-eng:TEW,Tewkesbury Borough Council,2024-02-29T00:14:46Z,2024-01-16T00:00:00Z


In [118]:
endpoint_resource_df[endpoint_resource_df["resource"] == "0fe950b55f7ff4425fc051fe9dc5eaa6d7dd18cea1e16bbb0b996f059b94524b"]

Unnamed: 0,endpoint_url,endpoint,status,exception,collection,resource,pipeline,organisation,name,log_entry_date,endpoint_entry_date


In [20]:
# get endpoints for a single LPA and look at those for TPO
# sal_endpoints_df = get_endpoints("local-authority-eng:SAL")

print(len(sal_endpoints_df))
sal_endpoints_df[sal_endpoints_df["collection"] == "article-4-direction"].values

9


array([['https://www.stalbans.gov.uk/sites/default/files/documents/publications/planning-building-control/Agile/Article_4_Direction_Dataset.json',
        200, nan, 'article-4-direction', 'article-4-direction-area',
        'local-authority-eng:SAL', 'St Albans City and District Council',
        '004e273e15af7f9c5ffe43cda70764da076e53c090c128f937031e63c7ce7a8d',
        '2024-02-29T00:16:02Z', '2024-02-06T00:00:00Z', nan],
       ['https://www.stalbans.gov.uk/sites/default/files/documents/publications/planning-building-control/Agile/Article_4_Area_Direction_Dataset.csv',
        200, nan, 'article-4-direction', 'article-4-direction',
        'local-authority-eng:SAL', 'St Albans City and District Council',
        '66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186fa95940c3a24034fd2',
        '2024-02-29T00:16:02Z', '2024-02-06T00:00:00Z', nan]],
      dtype=object)

In [23]:
q2 = """
select
    e.endpoint_url,
    l.status,
    l.exception,
    s.collection,
    re.resource,
    l.entry_date as log_entry_date,
    e.entry_date as endpoint_entry_date
from
    log l
    inner join source s on l.endpoint = s.endpoint
    inner join resource_endpoint re on l.endpoint = re.endpoint
    inner join endpoint e on l.endpoint = e.endpoint
where
    s.organisation = "local-authority-eng:SAL" and collection="article-4-direction" """

q2_df = get_datasette_results(q2)

q2_df.sort_values("log_entry_date").tail(10)

Unnamed: 0,endpoint_url,status,exception,collection,resource,log_entry_date,endpoint_entry_date
41,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-25T00:17:38Z,2024-02-06T00:00:00Z
18,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-25T00:17:38Z,2024-02-06T00:00:00Z
42,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-26T00:15:44Z,2024-02-06T00:00:00Z
19,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-26T00:15:44Z,2024-02-06T00:00:00Z
43,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-27T00:16:37Z,2024-02-06T00:00:00Z
20,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-27T00:16:37Z,2024-02-06T00:00:00Z
44,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-28T00:15:48Z,2024-02-06T00:00:00Z
21,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-28T00:15:48Z,2024-02-06T00:00:00Z
22,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186...,2024-02-29T00:16:02Z,2024-02-06T00:00:00Z
45,https://www.stalbans.gov.uk/sites/default/file...,200,,article-4-direction,004e273e15af7f9c5ffe43cda70764da076e53c090c128...,2024-02-29T00:16:02Z,2024-02-06T00:00:00Z


In [24]:
q2_df[["resource"]].drop_duplicates().values

array([['66ac40e5dd675da252b9bd15b5e4f63ded01ac65def186fa95940c3a24034fd2'],
       ['004e273e15af7f9c5ffe43cda70764da076e53c090c128f937031e63c7ce7a8d']],
      dtype=object)