####Look into external tools 
- Objective - Look into read, write, delete operations on Storage for external tables ONLY.   To identify good candidates that can be migrated to Managed Tables.
  - Identify if an external table is being leveraged by an external platform. If it is it would not be a good candidate until the write operations subside.

External Table -> main.default.mytable
- Leveraged by Databricks
- Leveraged by External Platforms, Read Only for Good Candidate

External Table -> main.default.xyz
- Leveraged by Databricks
- Good Candidate would be read, write, delete



#### External Tables

In [0]:
select distinct
  IT_TableName,
  Storage_userAgentHeader,
  IT_TableType
from
  slog.default.vw_storagelogs_information_schema
where
  -- lower(Storage_userAgentHeader) like '%databricks%' and
  lower(IT_TableType) like '%external%'

IT_TableName,Storage_userAgentHeader,IT_TableType
main.default.my_table,"Azure Blob FS/3.3 (AzulSystems,Inc. JavaJRE 17.0.13; Linux 5.15.0-1075-azure/amd64; SunJSSE-17.0; UNKNOWN/UNKNOWN) APN/1.0 unknown",EXTERNAL
main.default.my_table,APN/1.0 Databricks/1.0 DBR/null,EXTERNAL


#### Unique Agent Headers

In [0]:
SELECT
  Storage_userAgentHeader,
  Storage_Category,
  count(*)
FROM
  slog.default.vw_storagelogs_information_schema
group by all

Storage_userAgentHeader,Storage_Category,count(*)
"Go/go1.22.4 X:boringcrypto,nocoverageredesign (amd64-linux) go-autorest/v14.2.1 Azure-SDK-For-Go/v32.1.0 storagedatalake/2018-11-09",StorageRead,6
Azure Blob FS/3.3 (OracleCorporation JavaJRE 17.0.12; Linux 5.4.0-1145-azure-fips/amd64; SunJSSE-17.0; UNKNOWN/UNKNOWN) APN/1.0 Databricks/1.0 DBR/UNKNOWN,StorageRead,51
azsdk-js-data-tables/12.1.2 core-rest-pipeline/1.19.0 Node/18.20.4 OS/(arm64-Linux-5.10.234-225.910.amzn2.aarch64),StorageRead,1
azsdk-java-azure-storage-blob/12.29.0 (17.0.12; Linux; 5.4.0-1145-azure-fips),StorageRead,187
"Azure Blob FS/3.3 (AzulSystems,Inc. JavaJRE 17.0.13; Linux 5.15.0-1075-azure/amd64; SunJSSE-17.0; UNKNOWN/UNKNOWN) APN/1.0 unknown",StorageWrite,10994
azsdk-js-data-tables/12.1.2 core-rest-pipeline/1.19.0 Node/18.20.4 OS/(arm64-Linux-5.10.234-225.921.amzn2.aarch64),StorageRead,10
"Azure Blob FS/3.3 (AzulSystems,Inc. JavaJRE 1.8.0_412; Linux 5.15.0-1075-azure/amd64; SunJSSE-1.8; UNKNOWN/UNKNOWN) APN/1.0 unknown",StorageWrite,37232
"Azure Blob FS/3.3 (AzulSystems,Inc. JavaJRE 1.8.0_412; Linux 5.15.0-1075-azure/amd64; SunJSSE-1.8; UNKNOWN/UNKNOWN) APN/1.0 unknown",StorageRead,50885
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36",StorageDelete,6
SRP/1.0,StorageRead,184


#### Checking to see if there are any non-managed tables with a Databricks Agent Header

In [0]:
SELECT
  storage_time,
  IT_ParsedPath,
  IT_TableName,
  IT_TableType,
  Storage_userAgentHeader,
  Storage_Category
FROM
  slog.default.vw_storagelogs_information_schema
WHERE
  lower(Storage_userAgentHeader) like '%databricks%'
  and lower(IT_TableType) != 'managed'
GROUP BY
  storage_time,
  IT_ParsedPath,
  IT_TableName,
  IT_TableType,
  Storage_userAgentHeader,
  Storage_Category

storage_time,IT_ParsedPath,IT_TableName,IT_TableType,Storage_userAgentHeader,Storage_Category
2025-04-12T18:18:32.431Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-16T20:41:58.932Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-16T20:41:59.052Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-16T20:41:58.962Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-12T18:18:30.860Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-16T20:41:58.720Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-12T18:18:30.906Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-16T20:41:58.937Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-12T18:18:30.947Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead
2025-04-16T20:41:58.601Z,/stsezsandbox07/pos-dev/d1,main.default.my_table,EXTERNAL,APN/1.0 Databricks/1.0 DBR/null,StorageRead


#### Getting list candidates to upgrade to managed

In [0]:
-- Get the latest timestamp for the "eventhub_storage_log_setup" job
with last_run_timestamp as (
  select
    max(period_start_time) as last_run_timestamp
  from
    system.lakeflow.job_run_timeline
  where
    job_id = '1072068050595527'
)

-- Identify list of tables that are good candidates to move to managed tables
select
  src.Storage_AccountName,
  last_run_timestamp.last_run_timestamp as Job_Run_TS,
  src.IT_TableName,
  coalesce(sum(case when src.Storage_Category = 'StorageRead' then src.operation_count else 0 end), 0) as total_read_count,
  coalesce(sum(case when src.Storage_Category = 'StorageWrite' then src.operation_count else 0 end), 0) + 
  coalesce(sum(case when src.Storage_Category = 'StorageDelete' then src.operation_count else 0 end), 0) as write_delete_count,
  case
    when coalesce(sum(case when src.Storage_Category = 'StorageWrite' then src.operation_count else 0 end), 0) + 
    coalesce(sum(case when src.Storage_Category = 'StorageDelete' then src.operation_count else 0 end), 0) = 0 then 1
    else 0
  end as good_candidate,
  case
    when sum(case when lower(src.Storage_userAgentHeader) like '%databricks%' then 1 else 0 end) > 0 then 0
    else 1
  end as from_external_platform
from
  (
    select
      Storage_AccountName,
      Date(Storage_Time) as Storage_Date,
      lower(Storage_userAgentHeader) as Storage_userAgentHeader,
      IT_TableName,
      IT_TableType,
      Storage_Category,
      count(*) as operation_count
    from
      slog.default.vw_storagelogs_information_schema
    where
      IT_TableType = 'EXTERNAL' and
      Storage_Category in ('StorageRead', 'StorageWrite', 'StorageDelete')
    group by
      Storage_AccountName,
      Storage_Time,
      Storage_userAgentHeader,
      IT_TableName,
      IT_TableType,
      Storage_Category
  ) src
  join last_run_timestamp
group by
  src.Storage_AccountName,
  src.IT_TableName,
  last_run_timestamp.last_run_timestamp
order by
  total_read_count desc,
  write_delete_count desc

Storage_AccountName,Job_Run_TS,IT_TableName,total_read_count,write_delete_count,good_candidate,from_external_platform
stsezsandbox07,2025-04-18T11:27:55.123Z,main.default.my_table,66,26,0,0
stsezsandbox07,2025-04-18T11:27:55.123Z,pos_dev.retailer_na.pos_snapshots,13,0,1,1
stsezsandbox07,2025-04-18T11:27:55.123Z,pos_dev.retailer_na.pos_generator,4,0,1,1
stsezsandbox07,2025-04-18T11:27:55.123Z,pos_dev.retailer_na.pos_static,3,0,1,1


#View of Delta Tables not registered in Unity Catalog

Review all delta table storage paths that are not registred in Unity Catalog and see what operations are being done.

In [0]:
-- DELTA_LOG INTERROGATION OF PATHS
with external_tables_not_registered as (
  select
    Storage_AccountName,
    REGEXP_REPLACE(Storage_RelativePath, '^[^/]+://[^/]+/$', '') as Storage_RelativePath,
    Storage_Category,
    count(*) as operation_count
  from
    slog.default.vw_storagelogs_information_schema
  where
    IT_TableType is null
    and lower(REGEXP_REPLACE(Storage_RelativePath, '^[^/]+://[^/]+/$', '')) like '%/_delta_log'
    and lower(REGEXP_REPLACE(Storage_RelativePath, '^[^/]+://[^/]+/$', '')) not like '%unity%'
  group by
    Storage_AccountName,
    REGEXP_REPLACE(Storage_RelativePath, '^[^/]+://[^/]+/$', ''),
    Storage_Category
  order by
    operation_count desc
)
select
  *
from
  external_tables_not_registered
pivot (
  max(operation_count) for Storage_Category in ('StorageRead' as read_count, 'StorageWrite' as write_count, 'StorageDelete' as delete_count)
)

Storage_AccountName,Storage_RelativePath,read_count,write_count,delete_count
stsezsandbox07,/stsezsandbox07/pos-dev/d2/_delta_log,,1,
