You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Databricks profiling pipeline should be able to successfully profile Databricks tables for sensitive data.
Databricks masking pipeline should be able to successfully mask Databricks tables that contain sensitive data and copy tables that don't contain sensitive data.
Actual Behavior
Databricks profiling pipeline fails because it is unable to find table storage paths.
Databricks masking pipeline fails because it is unable to find table storage paths.
Steps To Reproduce the Problem
After importing and configuring the Databricks pipelines per their READMEs. Trigger the pipelines with relevant parameters.
Version
Latest versions of pipelines in dcsazure_Databricks_to_Databricks folder, following import.
Additional Context
These pipelines both had been relying on the INFORMATION_SCHEMA.tables table. Specifically, pulling and modifying the value of STORAGE_SUB_DIRECTORY from that table in the pipelines in order to determine the table's storage path in the unity catalog.
This information will instead have to be pulled on a per-table basis, from either describe detail <catalog>.<schema>.<table> or describe table extended <catalog>.<schema>.<table>. However, the current strategy of using pipeline variables to pull and parse this value will no longer work since variables cannot be modified in parallel, and so it may be required to break out these queries and their parsing into a separate pipeline that is called per table.
Other options may be available, we will need to reach out to Databricks support to determine what, if any, those are.
The text was updated successfully, but these errors were encountered:
Expected Behavior
Actual Behavior
Steps To Reproduce the Problem
After importing and configuring the Databricks pipelines per their READMEs. Trigger the pipelines with relevant parameters.
Version
Latest versions of pipelines in
dcsazure_Databricks_to_Databricks
folder, following import.Additional Context
These pipelines both had been relying on the
INFORMATION_SCHEMA.tables
table. Specifically, pulling and modifying the value ofSTORAGE_SUB_DIRECTORY
from that table in the pipelines in order to determine the table's storage path in the unity catalog.Recently, per updated Databricks documentation, that value is now always
NULL
.This information will instead have to be pulled on a per-table basis, from either
describe detail <catalog>.<schema>.<table>
ordescribe table extended <catalog>.<schema>.<table>
. However, the current strategy of using pipeline variables to pull and parse this value will no longer work since variables cannot be modified in parallel, and so it may be required to break out these queries and their parsing into a separate pipeline that is called per table.Other options may be available, we will need to reach out to Databricks support to determine what, if any, those are.
The text was updated successfully, but these errors were encountered: