[2024-04-22] [BUG] Databricks Pipelines No Longer Work #4

JessicaLHartog · 2024-04-23T18:11:38Z

Expected Behavior

Databricks profiling pipeline should be able to successfully profile Databricks tables for sensitive data.
Databricks masking pipeline should be able to successfully mask Databricks tables that contain sensitive data and copy tables that don't contain sensitive data.

Actual Behavior

Databricks profiling pipeline fails because it is unable to find table storage paths.
Databricks masking pipeline fails because it is unable to find table storage paths.

Steps To Reproduce the Problem

After importing and configuring the Databricks pipelines per their READMEs. Trigger the pipelines with relevant parameters.

Version

Latest versions of pipelines in dcsazure_Databricks_to_Databricks folder, following import.

Additional Context

These pipelines both had been relying on the INFORMATION_SCHEMA.tables table. Specifically, pulling and modifying the value of STORAGE_SUB_DIRECTORY from that table in the pipelines in order to determine the table's storage path in the unity catalog.

Recently, per updated Databricks documentation, that value is now always NULL.

This information will instead have to be pulled on a per-table basis, from either describe detail <catalog>.<schema>.<table> or describe table extended <catalog>.<schema>.<table>. However, the current strategy of using pipeline variables to pull and parse this value will no longer work since variables cannot be modified in parallel, and so it may be required to break out these queries and their parsing into a separate pipeline that is called per table.

Other options may be available, we will need to reach out to Databricks support to determine what, if any, those are.

The text was updated successfully, but these errors were encountered:

JessicaLHartog mentioned this issue Jun 3, 2024

fix: databricks pipelines are broken following databricks update #10

Merged

JessicaLHartog closed this as completed Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2024-04-22] [BUG] Databricks Pipelines No Longer Work #4

[2024-04-22] [BUG] Databricks Pipelines No Longer Work #4

JessicaLHartog commented Apr 23, 2024

[2024-04-22] [BUG] Databricks Pipelines No Longer Work #4

[2024-04-22] [BUG] Databricks Pipelines No Longer Work #4

Comments

JessicaLHartog commented Apr 23, 2024

Expected Behavior

Actual Behavior

Steps To Reproduce the Problem

Version

Additional Context