v0.1.7
- Added
Configure Secrets
support todatabricks labs remorph configure-secrets
cli command (#254). TheConfigure Secrets
feature has been implemented in thedatabricks labs remorph
CLI command, specifically for the newconfigure-secrets
command. This addition allows users to establish Scope and Secrets within their Databricks Workspace, enhancing security and control over resource access. The implementation includes a newrecon_config_utils.py
file in thedatabricks/labs/remorph/helpers
directory, which contains classes and methods for managing Databricks Workspace secrets. Furthermore, theReconConfigPrompts
helper class has been updated to handle prompts for selecting sources, entering secret scope names, and handling overwrites. The CLI command has also been updated with a newconfigure_secrets
function and corresponding tests to ensure correct functionality. - Added handling for invalid alias usage by manipulating the AST (#219). The recent commit addresses the issue of invalid alias usage in SQL queries by manipulating the Abstract Syntax Tree (AST). It introduces a new method,
unalias_lca_in_select
, which unaliases Lateral Column Aliases (LCA) in the SELECT clause of a query. The AliasInfo class is added to manage aliases more effectively, with attributes for the name, expression, and a flag indicating if the alias name is the same as a column. Additionally, the execute.py file is modified to check for unsupported LCA using thelca_utils.check_for_unsupported_lca
method, improving the system's robustness when handling invalid aliases. Test cases are also added in the new file, test_lca_utils.py, to validate the behavior of thecheck_for_unsupported_lca
function, ensuring that SQL queries are correctly formatted for Snowflake dialect and avoiding errors due to invalid alias usage. - Added support for
databricks labs remorph generate-lineage
CLI command (#238). A new CLI command,databricks labs remorph generate-lineage
, has been added to generate lineage for input SQL files, taking the source dialect, input, and output directories as arguments. The command uses existing logic to generate a directed acyclic graph (DAG) and then creates a DOT file in the output directory using the DAG. The new command is supported by new functions_generate_dot_file_contents
,lineage_generator
, and methods in theRootTableIdentifier
andDAG
classes. The command has been manually tested and includes unit tests, with plans for adding integration tests in the future. The commit also includes a new methodtemp_dirs_for_lineage
and updates to theconfigure_secrets_databricks
method to handle a new source type "databricks". The command handles invalid input and raises appropriate exceptions. - Custom oracle tokenizer (#316). In this release, the remorph library has been updated to enhance its handling of Oracle databases. A custom Oracle tokenizer has been developed to map the
LONG
datatype to text (string) in the tokenizer, allowing for more precise parsing and manipulation ofLONG
columns in Oracle databases. The Oracle dialect in the configuration file has also been updated to utilize the new custom Oracle tokenizer. Additionally, the Oracle class from the snow module has been imported and integrated into the Oracle dialect. These improvements will enable the remorph library to manage Oracle databases more efficiently, with a particular focus on improving the handling of theLONG
datatype. The commit also includes updates to test files in the functional/oracle/test_long_datatype directory, which ensure the proper conversion of theLONG
datatype to text. Furthermore, a new test file has been added to the tests/unit/snow directory, which checks for compatibility with Oracle's long data type. These changes enhance the library's compatibility with Oracle databases, ensuring accurate handling and manipulation of theLONG
datatype in Oracle SQL and Databricks SQL. - Removed strict source dialect checks (#284). In the latest release, the
transpile
andgenerate_lineage
functions incli.py
have undergone changes to allow for greater flexibility in source dialect selection. Previously, onlysnowflake
ortsql
dialects were supported, but now any source dialect supported by SQLGLOT can be used, controlled by theSQLGLOT_DIALECTS
dictionary. Providing an unsupported source dialect will result in a validation error. Additionally, the input and output folder paths for thegenerate_lineage
function are now validated against the file system to ensure their existence and validity. In theinstall.py
file of thedatabricks/labs/remorph
package, the source dialect selection has been updated to useSQLGLOT_DIALECTS.keys()
, replacing the previous hardcoded list. This change allows for more flexibility in selecting the source dialect. Furthermore, recent updates to various test functions in thetest_install.py
file suggest that the source selection process has been modified, possibly indicating the addition of new sources or a change in source identification. These modifications provide greater flexibility in testing and potentially in the actual application. - Set Catalog, Schema from default Config (#312). A new feature has been added to our open-source library that allows users to specify the
catalog
andschema
configuration options as part of thetranspile
command-line interface (CLI). If these options are not provided, thetranspile
function in thecli.py
file will now set them to the values specified indefault_config
. This ensures that a default catalog and schema are used if they are not explicitly set by the user. Thelabs.yml
file has been updated to reflect these changes, with the addition of thecatalog-name
andschema-name
options to thecommands
object. Thedefault
property of thevalidation
object has also been updated totrue
, indicating that the validation step will be skipped by default. These changes provide increased flexibility and ease-of-use for users of thetranspile
functionality. - Support for Null safe equality join for databricks generator (#280). In this release, we have implemented support for a null-safe equality join in the Databricks generator, addressing issue #280. This feature introduces the use of the " <=> " operator in the generated SQL code instead of the
is not distinct from
syntax to ensure accurate comparisons when NULL values are present in the columns being joined. The Generator class has been updated with a new method, NullSafeEQ, which takes in an expression and returns the binary version of the expression using the " <=> " operator. The preprocess method in the Generator class has also been modified to include this new functionality. It is important to note that this change may require users to update their existing code to align with the new syntax in the Databricks environment. With this enhancement, the Databricks generator is now capable of performing null-safe equality joins, resulting in consistent results regardless of the presence of NULL values in the join conditions.
Contributors: @vil1, @sundarshankar89, @jimidle, @ganeshdogiparthi-db, @vijaypavann-db, @nfx, @ravit-db, @dependabot[bot], @kush6542, @bishwajit-db