Skip to content

Releases: databrickslabs/lsql

v0.4.3

08 May 09:19
@nfx nfx
9032c9d
Compare
Choose a tag to compare
  • Bump actions/checkout from 4.1.2 to 4.1.3 (#97). The actions/checkout dependency has been updated from version 4.1.2 to 4.1.3 in the update-main-version.yml file. This new version includes a check to verify the git version before attempting to disable sparse-checkout, and adds an SSH user parameter to improve functionality and compatibility. The release notes and CHANGELOG.md file provide detailed information on the specific changes and improvements. The pull request also includes a detailed commit history and links to corresponding issues and pull requests on GitHub for transparency. You can review and merge the pull request to update the actions/checkout dependency in your project.
  • Maintain PySpark compatibility for databricks.labs.lsql.core.Row (#99). In this release, we have added a new method asDict to the Row class in the databricks.labs.lsql.core module to maintain compatibility with PySpark. This method returns a dictionary representation of the Row object, with keys corresponding to column names and values corresponding to the values in each column. Additionally, we have modified the fetch function in the backends.py file to return Row objects of pyspark.sql when using self._spark.sql(sql).collect(). This change is temporary and marked with a TODO comment, indicating that it will be addressed in the future. We have also added error handling code in the fetch function to ensure the function operates as expected. The asDict method in this implementation simply calls the existing as_dict method, meaning the behavior of the asDict method is identical to the as_dict method. The as_dict method returns a dictionary representation of the Row object, with keys corresponding to column names and values corresponding to the values in each column. The optional recursive argument in the asDict method, when set to True, enables recursive conversion of nested Row objects to nested dictionaries. However, this behavior is not currently implemented, and the recursive argument is always False by default.

Dependency updates:

  • Bump actions/checkout from 4.1.2 to 4.1.3 (#97).

Contributors: @dependabot[bot], @bishwajit-db

v0.4.2

19 Apr 17:13
@nfx nfx
a582dba
Compare
Choose a tag to compare
  • Added more NotFound error type (#94). In the latest update, the core.py file in the databricks/labs/lsql package has undergone enhancements to the error handling functionality. The _raise_if_needed function has been modified to raise a NotFound error when the error message includes the phrase "does not exist". This update enables the system to categorize specific SQL query errors as NotFound error messages, thereby improving the overall error handling and reporting capabilities. This change was a collaborative effort, as indicated by the co-authored-by statement in the commit.

Contributors: @nkvuong

v0.4.1

12 Apr 12:30
@nfx nfx
5782b23
Compare
Choose a tag to compare
  • Fixing ovewrite integration tests (#92). A new enhancement has been implemented for the overwrite feature's integration tests, addressing a concern with write operations. Two new variables, catalog and "schema", have been incorporated using the env_or_skip function. These variables are utilized in the save_table method, which is now invoked twice with the same table, once with the append and once with the overwrite option. The data in the table is retrieved and checked for accuracy after each call, employing the updated Row class with revised field names first and "second", formerly name and "id". This modification ensures the proper operation of the overwrite feature during integration tests and resolves any related issues. The commit message Fixing overwrite integration tests signifies this change.

Contributors: @william-conti

v0.4.0

11 Apr 17:27
@nfx nfx
8f3d164
Compare
Choose a tag to compare
  • Added catalog and schema parameters to execute and fetch (#90). In this release, we have added optional catalog and schema parameters to the execute and fetch methods in the SqlBackend abstract base class, allowing for more flexibility when executing SQL statements in specific catalogs and schemas. These updates include new method signatures and their respective implementations in the SparkSqlBackend and DatabricksSqlBackend classes. The new parameters control the catalog and schema used by the SparkSession instance in the SparkSqlBackend class and the SqlClient instance in the DatabricksSqlBackend class. This enhancement enables better functionality in multi-catalog and multi-schema environments. Additionally, this change comes with unit tests and integration tests to ensure proper functionality. The new parameters can be used when calling the execute and fetch methods. For example, with a SparkSqlBackend instance spark_backend, you can execute a SQL statement in a specific catalog and schema with the following code: spark_backend.execute("SELECT * FROM my_table", catalog="my_catalog", schema="my_schema"). Similarly, the fetch method can also be used with the new parameters.

Contributors: @FastLee

v0.3.1

02 Apr 16:16
@nfx nfx
155bea0
Compare
Choose a tag to compare
  • Check UCX and LSQL for backwards compatibility (#78). In this release, we introduce a new GitHub Actions workflow, downstreams.yml, which automates unit testing for downstream projects upon changes made to the upstream project. The workflow runs on pull requests, merge groups, and pushes to the main branch and sets permissions for id-token, contents, and pull-requests. It includes a compatibility job that runs on Ubuntu, checks out the code, sets up Python, installs the toolchain, and accepts downstream projects using the databrickslabs/sandbox/downstreams action. The job matrix includes two downstream projects, ucx and remorph, and uses the build cache to speed up the pip install step. This feature ensures that changes to the upstream project do not break compatibility with downstream projects, maintaining a stable and reliable library for software engineers.
  • Fixed Builder object has no attribute sdk_config error (#86). In this release, we've resolved a Builder object has no attribute sdk_config error that occurred when initializing a Spark session using the DatabricksSession.builder method. The issue was caused by using dot notation to access the sdk_config attribute, which is incorrect. This has been updated to the correct syntax of sdkConfig. This change enables successful creation of the Spark session, preventing the error from recurring. The DatabricksSession class and its methods, such as getOrCreate, continue to be used for interacting with Databricks clusters and workspaces, while the WorkspaceClient class manages Databricks resources within a workspace.

Dependency updates:

  • Bump codecov/codecov-action from 1 to 4 (#84).
  • Bump actions/setup-python from 4 to 5 (#83).
  • Bump actions/checkout from 2.5.0 to 4.1.2 (#81).
  • Bump softprops/action-gh-release from 1 to 2 (#80).

Contributors: @dependabot[bot], @nfx, @bishwajit-db, @william-conti

v0.3.0

27 Mar 13:28
@nfx nfx
073c922
Compare
Choose a tag to compare
  • Added support for save_table(..., mode="overwrite") to StatementExecutionBackend (#74). In this release, we've added support for overwriting a table when saving data using the save_table method in the StatementExecutionBackend. Previously, attempting to use the overwrite mode would raise a NotImplementedError. Now, when this mode is specified, the method first truncates the table before inserting the new rows. The truncation is done using the execute method to run a TRUNCATE TABLE SQL command. Additionally, we've added a new integration test, test_overwrite, to the test_deployment.py file to verify the new overwrite mode functionality. A new option, mode="overwrite", has been added to the save_table method, allowing for the existing data in the table to be deleted and replaced with the new data being written. We've also added two new test cases, test_statement_execution_backend_save_table_overwrite_empty_table and test_mock_backend_overwrite, to verify the new functionality. It's important to note that the method signature has been updated to include a default value for the mode parameter, setting it to append by default. This change does not affect the functionality and only provides a more convenient default behavior for users of the method.

Contributors: @william-conti

v0.2.5

26 Mar 08:59
@nfx nfx
8921e0f
Compare
Choose a tag to compare
  • Fixed PyPI badge (#72). In this release, we have implemented a fix to the PyPI badge in the README file of our open-source library. The PyPI badge displays the version of the package and serves as a quick reference for users. This fix ensures the accuracy and proper functioning of the badge, without involving any changes to the functionality or methods within the project. Software engineers can be assured that this update is limited to the README file, specifically the PyPI badge, and will not affect the overall functionality of the library.
  • Fixed no-cheat check (#71). In this release, we have made improvements to the no-cheat verification process for new code. Previously, the check for disabling the linter was prone to false positives when the string '# pylint: disable' appeared for reasons other than disabling the linter. The updated code now includes an additional filter to exclude the string CHEAT from the search, and the number of characters in the output is counted using the wc -c command. If the count is not zero, the script will terminate with an error message. This change enhances the accuracy of the no-cheat check, ensuring that the linter is being used correctly and that all new code meets our quality standards.
  • Removed upper bound on sqlglot dependency (#70). In this update, we have removed the upper bound on the sqlglot dependency version in the project's pyproject.toml file. Previously, the version constraint required sqlglot to be at least 22.3.1 but less than 22.5.0. With this modification, there will be no upper limit, enabling the project to utilize any version greater than or equal to 22.3.1. This change provides the project with the flexibility to take advantage of future bug fixes, performance improvements, and new features available in newer sqlglot package versions. Developers should thoroughly test the updated package version to ensure compatibility with the existing codebase.

Contributors: @nfx

v0.2.4

25 Mar 21:12
@nfx nfx
b840229
Compare
Choose a tag to compare
  • Fixed Builder object is not callable error (#67). In this release, we have made an enhancement to the Backends class in the databricks/labs/lsql/backends.py file. The DatabricksSession.builder() method call in the __init__ method has been changed to DatabricksSession.builder. This update uses the builder attribute to create a new instance of DatabricksSession without calling it like a function. The sdk_config method is then used to configure the instance with the required settings. Finally, the getOrCreate method is utilized to obtain a SparkSession object, which is then passed as a parameter to the parent class constructor. This modification simplifies the code and eliminates the error caused by treating the builder attribute as a callable object. Software engineers may benefit from this change by having a more streamlined and error-free codebase when working with the open-source library.
  • Prevent silencing of pylint (#65). In this release, we have introduced a new job, "no-lint-disabled", to the GitHub Actions workflow for the repository. This job runs on the latest Ubuntu version and checks out the codebase with a full history. It verifies that no new instances of code suppressing pylint checks have been added, by filtering the differences between the current branch and the main branch for new lines of code, and then checking if any of those new lines contain a pylint disable comment. If any such lines are found, the job will fail and print a message indicating the offending lines of code, thereby ensuring that the codebase maintains a consistent level of quality by not allowing linting checks to be bypassed.
  • Updated _SparkBackend.fetch() to return iterator instead of list (#62). In this release, the fetch() method of the _SparkBackend class has been updated to return an iterator instead of a list, which can result in reduced memory usage and improved performance, as the results of the SQL query can now be processed one element at a time. A new exception has been introduced to wrap any exceptions that occur during query execution, providing better debugging and error handling capabilities. The test_runtime_backend_fetch() unit test has been updated to reflect this change, and users of the fetch() method should be aware that it now returns an iterator and must be consumed to obtain the desired data. Thorough testing is recommended to ensure that the updated method still meets the needs of the application.

Contributors: @nfx, @qziyuan, @bishwajit-db

v0.2.3

18 Mar 19:18
@nfx nfx
9b5450e
Compare
Choose a tag to compare
  • Added support for common parameters in StatementExecutionBackend (#59). The StatementExecutionBackend class in the databricks.labs.lsql package's backends.py file now supports the passing of common parameters through keyword arguments (kwargs). This enhancement allows for greater customization and flexibility in the backend's operation, as the kwargs are passed to the StatementExecutionExt constructor. This change empowers users to control the behavior of the backend, making it more adaptable to various use cases. The key modification in this commit is the addition of the **kwargs parameter in the constructor signature and passing it to StatementExecutionExt, with no changes made to any methods within the class.

Contributors: @bishwajit-db

v0.2.2

15 Mar 13:38
@nfx nfx
31684a2
Compare
Choose a tag to compare
  • Updating packages. In this update, the dependencies specified in the pyproject.toml file have been updated to more recent versions. The outdated packages "databricks-labs-blueprint~=0.4.0" and "databricks-sdk~=0.21.0" have been replaced with "databricks-labs-blueprint>=0.4.2" and "databricks-sdk>=0.22.0", respectively. These updates are expected to bring new features and bug fixes to the software. The dependency sqlglot remains unchanged, with the same version requirement range of "sqlglot>=22.3.1,<22.5.0". These updates ensure that the software will function as intended, while also taking advantage of the enhancements provided by the more recent versions of the packages.

Contributors: @william-conti