Merged
Conversation
tobymao
reviewed
Apr 14, 2023
tobymao
reviewed
Apr 14, 2023
tobymao
reviewed
Apr 14, 2023
tobymao
reviewed
Apr 14, 2023
tobymao
reviewed
Apr 14, 2023
tobymao
reviewed
Apr 14, 2023
Contributor
|
this looks great, let's add some unit tests |
Contributor
|
you can run the linter with
and the rest of the tests with
|
tobymao
reviewed
Apr 14, 2023
sqlmesh/utils/file.py
Outdated
Contributor
There was a problem hiding this comment.
is it trivial to add gzip?
Contributor
There was a problem hiding this comment.
tobymao
reviewed
Apr 14, 2023
d602e8c to
8a059eb
Compare
Contributor
|
nice work! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request introduces support for fsspec and transactional operations over files. It aims to introduce a new state adapter,
FileAdapterStateSync.Fsspec is a Python library that provides a unified interface to access various filesystems. With this addition, sqlmesh can now interact with different file systems, such as local and cloud-based (GCS, S3, Azure), using a consistent API.
Moreover, the pull request includes the implementation of transactional support over files within the filesystem via a wrapper class. Transactions over files are theoretically at a serializable-level of isolation with built-in rollbacks.
Motivation
The current implementation of sqlmesh lacks reliable state storage outside of persistent OLTP services. Currently it either uses your data warehouse which is not ideal for the transactional integrity required or it piggybacks off of Airflows OLTP database. This can make it challenging for users to work with different systems (IE not Airflow) to ensure data consistency. Furthermore it prevents a path to a "stateless" execution model where no "stateful" services (like an OLTP database) are required meaning we can run sqlmesh reliably (like we would with Terraform with a cloud backend) on anything, anywhere. This includes Lambda, Cloud functions, Drone, Github Actions and so on without an OLTP. It also lowers the barrier to entry since cloud storage is often easier to provision than another database.
With this pull request, we aim to provide a solution that allows sqlmesh to support different filesystems with a consistent API, while also providing transactional support for data consistency and reliability.
Changes Made
Related PRs
N/A
Todos
Testing
Will lean on core team for help with this.