Skip to content

Feat: File state adapter#711

Merged
tobymao merged 9 commits intoSQLMesh:mainfrom
z3z1ma:feat/blob-state-adapter
Apr 15, 2023
Merged

Feat: File state adapter#711
tobymao merged 9 commits intoSQLMesh:mainfrom
z3z1ma:feat/blob-state-adapter

Conversation

@z3z1ma
Copy link
Contributor

@z3z1ma z3z1ma commented Apr 14, 2023

Description

This pull request introduces support for fsspec and transactional operations over files. It aims to introduce a new state adapter, FileAdapterStateSync.

Fsspec is a Python library that provides a unified interface to access various filesystems. With this addition, sqlmesh can now interact with different file systems, such as local and cloud-based (GCS, S3, Azure), using a consistent API.

Moreover, the pull request includes the implementation of transactional support over files within the filesystem via a wrapper class. Transactions over files are theoretically at a serializable-level of isolation with built-in rollbacks.

Motivation

The current implementation of sqlmesh lacks reliable state storage outside of persistent OLTP services. Currently it either uses your data warehouse which is not ideal for the transactional integrity required or it piggybacks off of Airflows OLTP database. This can make it challenging for users to work with different systems (IE not Airflow) to ensure data consistency. Furthermore it prevents a path to a "stateless" execution model where no "stateful" services (like an OLTP database) are required meaning we can run sqlmesh reliably (like we would with Terraform with a cloud backend) on anything, anywhere. This includes Lambda, Cloud functions, Drone, Github Actions and so on without an OLTP. It also lowers the barrier to entry since cloud storage is often easier to provision than another database.

With this pull request, we aim to provide a solution that allows sqlmesh to support different filesystems with a consistent API, while also providing transactional support for data consistency and reliability.

Changes Made

  • Added fsspec as a dependency in the project
  • Implemented a FileTransactionHandler class to provide some level of serializable isolation
  • ...

Related PRs

N/A

Todos

  • Ensure that the documentation is complete and accurate
  • Address any feedback provided by the code reviewers
  • Ensure that the pull request follows the project's contribution guidelines

Testing

Will lean on core team for help with this.

@tobymao
Copy link
Contributor

tobymao commented Apr 14, 2023

this looks great, let's add some unit tests

@tobymao
Copy link
Contributor

tobymao commented Apr 14, 2023

you can run the linter with

make style

and the rest of the tests with

make test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it trivial to add gzip?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep should be!

@z3z1ma z3z1ma force-pushed the feat/blob-state-adapter branch from d602e8c to 8a059eb Compare April 14, 2023 23:41
@tobymao
Copy link
Contributor

tobymao commented Apr 15, 2023

nice work!

@tobymao tobymao enabled auto-merge (squash) April 15, 2023 02:00
@tobymao tobymao disabled auto-merge April 15, 2023 02:00
@tobymao tobymao merged commit 6249374 into SQLMesh:main Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants