Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to allow programatic conversion #238

Open
osopardo1 opened this issue Nov 28, 2023 · 0 comments
Open

API to allow programatic conversion #238

osopardo1 opened this issue Nov 28, 2023 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@osopardo1
Copy link
Member

Right now, if we have a table in Parquet or Delta and we want to transform it to Qbeast we can do two operations:

  • Full Rewrite of the table (which is done by the Spark API)
  • Convert to Qbeast command

The first one might take a lot of time and resources, while the second is a more simple approach. It works as follows:

  1. User calls ConvertToQbeast with the columnsToIndex and cubeSize parameters.
  2. If the format of the target table to convert is Parquet: convert to Delta First.
  3. If the format of the target table is Delta: go to the next step.
  4. Adds the required information in the Commit Log with the metadata specified in the first step.

The command does not rewrite any existing data by itself, it only indexes the newly appended data.

All the files that previously existed in the log are targeted as staging area and all of them (logically) belong to the root.

The idea is to find a way of rewriting a batch of those files without having to externalize part of the code in an application.

It can be to a ConvertToQbeast API, a new command called RewriteQbeastFiles, or even a parameter in Optimize command. What do you think? @cugni @alexeiakimov @Jiaweihu08

@osopardo1 osopardo1 added enhancement New feature or request help wanted Extra attention is needed labels Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant