Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] allow to seal table and schemas #40

Closed
rudolfix opened this issue Jun 27, 2022 · 1 comment
Closed

[core] allow to seal table and schemas #40

rudolfix opened this issue Jun 27, 2022 · 1 comment
Assignees

Comments

@rudolfix
Copy link
Collaborator

rudolfix commented Jun 27, 2022

Sealing a table or the whole schema means that:

  1. table / schema definition is immutable
  2. the data that cannot be coerced into exiting tables and columns will be dropped (filtered out)
  3. provide an option to load the "bad data" to separate tables (ie. as JSON blob)

The difficulties

  • to seal the schema, it must be known. we typically let the dlt to infer the schema. are there any requirements to make the sealing easier or we just require to modify the schema at runtime

mimimum requirements from Adrian:

At the very least, we should be able to toggle "contract_mode=On" (let's use something relatable rather than seal?)
In this mode,

  • If there is nothing yet loaded, schema can evolve and be created
  • if there is something already loaded, schema may not evolve
  • this mode can be toggled on/off to allow temporary evolution

What should happen when schema is not allowed to evolve?

  • Any operation that would cause additions to the original schema should fail
  • the data should just not be loaded
  • any operations where the performance hints would change should fail. This includes keys, performance, and nullable hints, basically all changes
  • this does not relate to dlt's normaliser - it is expected that this normaliser types the data and normalises it - this refers to the schema only.

Open questions by @sh-rp :

  • How does this integrate with providing a schema.yaml
  • Should we enable sealing / freezing individual table chains with this PR? If so, how should we do it? Via the resource decorator and if so, does this get saved into the stored schema?
  • If we load "bad data" into an additional destination, should we store the complete data there?
  • Should the trace indicate wether the schema is sealed and should we maybe add schema change output info to the trace? This would be very nice for the user playing with dlt to see what is going on under the hood imho.
@rudolfix rudolfix self-assigned this Jun 27, 2022
@rudolfix rudolfix mentioned this issue Jun 27, 2022
11 tasks
@rudolfix rudolfix changed the title allow to seal table and schemas [core] allow to seal table and schemas Jul 8, 2022
@rudolfix
Copy link
Collaborator Author

closed with #135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant