Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measuring Loss of Data when cutting an edge to remove a cycle #23

Open
raresboza opened this issue Aug 19, 2021 · 1 comment
Open

Measuring Loss of Data when cutting an edge to remove a cycle #23

raresboza opened this issue Aug 19, 2021 · 1 comment

Comments

@raresboza
Copy link

Greetings,
I was reading the following article on subsetting:
https://www.tonic.ai/blog/condenser-a-database-subsetting-tool

I don't exactly understand what the faults are at dropping a cycle from a database. Of course, one loses data when doing so, but is the same amount of data lost irrespective of where you cut the cycle? How could one measure that? What are some of the criteria that affect it?

@theaeolianmachine
Copy link
Contributor

Hi @raresboza, I'm not sure I understand your question — condenser is setup to handle dependency breaks wherever it best makes sense, but realistically all it does is shoves NULLs in the column in question. Otherwise a cycle would ultimately cause all of the data to be grabbed within tables in the cycle in some cases, and we'd certainly not be able to peform a true topological sort.

Ultimately it comes down to whatever column you find less valuable in order to determine where to make the break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants