Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

Make data-diff faster when there are lots of differences #96

Closed
sirupsen opened this issue Jun 22, 2022 · 2 comments
Closed

Make data-diff faster when there are lots of differences #96

sirupsen opened this issue Jun 22, 2022 · 2 comments
Labels
performance stale Issues/PRs that have gone stale

Comments

@sirupsen
Copy link
Contributor

Today, one of the caveats of data-diff is that it's going to be significantly slower if you have a lot of differences, because we'll be checksumming so many segments repeatedly as we try to find the columns. I'm not exactly sure what the best solution is, but it likely entails a threshold of differences in earlier segments that cause us to increase the --bisection-threshold

@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2023

This issue has been marked as stale because it has been open for 60 days with no activity. If you would like the issue to remain open, please comment on the issue and it will be added to the triage queue. Otherwise, it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues/PRs that have gone stale label Jun 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment and it will be reopened for triage.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance stale Issues/PRs that have gone stale
Projects
None yet
Development

No branches or pull requests

1 participant