Fuse the functionality used in both _merge_histogram
and the newly created _assimilate_histogram
#838
Labels
contribution_day
Medium Priority
Significant improvement or bug / feature reducing overall performance
Refactor
Code that is being modified to improve the library
Is your feature request related to a problem? Please describe.
In an effort to adhere to the goal of achieving a clear paradigm of one, easy to understand, path for each of the following tasks for profiling:
Updating, Getting, and Merging
This issue focuses on clearing up the path to defining how to merge a profile (or parts of a profile) with a singular function path to achieving this goal.
The problem this issue addresses is the use of both
_merge_histogram
and the newly created_assimilate_histogram
as well as other merging processes within the dataprofiler that repeat functionality/have overlapping goals for input and output.An example of a fix for achieving this paradigm is as follows:
We have implemented a much better way to put information from two histograms together with the creation of
_assimilate_histogram
and we should be able to use that function throughout the code while also achieving the previously desired functionality of _merge_histograms. We can see the old way of doing this innumerical_column_stats.py
on line 1286. This recreates the histogram data which is more memory intensive than doing it the way we do in_assimilate_histogram
.Describe the outcome you'd like:
I would like a singular path to merging profiles and their information that achieves the success of all currently existing functions usage.
Additional context:
For detail behind
_assimilate_histogram
the PR:#815
Implements the more memory optimized solution
The text was updated successfully, but these errors were encountered: