-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical Stop Condition Options #808
Categorical Stop Condition Options #808
Conversation
…stop_condition to CategoricalOptions
max_sample_size_to_check_stop_condition: int = None, | ||
stop_condition_unique_value_ratio: float = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typing should have the | None
max_sample_size_to_check_stop_condition: int | None = None
same for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed
@@ -11,15 +11,66 @@ class TestCategoricalOptions(TestBaseInspectorOptions): | |||
def test_init(self): | |||
option = self.get_options() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another PR, but this is naming I should have caught before:
get_options
-> set_and_retrieve_option_values
or something. Needs to be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_options
is confusing when we set a value in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -32,6 +83,8 @@ def test_set(self): | |||
params_to_check = [ | |||
dict(prop="is_enabled", value_list=[False, True]), | |||
dict(prop="top_k_categories", value_list=[None, 3]), | |||
dict(prop="max_sample_size_to_check_stop_condition", value_list=[None, 20]), | |||
dict(prop="stop_condition_unique_value_ratio", value_list=[None, 2]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suggest adding a float here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
[expected_max_sample_size_type_error], options._validate_helper() | ||
) | ||
|
||
# These ones should throw a max_sample_size_type_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment should be for unique value I think
options = self.get_options( | ||
max_sample_size_to_check_stop_condition=0, | ||
stop_condition_unique_value_ratio=0.0, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the condition we discussed before, it might be good to make one of these a value. and one 0 to test that condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could do this as
stop_condition_unique_value_ratio=0.5,
and in check above
set stop_condition_unique_value_ratio=0.0
so we don't need more tests to do the same checks
if self.stop_condition_unique_value_ratio is not None and ( | ||
not isinstance(self.stop_condition_unique_value_ratio, float) | ||
or self.stop_condition_unique_value_ratio < 0 | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is a ratio we need to: 0 <= stop_condition_unique_value_ratio <= 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I knew it didn't makes sense for a hard core < 0
dataprofiler/tests/profilers/profiler_options/test_categorical_options.py
Show resolved
Hide resolved
@@ -32,6 +83,8 @@ def test_set(self): | |||
params_to_check = [ | |||
dict(prop="is_enabled", value_list=[False, True]), | |||
dict(prop="top_k_categories", value_list=[None, 3]), | |||
dict(prop="max_sample_size_to_check_stop_condition", value_list=[None, 20]), | |||
dict(prop="stop_condition_unique_value_ratio", value_list=[None, 2]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
8c0cf87
into
capitalone:feature/memory-optimization
…stop_condition to CategoricalOptions (#808)
* [WIP] Part 1 fix for categorical mem opt issue (#795) * part_1 of fix for mem optimization for categoical dict creation issue * precommit fix * Separated the update from the check in stop conditions for categoical columns * added tests and accounted for different varaibles affected by the change made to categories attribute * Modifications to code based on test findings * Fixes for logic and tests to match requirements from PR * Fix for rebase carry over issue * fixes for tests because of changes to variable names in categorical column object * precommit fixes and improvement of code based on testing * added stop_condition_unique_value_ratio and max_sample_size_to_check_stop_condition to CategoricalOptions (#808) * implementation of setting stop conds via options for cat column profiler (#810) * Space time analysis improvement (#809) * Made space time analysis code improvements (detect if dataset is already generated, specify cats to generate) * Modified md file to account for new variable in space time analysis code * fix: cat bug (#816) * hotfix for more conservatitive stop condition in categorical columns (#817) * [WIP] Fix for histogram merging (#815) * rough draft of merge fix for histograms * final fixes for passing of existing tests * Added option to remove calculations for updating row statistics (#827) * Fix to doc strings (#829) * Preset Option Fix: presets docsstring added (#830) * presets docsstring added * Update dataprofiler/profilers/profiler_options.py * Update dataprofiler/profilers/profiler_options.py Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * Update dataprofiler/profilers/profiler_options.py * Update dataprofiler/profilers/profiler_options.py --------- Co-authored-by: Taylor Turner <taylorfturner@gmail.com> --------- Co-authored-by: ksneab7 <91956551+ksneab7@users.noreply.github.com> Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> Co-authored-by: JGSweets <JGSweets@users.noreply.github.com>
No description provided.