Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added option to remove calculations for updating row statistics #827

Conversation

ksneab7
Copy link
Contributor

@ksneab7 ksneab7 commented May 19, 2023

No description provided.

@ksneab7 ksneab7 force-pushed the create_new_options_for_unique_calcs branch from 5a7febb to ed25e65 Compare May 19, 2023 13:28
@taylorfturner taylorfturner enabled auto-merge (squash) May 19, 2023 13:57
@taylorfturner taylorfturner added High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable New Feature A feature addition not currently in the library labels May 19, 2023
@taylorfturner taylorfturner merged commit 372365a into capitalone:feature/memory-optimization May 19, 2023
5 checks passed
JGSweets added a commit that referenced this pull request May 23, 2023
* [WIP] Part 1 fix for categorical mem opt issue (#795)

* part_1 of fix for mem optimization for categoical dict creation issue

* precommit fix

* Separated the update from the check in stop conditions for categoical columns

* added tests and accounted for different varaibles affected by the change made to categories attribute

* Modifications to code based on test findings

* Fixes for logic and tests to match requirements from PR

* Fix for rebase carry over issue

* fixes for tests because of changes to variable names in categorical column object

* precommit fixes and improvement of code based on testing

* added stop_condition_unique_value_ratio and max_sample_size_to_check_stop_condition to CategoricalOptions (#808)

* implementation of setting stop conds via options for cat column profiler (#810)

* Space time analysis improvement (#809)

* Made space time analysis code improvements (detect if dataset is already generated, specify cats to generate)

* Modified md file to account for new variable in space time analysis code

* fix: cat bug (#816)

* hotfix for more conservatitive stop condition in categorical columns (#817)

* [WIP] Fix for histogram merging (#815)

* rough draft of merge fix for histograms

* final fixes for passing of existing tests

* Added option to remove calculations for updating row statistics (#827)

* Fix to doc strings (#829)

* Preset Option Fix: presets docsstring added (#830)

* presets docsstring added

* Update dataprofiler/profilers/profiler_options.py

* Update dataprofiler/profilers/profiler_options.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* Update dataprofiler/profilers/profiler_options.py

* Update dataprofiler/profilers/profiler_options.py

---------

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

---------

Co-authored-by: ksneab7 <91956551+ksneab7@users.noreply.github.com>
Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>
Co-authored-by: JGSweets <JGSweets@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable New Feature A feature addition not currently in the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants