-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fractional split duplication bug #24
Merged
xandaau
merged 3 commits into
MobileTeleSystems:dev
from
xandaau:fractional_split_duplication_bug
Feb 2, 2023
Merged
Fractional split duplication bug #24
xandaau
merged 3 commits into
MobileTeleSystems:dev
from
xandaau:fractional_split_duplication_bug
Feb 2, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report
@@ Coverage Diff @@
## dev #24 +/- ##
==========================================
- Coverage 84.23% 84.20% -0.03%
==========================================
Files 42 42
Lines 2988 2995 +7
==========================================
+ Hits 2517 2522 +5
- Misses 471 473 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
xandaau
added a commit
that referenced
this pull request
Feb 15, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in `scipy` way, notebooks updated (#31) * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * 0.3.0 version descr added to changelog --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 13, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Not executable Preprocessor data agg bug fixed --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 13, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Regression tests for duplicated ids for split --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 13, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Methods for convenient dumping added * Tests added for classes load and dump to yaml abilities * Docstring unified * linted --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 13, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Theoretical tools params order is fixed and parameters names are corrected * Decorators for parallelism added * Glabal namings for design result added * Binary result tables view modified * Parallelism parameters handling added * Changed bootstarap parallel strategy and other parameters * Changed parallellism, added group_ratio parameter, unified views of resulted tables * Func name corrected * Helpers modified to handle ``groups_size`` parameter * Designer class and func modified to handle all changes made Tests corrected * Fixed broken import * Bootstrap params modified, utility func moved to back tools * as_numeric parameter invokation changed * Binary approach output tables unified * Multiprocessing handling wrap func changed --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 16, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in `scipy` way, notebooks updated (#31) * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * 0.3.0 version descr added to changelog * optional spark pkg * fix Make for extras * remove lock * Linted --------- Co-authored-by: Хакимов Артем Валерьевич <artem.khakimov@gmail.com> Co-authored-by: Artem Khakimov <83234014+xandaau@users.noreply.github.com> Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 17, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in `scipy` way, notebooks updated (#31) * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * 0.3.0 version descr added to changelog * optional spark pkg * fix Make for extras * remove lock * spark tester * Relative effect for ttest_ind spark * Swap Spark Ttest groups for correct pvalue calc Remove errors in merge * Decrease effect in test * Back tools import fixed --------- Co-authored-by: Хакимов Артем Валерьевич <artem.khakimov@gmail.com> Co-authored-by: Artem Khakimov <83234014+xandaau@users.noreply.github.com> Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Docs and tutorials changed, part I * another doc and usage examples update * First errors naming unified to Designer * Deterministic bootstrap behaviour added * Tester CI MHC change denied * Configs added * Agg watched data dumped * docs .rst files updated * .nblinks to usage examples added * power func output corrected * Test for groups ratio and alternatives added * README is updated * Usage examples updated * CHANGELOG updated * update --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in `scipy` way, notebooks updated (#31) * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * 0.3.0 version descr added to changelog --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Not executable Preprocessor data agg bug fixed --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Regression tests for duplicated ids for split --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Methods for convenient dumping added * Tests added for classes load and dump to yaml abilities * Docstring unified * linted --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Theoretical tools params order is fixed and parameters names are corrected * Decorators for parallelism added * Glabal namings for design result added * Binary result tables view modified * Parallelism parameters handling added * Changed bootstarap parallel strategy and other parameters * Changed parallellism, added group_ratio parameter, unified views of resulted tables * Func name corrected * Helpers modified to handle ``groups_size`` parameter * Designer class and func modified to handle all changes made Tests corrected * Fixed broken import * Bootstrap params modified, utility func moved to back tools * as_numeric parameter invokation changed * Binary approach output tables unified * Multiprocessing handling wrap func changed --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in `scipy` way, notebooks updated (#31) * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * 0.3.0 version descr added to changelog * optional spark pkg * fix Make for extras * remove lock * Linted --------- Co-authored-by: Хакимов Артем Валерьевич <artem.khakimov@gmail.com> Co-authored-by: Artem Khakimov <83234014+xandaau@users.noreply.github.com> Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in `scipy` way, notebooks updated (#31) * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * 0.3.0 version descr added to changelog * optional spark pkg * fix Make for extras * remove lock * spark tester * Relative effect for ttest_ind spark * Swap Spark Ttest groups for correct pvalue calc Remove errors in merge * Decrease effect in test * Back tools import fixed --------- Co-authored-by: Хакимов Артем Валерьевич <artem.khakimov@gmail.com> Co-authored-by: Artem Khakimov <83234014+xandaau@users.noreply.github.com> Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
xandaau
added a commit
that referenced
this pull request
Apr 21, 2023
* tests actions on pull request added, test badge updated * Python version dependency limited to be <3.11 * Preprocessing enhancement (#22) * Preprocessing enhancement first iteration * check_cols call corrected * Robust classes improved, tests added * cuped structure improved, tests namings improved * Preprocessing usage example updated * MLVR class save load dict methods implemented * chain smoke test for preprocessor added * fix test * chain smoke test for preprocessor added * files linted * Docstrings improved * code decomposition * docstring and typing added * Old preprocessor is removed * docstrings improved * Temp class for logging added * docs parsing for preprocessing classes added and updated * .rst file modified * version updated * Robust Preprocessors and Metric Transformers separation (#23) * Fractional split duplication bug (#24) * Added check for id uniqueness in splitter * Dataframe id_column access fix * duplicated function removed * Fittable aggregate preprocessor (#25) * _check_columns method renamed * Fittable aggregatepreprocessor * aggregation tests and class improvement * fix binary theory with groups_ratio/alternative/stabilizing * Update README.rst Telegram channel link added * add docstrings and project dependencies * fix poetry version * fix imports [isort] && fix tests * fix linters[black] * Docstrings improved, method check added * fix bug forgot first error for binary design * Multiple alpha power design for binary intervals methods * Incorrect variables types changed * Storable Preprocessor and VarReduction classes changes (#29) * Storable Preprocessor and VarReduction classes changes * Unused dict with preproc classes removed * Alternatives namings changed in scipy way, notebooks updated * Imports fixed * Docs and tutorials changed, part I * another doc and usage examples update * First errors naming unified to Designer * Deterministic bootstrap behaviour added * Tester CI MHC change denied * Configs added * Agg watched data dumped * docs .rst files updated * .nblinks to usage examples added * power func output corrected * Test for groups ratio and alternatives added * README is updated * Usage examples updated * CHANGELOG updated * update --------- Co-authored-by: Байрамкулов Аслан Магомедович <ambajramku@www.mts.ru> Co-authored-by: Aslan Bayramkulov <aslan.bayramkulov96@gmail.com> Co-authored-by: Artem Vasin <a.vasin@ira-labs.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This bug was described at #10
Now
Splitter
is checking for duplicates inid_column
at each run. If there are any of these, an error will be raised.It seems logically correct that in almost all groups splitting problems, we want to have unique id for each object(row).
In special cases, one can create an additional column with a unique id, and it will be much simplier rather than handling issues with hashing, index search, etc inside
Splitter
methods.