1059 timestamp bug #1065

IIaKyJIuH · 2023-03-13T09:37:59Z

Comparison of different approaches:

IIaKyJIuH · 2023-03-13T09:39:36Z

В Numpy-стратегии предполагается, что временной 'unit' - это наносекунды, в моём решении тоже так предполагается, то есть сохраняется поведение.

Вопрос такой: нормально ли вообще использовать такой юнит, или это должны быть хотя бы миллисекунды или просто секунды?

nicl-nno · 2023-03-13T10:07:38Z

Может сразу снабдить тестом, воспроизврдящим проблему?

MorrisNein · 2023-03-13T10:15:23Z

@IIaKyJIuH, мне кажется, что временное разрешение нужно выбирать на основании минимального шага значений в данных.

Возможно, с запасом в десятичный знак для новых данных. Препроцессинг ведь сможет запомнить выбранное преобразование?

Если это сложно реализовать сейчас, можем в рамках этого PR брать самый малый шаг (нс) и не париться. Правда, это не экономно с точки зрения памяти, поэтому можем завести ещё issue на адаптивное преобразование.

fedot/api/api_utils/data_definition.py

IIaKyJIuH · 2023-03-13T12:20:20Z

Может сразу снабдить тестом, воспроизврдящим проблему?

Примерно такого вида тесты напрашиваются. Допилю ещё кейсов, надо подумать.

Думаю, может быть стоит поместить тесты в integration?
Стоит ли ещё добавить проверку в NumpyStrategy случай, когда в np.ndarray попадают pd.Timestamp объекты, или это уже совсем нереальный кейс? Обработки такого случая в NumpyStrategy нет, кстати говоря.
Возможно, этот случай в препроцессоре уже обрабатывается, надо будет уточнить.

nicl-nno · 2023-03-13T19:55:44Z

Думаю, может быть стоит поместить тесты в integration?

Думаю лучше пусть будут в модульных. Можно их проводить, без эволюции, с predefined model.

test/unit/data_operations/test_data_definition.py

IIaKyJIuH · 2023-03-16T08:24:27Z

@MorrisNein натолкнул меня на мысль, что нужно вообще изменить подход к обработке datetime фичей.
Оставлять их как есть в виде даты плохо, но и переводить во float тоже неинформативно и как будто сложнее. Особенно когда речь про ts задачи.

Вижу смысл покомпонентно отделять дату на часы, дни, месяцы, года...
Вопрос, есть ли смысл добавлять ключ API какой-то для обработки datetime-фичей? Ключ будет означать тип кодирования: просто во float, покомпонентно, или же ещё как-то. Или всегда лучше их отделять на компоненты и кодировать уже категориально?

aPovidlo · 2023-03-16T12:37:50Z

@MorrisNein натолкнул меня на мысль, что нужно вообще изменить подход к обработке datetime фичей. Оставлять их как есть в виде даты плохо, но и переводить во float тоже неинформативно и как будто сложнее. Особенно когда речь про ts задачи.

Может для разных задач, по разному предобработать данные?

Вижу смысл покомпонентно отделять дату на часы, дни, месяцы, года... Вопрос, есть ли смысл добавлять ключ API какой-то для обработки datetime-фичей? Ключ будет означать тип кодирования: просто во float, покомпонентно, или же ещё как-то. Или всегда лучше их отделять на компоненты и кодировать уже категориально?

Покомпонентное разделение вполне разумный шаг, но почему категориально? Это же может сказаться на эффективности моделей. Им же с числами легче работать, чем с категориями (если это не катбуст).

aim-pep8-bot · 2023-03-22T06:15:11Z

Hello @IIaKyJIuH! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2023-04-24 09:00:35 UTC

codecov · 2023-04-05T13:51:22Z

Codecov Report

Attention: Patch coverage is 94.59459% with 2 lines in your changes missing coverage. Please review.

Project coverage is 87.79%. Comparing base (83a395a) to head (a5cb31c).
Report is 115 commits behind head on master.

Files with missing lines	Patch %	Lines
fedot/api/api_utils/data_definition.py	75.00%	1 Missing ⚠️
fedot/core/data/data.py	95.23%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1065      +/-   ##
==========================================
- Coverage   87.81%   87.79%   -0.03%     
==========================================
  Files         132      132              
  Lines        9355     9372      +17     
==========================================
+ Hits         8215     8228      +13     
- Misses       1140     1144       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

IIaKyJIuH · 2023-04-11T13:30:41Z

Провёл тесты качества на проблемном датасете из issue. Использовал catboost в качестве модели. В 2 из 3 случаев наносекунды по качеству совпадают с миллисекундами (скрин):

Почему так - неизвестно. В остальном количестве случаев качество между всеми моделями идентичное. Получается, миллисекунды можно ставить, хуже не будет.

aPovidlo · 2023-04-16T09:11:39Z

Почему так - неизвестно. В остальном количестве случаев качество между всеми моделями идентичное. Получается, миллисекунды можно ставить, хуже не будет.

Предположу, что возможно это связанно с фиксированием в RandomStateHandler у кб или подтягиванием одной и той же модели из кеша

IIaKyJIuH · 2023-04-17T10:30:44Z

Почему так - неизвестно. В остальном количестве случаев качество между всеми моделями идентичное. Получается, миллисекунды можно ставить, хуже не будет.

Предположу, что возможно это связанно с фиксированием в RandomStateHandler у кб или подтягиванием одной и той же модели из кеша

Сид передаю определённый, т.е. он устанавливается воспроизводимый и одинаковый всегда.
Кеш я вообще отключил, так что тут сложнее)

IIaKyJIuH · 2023-04-20T14:53:27Z

Уже намеревался смёрджить ветку, как вдруг задумался и понял, что контракт на use_input_preprocessing в моих текущих правках не выполняется.
Передавать его глубоко в DataDefiner'ы выглядит неэстетично (уже попробовал), нужно засунуть эту конвертацию в "обязательную предобработку", которая как раз подвязана на тот параметр.

aPovidlo · 2023-04-21T14:47:36Z

fedot/preprocessing/preprocessing.py

+        # Convert datetime data to numerical
+        data.features = np_datetime_to_numeric(data.features)
+        if data.target is not None:
+            data.target = np_datetime_to_numeric(data.target)  # TODO: А нужно ли конвертировать таргет из даты в число?


А этот вопрос не относится к этому PR?

Вопрос относится к этому PR.
Кажется, что кейс с датами в таргете редкий, но вероятен же?

test/unit/data_operations/test_text_preprocessing.py

fedot/core/data/data.py

deleted redundant files corrected typos simplifeid code remove redundant correct pep8 issues add example add mutations correct visualization fiting process improve visualization add partial in solver.py add example in .py format examples/confidence_intervals move prediction intervals in core/pipelines delete old examples add unit tests Refactoring of ApiParams and ApiMetrics (aimclub#1041) * WIP refactor ApiParams * Remove explicit ApiParams initialization * Move all params initialization to ApiParams * Minor changes * Remove _divide_params * WIP create ApiParamsBuilder * Add ApiParamsBuilder * Minor * Rename history_folder to history_dir * Remove train_data from ApiParams * Fixes after rebase * Move obtain_metric to ApiMetrics * Fix plot_pareto * Fix Fedot.tune * Refactor Fedot.get_metrics * Fix Fedot.tune * Fix metric names * Fixes after rebase * Remove ApiParamsBuilder * Structure parameters in Fedot docstring * Refactor init_composer_requirements * Refactor init_optimizer_params * Refactor init_optimizer_params * Fix docstrings * Add tests for ApiParamsRepository * Minors * Fix test_api_params * Review fixes * Review fixes * Review fixes Hotfix of pipeline import export example (aimclub#1064) meta rules (aimclub#1057) * initial assumption * final architecture * add meta rules * minor * simplify * add meta rule * fix with cv folds * minors * fix types * minor * golem fixes * add log messages * pep8 * remove log file Fix initial assumptions as list of pipelines (aimclub#1070) `Fedot(..., initial_assumption=...)` is expected to get a sequence of pipelines and pass them as initial graphs to an optimizer via composer. Moreover, FEDOT itself generates more than one initial assumption by default. As the result of this bug, composer passed only one of the initial assumptions to an optimizer. This PR fixes the bug and adds the corresponding test. tests fix (aimclub#1073) * remove some assumptions * update requirements * fix pep8 * update to golem Docs updated, badges added (aimclub#1072) * Docs updated, badges added * Mirror workflow fix Improve API documentation (aimclub#1067) - Moved type hints from method headers to corresponding parameters. - Allowed referencing GOLEM objects in FEDOT documentation. - Replaced all url links to documentation pages with sphinx references - it fixed some broken links. - FEDOT now uses its own directory for cache, instead of using GOLEM folder. - Documented `**composer_tuner_params` of `Fedot` with type hints and default values. All parameters with no usage examples are placed to the separate issue aimclub#1076 minor meta fix (aimclub#1078) * minor fix * indent fix * minor Add catboost to default initial assumptions (aimclub#1081) * add catboost to default initial assumptions for classification * restrict mutating loss function in CatBoost * evaluate f1 as expected in the example * pep8 F1 averaging fix (aimclub#1083) Minor logging fix (aimclub#1082) * fix print instead of logging for memory consumption 1059 timestamp bug (aimclub#1065) 879 FEDOT features (aimclub#1075) describe framework's features add example for surrogate optimizer (aimclub#1085) Example with surrogate optimizer was added. External parameters field was removed from api (now we should use partial) related pull request in GOLEM aimclub/GOLEM#82 has_one_root fix (aimclub#1091) * has_one_root fix * test fix Remove outdated test handled in thegolem (aimclub#1101) 358 Reduce execution time for unit tests (aimclub#1098) Update RTD benchmarks tabular data page (aimclub#1099) * +csv support Golem update requirements (aimclub#1088) * Upd RemoveType in Advisor (golem sync) * Upd requirements.txt for stable GOLEM * Fix few imports Add ts bench (aimclub#1104) Add results from ts benchmark Release 0.7.1 and test workflow updates (aimclub#1105) * Upd release version * Add pre-release tests actions on 'release' branch; Disable auto-publish * Upd GOLEM version * Remove manual-build.yml (dup of integration-build.yml) * Add integration tests badge to README * Revert "Upd GOLEM version" (for PR in master) This reverts commit 257ff16. Hotfix some integration tests for release 071 (aimclub#1107) * Fix integration test of ApiParams * Fix integration test of composition_time * Fix integration test of metocean_forecasting * Fix integration test of nemo_multiple.py * Workaround for sqlite exception raised in tests * pep8 fixes * fix different seed in quality imporvement tests * simplify condition * remove test that barely tested anything * fix condition * remove seed from example * fix typo parallel cache files test fix (aimclub#1109) Add IOptTuner (aimclub#1102) * Refactor search space * Fix ParametersChanges * Redact tuner builder * PEP 8 * Fix examples and correct tuning docs * Add IOpt example to docs * Add simple IOpt example * Fix integration tests (6 steal not work) * Fix integration test test_tuner_builder_with_custom_params (6 steal not work) * Fix pep8 * Add tuners comparison in docs * Update golem version in requirements * Fix warn_requirements * Fix warn_requirements * Fix table in tuning docs * Fix credit_scoring_problem_multiobj.py * Fix extra requirements * Edit docs * Fix table in docs * Change requirements * Test requirements * Set stable branch + h2o benchmark's tabular data values (aimclub#1106) * +h2o vals * change csv to html * specified table uuid * change max values style * add df to html converter --------- Co-authored-by: nicl-nno <nicl.nno@gmail.com> Added results for multimodal benchmark into FEDOT.docs (aimclub#1115) * - added results for multimodal benchmark ... add docstrings and type hints add prediction intervals unit test simplified solvers code correct pep issues add class defining PredictionIntervals params fix MutationStrength import issue correct test_data path update imports updated unit tests correct type-hint small corrections take short ts for unit test update pred_int test change ql_models to 'max' in pred_ints test update params for ql method ... update example update tests add base_quantiles visualization fix import get_base_quantiles update examples and deleted ql from tests update pep pep issues pep pep update requirements.txt

IIaKyJIuH requested a review from aPovidlo March 13, 2023 09:37

IIaKyJIuH linked an issue Mar 13, 2023 that may be closed by this pull request

Bug: Timestamp cause crush in Preprocessing #1059

Closed

MorrisNein self-requested a review March 13, 2023 10:28

aPovidlo reviewed Mar 13, 2023

View reviewed changes

fedot/api/api_utils/data_definition.py Outdated Show resolved Hide resolved

MorrisNein requested changes Mar 14, 2023

View reviewed changes

test/unit/data_operations/test_data_definition.py Outdated Show resolved Hide resolved

MorrisNein reviewed Mar 15, 2023

View reviewed changes

test/unit/data_operations/test_data_definition.py Outdated Show resolved Hide resolved

IIaKyJIuH added 13 commits April 5, 2023 13:09

offer fıxes

8dacb6f

add remarkable solution

53103af

come to the most appropriate solution

d6e646c

format variants

132f3f6

+date conversion tests

4c92a10

monkeypatched untestable function of strategies

f43beef

+conversions to int ms

4f9d060

exclude copying idx array if it is already numpy

938bfe2

lint fixes (1)

65b5771

+numpy strategy dates handler

2aeb2da

extend strategies test

8fa77e7

upd numpy's strategy

27149de

upd strategies test

98aba06

IIaKyJIuH force-pushed the 1059-timestamp-bug branch from fcaacef to 98aba06 Compare April 5, 2023 13:34

IIaKyJIuH added 6 commits April 12, 2023 11:35

made separate union testable function

5ce0438

use separate function&extend strategies

579ea3d

simplified datetime conversion

302f876

preserve original shape of data

22b94b5

fix 'field' variable shadowing

87d50ac

place module level imports at the top

15a39b8

IIaKyJIuH requested review from aPovidlo and MorrisNein April 12, 2023 10:56

MorrisNein approved these changes Apr 20, 2023

View reviewed changes

aPovidlo approved these changes Apr 20, 2023

View reviewed changes

IIaKyJIuH added 6 commits April 20, 2023 18:10

rid of datetime conversion in definer

4632edd

place datetime conversion in preprocessor

6a304ef

preserve original dtype in datetime converter

9ec154d

update tests

2aa0282

reorganize imports

533e585

clarify output dtype

9d6a8bf

IIaKyJIuH requested review from MorrisNein and aPovidlo April 21, 2023 14:40

aPovidlo approved these changes Apr 21, 2023

View reviewed changes

MorrisNein approved these changes Apr 21, 2023

View reviewed changes

test/unit/data_operations/test_text_preprocessing.py Show resolved Hide resolved

fedot/core/data/data.py Outdated Show resolved Hide resolved

IIaKyJIuH added 2 commits April 21, 2023 21:02

simplified datetime division

378b2a0

remove TODO

a5cb31c

IIaKyJIuH merged commit aa39918 into master Apr 24, 2023

IIaKyJIuH deleted the 1059-timestamp-bug branch April 24, 2023 10:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1059 timestamp bug #1065

1059 timestamp bug #1065

IIaKyJIuH commented Mar 13, 2023 •

edited

Loading

IIaKyJIuH commented Mar 13, 2023

nicl-nno commented Mar 13, 2023

MorrisNein commented Mar 13, 2023 •

edited

Loading

IIaKyJIuH commented Mar 13, 2023

nicl-nno commented Mar 13, 2023

IIaKyJIuH commented Mar 16, 2023

aPovidlo commented Mar 16, 2023

aim-pep8-bot commented Mar 22, 2023 •

edited

Loading

codecov bot commented Apr 5, 2023 •

edited

Loading

IIaKyJIuH commented Apr 11, 2023

aPovidlo commented Apr 16, 2023

IIaKyJIuH commented Apr 17, 2023

IIaKyJIuH commented Apr 20, 2023

aPovidlo Apr 21, 2023

IIaKyJIuH Apr 21, 2023

1059 timestamp bug #1065

1059 timestamp bug #1065

Conversation

IIaKyJIuH commented Mar 13, 2023 • edited Loading

IIaKyJIuH commented Mar 13, 2023

nicl-nno commented Mar 13, 2023

MorrisNein commented Mar 13, 2023 • edited Loading

IIaKyJIuH commented Mar 13, 2023

nicl-nno commented Mar 13, 2023

IIaKyJIuH commented Mar 16, 2023

aPovidlo commented Mar 16, 2023

aim-pep8-bot commented Mar 22, 2023 • edited Loading

Comment last updated at 2023-04-24 09:00:35 UTC

codecov bot commented Apr 5, 2023 • edited Loading

Codecov Report

IIaKyJIuH commented Apr 11, 2023

aPovidlo commented Apr 16, 2023

IIaKyJIuH commented Apr 17, 2023

IIaKyJIuH commented Apr 20, 2023

aPovidlo Apr 21, 2023

Choose a reason for hiding this comment

IIaKyJIuH Apr 21, 2023

Choose a reason for hiding this comment

IIaKyJIuH commented Mar 13, 2023 •

edited

Loading

MorrisNein commented Mar 13, 2023 •

edited

Loading

aim-pep8-bot commented Mar 22, 2023 •

edited

Loading

codecov bot commented Apr 5, 2023 •

edited

Loading