Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add catboost to default initial assumptions #1081

Merged
merged 4 commits into from
Apr 9, 2023

Conversation

MorrisNein
Copy link
Collaborator

@MorrisNein MorrisNein commented Apr 7, 2023

Added CatBoost to FEDOT initial assumptions for classification.

For the sake of stability, restricted tuning of the model's loss function.

@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Merging #1081 (7322433) into master (8629d56) will decrease coverage by 0.39%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1081      +/-   ##
==========================================
- Coverage   88.93%   88.55%   -0.39%     
==========================================
  Files         132      132              
  Lines        9356     9356              
==========================================
- Hits         8321     8285      -36     
- Misses       1035     1071      +36     
Impacted Files Coverage Δ
...edot/api/api_utils/assumptions/task_assumptions.py 78.94% <ø> (ø)
fedot/core/pipelines/tuning/search_space.py 100.00% <ø> (ø)

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@MorrisNein MorrisNein self-assigned this Apr 7, 2023
@MorrisNein MorrisNein changed the title Add catboost to default initial assumptions for classification Add catboost to default initial assumptions Apr 7, 2023
@MorrisNein
Copy link
Collaborator Author

MorrisNein commented Apr 7, 2023

After running integration tests, got CatBoostError at test_multi_modal_example.

@nicl-nno, wdyt? Should we give it a further chance?

The full log & traceback:
FAILED      [ 81%]2023-04-07 22:52:15,256 - AssumptionsHandler - Memory consumption for fitting of the initial pipeline in main session: current 2.5 MiB, max: 16.2 MiB
2023-04-07 22:52:15,258 - ApiComposer - Initial pipeline was fitted in 3.0 sec.
2023-04-07 22:52:15,264 - ApiComposer - AutoML configured. Parameters tuning: False. Time limit: 10 min. Set of candidate models: ['qda', 'pca', 'scaling', 'bernb', 'knn', 'logit', 'dt', 'rf', 'normalization'].
2023-04-07 22:52:15,266 - ApiComposer - Pipeline composition started.
2023-04-07 22:52:50,299 - MultiprocessingDispatcher - 3 individuals out of 3 in previous population were evaluated successfully.
Generations:   0%|          | 1/10000 [01:15<?, ?gen/s]

test\integration\real_applications\test_examples.py:100 (test_multi_modal_example)
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\joblib\externals\loky\process_executor.py", line 428, in _process_worker
    r = call_item()
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\joblib\externals\loky\process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\joblib\_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\golem\core\optimisers\genetic\evaluation.py", line 258, in evaluate_single
    eval_res = super().evaluate_single(graph, uid_of_individual, with_time_limit, cache_key, logs_initializer)
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\golem\core\optimisers\genetic\evaluation.py", line 168, in evaluate_single
    fitness, graph = adapted_evaluate(graph)
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\golem\core\adapter\adapter.py", line 174, in adapted_fun
    result = fun(*adapted_args, **adapted_kwargs)
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\golem\core\optimisers\genetic\evaluation.py", line 181, in _evaluate_graph
    fitness = self._objective_eval(domain_graph)
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\optimisers\objective\data_objective_eval.py", line 74, in evaluate
    raise ex
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\optimisers\objective\data_objective_eval.py", line 66, in evaluate
    prepared_pipeline = self.prepare_graph(graph, train_data, fold_id, self._eval_n_jobs)
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\optimisers\objective\data_objective_eval.py", line 112, in prepare_graph
    graph.fit(
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\pipelines\pipeline.py", line 191, in fit
    train_predicted = self._fit(input_data=copied_input_data)
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\pipelines\pipeline.py", line 111, in _fit
    train_predicted = self.root_node.fit(input_data=input_data)
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\pipelines\node.py", line 198, in fit
    input_data = self._get_input_data(input_data=input_data, parent_operation='fit')
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\pipelines\node.py", line 275, in _get_input_data
    input_data = self._input_from_parents(input_data=input_data, parent_operation=parent_operation)
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\pipelines\node.py", line 301, in _input_from_parents
    parent_results, _ = _combine_parents(parent_nodes, input_data,
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\pipelines\node.py", line 393, in _combine_parents
    prediction = parent.fit(input_data=input_data)
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\pipelines\node.py", line 202, in fit
    self.fitted_operation, operation_predict = self.operation.fit(params=self._parameters,
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\operations\operation.py", line 85, in fit
    self.fitted_operation = self._eval_strategy.fit(train_data=data)
  File "C:\Users\petro\PycharmProjects\FEDOT\fedot\core\operations\evaluation\evaluation_interfaces.py", line 234, in fit
    operation_implementation.fit(train_data.features, train_data.target)
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\catboost\core.py", line 5128, in fit
    self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\catboost\core.py", line 2355, in _fit
    self._train(
  File "C:\Users\petro\PycharmProjects\FEDOT\venv\lib\site-packages\catboost\core.py", line 1759, in _train
    self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
  File "_catboost.pyx", line 4623, in _catboost._CatBoost._train
  File "_catboost.pyx", line 4672, in _catboost._CatBoost._train
_catboost.CatBoostError: C:/Program Files (x86)/Go Agent/pipelines/BuildMaster/catboost.git/catboost/libs/metrics/metric.cpp:6381: Max target greater than 1: 4
"""

@MorrisNein
Copy link
Collaborator Author

MorrisNein commented Apr 7, 2023

Нагуглил возможную ошибку: скорее всего, дело в CrossEntropyLoss, не подходящей для многоклассовых данных.

У нас тюнинг позволяет задавать эту функцию потерь узлу CatBoost:

'catboost': {
'max_depth': (hp.uniformint, [1, 11]),
'learning_rate': (hp.loguniform, [np.log(0.01), np.log(0.2)]),
'min_data_in_leaf': (hp.qloguniform, [0, 6, 1]),
'border_count': (hp.uniformint, [2, 255]),
'l2_leaf_reg': (hp.loguniform, [np.log(1e-8), np.log(10)]),
'loss_function': (hp.choice, [['Logloss', 'CrossEntropy']])
},

Возможно ли это при помощи мутации - не уверен

Upd. Действительно: мутация меняла loss функцию на бинарную, что вызывало ошибку при обучении узла. Пофикшено удалением параметра loss_function из пространства поиска.

@nicl-nno
Copy link
Collaborator

nicl-nno commented Apr 7, 2023

Возможно ли это при помощи мутации

Вполне.

@aim-pep8-bot
Copy link

aim-pep8-bot commented Apr 7, 2023

Hello @MorrisNein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2023-04-07 21:22:11 UTC

@MorrisNein
Copy link
Collaborator Author

Все остальные интеграционные тесты падают ровно так же, как и на мастере 😏
Видимо, больше ошибок с CatBoost не наблюдается.
Предлагаю вливать.

@MorrisNein MorrisNein requested a review from nicl-nno April 8, 2023 20:42
@MorrisNein MorrisNein merged commit dca4364 into master Apr 9, 2023
@MorrisNein MorrisNein deleted the add_catboost_assumption branch April 9, 2023 09:20
GrigoriJasnovidov added a commit to GrigoriJasnovidov/FEDOT that referenced this pull request Aug 17, 2023
deleted redundant files

corrected typos

simplifeid code

remove redundant

correct pep8 issues

add example

add mutations

correct visualization fiting process

improve visualization

add partial in solver.py

add example in .py format

examples/confidence_intervals

move prediction intervals in core/pipelines

delete old examples

add unit tests

Refactoring of ApiParams and ApiMetrics (aimclub#1041)

* WIP refactor ApiParams

* Remove explicit ApiParams initialization

* Move all params initialization to ApiParams

* Minor changes

* Remove _divide_params

* WIP create ApiParamsBuilder

* Add ApiParamsBuilder

* Minor

* Rename history_folder to history_dir

* Remove train_data from ApiParams

* Fixes after rebase

* Move obtain_metric to ApiMetrics

* Fix plot_pareto

* Fix Fedot.tune

* Refactor Fedot.get_metrics

* Fix Fedot.tune

* Fix metric names

* Fixes after rebase

* Remove ApiParamsBuilder

* Structure parameters in Fedot docstring

* Refactor init_composer_requirements

* Refactor init_optimizer_params

* Refactor init_optimizer_params

* Fix docstrings

* Add tests for ApiParamsRepository

* Minors

* Fix test_api_params

* Review fixes

* Review fixes

* Review fixes

Hotfix of pipeline import export example (aimclub#1064)

meta rules  (aimclub#1057)

* initial assumption

* final architecture

* add meta rules

* minor

* simplify

* add meta rule

* fix with cv folds

* minors

* fix types

* minor

* golem fixes

* add log messages

* pep8

* remove log file

Fix initial assumptions as list of pipelines (aimclub#1070)

`Fedot(..., initial_assumption=...)` is expected to get a sequence of pipelines and pass them as initial graphs to an optimizer via composer.

Moreover, FEDOT itself generates more than one initial assumption by default.

As the result of this bug, composer passed only one of the initial assumptions to an optimizer.

This PR fixes the bug and adds the corresponding test.

tests fix (aimclub#1073)

* remove some assumptions

* update requirements

* fix pep8

* update to golem

Docs updated, badges added (aimclub#1072)

* Docs updated, badges added

* Mirror workflow fix

Improve API documentation (aimclub#1067)

- Moved type hints from method headers to corresponding parameters.

- Allowed referencing GOLEM objects in FEDOT documentation.

- Replaced all url links to documentation pages with sphinx references - it fixed some broken links.

- FEDOT now uses its own directory for cache, instead of using GOLEM folder.

- Documented `**composer_tuner_params` of `Fedot` with type hints and default values.

All parameters with no usage examples are placed to the separate issue aimclub#1076

minor meta fix (aimclub#1078)

* minor fix

* indent fix

* minor

Add catboost to default initial assumptions (aimclub#1081)

* add catboost to default initial assumptions for classification

* restrict mutating loss function in CatBoost

* evaluate f1 as expected in the example

* pep8

F1 averaging fix (aimclub#1083)

Minor logging fix (aimclub#1082)

* fix print instead of logging for memory consumption

1059 timestamp bug (aimclub#1065)

879 FEDOT features (aimclub#1075)

describe framework's features

add example for surrogate optimizer (aimclub#1085)

Example with surrogate optimizer was added. External parameters field was removed from api (now we should use partial)

related pull request in GOLEM aimclub/GOLEM#82

has_one_root fix (aimclub#1091)

* has_one_root fix

* test fix

Remove outdated test handled in thegolem (aimclub#1101)

358 Reduce execution time for unit tests (aimclub#1098)

Update RTD benchmarks tabular data page (aimclub#1099)

* +csv support

Golem update requirements (aimclub#1088)

* Upd RemoveType in Advisor (golem sync)

* Upd requirements.txt for stable GOLEM

* Fix few imports

Add ts bench (aimclub#1104)

Add results from ts benchmark

Release 0.7.1 and test workflow updates (aimclub#1105)

* Upd release version

* Add pre-release tests actions on 'release' branch; Disable auto-publish

* Upd GOLEM version

* Remove manual-build.yml (dup of integration-build.yml)

* Add integration tests badge to README

* Revert "Upd GOLEM version" (for PR in master)

This reverts commit 257ff16.

Hotfix some integration tests for release 071 (aimclub#1107)

* Fix integration test of ApiParams

* Fix integration test of composition_time

* Fix integration test of metocean_forecasting

* Fix integration test of nemo_multiple.py

* Workaround for sqlite exception raised in tests

* pep8 fixes

* fix different seed in quality imporvement tests

* simplify condition

* remove test that barely tested anything

* fix condition

* remove seed from example

* fix typo

parallel cache files test fix (aimclub#1109)

Add IOptTuner (aimclub#1102)

* Refactor search space

* Fix ParametersChanges

* Redact tuner builder

* PEP 8

* Fix examples and correct tuning docs

* Add IOpt example to docs

* Add simple IOpt example

* Fix integration tests (6 steal not work)

* Fix integration test test_tuner_builder_with_custom_params (6 steal not work)

* Fix pep8

* Add tuners comparison in docs

* Update golem version in requirements

* Fix warn_requirements

* Fix warn_requirements

* Fix table in tuning docs

* Fix credit_scoring_problem_multiobj.py

* Fix extra requirements

* Edit docs

* Fix table in docs

* Change requirements

* Test requirements

* Set stable branch

+ h2o benchmark's tabular data values  (aimclub#1106)

* +h2o vals

* change csv to html

* specified table uuid

* change max values style

* add df to html converter

---------

Co-authored-by: nicl-nno <nicl.nno@gmail.com>

Added results for multimodal benchmark into FEDOT.docs (aimclub#1115)

* - added results for multimodal benchmark

...

add docstrings and type hints

add prediction intervals unit test

simplified solvers code

correct pep issues

add class defining PredictionIntervals params

fix MutationStrength import issue

correct test_data path

update imports

updated unit tests

correct type-hint

small corrections

take short ts for unit test

update pred_int test

change ql_models to 'max' in pred_ints test

update params for ql method

...

update example

update tests

add base_quantiles visualization

fix import get_base_quantiles

update examples and deleted ql from tests

update

pep

pep issues

pep

pep

update requirements.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants