Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of multimodal data in DataSourceSplitter + bugfix #1119 #1138

Merged
merged 15 commits into from
Aug 14, 2023

Conversation

kasyanovse
Copy link
Collaborator

@kasyanovse kasyanovse commented Aug 9, 2023

  1. Add support of multimodal data to cross validation
  2. Add stratification for classification by default (fix data split #1119)
  3. All data splitter function parameters are unified (shuffle_flag in train_test_data_setup was retained for backward compatibility)
  4. Many checks added in DataSourceSplitter and train_test_data_setup
  5. core.validation.split.py module is moved to core.data.cv_folds.py
  6. New tests are added
  7. DataObjectiveAdvisor is deleted because stratification is enable by default
  8. OneFoldInputDataSplit is deleted because cv_generator get checked parameters from DataSourceSplitter. Cross validation always may be done.
  9. Some old tests rely on unstratified samples. DataSourceSplitter and train_test_data_setup are called
    with stratify=False in that cases.

@kasyanovse kasyanovse linked an issue Aug 9, 2023 that may be closed by this pull request
@aim-pep8-bot
Copy link

aim-pep8-bot commented Aug 9, 2023

Hello @kasyanovse! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 207:27: E126 continuation line over-indented for hanging indent
Line 223:27: E123 closing bracket does not match indentation of opening bracket's line

Comment last updated at 2023-08-14 13:15:41 UTC

@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #1138 (d77ecdb) into master (04bceac) will decrease coverage by 0.12%.
The diff coverage is 92.96%.

@@            Coverage Diff             @@
##           master    #1138      +/-   ##
==========================================
- Coverage   78.62%   78.51%   -0.12%     
==========================================
  Files         131      130       -1     
  Lines        9401     9336      -65     
==========================================
- Hits         7392     7330      -62     
+ Misses       2009     2006       -3     
Files Changed Coverage Δ
fedot/api/api_utils/params.py 89.21% <ø> (-0.11%) ⬇️
.../core/optimisers/objective/data_source_splitter.py 92.20% <85.29%> (-3.57%) ⬇️
fedot/core/data/data_split.py 94.73% <93.84%> (+3.52%) ⬆️
fedot/core/data/cv_folds.py 100.00% <100.00%> (ø)
...tations/data_operations/sklearn_transformations.py 91.20% <100.00%> (ø)

@kasyanovse kasyanovse changed the title Refactor of data splitters Ыupport of multimodal data + bugfix #1119 Aug 10, 2023
@kasyanovse kasyanovse changed the title Ыupport of multimodal data + bugfix #1119 Support of multimodal data + bugfix #1119 Aug 10, 2023
@kasyanovse kasyanovse changed the title Support of multimodal data + bugfix #1119 Support of multimodal data in DataSourceSplitter + bugfix #1119 Aug 10, 2023
Fix #1119

Add support of MultiModalData in cv_folds

Delete some code that solves problems are solved in the new DataSourceSplitter or in new cv_folds
Also some fixes in new code
Fix calls of changed functions, add correct parameters to some test, fix data splitting functions
in the InputData format for cross validation. The function
return a generator of tuples, consisting of a pair of train, test.

:param data: InputData for train and test splitting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Если data может быть и InputData, и MultiModalData, лучше определить обобщённее: data for train and test splitting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправлено.

Applied only for time series data.
If not provided, then value will be calculated.
:param random_seed: Random seed for shuffle.
:param log: Log for logging.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Порядок переменных в докстринге не совпадает с порядком переменных в классе. Так и должно быть?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправлено.

@nicl-nno
Copy link
Collaborator

Не забудь только pep8-замечания поправить.

@kasyanovse
Copy link
Collaborator Author

Не забудь только pep8-замечания поправить.

Поправил. Осталось два замечания, но с кодом в указанных местах все нормально.

@kasyanovse kasyanovse merged commit d53dfa1 into master Aug 14, 2023
6 checks passed
@kasyanovse kasyanovse deleted the 1119-bugfix branch August 14, 2023 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix data split
5 participants