Resampling with imbalanced-learn samplers #15

qtux · 2018-12-08T06:25:11Z

Hi David,

I added the patch_sampler(imblearn_sampler_class) function which can be used to derive a dynamically created (and pickable) sampler class compatible with Pype.
The derived class implements a transform method which returns the data unchanged. The fit_transform method calls the fit_resample method of the imbalanced-learn sampler which resamples the data.
These steps are important to ensure that resampling only applies to training data but not to test data (the example shows that Pype.fit calls the fit_transform method, whereas score calls the transform method).

Cheers,
Matthias

coveralls · 2018-12-08T09:52:32Z

Pull Request Test Coverage Report for Build 225

282 of 288 (97.92%) changed or added relevant lines in 11 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.5%) to 95.104%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
seglearn/pipe.py	1	2	50.0%
seglearn/transform.py	66	68	97.06%
seglearn/util.py	15	18	83.33%

Totals
Change from base Build 197:	0.5%
Covered Lines:	1923
Relevant Lines:	2022

💛 - Coveralls

dmbee · 2018-12-08T19:57:00Z

Thanks Matthias. I'll have to look this over later this week. Thanks again for the contribution.
David

qtux · 2018-12-12T11:49:37Z

I rebased the commits on the current development branch.

qtux · 2018-12-13T13:39:51Z

Hi David,

I added the possibility to shuffle the resampled results. The reason for this feature is that e.g. the RandomUnderSampler seems to sort the X/y arrays by the class of y. This turns out to be problematic when using the fixed validation_split to fit a Keras classifier on resampled and segmented data.
A solution to provide validation_data without using the validation_split seems to be more complex.

Cheers,
Matthias

dmbee · 2018-12-14T23:03:43Z

Thanks Matthias - I have to spend some more time looking at this. I am working on sklearn on how best to integrate resampling into their pipeline as well.

Here is the thread for the discussion: scikit-learn/scikit-learn#3855

dmbee · 2018-12-14T23:04:41Z

The reason for this feature is that e.g. the RandomUnderSampler seems to sort the X/y arrays by the class of y.

This is really good to know.

qtux · 2018-12-18T23:56:56Z

I rebased the commits on the current development branch.

qtux · 2019-01-05T01:41:48Z

I rebased the commits on the current development branch.

qtux · 2019-01-05T01:47:47Z

seglearn/transform.py

+            '''
+            Circumvent the check whether dim(Xt) == 2.
+            '''
+            Xt_2d = Xt.reshape(Xt.shape[0], -1)


This creates a copy of Xt when Fortran-like (default) ordering is used when segmenting data. #24 solves this issue by choosing C-like ordering for the segmentation. Shall I give a warning if Fortran-like ordering is used, or shall I remove the "faked" check altogether?

Replace deprecated six functionality with Python 3 code and adapt to new version requirements.

Drop Python 2 support by using scikit-learn 0.21.3

qtux · 2019-08-27T16:47:31Z

Hi David,

I rebased the resampling patches to the master branch and squashed the commits such that it would be easier to revert them. What do you think about merging this patch set? It seems that scikit-learn needs some more time until they might provide this feature (c.f. scikit-learn/scikit-learn#13269).

Should I change this pull request from the dev to the master branch?

Cheers,
Matthias

dmbee · 2019-08-30T21:54:57Z

Hi Matthias,

I really appreciate your work on this. I am pretty busy over the next two weeks but promise to look over this again soon. Last time I wasn't too keen on adding the dependency of imblearn.

Let me look it over again and let us then discuss.

David

…deviation add median absolute deviation

qtux · 2019-11-07T18:20:25Z

Hi David,

any news?

Cheers,
Matthias

dmbee · 2019-11-07T18:40:47Z

Matthias - truly apologize for the delay as I am writing my thesis currently. This looks great. Can you please rebase to the current master and I will merge and deploy soon as that's done.

I appreciate all your work on this really useful patch.

David

This functions dynamically patches an imbalanced-learn Sampler transformer to be usable inside a seglearn Pype. It ensures that the objects created from this metaclass are pickable. Additionally, shuffling is implemented for imbalanced-learn samplers. The reason is that imbalanced-learn sorts the output by classes. This is problematic when splitting the resampled data (e.g. using the validation split from the Keras fit function). Finally, calling repr() on a dynamically patched sampler will return the parameters of the imbalanced-learn base class along with the additional parameters introduced in the PickableSampler class (shuffle and random_state).

Additionally, add tests for a dynamically created PickableSampler object and imbalanced-learn sampler shuffling.

qtux · 2019-11-07T22:14:25Z

Hi David,

no worries, all the best for your thesis :).

Cheers,
Matthias

dmbee · 2019-11-08T20:01:59Z

Thanks Matthias

qtux force-pushed the resampling branch 2 times, most recently from f86a465 to ebdd07e Compare December 8, 2018 09:50

qtux force-pushed the resampling branch from ebdd07e to a852fc4 Compare December 12, 2018 11:48

qtux force-pushed the resampling branch from f6930d7 to 5f03eae Compare December 18, 2018 23:52

qtux force-pushed the resampling branch from 5f03eae to b009300 Compare January 5, 2019 01:32

qtux mentioned this pull request Jan 5, 2019

transform: Introduce order param for segmentation #24

Merged

qtux commented Jan 5, 2019

View reviewed changes

qtux force-pushed the resampling branch from b009300 to b0ef96a Compare January 29, 2019 19:00

qtux force-pushed the resampling branch from 807842a to 6b1669f Compare May 14, 2019 11:48

dmbee and others added 3 commits May 24, 2019 16:14

pandas support

99f7c53

Drop Python 2 support by using scikit-learn 0.21.3

45045b1

Replace deprecated six functionality with Python 3 code and adapt to new version requirements.

Merge pull request dmbee#37 from qtux/drop_python2

05292d8

Drop Python 2 support by using scikit-learn 0.21.3

qtux force-pushed the resampling branch 2 times, most recently from 9039c6e to c24a839 Compare August 27, 2019 16:46

dmbee and others added 6 commits September 25, 2019 13:50

multilabel and duplicate timestamp support

ea2d56b

Merge branch 'master' of https://github.com/dmbee/seglearn

1a9c201

added interplongtowide

82b2c58

init updates

01fa65b

add median absolute deviation

d0fbd62

Merge pull request dmbee#38 from tylerwmarrs/feature_median_absolute_…

da93257

…deviation add median absolute deviation

dmbee added 3 commits October 9, 2019 13:22

improved interpolater sorting

286ccc5

typo fix

3582f2d

Merge branch 'master' of https://github.com/dmbee/seglearn

5de0d32

qtux added 2 commits November 7, 2019 23:07

examples: Add imbalanced-learn example

3fcb4fc

qtux force-pushed the resampling branch from c24a839 to a61f90f Compare November 7, 2019 22:10

tests: Add tests for patch_sampler

a61f90f

Additionally, add tests for a dynamically created PickableSampler object and imbalanced-learn sampler shuffling.

dmbee merged commit a61f90f into dmbee:dev Nov 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resampling with imbalanced-learn samplers #15

Resampling with imbalanced-learn samplers #15

qtux commented Dec 8, 2018

coveralls commented Dec 8, 2018 •

edited

dmbee commented Dec 8, 2018

qtux commented Dec 12, 2018

qtux commented Dec 13, 2018

dmbee commented Dec 14, 2018

dmbee commented Dec 14, 2018

qtux commented Dec 18, 2018

qtux commented Jan 5, 2019

qtux Jan 5, 2019 •

edited

qtux commented Aug 27, 2019 •

edited

dmbee commented Aug 30, 2019

qtux commented Nov 7, 2019

dmbee commented Nov 7, 2019

qtux commented Nov 7, 2019

dmbee commented Nov 8, 2019

Resampling with imbalanced-learn samplers #15

Resampling with imbalanced-learn samplers #15

Conversation

qtux commented Dec 8, 2018

coveralls commented Dec 8, 2018 • edited

Pull Request Test Coverage Report for Build 225

💛 - Coveralls

dmbee commented Dec 8, 2018

qtux commented Dec 12, 2018

qtux commented Dec 13, 2018

dmbee commented Dec 14, 2018

dmbee commented Dec 14, 2018

qtux commented Dec 18, 2018

qtux commented Jan 5, 2019

qtux Jan 5, 2019 • edited

Choose a reason for hiding this comment

qtux commented Aug 27, 2019 • edited

dmbee commented Aug 30, 2019

qtux commented Nov 7, 2019

dmbee commented Nov 7, 2019

qtux commented Nov 7, 2019

dmbee commented Nov 8, 2019

coveralls commented Dec 8, 2018 •

edited

qtux Jan 5, 2019 •

edited

qtux commented Aug 27, 2019 •

edited