Column transformer for segmented data #9

qtux · 2018-11-28T04:08:49Z

Hi David,

I wrote a simple wrapper to use the sklearn ColumnTransformer on segmented data which is kind of useful when dealing with heterogeneous (multivariate) time series data.
I've taken a look into supporting contextual data but did not find an easy way to make the current code work with the TS_Data class. Maybe copying and adapting the whole ColumnTransformer code instead of patching some parts of it could lead to a proper solution to support both.
Nevertheless, I hope you find the SegmentedColumnTransformer to be useful.

Cheers,
Matthias

The main use case for this transformer is to enable the application of specified groups of feature functions to specified columns of data, e.g. when dealing with heterogeneous data. The SegmentedColumnTransformer is derived from the sklearn ColumnTransformer and adapted to be used inside a Pype object after a segment transformation. The adaption mainly consists of: - adapt the notation of a column (ColumnTransformer iterates over the second dimension, segmented data must be iterated over the third dimension). - disable "drop" and "passthrough" transform options for simplicity and drop non-specified columns by default Note: SegmentedColumnTransformer does not support contextual data.

coveralls · 2018-11-28T04:11:22Z

Pull Request Test Coverage Report for Build 124

33 of 33 (100.0%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.2%) to 93.715%

Totals
Change from base Build 122:	0.2%
Covered Lines:	1178
Relevant Lines:	1257

💛 - Coveralls

dmbee · 2018-11-28T19:04:49Z

Thank you Matthias for your work on this. If I understand your aim correctly - you want to have the API support specifying which time series variables each feature is computed for?

It is true that currently the API supports only feature representations where each feature is computed for all variables. I agree that this is a limitation of the current code.

I would be happy to include this capability, but it needs to work with TS_Data. Did you look at FeatureRep.transform? Should be pretty easy to merge the context data back with the feature data using np.column stack. Also maybe pick a better name eg FeatureRepMix since columns makes sense primarily in the context of 2D data. Also we should implement f_labels so that the user can retrieve the mapping of features post transform. It would be nice if the unit testing checked the calculation, not just the returned data shape.

Let me know if you need any help with this. Once you are done, please submit the pull request on the dev branch so I can thoroughly test it before releasing it to master.

Thanks again
David

qtux · 2018-11-29T00:43:06Z

Hi David,

If I understand your aim correctly - you want to have the API support specifying which time series variables each feature is computed for?

Yes, that is correct. Furthermore I wanted to have the same functionality that ColumnTransformer offers: Parallel processing of transformers not only restricted to the FeatureRep (hence the naming).

I will have a look on whether I can integrate the TS_Data into the SegmentedColumnTransformer or go with your proposal of implementing a FeatureRepMix for only applying different FeatureRep transforms on different time series variables. In that case we could use the sklearn ColumnTransformer on the outcome of the FeatureRep transformers.

Cheers,
Matthias

dmbee · 2018-11-29T15:22:05Z

Thanks Matthias,

I think that would be a great addition to seglearn. I think you can use (inherit) sklearn ColumnTransformer to do the processing on the time series data as you did in your previous pull request. I was just suggesting you call the seglearn class implementation like FeatureRepMix to avoid confusion.

The context data doesn't need to go to ColumnTransformer, so the implementation would look like the current feature rep

Xt, Xc = get_ts_data_parts(X)

fts = Parallel(n_jobs=self.n_jobs)(
            delayed(func)(
                clone(trans) if not fitted else trans, np.atleast_3d(X)[:, :, column], y, weight
            ) for _, trans, column, weight in self._iter(fitted=fitted, replace_strings=False)

if Xc is not None:
            fts = np.column_stack([fts, Xc])

return fts

good luck

qtux added 3 commits November 28, 2018 04:53

tests: Add test for SegmentedColumnTransformer

fc53756

examples: Add SegmentedColumnTransformer example

e1a9df1

dmbee closed this Nov 28, 2018

qtux mentioned this pull request Dec 4, 2018

FeatureRepMix #11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column transformer for segmented data #9

Column transformer for segmented data #9

qtux commented Nov 28, 2018

coveralls commented Nov 28, 2018 •

edited

dmbee commented Nov 28, 2018 •

edited

qtux commented Nov 29, 2018

dmbee commented Nov 29, 2018 •

edited

Column transformer for segmented data #9

Column transformer for segmented data #9

Conversation

qtux commented Nov 28, 2018

coveralls commented Nov 28, 2018 • edited

Pull Request Test Coverage Report for Build 124

💛 - Coveralls

dmbee commented Nov 28, 2018 • edited

qtux commented Nov 29, 2018

dmbee commented Nov 29, 2018 • edited

coveralls commented Nov 28, 2018 •

edited

dmbee commented Nov 28, 2018 •

edited

dmbee commented Nov 29, 2018 •

edited