Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Adding FunctionTransformer #1498

Merged
merged 6 commits into from
Oct 28, 2021
Merged

Conversation

boukepostma
Copy link
Contributor

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds an sktime-compatible adjustment of sklearn's FunctionTransformer. This class allows the user to easily create custom transformers that are different from the ones already included in sktime (e.g. SqrtTransformer, ExponentTransformer).

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

  • Docstring: this is a slightly adjusted version of the docstring in sklearn's FunctionTransformer. Should I give credits to or cite the original authors and if so, how?
  • def _check_inverse_transform(): this works but there may be more reliable ways to check whether func and inverse_func are the inverse. Note: comparing with == does not work due to numeric errors.

Any other comments?

Looking forward to receiving your feedback!

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • I've added unit tests and made sure they pass locally.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 8, 2021

Huh, we didn't already have this?
@aiwalter, @mloning, did we not already have this somewhere (under a different name)?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this looks great!
If we didn't already have this (and I indeed can't find it), would be a nice thing to have!

Some requests for change:

  • the one failure seems to be related to the example in the docstring, number of spaces
  • custom is not very descriptive as a module name, why not functiontrafo or similar?

@boukepostma
Copy link
Contributor Author

boukepostma commented Oct 8, 2021

Thanks for the feedback! Couldn't find something like this neither.

How do you like the filename func_transform.py?

Not sure which error you are referring to wrt docstrings. Should I remove some spaces after the 0. in the array?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 8, 2021

Not sure which error you are referring to wrt docstrings. Should I remove some spaces after the 0. in the array?

Go here to see CI test failures for the various builds:

image

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 8, 2021

How do you like the filename func_transform.py?

Yes, that's clear and concise imo.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 8, 2021

looks like you changed the module name but forgot to change an import of it somewhere

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's failing again with the docstring...

I think the number of spaces in the docstring example and in the actual printout need to be exactly the same, or the test won't be happy.

fkiraly
fkiraly previously approved these changes Oct 11, 2021
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good now, great contribution! (did we really not have this yet, @aiwalter, @RNKuhns, @mloning)

Apologies for the above back/forth, but as you see empty spaces are really important in documentation...


Parameters
----------
Z : pd.Series / pd.DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you accept pd.Series / pd.DataFrame, so we should do the same for fit()

@aiwalter
Copy link
Contributor

@boukepostma would you mind to add FunctionTransformer also to docs? You can search for other transformers in the .rst files how to do it.

Copy link
Contributor

@RNKuhns RNKuhns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a useful way to create an arbitrary series-to-series transformer. I've got 2 quick comments.

The first is a small change I'd like to see to make it more readable when people go back and look at the code later.

  • self.inverse_transform is calling self._transform.

I get that the call in the inverse_transform is applying the inverse function. But it seems unintuitive at first glance to see self._transform being applied in self.inverse_transform. I'd suggest renaming the existing self._transform to a helper function like "apply_function". Then you could use the helper function as appropriate in self._transform and separately in a self._inverse_transform. Then you could call self._inverse_transform in self.inverse_transform.

Second comment is more of a general question. Elsewhere in sktime we have functions to create different estimators. Do we want this as a class or is this a case where the class is not part of the public API, but a function like make_series_transformer is part of the API and returns the constructed class? @mloning, @aiwalter and @fkiraly I'm interested to your thoughts hear when it makes sense to do that.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 16, 2021

@RNKuhns, I think this very much makes sense as part of the public API, a general-purpose transformer creator.

@boukepostma
Copy link
Contributor Author

Hi all, thanks a lot for the feedback! I hope to have time later this week to go implement the suggested changes :)

@boukepostma
Copy link
Contributor Author

@boukepostma would you mind to add FunctionTransformer also to docs? You can search for other transformers in the .rst files how to do it.

FunctionTransformer should work for any Serie or Dataframe (i.e. Panel and Series). To which category should I add it in the documentation and folder structure?

@boukepostma
Copy link
Contributor Author

I pushed some changes:

  • _check_inverse_transform now uses actual data instead of a numpy array to validate whether transform and inverse_transform are each other's inverses.
    This provides more flexibility for different input datatypes (array/Serie/Dataframe), e.g. when the user would like to define func using a Series-specific syntax like func = lambda x: x ** 2. It supports NA value, by considering NA == NA to be True.
  • implemented @aiwalter his suggestions regarding docstrings: fit, and transform both accept Series and Dataframe.
  • Changed _transform to _apply_function as proposed by @RNKuhns. Fully agree that this is more intuitive!

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can go in for the current release, any changes tbd separately.

@fkiraly fkiraly merged commit 5fecb32 into sktime:main Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants