Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn api #12

Closed
gioxc88 opened this issue Jun 7, 2021 · 3 comments
Closed

sklearn api #12

gioxc88 opened this issue Jun 7, 2021 · 3 comments

Comments

@gioxc88
Copy link

gioxc88 commented Jun 7, 2021

Would you consider the possibility of making it compatible with sklearn using fit and transform instead of smooth?
Is there a specific reason why you save the transformed data as an instance attribute? (this would be against the sklearn API)

I am thinking of doing it myself for a project I am working on but I wanted to ask you first if I missed anything obvious that would make this difficult or not possible.

Many thanks

@cerlymarco
Copy link
Owner

Hi,

all the Smoother classes compute 'smooth'... this is the equivalent of 'fit_trasform' method. There is no way to operate 'fit' or 'transform' separately. For this reason, I prefer to not introduce this behavior, but you are free to create your own class in sklearn style like a wrapper of tsmoothie.

If you support the project don't forget to live a star ;-)

All the best

@gioxc88
Copy link
Author

gioxc88 commented Jun 9, 2021

thanks for the answer I am aware of this behavior, but I don't think you are entirely right.
On many of them you can in fact operate fit and transform separately

There are many classes where you calculate X_base and then you fit a LinearRegression using X_base. Finally you use the fitted LinearRegression for predicting the smoothed data.
In particular this applies to Polynomial, Gaussian , Spline and Binner.

That is by definition the separation between fit (where you calculate the X_base and fit the LinearRegression) and transform (where you use the LinearRegression for predicting the smoothed data). In this case for example you would only save the fitted LinearRegression as instance attribute

A similar pattern happens for the Lowess where the fitting process consists in calculating the weights (what you call betas) and the transform step is basically only this (betas[..., 0] + betas[..., 1] * X).T

And finally for Exponential and Convolutional the fit step would consist in calculating the weights and the transform step is to apply the convolution using the "fitted" weights.

So I believe saying "There is no way to operate 'fit' or 'transform' separately" is inaccurate.

That being said I love the package and I am not saying you should change the api to sklearn (I'll do it my self).
I am only suggesting that in my opinion there are some design patterns you used like self._store_results which are not great and makes it more difficult to create wrappers.

At the moment creating a Wrapper compliant with the sklearn api would require literally to take the code and copy paste it using a different structure. So this is not exactly ideal because every time update tsmoothie, the hypothetical external developer who wrote the Wrapper would need to check line by line the consistency between tsmoothie and his Wrapper.

Again just my 2 cents and congrats for the awesome work.

@cerlymarco
Copy link
Owner

cerlymarco commented Jun 11, 2021

Thanks for the suggestions, but I saw them not entirely applicable to all the smoother available. For the same reason, I used _store_results, as a common method for all the smoother, to make the code build in the same way (this is only a choice of mine).

I also think that a dummy sklearn wrapper can be built in this simple way (also it depends on what u are looking for):

class SklearnLowess(LowessSmoother):

  def __init__(self, smooth_fraction, iterations, batch_size=None, copy=True):
    self.smooth_fraction = smooth_fraction
    self.iterations = iterations
    self.batch_size = batch_size
    self.copy = copy

  def fit(self, X):
    return self

  def transform(self, X):
    return self.smooth(X).smooth_data

  def fit_transform(self, X):
    return self.smooth(X).smooth_data

here the full code

Bye

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants