Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform(t::Tuple, x) #781

Open
gdkrmr opened this issue Apr 4, 2022 · 11 comments
Open

transform(t::Tuple, x) #781

gdkrmr opened this issue Apr 4, 2022 · 11 comments

Comments

@gdkrmr
Copy link

gdkrmr commented Apr 4, 2022

I think such an abstraction could be quite handy, just apply all the data transforms one after another to x. The same should exist for reconstruct. Would you be interested in such an extension? I could try to implement it.

@nalimilan
Copy link
Member

Could you describe the use case? Currently, supported transformations would cancel each other if applied sequentially.

@gdkrmr
Copy link
Author

gdkrmr commented Apr 7, 2022

I want to use it for centering and normalizing before applying a PCA. The PCA from MultivariateStatistics.jl only supports centering but not standardizing and they don't want to support it (JuliaStats/MultivariateStats.jl#86). I really need that feature and don't want to pirate the methods (https://github.com/gdkrmr/WeightedOnlineStats.jl/blob/master/src/pca.jl#L58). MultivariateStats.jl is going to use transform soon (JuliaStats/MultivariateStats.jl#171) and if you support chaining transforms, I get this feature for free and without pirating.

@nalimilan
Copy link
Member

Ah so PCA would be the second transform step? StatsBase.transform doesn't support PCA currently, so anyway MultivariateStats should first overload StatsBase.transform instead of defining its separate transform function (JuliaStats/MultivariateStats.jl#171 is more ambitious than that; see also #453). Without this it doesn't make sense for StatsBase.transform to support chaining transformations.

There's also an API question about what to return. If you want to be able to reconstruct the original data, the function has to return a tuple holding the outputs of each step, rather than only the output of the last step.

Previous issue at #452.

@gdkrmr
Copy link
Author

gdkrmr commented Apr 7, 2022

Ah so PCA would be the second transform step?

Yes

StatsBase.transform doesn't support PCA currently, so anyway MultivariateStats should first overload StatsBase.transform instead of defining its separate transform function (JuliaStats/MultivariateStats.jl#171 is more ambitious than that; see also #453).

I am bit lost here, so the transform that they are want to export is not StatsBase.transform? Then you are right.

@nalimilan
Copy link
Member

Yes currently StatsBase.transform and transform are different functions (in your code you use the two different forms in the same function, with a different prefix).

@gdkrmr
Copy link
Author

gdkrmr commented Apr 7, 2022

Aren't they planning to change that?

@nalimilan
Copy link
Member

I'm not sure. Why don't you ask them?

@wildart
Copy link
Contributor

wildart commented Apr 7, 2022

transform & reconstruct are internal function of StatsBase. They are not exported so do not try to override them. MultivariateStats reexports fit & predict from the StatsAPI. predict call does the same job as transform, i.e. the function forms the response of a given model.

@gdkrmr
Copy link
Author

gdkrmr commented Apr 8, 2022

There's also an API question about what to return. If you want to be able to reconstruct the original data, the function has to return a tuple holding the outputs of each step, rather than only the output of the last step.

Agreed, there are some open API questions. You don't need the intermediate steps to reconstruct the input unless you want to support transform! and reconstruct!. The other question is what the inverse for transform((a, b, c), x) should be, reconstruct((a, b, c), x) or reconstruct((c, b, a), x).

@nalimilan
Copy link
Member

Well transform and reconstruct are not exported because the names are very general, but they are definitely part of the public API as they have a whole section of the manual dedicated to them. Given that MultivariateStats has deprecated transform in favor of predict I'm not sure what to do about this issue. We could do the same for transformations in StatsBase I guess (or at least define both transform and predict for transformations).

@wildart We should probably define reconstruct in StatsAPI so that both MultivariateStats and StatsBase override it, like they already do for predict?

@wildart
Copy link
Contributor

wildart commented Apr 8, 2022

Yes, we should definitely put reconstruct into StatsAPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants