Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLJ Tangent space transformer #5

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

mateuszbaran
Copy link
Member

At the moment it's just an initial sketch. I'm going to use it to try out the new MLJ <-> Manifolds integration: https://github.com/alan-turing-institute/MLJScientificTypes.jl/issues/46 .

@mateuszbaran mateuszbaran added the WIP Work in Progress (for a pull request) label Sep 23, 2020
@codecov
Copy link

codecov bot commented Sep 23, 2020

Codecov Report

Merging #5 (18d1953) into master (029bba9) will decrease coverage by 76.92%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##            master       #5       +/-   ##
============================================
- Coverage   100.00%   23.07%   -76.93%     
============================================
  Files            3        4        +1     
  Lines           21       91       +70     
============================================
  Hits            21       21               
- Misses           0       70       +70     
Impacted Files Coverage Δ
src/ManifoldML.jl 100.00% <ø> (ø)
src/tangent_transformer.jl 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 029bba9...ec8eb6d. Read the comment docs.

@mateuszbaran
Copy link
Member Author

@ablaom could you take a look at this PR? This should be a reasonable first step for manifold support in MLJ. In particular TangentSpaceTransformer can be used to transform to tangent space and back and works, at least on my computer. I'm not sure how important UnivariateTangentSpaceTransformer would be? I started making it similar to Standardizer/UnivariateStandardizer transformers.

Does it go in a more or less right direction?

@ablaom
Copy link

ablaom commented Nov 12, 2020

Cool. Exciting to see convergence of differential geometry with MLJ!

I've had a quick look at this PR and I think understand the basic objective here.

  • You may want to consider restricting to the univariate case, ie, the case where your input X is an abstract vector of scientific type AbstractVector{<:ManifoldPoint}. The general case should be sorted by Universal table transformer combining univariate transformations dispatched on schema  JuliaAI/MLJModels.jl#288, which is about wrapping univariate transformers so that they are applied for each specified feature in a table (actually, given all the work you're now perfectly poised to sort out that issue!). I'm sorry we didn't anticipate this better in initial design.

  • since the ouput type is a tangent space, we are going to need a scientific type for tangent spaces, if this transformer is to be made available to all MLJ users. That's your plan, yes? Every registered transformer needs to have a output scientific type.

  • I think "TangentSpaceTransformer" is a perfectly fine name, but maybe "ManifoldToTangentSpaceTransformer" or "InverseTangentSpaceRetraction" are slightly more informative??

@mateuszbaran
Copy link
Member Author

  • You may want to consider restricting to the univariate case, ie, the case where your input X is an abstract vector of scientific type AbstractVector{<:ManifoldPoint}. The general case should be sorted by alan-turing-institute/MLJModels.jl#288, which is about wrapping univariate transformers so that they are applied for each specified feature in a table (actually, given all the work you're now perfectly poised to sort out that issue!). I'm sorry we didn't anticipate this better in initial design.

Yes, right, it would certainly make sense to centralize that functionality. That looks like a lot of work though and the current approach also works so I'd prefer to leave the multivariate functionality here for now and refactor it later.

  • since the ouput type is a tangent space, we are going to need a scientific type for tangent spaces, if this transformer is to be made available to all MLJ users. That's your plan, yes? Every registered transformer needs to have a output scientific type.

I think we have two issues here. First, I actually thought that it might make more sense to split the current functionality of TangentSpaceTransformer into transformation into a tangent space and splitting coordinates into columns. It's certainly my goal to provide both functionalities to all MLJ users. For the first part, the output scientific type already exists (Manifolds.jl has a manifold that represents a tangent space so ManifoldPoint scientific type can also be used for this). How can I do the splitting into multiple columns though? Can a registered transformer have multiple output types?

  • I think "TangentSpaceTransformer" is a perfectly fine name, but maybe "ManifoldToTangentSpaceTransformer" or "InverseTangentSpaceRetraction" are slightly more informative??

Right, ManifoldToTangentSpaceTransformer sounds better. I'll change the name.

@ablaom
Copy link

ablaom commented Nov 12, 2020

... so I'd prefer to leave the multivariate functionality here for now and refactor it later. 👍

... into transformation into a tangent space and splitting coordinates into columns.

Okay, I misunderstood. I thought your "tangent space" was a custom object. Rather, you identify elements of a tangent space with points in (a subspace of some) Euclidean space and you don't care to remember the base point or what subspace this is?

And you are then splitting the components of this representation into columns of a table?

And, just to be clear, the input of your Transformer can be either:

  • a single vector of manifold points, or

  • any table with one or more columns being such vectors

Correct?

Assuming the answer to these questions is yes, it would make sense to have specific input and output side types for a separate "univariate" transformer, well the generic transformer would have rather loose types like Table.

@mateuszbaran
Copy link
Member Author

Okay, I misunderstood. I thought your "tangent space" was a custom object. Rather, you identify elements of a tangent space with points in (a subspace of some) Euclidean space and you don't care to remember the base point or what subspace this is?

Well, not quite. In Manifolds.jl there are no strict requirements for the representation of tangent vectors. Usually, an isometric embedding into the Euclidean space is used to represent points and a similar representation is used for tangent vectors (with a notable exception of low-rank matrices). We also have another representation for tangent vectors, that is their coefficients in a certain basis (in this transformer it's stored here: https://github.com/JuliaManifolds/ManifoldML.jl/pull/5/files#diff-dd238691731421f4d569f6b8d8496c5a1eecfd8203b6a3e98fcb077218659909R30). Also, I wasn't quite sure here what to do with storing the point at which the vectors are tangent so currently it's stored in fitresult. When I extract the "transforming into tangent space" part that point will be stored in the manifold object.

From a more high-level point of view, this transformer is supposed to be useful as a part of for example principal geodesic analysis. In general, when a standard statistical method is generalized to manifolds, one can either use the minimization-based formulation and rewrite it using geometric concepts (which is the path that, with few exceptions, requires Manopt.jl) or take a more approximate route of selecting a decent linearization of data and applying standard methods to coefficients in a basis (which is what this tangent space transformer is useful for).

And you are then splitting the components of this representation into columns of a table?

Essentially, yes. That's how I can for example use the standard PCA code to do principal geodesic analysis.

And, just to be clear, the input of your Transformer can be either:

  • a single vector of manifold points, or
  • any table with one or more columns being such vectors

Correct?

Assuming the answer to these questions is yes, it would make sense to have specific input and output side types for a separate "univariate" transformer, well the generic transformer would have rather loose types like Table.

Yes, OK, I think I know now how to improve this.

@mateuszbaran
Copy link
Member Author

A small update: I'm planning to generalize it to chart transformers (with tangent space transformer as a special case) after JuliaManifolds/Manifolds.jl#325 is finished. After that I'll see if there are any updates to MLJ that would let me implement it a bit nicer.

@kellertuer
Copy link
Member

Cool, sorry that I did not have time to come back to ManifoldML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in Progress (for a pull request)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants