New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SciPy sparse matrices to support reticulate? #141

Closed
eddelbuettel opened this Issue Jun 25, 2017 · 15 comments

Comments

Projects
None yet
3 participants
@eddelbuettel
Member

eddelbuettel commented Jun 25, 2017

The reticulate package is pretty exciting as it gives us R access to underlying Python code, and is already used by the R packages for keras and tensorflow.

Over there, a discussion started about sparse matrices.

Would it be feasible to cover one or more use cases of bringing SciPy sparse matrices to R?

@binxiangni

This comment has been minimized.

Show comment
Hide comment
@binxiangni

binxiangni Aug 5, 2017

Contributor

I write a small program to convert SciPy sparse matrix to R. But where should I add it to? To RcppArmadillo? I have no idea how to put it in. Or should I create a new package like sparsio you mentioned before?

Contributor

binxiangni commented Aug 5, 2017

I write a small program to convert SciPy sparse matrix to R. But where should I add it to? To RcppArmadillo? I have no idea how to put it in. Or should I create a new package like sparsio you mentioned before?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 5, 2017

Member

Nice!

There are a few things we could do:

  • keep it in RcppArmadillo if you feel it has the best match and use there
    • have RcppArmadillo add a "Siuggests: reticulate"
    • have the script use requireNamespace("reticulate", quietlty=TRUE) to operate conditionally
    • that way the function for scipy import could be a regular function in RcppArmadillo
  • make it an external function or package
    • this made sense for sparsio because it also needed extra code

Which reminds me: I meant to get back to the sparsio author and ask if we could have an additonal function or option to just import X but not y -- as I see it, it always wants to import features (X) and prediction (y). So poking @dselivanov: Is that something we could get sparsio to do?

Member

eddelbuettel commented Aug 5, 2017

Nice!

There are a few things we could do:

  • keep it in RcppArmadillo if you feel it has the best match and use there
    • have RcppArmadillo add a "Siuggests: reticulate"
    • have the script use requireNamespace("reticulate", quietlty=TRUE) to operate conditionally
    • that way the function for scipy import could be a regular function in RcppArmadillo
  • make it an external function or package
    • this made sense for sparsio because it also needed extra code

Which reminds me: I meant to get back to the sparsio author and ask if we could have an additonal function or option to just import X but not y -- as I see it, it always wants to import features (X) and prediction (y). So poking @dselivanov: Is that something we could get sparsio to do?

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Aug 5, 2017

I personally think that most sense it will be to include such conversions to reticulate itself (Since it covers most common r - python data structures conversions).

dselivanov commented Aug 5, 2017

I personally think that most sense it will be to include such conversions to reticulate itself (Since it covers most common r - python data structures conversions).

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 5, 2017

Member

Could, on the other hand reticulate is a poor "connection" package free of application use.

What about my second question to use: sparsio to read just a feature matrix X, not left-hand side variable?

Member

eddelbuettel commented Aug 5, 2017

Could, on the other hand reticulate is a poor "connection" package free of application use.

What about my second question to use: sparsio to read just a feature matrix X, not left-hand side variable?

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Aug 5, 2017

Why not. We read/write target only because of definition of the svmlight format.
But still I wouldn't consider basic linear algebra stuff as application... Any sparse matrix is just 3 arrays. @jjallaire any thoughts?

dselivanov commented Aug 5, 2017

Why not. We read/write target only because of definition of the svmlight format.
But still I wouldn't consider basic linear algebra stuff as application... Any sparse matrix is just 3 arrays. @jjallaire any thoughts?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 5, 2017

Member

Yes, a sparse matrix's internal representation is just three vectors, and @binxiangni has done most excellent work over this Google Summer of Code adding more and more converters to RcppArmadillo.

What would help further, I thought, was to use your sparsio to read more matrices to have more conversion tests. When I looked at sparsio it could only be for sparse ML examples with both a left and a right-hand side. So what I am asking you has nothing to do with reticulate: I am simply wondering if we could use sparsio for a little more 'i' of sparse matrices...

Member

eddelbuettel commented Aug 5, 2017

Yes, a sparse matrix's internal representation is just three vectors, and @binxiangni has done most excellent work over this Google Summer of Code adding more and more converters to RcppArmadillo.

What would help further, I thought, was to use your sparsio to read more matrices to have more conversion tests. When I looked at sparsio it could only be for sparse ML examples with both a left and a right-hand side. So what I am asking you has nothing to do with reticulate: I am simply wondering if we could use sparsio for a little more 'i' of sparse matrices...

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Aug 5, 2017

Ok, I got the point. By default sparsio writes dummy y = 0, so it can be considered as optional. But by the definition of svmlight target variable always should be there.

Another widespread format for storing just sparse matrices (without target) is MatrixMarket - but it is not insparsio (yet).

dselivanov commented Aug 5, 2017

Ok, I got the point. By default sparsio writes dummy y = 0, so it can be considered as optional. But by the definition of svmlight target variable always should be there.

Another widespread format for storing just sparse matrices (without target) is MatrixMarket - but it is not insparsio (yet).

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 8, 2017

Member

This is now merged.

Member

eddelbuettel commented Aug 8, 2017

This is now merged.

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Aug 9, 2017

Still don't understand why you have decided to put it into RcppArmadillo...

dselivanov commented Aug 9, 2017

Still don't understand why you have decided to put it into RcppArmadillo...

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 9, 2017

Member

What is "it" ?

If you wonder why SciPy sparse matrice support, well that is what the thread started with. And as optional component it makes perfect sense.

Member

eddelbuettel commented Aug 9, 2017

What is "it" ?

If you wonder why SciPy sparse matrice support, well that is what the thread started with. And as optional component it makes perfect sense.

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Aug 9, 2017

dselivanov commented Aug 9, 2017

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 9, 2017

Member

Well, it is our package and it made sense to us.

We are on the other hand still waiting on you and sparsio :)

Member

eddelbuettel commented Aug 9, 2017

Well, it is our package and it made sense to us.

We are on the other hand still waiting on you and sparsio :)

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Aug 9, 2017

dselivanov commented Aug 9, 2017

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Aug 9, 2017

Member

That's of course all possible. But to recap:

  • Binxiang has done a metric ton of work to improve Armadillo sparse matrix support in RcppArmadillo
  • We had some issues / potential issues with Armadillo being non-standard (was a bug, now fixed)
  • Having another source of sparse matrices was beneficial for testing; Python is widely used; reticulate is ready.
  • So we added this. It's still tiny: one small (currently unexported) helper function, a set of unit tests.

I see absolutely nothing wrong with that. We reached out to for help with sparse matrices via sparsio; I am still hopeful we can do that too. In the meantime we moved on with our package.

Should be on CRAN "soon".

Member

eddelbuettel commented Aug 9, 2017

That's of course all possible. But to recap:

  • Binxiang has done a metric ton of work to improve Armadillo sparse matrix support in RcppArmadillo
  • We had some issues / potential issues with Armadillo being non-standard (was a bug, now fixed)
  • Having another source of sparse matrices was beneficial for testing; Python is widely used; reticulate is ready.
  • So we added this. It's still tiny: one small (currently unexported) helper function, a set of unit tests.

I see absolutely nothing wrong with that. We reached out to for help with sparse matrices via sparsio; I am still hopeful we can do that too. In the meantime we moved on with our package.

Should be on CRAN "soon".

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Aug 9, 2017

dselivanov commented Aug 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment