Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SciPy sparse matrices to support reticulate? #141

Closed
eddelbuettel opened this issue Jun 25, 2017 · 15 comments
Closed

Support SciPy sparse matrices to support reticulate? #141

eddelbuettel opened this issue Jun 25, 2017 · 15 comments

Comments

@eddelbuettel
Copy link
Member

The reticulate package is pretty exciting as it gives us R access to underlying Python code, and is already used by the R packages for keras and tensorflow.

Over there, a discussion started about sparse matrices.

Would it be feasible to cover one or more use cases of bringing SciPy sparse matrices to R?

@binxiangni
Copy link
Contributor

I write a small program to convert SciPy sparse matrix to R. But where should I add it to? To RcppArmadillo? I have no idea how to put it in. Or should I create a new package like sparsio you mentioned before?

@eddelbuettel
Copy link
Member Author

Nice!

There are a few things we could do:

  • keep it in RcppArmadillo if you feel it has the best match and use there
    • have RcppArmadillo add a "Siuggests: reticulate"
    • have the script use requireNamespace("reticulate", quietlty=TRUE) to operate conditionally
    • that way the function for scipy import could be a regular function in RcppArmadillo
  • make it an external function or package
    • this made sense for sparsio because it also needed extra code

Which reminds me: I meant to get back to the sparsio author and ask if we could have an additonal function or option to just import X but not y -- as I see it, it always wants to import features (X) and prediction (y). So poking @dselivanov: Is that something we could get sparsio to do?

@dselivanov
Copy link

I personally think that most sense it will be to include such conversions to reticulate itself (Since it covers most common r - python data structures conversions).

@eddelbuettel
Copy link
Member Author

Could, on the other hand reticulate is a poor "connection" package free of application use.

What about my second question to use: sparsio to read just a feature matrix X, not left-hand side variable?

@dselivanov
Copy link

Why not. We read/write target only because of definition of the svmlight format.
But still I wouldn't consider basic linear algebra stuff as application... Any sparse matrix is just 3 arrays. @jjallaire any thoughts?

@eddelbuettel
Copy link
Member Author

Yes, a sparse matrix's internal representation is just three vectors, and @binxiangni has done most excellent work over this Google Summer of Code adding more and more converters to RcppArmadillo.

What would help further, I thought, was to use your sparsio to read more matrices to have more conversion tests. When I looked at sparsio it could only be for sparse ML examples with both a left and a right-hand side. So what I am asking you has nothing to do with reticulate: I am simply wondering if we could use sparsio for a little more 'i' of sparse matrices...

@dselivanov
Copy link

Ok, I got the point. By default sparsio writes dummy y = 0, so it can be considered as optional. But by the definition of svmlight target variable always should be there.

Another widespread format for storing just sparse matrices (without target) is MatrixMarket - but it is not insparsio (yet).

@eddelbuettel
Copy link
Member Author

This is now merged.

@dselivanov
Copy link

Still don't understand why you have decided to put it into RcppArmadillo...

@eddelbuettel
Copy link
Member Author

eddelbuettel commented Aug 9, 2017

What is "it" ?

If you wonder why SciPy sparse matrice support, well that is what the thread started with. And as optional component it makes perfect sense.

@dselivanov
Copy link

dselivanov commented Aug 9, 2017 via email

@eddelbuettel
Copy link
Member Author

Well, it is our package and it made sense to us.

We are on the other hand still waiting on you and sparsio :)

@dselivanov
Copy link

dselivanov commented Aug 9, 2017 via email

@eddelbuettel
Copy link
Member Author

eddelbuettel commented Aug 9, 2017

That's of course all possible. But to recap:

  • Binxiang has done a metric ton of work to improve Armadillo sparse matrix support in RcppArmadillo
  • We had some issues / potential issues with Armadillo being non-standard (was a bug, now fixed)
  • Having another source of sparse matrices was beneficial for testing; Python is widely used; reticulate is ready.
  • So we added this. It's still tiny: one small (currently unexported) helper function, a set of unit tests.

I see absolutely nothing wrong with that. We reached out to for help with sparse matrices via sparsio; I am still hopeful we can do that too. In the meantime we moved on with our package.

Should be on CRAN "soon".

@dselivanov
Copy link

dselivanov commented Aug 9, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants