Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quanteda regression #161

Closed
eddelbuettel opened this issue Aug 17, 2017 · 16 comments
Closed

Quanteda regression #161

eddelbuettel opened this issue Aug 17, 2017 · 16 comments

Comments

@eddelbuettel
Copy link
Member

Those watching this repo and hence seeing this probably also follow the rcpp-devel list.

The quanteda by @kbenoit et al package came up as it at times seems to fail tests for me / us / CRAN. Now that RcppArmadillo 0.760.1.0 is on CRAN, it is seen to let quanteda fail 14 times:

* checking tests ... [59s/36s] ERROR
  Running ‘testthat.R’ [58s/36s]
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  testthat results ================================================================
  OK: 1390 SKIPPED: 19 FAILED: 17
  1. Error: test textplot_scale1d wordfish in the most basic way (@test-plots.R#96) 
  2. Error: textmodel-ca works as expected as ca::ca : use mt (@test-textmodel_ca.R#27) 
  3. Error: textmodel-ca(sparse) works as expected on another dataset (@test-textmodel_ca.R#68) 
  4. Error: textmodel-wordfish works as expected: dense vs sparse vs sparse+mt (@test-textmodel_wordfish.R#22) 
  5. Error: print/show/summary method works as expected (@test-textmodel_wordfish.R#32) 
  6. Error: coef works for wordfish fitted (@test-textmodel_wordfish.R#64) 
  7. Error: textmodel-wordfish works for quasipoisson - feature as expected: dense vs sparse vs sparse+mt (@test-textmodel_wordfish.R#81) 
  8. Error: textmodel-wordfish works for quasipoisson - overall as expected: dense vs sparse vs sparse+mt (@test-textmodel_wordfish.R#103) 
  9. Error: textmodel-wordfish (sparse) works as expected on another dataset (@test-textmodel_wordfish.R#119) 
  1. ...
  
  Error: testthat unit tests failed
  Execution halted
* checking for unstated dependencies in vignettes ... OK

Most of these are of the dfmSparse is not supported variety leading to our as<>() caster.

For testing, I downgraded to the previously release version 0.7.900.2.0 of RcppArmadillo -- and with it quateda 0.99 (which itself only got onto CRAN after we submitted) passes.

@binxiangni Could you possibly take a quick look if either we (or quanteda) can accomodate with a simple cast? I can take a look too but I am simply not that deep in the sparse code ...

@kbenoit
Copy link

kbenoit commented Aug 17, 2017

If we can help please let me know, I am on holidays for a week from Sunday but @koheiw and/or @HaiyanLW could assist with this. dfmSparse inherits from Matrix::dgCMatrix and we could easily change our code drop back to that before sending to RcppArmadillo functions if needed.

@eddelbuettel
Copy link
Member Author

eddelbuettel commented Aug 17, 2017

Appreciate the super-quick reply, Ken! I suspected something exactly along those lines. It could be our code is too simplistic ie in R I learned to test via inherits() rather than == for objects that subclass as eg in

R> x <- xts(1:3, order.by=Sys.time()+0:2)
R> class(x)
[1] "xts" "zoo"
R> 

Can you point us too where dfmSparse is set up? Maybe we need to change how we check, maybe it is your side, ... but I am sure we get this squared.

This was essentially a "regression test race condition" as we both made changes at the same time. But there is a lot more sparse matrix code on our side which will hopefully help you all too.

@eddelbuettel
Copy link
Member Author

It think it may be as simple as your

setClass("dfmSparse",
         contains = c("dfm", "dgCMatrix"))

not meshing with our side of

            // Get the type of sparse matrix
            std::string type = Rcpp::as<std::string>(mat.slot("class"));
            if (type == "dgCMatrix") {

@eddelbuettel
Copy link
Member Author

It all works if we just make this change to RcppArmadillo:

modified   inst/include/RcppArmadilloAs.h
@@ -97,7 +97,7 @@ namespace traits {
             
             // Get the type of sparse matrix
             std::string type = Rcpp::as<std::string>(mat.slot("class"));
-            if (type == "dgCMatrix") {
+            if (type == "dgCMatrix" || type == "dfmSparse") {
                 IntegerVector i = mat.slot("i");
                 IntegerVector p = mat.slot("p");
                 Vector<RTYPE> x = mat.slot("x");

@dmbates Is there a more S4-ish trick which could be done from client-package side? We can't possibly enumerate all possible S4 classes on our end, but we also shouldn't have to force people to stick with dgcMatrix and the like. I am clearly missing a (possibly simple) piece here...

@eddelbuettel
Copy link
Member Author

I think I have idea:

  • In quanteda, we provide a proxy class quantedaMat (or qmat for short) which inherist from arma::spmat.
  • It checks that the incoming class is dfmSparse, and then sets it to dgCMatrix.

Thoughts?

@eddelbuettel
Copy link
Member Author

Sounds like we exactly that (from today's R-devel NEWS feed)

CHANGES IN R-devel BUG FIXES

Internal C-level ‘R_check_class_and_super()’ and hence ‘R_check_class_etc()’ now also consider non-direct super classes and hence return a match in more cases. This e.g., fixes behaviour of derived classes in package ‘Matrix’.

@eddelbuettel
Copy link
Member Author

Realized (on train) that the shim class may create problems as it will likely affect behaviour once back in R too.

So the better bet may be to go with is() as suggsted by Kurt:

R> class(dfm1)
[1] "dfmSparse"
attr(,"package")
[1] "quanteda"
R> "dgCMatrix" %in% methods::is(dfm1)          ## this does the Right Thing (TM)
[1] TRUE
R> 

@binxiangni
Copy link
Contributor

Sorry, I don't understand your latest solution. Do you mean a minor modification on the quanteda side? I found it hard to change the conditional statement in as<>(), since the slot type is just a character vector and we cannot detect its inheritance in Rcpp.

@eddelbuettel
Copy link
Member Author

We can access the line "dgCMatrix" %in% methods::is(dfm1) because we can access R functions. It is kinda ugly, but the thing we need here.

So I guess I will implement that, and we then change all your class == "thisClass" string comparisons with an additional || isIt(class, "thisClass") where the to-be-witten isIt() will basically do this one line using is().

With the news from R-devel we probably have a chance to conditionally on R-devel (and then R 3.5.* next) do it in C code. And as the || is less expensive at the C++ level, I think we're fine.

In short: we just generalize your comparisons. At least, that is my current plan ... but then I am also currently at work and not looking at this.

@eddelbuettel
Copy link
Member Author

The following works (which is a start ;-) but feels a little ... rushed.

Should the helper function be somewhere else? In Rcpp?

modified   inst/include/RcppArmadilloAs.h
@@ -25,6 +25,19 @@
 
 namespace Rcpp{
 
+    inline bool isIt(const std::string cls, Rcpp::S4 s) {
+        static Rcpp::Environment mthds("package:methods");
+        static Rcpp::Function is = mthds["is"];
+  
+        Rcpp::CharacterVector res = is(s);
+        bool itis = false;
+        for (int i=0; !itis && i<res.size(); i++) {
+            itis = strcmp(cls.c_str(), res[i]) == 0;
+        }
+        return itis;
+    }
+
+    
 namespace traits {
 
     template <typename T> 
@@ -97,7 +110,7 @@ namespace traits {
             
             // Get the type of sparse matrix
             std::string type = Rcpp::as<std::string>(mat.slot("class"));
-            if (type == "dgCMatrix") {
+            if (type == "dgCMatrix" || isIt("dgCMatrix", mat)) {
                 IntegerVector i = mat.slot("i");
                 IntegerVector p = mat.slot("p");
                 Vector<RTYPE> x = mat.slot("x");
@@ -110,7 +123,7 @@ namespace traits {
                 std::copy(p.begin(), p.end(), arma::access::rwp(res.col_ptrs));
                 std::copy(x.begin(), x.end(), arma::access::rwp(res.values));
             }
-            else if (type == "dtCMatrix") {
+            else if (type == "dtCMatrix" || isIt("dtCMatrix", mat)) {
                 IntegerVector i = mat.slot("i");
                 IntegerVector p = mat.slot("p");
                 Vector<RTYPE> x = mat.slot("x");
@@ -128,7 +141,7 @@ namespace traits {
                     res.diag().ones();
                 }
             }
-            else if (type == "dsCMatrix") {
+            else if (type == "dsCMatrix" || isIt("dsCMatrix", mat)) {
                 IntegerVector i = mat.slot("i");
                 IntegerVector p = mat.slot("p");
                 Vector<RTYPE> x = mat.slot("x");
@@ -148,7 +161,7 @@ namespace traits {
                     res = symmatl(res);
                 }
             }
-            else if (type == "dgTMatrix") {
+            else if (type == "dgTMatrix" || isIt("dgTMatrix", mat)) {
                 IntegerVector ti = mat.slot("i");
                 IntegerVector tj = mat.slot("j");
                 Vector<RTYPE> tx = mat.slot("x");
@@ -200,7 +213,7 @@ namespace traits {
                 std::copy(p.begin(), p.end(), arma::access::rwp(res.col_ptrs));
                 std::copy(x.begin(), x.end(), arma::access::rwp(res.values));
             }
-            else if (type == "dtTMatrix") {
+            else if (type == "dtTMatrix" || isIt("dtTMatrix", mat)) {
                 IntegerVector ti = mat.slot("i");
                 IntegerVector tj = mat.slot("j");
                 Vector<RTYPE> tx = mat.slot("x");
@@ -257,7 +270,7 @@ namespace traits {
                     res.diag().ones();
                 }
             }
-            else if (type == "dsTMatrix") {
+            else if (type == "dsTMatrix" || isIt("dsTMatrix", mat)) {
                 IntegerVector ti = mat.slot("i");
                 IntegerVector tj = mat.slot("j");
                 Vector<RTYPE> tx = mat.slot("x");
@@ -316,7 +329,7 @@ namespace traits {
                     res = symmatl(res);
                 }
             }
-            else if (type == "dgRMatrix") {
+            else if (type == "dgRMatrix" || isIt("dgRMatrix", mat)) {
                 IntegerVector rj = mat.slot("j");
                 IntegerVector rp = mat.slot("p");
                 Vector<RTYPE> rx = mat.slot("x");
@@ -366,7 +379,7 @@ namespace traits {
                 std::copy(p.begin(), p.end(), arma::access::rwp(res.col_ptrs));
                 std::copy(x.begin(), x.end(), arma::access::rwp(res.values));
             }
-            else if (type == "dtRMatrix") {
+            else if (type == "dtRMatrix" || isIt("dtRMatrix", mat)) {
                 IntegerVector rj = mat.slot("j");
                 IntegerVector rp = mat.slot("p");
                 Vector<RTYPE> rx = mat.slot("x");
@@ -421,7 +434,7 @@ namespace traits {
                     res.diag().ones();
                 }
             }
-            else if (type == "dsRMatrix") {
+            else if (type == "dsRMatrix" || isIt("dsRMatrix", mat)) {
                 IntegerVector rj = mat.slot("j");
                 IntegerVector rp = mat.slot("p");
                 Vector<RTYPE> rx = mat.slot("x");
@@ -478,7 +491,7 @@ namespace traits {
                     res = symmatl(res);
                 }
             }
-            else if (type == "indMatrix") {
+            else if (type == "indMatrix" || isIt("indMatrix", mat)) {
                 std::vector<int> i;
                 IntegerVector p(ncol + 1);
                 IntegerVector x(nrow, 1);
@@ -519,7 +532,7 @@ namespace traits {
                 std::copy(p.begin(), p.end(), arma::access::rwp(res.col_ptrs));
                 std::copy(x.begin(), x.end(), arma::access::rwp(res.values));
             }
-            else if (type == "pMatrix") {
+            else if (type == "pMatrix" || isIt("pMatrix", mat)) {
                 std::vector<int> i;
                 IntegerVector p(ncol + 1);
                 IntegerVector x(ncol, 1);
@@ -550,7 +563,7 @@ namespace traits {
                 std::copy(p.begin(), p.end(), arma::access::rwp(res.col_ptrs));
                 std::copy(x.begin(), x.end(), arma::access::rwp(res.values));
             }
-            else if (type == "ddiMatrix") {
+            else if (type == "ddiMatrix" || isIt("ddiMatrix", mat)) {
                 std::vector<int> i;
                 std::vector<int> p;
                 std::vector<double> x;

@kevinushey
Copy link
Contributor

kevinushey commented Aug 18, 2017

It looks like S4 objects already do have an is() method -- perhaps we can just call that? E.g.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
bool cpp_inherits(S4 object, std::string clazz)
{
  return object.is(clazz);
}

/*** R
library(Matrix)
m <- Matrix(1)
cpp_inherits(m, "dsyMatrix")
cpp_inherits(m, "Matrix")
*/

@eddelbuettel
Copy link
Member Author

Crikey. Did I just reimplement that?

@kevinushey
Copy link
Contributor

More or less :-)

The Rcpp code does something similar, but extracts the class definition directly and looks to see if the compared class is a known parent class.

@eddelbuettel
Copy link
Member Author

And exactly what we needed. Thanks for cluebat, Kevin :)

R> cpp_inherits(dfm1, "dgCMatrix")     # after library(quanteda); example(dfm)
[1] TRUE
R> 

@eddelbuettel
Copy link
Member Author

Well I didn't commit it so maybe it doesn't count, right? ;-)

That'll make the right generalization for the test and is what we needed. All good.

@eddelbuettel
Copy link
Member Author

PR #162 which fixes this has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants