Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dim problem with arma::field inputs #352

Merged
merged 4 commits into from
Nov 10, 2021
Merged

Fix dim problem with arma::field inputs #352

merged 4 commits into from
Nov 10, 2021

Conversation

BerriJ
Copy link
Contributor

@BerriJ BerriJ commented Nov 2, 2021

Attempts to fix #351

This correctly passes the dims of nested lists when arma::field is specified as input (so field from R to Cpp).

This implementation works with 1d and 2d fields. The generalization to 3d seems very straightforward. However, just adding a third loop and extending the dims results in an error at runtime: "product [x] do not match the length of object [x]".

This problem with 3d arrays most probably also affects "FieldImporter" (so field from C++ to R) since the following function fails with the same error:

#include <RcppArmadillo.h>
using namespace arma;

// [[Rcpp::export]]
field<mat> fieldtest1()
{
    field<mat> out(5, 5, 5);

    return out;
}

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 3, 2021

So heres an Overview of the progress so far. Consider this Cpp Function:

// [[Rcpp::export]]
arma::field<arma::mat> fieldtest2(arma::field<arma::mat> x)
{
    return x;
}

Creating a simple 1d list of matrices works as expected:

[R]  A <- matrix(1, ncol = 2, nrow = 3)
[R]  B <- matrix(4, ncol = 5, nrow = 6)
[R]  L <- list(A,A,A,A,A, B,B,B)
[R]  dim(L) <- c(8,1)

[R] fieldtest2(L)                          
     [,1]      
[1,] numeric,6 
[2,] numeric,6 
[3,] numeric,6 
[4,] numeric,6 
[5,] numeric,6 
[6,] numeric,30
[7,] numeric,30
[8,] numeric,30

You may wonder why I set [R] dim(L) <- c(8,1) and not just [R] dim(L) <- c(8) but c(8,1) is how fields of dimension 1 get passed to R from Cpp.

It also works for 2d lists:

[R] dim(L) <- c(4,2)                       

[R] fieldtest2(L)                          
     [,1]      [,2]      
[1,] numeric,6 numeric,6 
[2,] numeric,6 numeric,30
[3,] numeric,6 numeric,30
[4,] numeric,6 numeric,30

The question remains about how to deal with the 3D case. If we do not find a suitable solution we could fall back to the current implementation of flattening out the field (and maybe return a warning about that?).

@eddelbuettel
Copy link
Member

NB: I have NOT YET looked at your most recent change. Consider this semi-related

Played a bit more with creating field objects based on your first example:

// [[Rcpp::export]]
Rcpp::List fieldtest1() {
    field<cube> zap(1,2,1);
    zap.print("zap");

    field<mat> out(2, 2);
    out.print("out");

    field<vec> boo(2);
    boo.print("boo");

    return Rcpp::List::create(Rcpp::Named("fc") = zap,
                              Rcpp::Named("fm") = out,
                              Rcpp::Named("fv") = boo);
}

The above works, but using, say, field<cube> zap(2,2,2); fails and I now suspect this is due because we force everything to be a matrix (which is actually a bigger RcppArmadillo decision I now regret, we never do the drop=TRUE feature of one-row or one-col returns becoming a vector). So maybe by relaxing this for fields and just returning arrays we would be in business?

There are three tentative #define statements at the end of RcppArmadilloConfig.h

// To return arma::vec or arma::rowvec as R vector (i.e. dimensionless),
// one of the following macro can be defined before including
// RcppArmadillo.h. "ANYVEC" applys for both col- and row-vec.
//#define RCPP_ARMADILLO_RETURN_COLVEC_AS_VECTOR
//#define RCPP_ARMADILLO_RETURN_ROWVEC_AS_VECTOR
//#define RCPP_ARMADILLO_RETURN_ANYVEC_AS_VECTOR

We could see if altering their value helps.

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 3, 2021

Thanks again for your comments. I just pushed some additional changes and now it works.

Previously I missed out on RcppArmadilloWrap.h but adding n_slices there was crucial for importing to R.

I wrote some lines of demo code here.

@eddelbuettel
Copy link
Member

and now it works.

That would be sweet! I was tied up somewhere else (there was for once a bug in the RC for (Rcpp)Armadillo, plus other "stuff") but this is excellent new. I think we can add ample unit tests and illustrations as this matures.

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 3, 2021

I see that CI reports failing tests.

This affects test_rcpparmadillo.R.

At first glance, those are field/dim related and may need to be adjusted.

expect_equal( res[[4]][[1]], matrix(0:3, ncol = 2, nrow=2))#, msg = "field<int>" )
expect_equal( res[[4]][[2]], matrix(letters[1:4], ncol = 2, nrow=2))#, msg = "field<std::string>" ) 

@eddelbuettel
Copy link
Member

Yes, maybe comment the failing tests out 'for now' and add a big fat # TODO: Revisit or # FIXME comment? It is quite possibly that these were set up 'to suit the buggy behavior'.

@coatless
Copy link
Contributor

coatless commented Nov 4, 2021

Regarding the user test, I think that's because the list matrix is now being considered as an array instead of a matrix under the class() check.

https://github.com/RcppCore/RcppArmadillo/runs/4097501712?check_suite_focus=true#step:5:110

   call| expect_equal(res[[4]][[2]], matrix(letters[1:4], ncol = 2, nrow = 2))
   diff| Attributes: < Component "dim": Numeric: lengths (2, 3) differ >
   diff| target is matrix, current is array
  Error: 2 out of 259 tests failed

In R 4.0 higher, this object is considered to inherit both: matrix and array classes.

Quickly checking through an example gives:

A = matrix(1, ncol = 2, nrow = 3)
B = matrix(4, ncol = 5, nrow = 6)

listmatrix = matrix(list( ), 2, 2)

listmatrix[1, 1] = list(A)
listmatrix[1, 2] = list(B)
listmatrix[2, 1] = list(A)
listmatrix[2, 2] = list(B)

listmatrix
#          [,1]      [,2]      
# [1,] numeric,6 numeric,30
# [2,] numeric,6 numeric,30

typeof(listmatrix)
[1] "list"
class(listmatrix)
[1] "matrix" "array" 
# Pass into C++
recovered_listmatrix = arma_field_exporter(listmatrix)

# Check list recovery
class(recovered_listmatrix)
[1] "array"

typeof(recovered_listmatrix)
[1] "list"

I think when I penned #263, I had expected the class() declaration to be in a list form. After a few years of reflection, I'm not sure pursuing an underlying class change to list() would be ideal. Perhaps a better approach that should be taken is a wrap of the arma::field<> with Rcpp::List().

@eddelbuettel
Copy link
Member

In R 4.0 higher, this object is considered to inherit both: matrix and array classes.

Correct but shouldn't it have bitten us then for all that time too? (Maybe I am asleep, and tickling the bug needs our changes hence we only see it now...)

In any event I think there is reasonably little use of fields in the wild (within R land) that we can probably clean up a little. But we should get it right.

(Quick check)

Ok, proving myself wrong with one query. GH mirror of CRAN has 100+ hits for arma::field. So we can't be too radical.

@eddelbuettel
Copy link
Member

@BerriJ : Ok, we are all caught up at CRAN and with Armadillo upstream. Can you please rebase to master ?

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 6, 2021

@BerriJ : Ok, we are all caught up at CRAN and with Armadillo upstream. Can you please rebase to master ?

I did. I also squashed my commits since the first 2 were just fiddling around with it. Now it's one commit with all of my changes.

Ok, proving myself wrong with one query. GH mirror of CRAN has 100+ hits for arma::field. So we can't be too radical.

Maybe we can do a reverse dependency check and look how bad it is? I suspect that most people won't be affected by this change even if they use arma::fields. I think about the following:

  • Most people use arma::fields purely inside of C++ so they won't be affected at all.

  • People who use fields as input arguments should be really rare. Some may use 1d fields as input because it works as expected, but they should not be affected by this change at all. I don't think that anyone passes 2d or 3d lists to an arma::field. First, because the 2nd and 3rd dimensions will get lost, and second because the creation in R is not easy and I suspect that many people do not know about 2d and 3d lists in R.

  • People who return arma::field to R will get the biggest hit. Currently, they do this using 1d and 2d fields only since 3d fields do not work. With the proposed changes in RcppArmadilloWrap.h they will get another dimension. So depending on if (and how) they use those output objects they may have to add an extra comma. But that should be all they have to do.

@eddelbuettel
Copy link
Member

Maybe we can do a reverse dependency check and look how bad it is?

Yup. I have the machinery for it, and fresh baselines from the three tests I ran for RcppArmadillo 0.10.7.3.0 its predecessor release candidates. Stay tuned!

@eddelbuettel
Copy link
Member

So the test run and the result has been comitted as usual with the summary in the table below.

That is a lot of shrapnel. We cannot merge the PR as is. I have been thinking about this while it was running and the results were accumulating towards this outcome: could you possibly rejig the fix in the PR such that it is opt-in? In other words set up a #define to switch in the two headers (or even switch within the two -- for wrap it seems easy to do in the file; for as we could do either way--in the file maybe simpler but less clear to read?).

That way you get to opt into new and corrected fields behaviour, and so can other if they choose but by default nothing changes. We can then contact maintainer and suggest updates. (I am currently doing this (since April !!) for Rcpp here and need to do one for RcppEigen to get to 3.4.0 so this won't be imminent but it is both doable and should be done.)

Test of RcppArmadillo 0.10.7.3.1 had 893 successes, 26 failures, and 8 skipped packages. 
Ran from 2021-11-06 08:48:31.83 to 2021-11-06 21:30:47.01 for 12.704 hours 
Average of 49.337 secs relative to 294.11 secs using 6 runners

Failed packages:  ADMM, AlphaSimR, Bayesrel, BGVAR, carat, cIRT, EMbC, FLightR, FMCCSD, grpsel, HDSpatialScan, higlasso, kernelPSI, meshed, netClust, NetMix, odpc, pdSpecEst, perccal, psgp, quanteda.textstats, rstpm2, simts, spamtree, T4transport, wv 

Skipped packages:  BMTME, Crossover, DataVisualizations, dynamichazard, growfunctions, joineRML, nlmixr, Rfast 

None still working

None still scheduled

Error summary:
               package missingPkg badInstall error fail warn note ok hasOtherIssue
 1:               ADMM                 FALSE     1    0    0    3 10         FALSE  likely new issue
 2:          AlphaSimR                 FALSE     0    0    0    6  8         FALSE  likely new issue
 3:           Bayesrel                 FALSE     0    0    0    0 14         FALSE  known earlier issue
 4:              BGVAR                 FALSE     4    0    0   10  0         FALSE  upstream, passes with new version
 5:              carat                 FALSE     0    0    0    4 10         FALSE  likely new issue
 6:               cIRT                 FALSE     0    0    0    1 13         FALSE  likely new issue
 7:               EMbC                 FALSE     0    0    0    0 14         FALSE  known earlier issue 
 8:            FLightR                 FALSE     0    0    0    7  7         FALSE  known earlier issue
 9:             FMCCSD                 FALSE     0    0    0    0 14         FALSE  likely new issue
10:             grpsel                 FALSE     0    0    0    1 13         FALSE  likely new issue
11:      HDSpatialScan                  TRUE     0    0    0    0 14         FALSE  known earlier issue
12:           higlasso                 FALSE     0    0    0    0 14         FALSE  likely new issue
13:          kernelPSI                 FALSE     0    0    0    0 14         FALSE  likely new issue
14:             meshed                 FALSE     1    0    0    6  7         FALSE  likely new issue
15:           netClust                 FALSE     0    0    0    3 11         FALSE  likely new issue
16:             NetMix                 FALSE     0    0    0    0 14         FALSE  likely new issue
17:               odpc                 FALSE     0    0    0    0 14         FALSE  likely new issue
18:          pdSpecEst                 FALSE     0    0    0   13  1         FALSE  likely new issue
19:            perccal                 FALSE     0    0    0    2 12         FALSE  likely new issue
20:               psgp                  TRUE     0    0    0    4 10         FALSE  known earlier issue
21: quanteda.textstats                 FALSE    10    0    0    2  2         FALSE  known earlier issue
22:             rstpm2                 FALSE     0    0    0    6  8         FALSE  known earlier issue
23:              simts                 FALSE     0    0    0    7  7         FALSE  likely new issue
24:           spamtree                 FALSE     1    0    0    4  9         FALSE  likely new issue
25:        T4transport                 FALSE     0    0    0    0 14         FALSE  likely new issue
26:                 wv                 FALSE     0    0    0    7  7         FALSE  likely new issue
               package missingPkg badInstall error fail warn note ok hasOtherIssue

Copy link
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work also to re-enable the tests/

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 7, 2021

Nice work also to re-enable the tests/

Thank you. So I simply defined 2 variables:
// #define RCPP_ARMADILLO_FIX_FieldExporter
// #define RCPP_ARMADILLO_FIX_FieldImporter
to control the behavior in the 2 files respectively.

However, I could not test it. I tried simply running those functions:

test.cpp:

#define RCPP_ARMADILLO_FIX_FieldExporter
#define RCPP_ARMADILLO_FIX_FieldImporter

#include <RcppArmadillo.h>
using namespace arma;

// [[Rcpp::export]]
arma::field<arma::mat> fieldtest1()
{
    arma::field<arma::mat> out(5, 5);

    return out;
}

// [[Rcpp::export]]
arma::field<arma::mat> fieldtest2(arma::field<arma::mat> x)
{

    arma::cout << x.n_rows << arma::endl;
    arma::cout << x.n_cols << arma::endl;
    arma::cout << x.n_slices << arma::endl;

    return x;
}

after executing devtools::load_all(). Probably my setup does not catch that #include <RcppArmadillo.h> late enough. Putting those #define statements in the 2 files directly worked fine. Maybe you have an idea how you would test that?

I hope I got it. Ci looks good at least 👍 .

Also, it would be very interesting to make 2 additional reverse dep checks to see which change is more problematic.

Thanks so far, Dirk.

@eddelbuettel
Copy link
Member

Maybe you have an idea how you would test that?

We should be able to add something. One approach is simply ... another cpp source file with the #defines set as you have above. I should be able to probe (and we can add that post merge).

New reverse depends run started and now past the first former failure so fingers crossed ...

@coatless
Copy link
Contributor

coatless commented Nov 7, 2021

So the test run and the result has been comitted as usual with the summary in the table below.

Odd to see cIRT appear under that header. Could this be picking up the new macos m1 check notes on CRAN?

@eddelbuettel
Copy link
Member

Absolutely not. The rev.dep box is a plain standard Linux machine with an Intel CPU. Always has been. What happens in other run on other machines under other OSs is, well, something else.

It is a regular error likely caused by the field changes:

Running examples in ‘cIRT-Ex.R’ failed
The error most likely occurred in:

> ### Name: direct_sum
> ### Title: Direct Sum of Matrices
> ### Aliases: direct_sum
> 
> ### ** Examples
> 
> 
> x = list(matrix(0, nrow = 5, ncol = 3),
+          matrix(1, nrow = 5, ncol = 3))
> direct_sum(x)
Error in direct_sum(x) : 
  Not compatible with requested type: [type=NULL; target=integer].
Execution halted
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ... SKIPPED
* checking re-building of vignette outputs ... SKIPPED
* DONE
Status: 1 ERROR, 1 NOTE

@eddelbuettel
Copy link
Member

Last rev.dep was of course again very clean, see its commit

So I think we're ready, modulo maybe reflecting on the previous comment. Do you want to fold that in?

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 8, 2021

I would propose to merge as is. The main functionality of exporting and importing fields works fine and always returning fields with 3 dimensions is something most people can easily live with I think.

One could think about adding an additional flag for dropping dimensions (something like drop=TRUE in R). But I would propose to postpone this for now since I'm quite busy.

@eddelbuettel
Copy link
Member

Hi @BerriJ sorry for this taking a moment but we may as well get it right. I just committed (to master) a file local/armafield.cpp which creates a few field objects, prints them to stdout and tries to return them to R. The last one a 3-d field, and this still fails (under master and under your PR), and I think it shouldn't.

@coatless
Copy link
Contributor

coatless commented Nov 9, 2021

@eddelbuettel agreed. The proposed change is still a bit problematic. There shouldn't be a reason for why cIRT::direct_sum() is failing as that routine interacts with a field object using only index notation.

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 9, 2021

Hi @BerriJ sorry for this taking a moment but we may as well get it right. I just committed (to master) a file local/armafield.cpp which creates a few field objects, prints them to stdout and tries to return them to R. The last one a 3-d field, and this still fails (under master and under your PR), and I think it shouldn't.

Did you maybe miss executing a git push? I can't find the commit you refer to. Looked here.

@eddelbuettel
Copy link
Member

I did indeed, sorry for that and thanks for the heads-up. Sometimes I think the 'sequence' is muscle memory in magit but every now and it isn't.

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 9, 2021

I'm pretty shure I got cIRT::direct_sum() working now. The problem was that it crashed at arma::ivec dims = data.attr("dim"); for List inputs that had dim() = NULL. I just added a check > if(!Rf_isNull(data.attr("dim")))

If dims exist it will set the field size according to dims, else it will just make a 1D field with length of the list (basically the old behavior).

I still have to look into your file @eddelbuettel .

@eddelbuettel
Copy link
Member

I just added the two define statements and we now get (for the last example that previously failed)

> str(a3)
List of 12
 $ : num[0 , 0 ] 
 $ : num[0 , 0 ] 
 $ : num [1:2, 1:3] -1.024 1.534 -0.894 -0.49 0.597 ...
 $ : num[0 , 0 ] 
 $ : num[0 , 0 ] 
 $ : num [1:2, 1:4] 0.0648 -0.2241 1.5294 -0.4769 0.3206 ...
 $ : num[0 , 0 ] 
 $ : num [1:3, 1:2] 1.4157 1.6768 -0.0524 -0.1988 -1.3985 ...
 $ : num[0 , 0 ] 
 $ : num[0 , 0 ] 
 $ : num[0 , 0 ] 
 $ : num[0 , 0 ] 
 - attr(*, "dim")= int [1:3] 2 3 2
> 
> class(a3)
[1] "array"
> dim(a3)
[1] 2 3 2
> 

The indexing is still ... awkward but I am also not that familiar with arrays where we have more than two dimensions.
which is so much better.

@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 9, 2021

Sorry for adding another commit. I realized that I had left #define RCPP_ARMADILLO_FIX_FieldExporter in RcppArmadilloAs.h.
The code in local/armafield.cpp now runs exactly as I expect it to.

So as far as I see we are done now? Do you want to run the rev dep checks another time?

@eddelbuettel
Copy link
Member

Yes I think we're there, or very close.

The #ifdef protection should mean to harm over at the CRAN side and those who want it (you ;-) ) can opt into it now. We can then slowly and patiently email those with failing packages and suggest they turn the #define and adjust their code so that "eventually" we can turn it on by default.

@eddelbuettel
Copy link
Member

Oh, the one thing I forgot: as I made one change to master adding the fragment of a to-be-refined test script, could you be please rebase once more?

This the dims of arma::field when specified as input or output.

This implementation works with 1d, 2d and 3d fields.
This also fixes a problem occuring with cIRT::direct_sum()
@BerriJ
Copy link
Contributor Author

BerriJ commented Nov 9, 2021

Oh, the one thing I forgot: as I made one change to master adding the fragment of a to-be-refined test script, could you be please rebase once more?

I did, no problem. :-)

@eddelbuettel
Copy link
Member

Thanks for your patience and diligence on this!

@eddelbuettel eddelbuettel merged commit 018a8be into RcppCore:master Nov 10, 2021
eddelbuettel added a commit that referenced this pull request Nov 10, 2021
eddelbuettel added a commit that referenced this pull request Nov 10, 2021
@BerriJ BerriJ deleted the fix-field_dims branch November 10, 2021 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loss of dim 2 when using field as input
3 participants