Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds size attribute aliases for DataFrame (closes #630) #638

Merged
merged 3 commits into from
Jan 23, 2017
Merged

Adds size attribute aliases for DataFrame (closes #630) #638

merged 3 commits into from
Jan 23, 2017

Conversation

coatless
Copy link
Contributor

No description provided.

@codecov-io
Copy link

codecov-io commented Jan 21, 2017

Current coverage is 92.50% (diff: 100%)

Merging #638 into master will not change coverage

@@             master       #638   diff @@
==========================================
  Files            65         65          
  Lines          3309       3309          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           3061       3061          
  Misses          248        248          
  Partials          0          0          

Powered by Codecov. Last update 271148a...e9db599

@@ -83,6 +83,26 @@ namespace Rcpp{
return LENGTH(rn);
}

// Leave in place for a year
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have consensus that we what or need that.

I presume I could get my arm twisted to allow nrow and nrows; I see nothing but needed pain in removing the old one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal behind the PR was to align the DataFrame class to match with:

  1. the R functions ncol() and nrow();
  2. the Matrix class member functions .nrow(), .ncol(), .rows(), .cols().

In an ideal world, the old interface probably should be cleaned up as a result. But, it most certainly can be left in place and the nrows() aspect can be omitted from documentation.

By my count, the following ~20 packages on CRAN would be affected:

  • BacArena
  • Biocomb
  • Cyclops
  • deepboost
  • dplyr
  • FeatureHashing
  • FIT
  • feather
  • gaston
  • medfate
  • missDeaths
  • Rfast
  • Rborist
  • raptr
  • readr
  • readstat13
  • reshape2
  • tweenr
  • valr
  • wsrf

Copy link
Member

@eddelbuettel eddelbuettel Jan 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no interest / time / energy to chase twenty package maintainers over essentially nothing.

I have tried to get five to change an error of theirs for three weeks now and only got half. No, we are not going to break that interface.

Additional accessors are perfectly fine.

@eddelbuettel
Copy link
Member

I took the language about deprecation out and simplified a little. Now it is a straight-up extension of the API, plus extra tests, and should not create any trouble.

Second set of eyes anyone? @thirdwing @nathan-russell @kevinushey @jjallaire ?

@kevinushey
Copy link
Contributor

LGTM. As an aside, these APIs should probably be returning R_xlen_t, but that's probably for a separate PR.

@eddelbuettel
Copy link
Member

Good catch. The actual worker function here still uses LENGTH(), and that should be XLENGTH() with the commensurate change in returning R_xlen_t.

@coatless
Copy link
Contributor Author

coatless commented Jan 23, 2017

The following seems to work:

        inline R_xlen_t nrow() const {
            SEXP rn = R_NilValue ;
            SEXP att = ATTRIB( Parent::get__() )  ;
            while( att != R_NilValue ){
                if( TAG(att) == R_RowNamesSymbol ) {
                    rn = CAR(att) ;
                    break ;
                }
                att = CDR(att) ;
            }
            if (Rf_isNull(rn))
                return 0;
            if (TYPEOF(rn) == INTSXP && XLENGTH(rn) == 2 && INTEGER(rn)[0] == NA_INTEGER)
                return abs(INTEGER(rn)[1]);
            return XLENGTH(rn);
        }

        // Offer multiple variants to accomodate both old interface here and signatures in other classes
        inline R_xlen_t nrows() const { return DataFrame_Impl::nrow(); }
        inline R_xlen_t rows()  const { return DataFrame_Impl::nrow(); }

        inline R_xlen_t ncol()  const { return DataFrame_Impl::length(); }
        inline R_xlen_t cols()  const { return DataFrame_Impl::length(); }

Given the above discussion, I'll toss the above in a new issue and then submit a PR with the above change. Or, should I just submit the code change within here and add a new note to NEWS?

@kevinushey
Copy link
Contributor

kevinushey commented Jan 23, 2017

Sorry, looks like I'm not quite right -- while the number of columns (length) can indeed be a 'long' vector, the row.names attribute is still expected to be a 'short' vector. From

https://github.com/wch/r-source/blob/45576f65c403f46e03b88e4e0301bde330aea3be/src/main/attrib.c#L181-L203

The code for extracting the true length from a potentially 'compact' representation is:

int n = (isInteger(s) && LENGTH(s) == 2 && INTEGER(s)[0] == NA_INTEGER)
    ? INTEGER(s)[1] : (isNull(s) ? 0 : LENGTH(s));
ans = ScalarInteger((type == 1) ? n : abs(n));

So R indeed is still expecting an integer number of rows for data frames, if I understand correctly. I was under the impression that 'long' vectors in data frames had been supported now, but perhaps not (or at least, not for this specific API)

@eddelbuettel
Copy link
Member

eddelbuettel commented Jan 23, 2017

Thank you both for following up. So I guess I'll merge as is...

@coatless
Copy link
Contributor Author

Based on @kevinushey's digging, the aliases for columns should have their return type updated to match the definition out of Vector.

        inline R_xlen_t ncol()  const { return DataFrame_Impl::length(); }
        inline R_xlen_t cols()  const { return DataFrame_Impl::length(); }

@eddelbuettel
Copy link
Member

Thanks for making the change. This should now be good.

@kevinushey
Copy link
Contributor

LGTM as well; thanks!

@eddelbuettel
Copy link
Member

So in it goes -- thanks to @coatless for doing the work and to @kevinushey for being eagle-eyed as usual.

@eddelbuettel eddelbuettel merged commit 5a99a86 into RcppCore:master Jan 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants