Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds size attribute aliases for DataFrame (closes #630) #638

Merged
merged 3 commits into from Jan 23, 2017

Conversation

@coatless
Copy link
Contributor

@coatless coatless commented Jan 21, 2017

No description provided.

@codecov-io
Copy link

@codecov-io codecov-io commented Jan 21, 2017

Current coverage is 92.50% (diff: 100%)

Merging #638 into master will not change coverage

@@             master       #638   diff @@
==========================================
  Files            65         65          
  Lines          3309       3309          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           3061       3061          
  Misses          248        248          
  Partials          0          0          

Powered by Codecov. Last update 271148a...e9db599

@@ -83,6 +83,26 @@ namespace Rcpp{
return LENGTH(rn);
}

// Leave in place for a year

This comment has been minimized.

@eddelbuettel

eddelbuettel Jan 21, 2017
Member

I don't think we have consensus that we what or need that.

I presume I could get my arm twisted to allow nrow and nrows; I see nothing but needed pain in removing the old one.

This comment has been minimized.

@coatless

coatless Jan 21, 2017
Author Contributor

The goal behind the PR was to align the DataFrame class to match with:

  1. the R functions ncol() and nrow();
  2. the Matrix class member functions .nrow(), .ncol(), .rows(), .cols().

In an ideal world, the old interface probably should be cleaned up as a result. But, it most certainly can be left in place and the nrows() aspect can be omitted from documentation.

By my count, the following ~20 packages on CRAN would be affected:

  • BacArena
  • Biocomb
  • Cyclops
  • deepboost
  • dplyr
  • FeatureHashing
  • FIT
  • feather
  • gaston
  • medfate
  • missDeaths
  • Rfast
  • Rborist
  • raptr
  • readr
  • readstat13
  • reshape2
  • tweenr
  • valr
  • wsrf

This comment has been minimized.

@eddelbuettel

eddelbuettel Jan 21, 2017
Member

I have no interest / time / energy to chase twenty package maintainers over essentially nothing.

I have tried to get five to change an error of theirs for three weeks now and only got half. No, we are not going to break that interface.

Additional accessors are perfectly fine.

Simplified
@eddelbuettel
Copy link
Member

@eddelbuettel eddelbuettel commented Jan 23, 2017

I took the language about deprecation out and simplified a little. Now it is a straight-up extension of the API, plus extra tests, and should not create any trouble.

Second set of eyes anyone? @thirdwing @nathan-russell @kevinushey @jjallaire ?

@kevinushey
Copy link
Contributor

@kevinushey kevinushey commented Jan 23, 2017

LGTM. As an aside, these APIs should probably be returning R_xlen_t, but that's probably for a separate PR.

@eddelbuettel
Copy link
Member

@eddelbuettel eddelbuettel commented Jan 23, 2017

Good catch. The actual worker function here still uses LENGTH(), and that should be XLENGTH() with the commensurate change in returning R_xlen_t.

@coatless
Copy link
Contributor Author

@coatless coatless commented Jan 23, 2017

The following seems to work:

        inline R_xlen_t nrow() const {
            SEXP rn = R_NilValue ;
            SEXP att = ATTRIB( Parent::get__() )  ;
            while( att != R_NilValue ){
                if( TAG(att) == R_RowNamesSymbol ) {
                    rn = CAR(att) ;
                    break ;
                }
                att = CDR(att) ;
            }
            if (Rf_isNull(rn))
                return 0;
            if (TYPEOF(rn) == INTSXP && XLENGTH(rn) == 2 && INTEGER(rn)[0] == NA_INTEGER)
                return abs(INTEGER(rn)[1]);
            return XLENGTH(rn);
        }

        // Offer multiple variants to accomodate both old interface here and signatures in other classes
        inline R_xlen_t nrows() const { return DataFrame_Impl::nrow(); }
        inline R_xlen_t rows()  const { return DataFrame_Impl::nrow(); }

        inline R_xlen_t ncol()  const { return DataFrame_Impl::length(); }
        inline R_xlen_t cols()  const { return DataFrame_Impl::length(); }

Given the above discussion, I'll toss the above in a new issue and then submit a PR with the above change. Or, should I just submit the code change within here and add a new note to NEWS?

@kevinushey
Copy link
Contributor

@kevinushey kevinushey commented Jan 23, 2017

Sorry, looks like I'm not quite right -- while the number of columns (length) can indeed be a 'long' vector, the row.names attribute is still expected to be a 'short' vector. From

https://github.com/wch/r-source/blob/45576f65c403f46e03b88e4e0301bde330aea3be/src/main/attrib.c#L181-L203

The code for extracting the true length from a potentially 'compact' representation is:

int n = (isInteger(s) && LENGTH(s) == 2 && INTEGER(s)[0] == NA_INTEGER)
    ? INTEGER(s)[1] : (isNull(s) ? 0 : LENGTH(s));
ans = ScalarInteger((type == 1) ? n : abs(n));

So R indeed is still expecting an integer number of rows for data frames, if I understand correctly. I was under the impression that 'long' vectors in data frames had been supported now, but perhaps not (or at least, not for this specific API)

@eddelbuettel
Copy link
Member

@eddelbuettel eddelbuettel commented Jan 23, 2017

Thank you both for following up. So I guess I'll merge as is...

@coatless
Copy link
Contributor Author

@coatless coatless commented Jan 23, 2017

Based on @kevinushey's digging, the aliases for columns should have their return type updated to match the definition out of Vector.

        inline R_xlen_t ncol()  const { return DataFrame_Impl::length(); }
        inline R_xlen_t cols()  const { return DataFrame_Impl::length(); }
@eddelbuettel
Copy link
Member

@eddelbuettel eddelbuettel commented Jan 23, 2017

Thanks for making the change. This should now be good.

@kevinushey
Copy link
Contributor

@kevinushey kevinushey commented Jan 23, 2017

LGTM as well; thanks!

@eddelbuettel
Copy link
Member

@eddelbuettel eddelbuettel commented Jan 23, 2017

So in it goes -- thanks to @coatless for doing the work and to @kevinushey for being eagle-eyed as usual.

@eddelbuettel eddelbuettel merged commit 5a99a86 into RcppCore:master Jan 23, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.