Deprecate length, nrow, and ncol on DataFrames in favor of size. Fixe… #1224

quinnj · 2017-09-08T03:26:09Z

nalimilan · 2017-09-08T07:10:40Z

+1 for deprecating nrow and ncol, but I'm not so sure about length. As noted in #1200, it's consistent with df[i], so as long as we support doing that I'd rather keep length.

garborg · 2017-09-08T12:17:00Z

I'd have thought deprecating size would have been more likely at this point.

I thought the move to treat DataFrames as matrix-like had fizzled out. length and ncols, for example, would be more of a fit for DataFrames that are conceptually a collection of namedtuples.

quinnj · 2017-09-08T12:29:18Z

I think there's been more consensus recently to go with size as a more "Julian" idiom rather than as a reason to treat DataFrames as matrix-like, though there's also JuliaLang/julia#22665, where it was debated to change size to shape. We'd probably want to switch to shape if that ends up happening in Base (though it's unclear if anything will get changed there).

I don't see what the big fuss is about deprecating length(df); at this point, it's just the more confusing thing between length(df) and df[1], mainly, IMO, because df[:col] has been such a staple in DataFrames for a long time. It can also be the "step 1" in transitioning DataFrame to be more of a "bag of namedtuples" representation rather than the bag-of-columns type of definition it's always had.

nalimilan · 2017-09-08T12:41:47Z

I don't see what the big fuss is about deprecating length(df); at this point, it's just the more confusing thing between length(df) and df[1], mainly, IMO, because df[:col] has been such a staple in DataFrames for a long time. It can also be the "step 1" in transitioning DataFrame to be more of a "bag of namedtuples" representation rather than the bag-of-columns type of definition it's always had.

I'm not sure what you mean here. Can you elaborate?

nalimilan · 2017-09-08T12:46:21Z

src/dataframe/dataframe.jl

-ncol(df::DataFrame) = length(index(df))
+Base.size(df::DataFrame) = ((length(df.columns) > 0 ? length(df.columns[1]) : 0),
+                            length(index(df)))
+Base.size(df::DataFrame, i::Int) = i == 1 ? 


Would be cleaner with an if... else block (with a long-form function definition), the indentation a bit weird here. BTW, could print "$i" in the error message, that's always useful to understand what went wrong.

garborg · 2017-09-08T15:31:07Z

@quinnj I agree with the issues you brought up about length being problematic as a replacement for nrows. Take back my suggestion of using ncols and length, and replace it with using ncols and nrows (or whatever else).

size would be more julian if it wasn't a pun, but I don't see size(::Collection{NamedTuple}, 2) returning the number of fields the elements have for other collections of records.

Leaning solely on a couple (maybe gross) non-base names (like ncols), would make the next step easier if it turns out that this change is is not the end state.

Six in one hand, half dozen in the other -- just wanted to point out that size seems really punny to me, as I was MIA during #1200 and missed pointing out then that the punniness is why #406 never happened in the couple years after the issue was opened.

rofinn · 2017-09-08T15:41:27Z

FWIW, I like just having size cause it's simple and intuitive. I've found nrow and eachrow hard to remember in the julia ecosystem (e.g., length(rows(df)) and length(columns(df)) always seemed more intuitive to me). As was discussed in #1200, it is unclear what length should do.

Columns: This is consistent with df[i], but I'd argue that we should drop df[i] since it isn't obvious whether that returns a column or row.
Rows: This is consistent if you view as dataframe as just a collection of rows (or named tuples), but that seems better handled by length(rows(df)).
Elements (e.g., length(df) = *(size(df)...)): Consistent with the array interface in base, but might not make sense for the table concept.

As a result, I'm pretty okay with not supporting any of the options for now.

garborg · 2017-09-08T16:20:38Z

size is consistent with how indexing into a cell with row & col happens, too, to argue against my previous comments.

Nectarineimp · 2017-09-08T16:35:58Z

I haven't done much but lurk and submit one PR, but on this issue I will say this - nrow/ncol are nice, but separate operations. A data frame is not a matrix or an array. It has area, so to speak. size() gets you everything you need in an easy to parse result. From my experience, simpler and complete usually wins in design. Compatibility with other packages and aspects of Julia are, of course, major considerations. I am not sure there is enough concern here about compatibility to disqualify size, no?

nalimilan · 2017-09-11T12:53:01Z

@quinnj How about merging the deprecation of nrow and ncol now? Then we can discuss whether to keep length or not.

quinnj · 2017-09-11T12:55:58Z

Meh, if JuliaLang/julia#22665 settles on size vs. shape then I think it's worth deprecating, but until that's settled, I'm not sure it's worth another potential deprecation later.

nalimilan · 2017-09-11T13:14:46Z

Doesn't seem size is going to be deprecated. The only option that appears to be seriously considered is to replace indices with shape.

coveralls · 2017-09-11T13:26:34Z

Changes Unknown when pulling ba396e6 on jq/len into ** on master**.

randyzwitch · 2017-09-21T01:21:05Z

Might be late to the party on this, but I'm a fan of nrow, one because R does it, but more conceptually, almost every day I sample a DataFrame (or limit in select query), so I quite frequently think of "n(umber of) rows". Whereas I've never cared the number of columns, because they are named. Which is to say, I don't sample across columns or features (unless random forest is doing it for me), but limiting the number of rows is an obvious use case.

size to me feels much more of memory usage than the (# rows, #cols) tuple.

bkamins · 2019-02-06T08:33:21Z

Now we have length deprecated.

So the decision is do we:

retain nrow and ncol
deprecate them in favor of size?

In both cases this PR should probably be closed, but if we want to deprecate them I would open a new PR only with this deprecation.

nalimilan · 2019-02-06T08:51:16Z

I'd rather keep them. Since we've decided data frames are collections of rows, their dimensions are not really symmetric like matrices, so it would be weird to have to use size.

bkamins · 2019-02-06T08:52:09Z

Fine with me.

bkamins · 2019-02-06T08:52:18Z

So close this?

quinnj added 2 commits September 7, 2017 21:25

Deprecate length, nrow, and ncol on DataFrames in favor of size. Fixes …

15c3b37

…#1200

Merge branch 'master' into jq/len

0f11789

quinnj mentioned this pull request Sep 8, 2017

length(::DataFrame) returns number of columns #1200

Closed

Fix failing 0.6 tests

ba396e6

nalimilan reviewed Sep 8, 2017

View reviewed changes

quinnj closed this Sep 11, 2017

nalimilan reopened this Sep 11, 2017

spurll mentioned this pull request Sep 11, 2017

isempty checks number of columns, rather than number of rows #1230

Closed

nalimilan mentioned this pull request Jul 11, 2018

Feature Request: subset rows when boolean vector passed alone #1445

Closed

nalimilan mentioned this pull request Nov 13, 2018

Review row vs. column orientation of API #1514

Closed

6 tasks

bkamins mentioned this pull request Jan 15, 2019

DataFrames.jl roadmap #1678

Closed

31 tasks

nalimilan closed this Feb 6, 2019

nalimilan deleted the jq/len branch February 6, 2019 09:17

jonas-schulze mentioned this pull request May 12, 2020

Rename nrow to nrows, ncol to ncols #2247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate length, nrow, and ncol on DataFrames in favor of size. Fixe… #1224

Deprecate length, nrow, and ncol on DataFrames in favor of size. Fixe… #1224

quinnj commented Sep 8, 2017

nalimilan commented Sep 8, 2017

garborg commented Sep 8, 2017

quinnj commented Sep 8, 2017

nalimilan commented Sep 8, 2017

nalimilan Sep 8, 2017

garborg commented Sep 8, 2017 •

edited

Loading

rofinn commented Sep 8, 2017 •

edited

Loading

garborg commented Sep 8, 2017

Nectarineimp commented Sep 8, 2017

nalimilan commented Sep 11, 2017

quinnj commented Sep 11, 2017

nalimilan commented Sep 11, 2017

coveralls commented Sep 11, 2017 •

edited

Loading

randyzwitch commented Sep 21, 2017 •

edited

Loading

bkamins commented Feb 6, 2019

nalimilan commented Feb 6, 2019

bkamins commented Feb 6, 2019

bkamins commented Feb 6, 2019

Deprecate length, nrow, and ncol on DataFrames in favor of size. Fixe… #1224

Deprecate length, nrow, and ncol on DataFrames in favor of size. Fixe… #1224

Conversation

quinnj commented Sep 8, 2017

nalimilan commented Sep 8, 2017

garborg commented Sep 8, 2017

quinnj commented Sep 8, 2017

nalimilan commented Sep 8, 2017

nalimilan Sep 8, 2017

Choose a reason for hiding this comment

garborg commented Sep 8, 2017 • edited Loading

rofinn commented Sep 8, 2017 • edited Loading

garborg commented Sep 8, 2017

Nectarineimp commented Sep 8, 2017

nalimilan commented Sep 11, 2017

quinnj commented Sep 11, 2017

nalimilan commented Sep 11, 2017

coveralls commented Sep 11, 2017 • edited Loading

randyzwitch commented Sep 21, 2017 • edited Loading

bkamins commented Feb 6, 2019

nalimilan commented Feb 6, 2019

bkamins commented Feb 6, 2019

bkamins commented Feb 6, 2019

garborg commented Sep 8, 2017 •

edited

Loading

rofinn commented Sep 8, 2017 •

edited

Loading

coveralls commented Sep 11, 2017 •

edited

Loading

randyzwitch commented Sep 21, 2017 •

edited

Loading