Remove nrow/ncol #406

johnmyleswhite · 2013-11-11T00:16:35Z

We should remove nrow and ncol. Everything should be done with size(df).

The text was updated successfully, but these errors were encountered:

milktrader · 2013-12-09T00:13:58Z

So how will loop code look like?

for i in 1:nrow(df) => for i in 1:size(df)[1] ?

milktrader · 2013-12-09T02:02:58Z

btw, the vi code to replace nrow occurrences would be:

:%s/nrow(\(\w\+\))/size(\1)[1]

So I suppose I'll start migrating over sooner rather than later.

johnmyleswhite · 2013-12-09T04:02:14Z

You should use for i in 1:size(df, 1).

milktrader · 2013-12-09T14:19:51Z

Okay, that's better.

:%s/nrow(\(\w\+\))/size(\1, 1)

ararslan · 2016-08-17T20:16:00Z

I'm surprised this hasn't been done in the past 3 years since this issue was brought up. I definitely agree that we should be using size everywhere rather than nrow/ncol.

andreasnoack · 2016-08-17T23:40:33Z

I'm not sure about this. I don't think it is a good idea to make DataFrames too similar to arrays. It's better to think of them as a table and therefore we might not want to follow the AbstractArray interface.

I should explain what I mean by better here. If a DataFrame is too similar to an array then users might start to use in the same way and that would give a bad experience with slow execution and uninferred return values.

ararslan · 2016-08-18T02:31:57Z

Hm, that's a very good point.

nalimilan · 2016-08-18T15:11:38Z

OTOH, nrow and ncol clearly use the matrix vocabulary. I think we should either move to size, or use a database vocabulary like nobs and nvar.

phaverty · 2016-08-18T16:14:38Z

nrow and ncol will be very familiar to people coming from R, who are likely
to make up a large proportion of the DataFrames users. I would recommend
leaving nrow an ncol in.

Pete

Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty@gene.com

On Thu, Aug 18, 2016 at 8:11 AM, Milan Bouchet-Valat <
notifications@github.com> wrote:

OTOH, nrow and ncol clearly use the matrix vocabulary. I think we should
either move to size, or use a database vocabulary like nobs and nvar.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#406 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH02KwsFj4d_ebDWSY0p-ArvnyJ2Pj3Dks5qhHYsgaJpZM4BMPWe
.

nalimilan · 2016-08-18T19:39:05Z

@phaverty While we certainly take it into account, the fact that something exists in a language is never a sufficient argument by itself for including something in Julia. The most important question is whether it makes for a good and consistent design in Julia.

ararslan · 2016-08-18T19:42:20Z

nobs/nvar seems like a good compromise to me between avoiding array-specific terminology and allowing clear, familiar names for accessing these dimensions.

quinnj · 2016-08-18T20:42:55Z

I'm not a fan of nobs personally. I personally like size because of it's julian heritage at this point, I think there are enough differentiators from AbstractArray that DataFrames won't be confusing. I like that size also encourages getting the # of rows/columns in a single call as well. I vote we deprecate/remove nrow/ncol, and just keep size.

johnmyleswhite · 2016-08-18T20:44:05Z

+1 for Jacob's proposal. Adding new names seems superfluous.

ararslan · 2016-08-18T20:44:15Z

I'd be fine with that as well.

milktrader · 2016-08-18T21:43:48Z

Also not in favor af new names at this point. Just because ncol/nrow are used in R doesn't mean we have to be different. I prefer it slightly to size because it is more natural to a table vs array interface.

andreasnoack · 2016-08-18T21:50:21Z

I think @nalimilan is right that nrow and ncol are as matrix like as size but I don't like the symmetry in size(DataFrame, 1) and size(DataFrame, 2) (not so speak about size(DataFrame, 7)==1?) . Pretending that the two dimensions are similar is not doing anybody a favor. If size is julian heritage so is mean(DataFrame, 2) which would be a really bad idea. My point is that we should try to communicate as much as possible, including the choice of function names, that DataFrames are not arrays. Finally, deprecating any or all of nrow, ncol and size shouldn't be disruptive. It would be a simple search/replace change and easy to add deprecation warnings for.

johnmyleswhite · 2016-08-18T21:58:18Z

If this change is part of focusing on tables, should this really be part of the API at all? Computing nrow involves SELECT COUNT(1) FROM tbl, which isn't generally a near zero-cost operation.

quinnj · 2016-08-18T22:00:59Z

^ that's a good point John. Currently in DataStreams for a Data.Schema, a Source or Sink is allowed to return -1 rows, which indicates an unknown number of rows (and Data.stream! methods need to be prepared to handle these cases).

andreasnoack · 2016-08-18T22:04:13Z

which isn't generally a near zero-cost operation

Is that a requirement?

johnmyleswhite · 2016-08-18T22:07:06Z

I don't know. It might be surprising to people if, say, the REPL blocked for a long time when you run nrow(tbl). If we end up supporting something like Hive, that operation might block for many minutes.

andreasnoack · 2016-08-18T22:16:22Z

I'm also not sure. My gut feeling is that it's okay that the costs are higher for this operation but I think it's useful to include such large examples in the API discussion.

quinnj · 2017-09-07T04:13:26Z

We should still deprecate nrow, ncol in favor of size.

nalimilan · 2017-09-07T12:59:31Z

Adding to milestone before we forget since that's easy to do.

nalimilan · 2017-10-08T13:40:59Z

I just bumped into a previous discussion from... 2013:
https://groups.google.com/d/msg/julia-stats/2EzVzAtrP9Y/NY6dkX4gbHUJ

Nosferican · 2017-11-07T21:26:58Z

+1 in favor of size.

HarlanH · 2017-11-16T21:28:23Z

from that 2013 discussion, reminder for this issue that size(df, :nobs) or size(df, :nrow) were also suggested, either instead of or in addition to size(df, 1).

bkamins · 2019-07-25T01:17:53Z

I am closing this as my feeling is that after the discussions nrow and ncol will stay (at least for 1.0).
The major reasons are:

row and column are natural names (even Base now uses eachrow and eachcol as do we + we have DataFrameRow type),
it is easy to learn to use nrow and ncol (many people know it from R - although it is a minor reason, it is shorter to type, anyone, even beginner, immediately will know what it means)
and it really does not make much harm that we have it IMO (it will not confuse anyone I think nor conflict with anything; even if we removed it we would keep a long-term deprecation).

Feel free to reopen it if you disagree (I just want to clean up pending issues where I feel a conclusion was reached).

quinnj · 2019-07-25T01:35:15Z

I support @bkamins here.

ararslan · 2019-07-25T15:38:22Z

#i'mwithbogamil

nalimilan mentioned this issue Jul 19, 2017

length(::DataFrame) returns number of columns #1200

Closed

nalimilan added this to the 0.11 milestone Sep 7, 2017

garborg mentioned this issue Sep 8, 2017

Deprecate length, nrow, and ncol on DataFrames in favor of size. Fixe… #1224

Closed

nalimilan removed this from the 0.11 milestone Nov 23, 2017

nalimilan added this to the 0.12 milestone Nov 23, 2017

nalimilan mentioned this issue Sep 18, 2018

Review row vs. column orientation of API #1514

Closed

6 tasks

bkamins mentioned this issue Jan 15, 2019

DataFrames.jl roadmap #1678

Closed

31 tasks

nalimilan modified the milestones: 0.12, 1.0 Jan 25, 2019

bkamins closed this as completed Jul 25, 2019

jonas-schulze mentioned this issue May 12, 2020

Rename nrow to nrows, ncol to ncols #2247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove nrow/ncol #406

Remove nrow/ncol #406

johnmyleswhite commented Nov 11, 2013

milktrader commented Dec 9, 2013

milktrader commented Dec 9, 2013

johnmyleswhite commented Dec 9, 2013

milktrader commented Dec 9, 2013

ararslan commented Aug 17, 2016

andreasnoack commented Aug 17, 2016 •

edited

Loading

ararslan commented Aug 18, 2016

nalimilan commented Aug 18, 2016

phaverty commented Aug 18, 2016

nalimilan commented Aug 18, 2016

ararslan commented Aug 18, 2016

quinnj commented Aug 18, 2016

johnmyleswhite commented Aug 18, 2016

ararslan commented Aug 18, 2016

milktrader commented Aug 18, 2016

andreasnoack commented Aug 18, 2016

johnmyleswhite commented Aug 18, 2016

quinnj commented Aug 18, 2016

andreasnoack commented Aug 18, 2016

johnmyleswhite commented Aug 18, 2016

andreasnoack commented Aug 18, 2016

quinnj commented Sep 7, 2017

nalimilan commented Sep 7, 2017

nalimilan commented Oct 8, 2017

Nosferican commented Nov 7, 2017

HarlanH commented Nov 16, 2017

bkamins commented Jul 25, 2019

quinnj commented Jul 25, 2019

ararslan commented Jul 25, 2019

Remove nrow/ncol #406

Remove nrow/ncol #406

Comments

johnmyleswhite commented Nov 11, 2013

milktrader commented Dec 9, 2013

milktrader commented Dec 9, 2013

johnmyleswhite commented Dec 9, 2013

milktrader commented Dec 9, 2013

ararslan commented Aug 17, 2016

andreasnoack commented Aug 17, 2016 • edited Loading

ararslan commented Aug 18, 2016

nalimilan commented Aug 18, 2016

phaverty commented Aug 18, 2016

nalimilan commented Aug 18, 2016

ararslan commented Aug 18, 2016

quinnj commented Aug 18, 2016

johnmyleswhite commented Aug 18, 2016

ararslan commented Aug 18, 2016

milktrader commented Aug 18, 2016

andreasnoack commented Aug 18, 2016

johnmyleswhite commented Aug 18, 2016

quinnj commented Aug 18, 2016

andreasnoack commented Aug 18, 2016

johnmyleswhite commented Aug 18, 2016

andreasnoack commented Aug 18, 2016

quinnj commented Sep 7, 2017

nalimilan commented Sep 7, 2017

nalimilan commented Oct 8, 2017

Nosferican commented Nov 7, 2017

HarlanH commented Nov 16, 2017

bkamins commented Jul 25, 2019

quinnj commented Jul 25, 2019

ararslan commented Jul 25, 2019

andreasnoack commented Aug 17, 2016 •

edited

Loading