Add row slicing #110

stefanv · 2015-10-07T01:38:50Z

This lets Tables behave similar to other Python containers, such that:

In [1]: import datascience as ds

In [2]: t = ds.Table((['a','b','c','d'], [1, 2, 3, 4]), ('names', 'counts'))

In [3]: print(t[:2])
names | counts
a     | 1
b     | 2

In [4]: print(t[-1])
names | counts
d     | 4

In [5]: print(t[1:-1])
names | counts
b     | 2
c     | 3

SamLau95 · 2015-10-07T01:53:21Z

This syntax is a tad confusing because we've only had stuff like t['label'] returning an array of the values of that column up to this point. @papajohn any thoughts?

stefanv · 2015-10-07T01:55:48Z

Is there currently another way of easily grabbing a few rows? This is a common enough operation that I presumed I must have missed it.

papajohn · 2015-10-07T01:56:51Z

Nice suggestion, but tables are indexed by column name rather than number. You could change t.columns() and t.rows() to special containers that give you a selected table upon slicing, rather than a list of arrays.

Currently take gives you a few rows, and select gives you a few columns.

SamLau95 · 2015-10-07T01:57:56Z

@stefanv There's Table.take which students have seen before.

stefanv · 2015-10-07T01:59:16Z

Can take receive slices as input?

stefanv · 2015-10-07T02:00:41Z

@papajohn This is an interesting design decision. Isn't it much more common to slice out rows than to select columns?

papajohn · 2015-10-07T02:04:55Z

In our course, the first table manipulation is to create new columns from existing columns. Named columns make the resulting expressions fairly easy to interpret.

E.g. http://data8.org/text/1_data.html#tables

Most of our examples don't pick out rows based on index, but instead based on their contents (e.g., using where) or by sampling.

papajohn · 2015-10-07T02:14:00Z

@stefanv take can take a range rather than a slice. (We actually teach np.arange instead b/c Python 3 ranges take extra explanation.)

deculler · 2015-10-07T02:14:36Z

take by index
where by value

Both produce a new table.
On Oct 6, 2015 7:12 PM, "Stefan van der Walt" notifications@github.com
wrote:

Is there currently another way of easily grabbing a few rows? This is a
common enough operation that I presumed I must have missed it.

—
Reply to this email directly or view it on GitHub
#110 (comment).

stefanv · 2015-10-07T16:53:09Z

@papajohn Thanks for the link to the notes, I see now that the column syntax is often used for operations such as table['diff'] = table['2015'] - table['2012'.

One way to do this would be to repurpose .rows. We could, e.g., do t.rows[:15] and get a table back. t.rows[0] (the current usage) will still be supported.

Indexing by range is probably fine for smaller queries, but becomes expensive for larger ones:

In [1]: import numpy as np

In [2]: x = np.random.random(100000)

In [3]: %timeit x[:50000]
The slowest run took 122.46 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 259 ns per loop

In [4]: z = np.arange(50000)

In [5]: %timeit x[z]
The slowest run took 7.63 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 97.4 µs per loop

Has a design decision been made on whether a take operation should always copy data, or whether tables can re-use underlying column storage? Again, a trade-off between simplicity and memory usage (which can be limiting when working on large datasets).

stefanv · 2015-10-10T16:30:52Z

See #120.

Add row slicing

3b60385

stefanv closed this Oct 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add row slicing #110

Add row slicing #110

Uh oh!

stefanv commented Oct 7, 2015

Uh oh!

SamLau95 commented Oct 7, 2015

Uh oh!

stefanv commented Oct 7, 2015 via email

Uh oh!

papajohn commented Oct 7, 2015

Uh oh!

SamLau95 commented Oct 7, 2015

Uh oh!

stefanv commented Oct 7, 2015 via email

Uh oh!

stefanv commented Oct 7, 2015 via email

Uh oh!

papajohn commented Oct 7, 2015

Uh oh!

papajohn commented Oct 7, 2015

Uh oh!

deculler commented Oct 7, 2015

Uh oh!

stefanv commented Oct 7, 2015

Uh oh!

stefanv commented Oct 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add row slicing #110

Add row slicing #110

Uh oh!

Conversation

stefanv commented Oct 7, 2015

Uh oh!

SamLau95 commented Oct 7, 2015

Uh oh!

stefanv commented Oct 7, 2015 via email

Uh oh!

papajohn commented Oct 7, 2015

Uh oh!

SamLau95 commented Oct 7, 2015

Uh oh!

stefanv commented Oct 7, 2015 via email

Uh oh!

stefanv commented Oct 7, 2015 via email

Uh oh!

papajohn commented Oct 7, 2015

Uh oh!

papajohn commented Oct 7, 2015

Uh oh!

deculler commented Oct 7, 2015

Uh oh!

stefanv commented Oct 7, 2015

Uh oh!

stefanv commented Oct 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants