Skip to content

Conversation

@mDibyo
Copy link
Contributor

@mDibyo mDibyo commented Oct 25, 2015

This fixes #47.

@mDibyo
Copy link
Contributor Author

mDibyo commented Oct 25, 2015

In addition to adding a Table.exclude method, this branch makes the following changes:

  • table.rows.__getitem__ now returns an iterable instead of a list.
  • _Taker (and _Excluder) inherit(s) from _RowSelector.
  • Row definition has been simplified. It is now possible to check if an object is an instance of it by checking against Table.Row.
  • _Taker.__getitem__ has been simplified. @stefanv, could you take a look at this, and confirm that this still works? (it passes tests and doctests)

@SamLau95
Copy link
Contributor

Is there an advantage of defining a separate RowExcluder class versus translating exclude into a take? Eg. if I have a table t that looks like:

    letter | count | points
    a      | 9     | 1
    b      | 3     | 2
    c      | 3     | 2
    z      | 1     | 10

then
t.exclude(2) calls t.take([1, 3, 4]).

@stefanv
Copy link
Contributor

stefanv commented Oct 26, 2015

If you have a binary mask, you can also do take[~mask] for exclude.

@mDibyo
Copy link
Contributor Author

mDibyo commented Oct 26, 2015

@SamLau95 Not much of an advantage, but translating t.exclude calls to t.take calls does require a substantial bit of logic. Moreover, if we want to support row index, row indices list and slice, we would have to provide both __getitem__ and __call__ methods anyway. Recognizing that take and exclude are similar methods, I have extracted the common behavior into the _RowSelector class.
The only thing _Excluder.__getitem__ is doing in addition to what you suggested (t.exclude -> t.take) is creating the new table on its own. This eliminates the space that would have been allocated to the indices passed to take and the overhead for that operation, since every operation is currently being done using iterables. For a small table, its not a big deal, but it would be a big deal when we are running exclude(2) on a large table.
Let me know what you think. I might be worrying about the wrong things. :D

@mDibyo
Copy link
Contributor Author

mDibyo commented Oct 26, 2015

@stefanv That's a cool idea. But seems too space-intensive? (see my previous comment?)

@mDibyo
Copy link
Contributor Author

mDibyo commented Oct 26, 2015

I would like to point out that the space required for constructing indices for the take operation that I mentioned in the previous comments, will be dwarfed by the table returned from the take operation. So, it might not be worth worrying about.

@SamLau95
Copy link
Contributor

Cool! Thanks for working on this @mDibyo .

SamLau95 added a commit that referenced this pull request Oct 27, 2015
Add Table.exclude method for getting table excluding rows
@SamLau95 SamLau95 merged commit d13f82c into master Oct 27, 2015
@SamLau95 SamLau95 deleted the add-table-except branch October 27, 2015 00:14
@stefanv
Copy link
Contributor

stefanv commented Oct 27, 2015

This PR has now been merged (sorry, I was traveling and couldn't review), but some additional consideration may be worth the time. Is there really no way in which we can combine the logic for take and exclude, without increasing memory usage?

Also, we may want to consider using the name drop (from Pandas)?

Further, I'm curious how API decisions gets made for the datascience project. Are there meetings or discussions besides what we see on these PRs?

@mDibyo
Copy link
Contributor Author

mDibyo commented Oct 29, 2015

@stefanv The extra memory usage I mentioned is quite small. I could definitely look into ways of combining the two, and welcome your thinking on this too.

To me, drop seems to suggest that we are removing rows from the current table (instead of creating a new table without some rows. )

I think the only conversation we have about the API is on the corresponding issues. For instance, this PR addressed #47.

@mDibyo
Copy link
Contributor Author

mDibyo commented Oct 29, 2015

Also, feel free to review the code now and suggest changes. We can incorporate them in a future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drop row

4 participants