Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add DataFrame tabular repr #1637

Merged
merged 6 commits into from Jan 22, 2017
Merged

ENH: Add DataFrame tabular repr #1637

merged 6 commits into from Jan 22, 2017

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Oct 8, 2016

closes #1604. Fixed to be distinguishable from pandas output.

2016-10-09 7 40 44

@mrocklin
Copy link
Member

@jcrist any comments on this?

@sinhrks sinhrks force-pushed the df_repr branch 2 times, most recently from 3beac48 to bfc1856 Compare October 15, 2016 06:36
@mrocklin
Copy link
Member

Taking another look at this (sorry for the long silence). Some feedback based on the following example:

image

Some subjective thoughts:

  • I think we should show fewer empty lines for divisions. Maybe four or five
  • I think that we should remove the text-based representation above the tabular repr.
  • I think that we should include the number of partitions somewhere, perhaps where it currently says "divisions"
  • It would be nice to also state how many tasks are in the computation. I'm not sure exactly where this would go, but we could consider using some of the empty space below the dtypes.
  • If the computation of df.head is cheap we could consider adding the first few lines as a sample
head = df.head(compute=False)
head.dask = head._optimize(head.dask, head._keys())
if len(head.dask) < 10:
    ...

I'm not sure about this though. This sort of guess work can get us in trouble. Just throwing it out there as a thought.

@mrocklin mrocklin mentioned this pull request Oct 18, 2016
@sinhrks sinhrks force-pushed the df_repr branch 3 times, most recently from e4d417e to 67f9e88 Compare November 3, 2016 12:00
@sinhrks
Copy link
Member Author

sinhrks commented Jan 21, 2017

Sorry not to follow this up. updated based on your suggestions.

  • show fewer empty lines for divisions. Maybe four or five
  • remove the text-based representation above the tabular repr.
  • include the number of partitions somewhere

2017-01-21 11 29 09

I think it's nice to display dask key name and number of tasks. Do we already have a function to count current tasks?

@mrocklin
Copy link
Member

Do we already have a function to count current tasks?

I would just use len(self.dask)

@mrocklin
Copy link
Member

Trying this out locally now. It feels very nice to me.

@mrocklin
Copy link
Member

Oh, and I see that it works nicely as a text repr as well.

@sinhrks
Copy link
Member Author

sinhrks commented Jan 21, 2017

I would just use len(self.dask)

Ah you're right. I've tried to count something other than dask tasks.

Fixed to include it to html / string repr. Note that to_string only contains data repr, but I do not have specific preference.


@property
def _repr_name(self):
return self._name if len(self._name) < 10 else self._name[:7]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to use more than 7 or 10 characters then we might want to use the dask.utils.key_split function introduced in #1919 .

@mrocklin
Copy link
Member

I've added the use of key_split to determine the name. Any objections @sinhrks ?

I would like to merge this soon.

@sinhrks
Copy link
Member Author

sinhrks commented Jan 22, 2017

thx, no objections of course:)

@mrocklin mrocklin merged commit a8db711 into dask:master Jan 22, 2017
@mrocklin
Copy link
Member

Merged. Thanks @sinhrks ! I think this change will make several people happy.

@sinhrks sinhrks added this to the 0.14.0 milestone Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame tabular repr
2 participants