-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First impression feedback #2
Comments
I now pass divisions= based on the row counts in each row_group. The information on whether there are sorting columns and statistics in each piece are also available, so I can implement what you suggest. |
Suggestions for what should should be in |
Location, size in bytes, columns or number of columns, dtypes? I wouldn't On Wed, Oct 26, 2016 at 9:46 AM, Martin Durant notifications@github.com
|
Just took this for a spin, here is some feedback. Some of this is administrative and probably obvious to you. I'm listing it here just for completeness, not to prioritize:
Administrative
__repr__
implementation__init__.py
parquet/tests
?mdurant
shows up in the tests)Performance
from_delayed
call needs to be passed metadata to avoid triggering a local computationfrom_delayed
call would like to be passeddivisions=
information if we can get it from the parquet file. We might want to sniff statistics within the parquet file for sorted columns and, if we find exactly one, make it the index automatically?The text was updated successfully, but these errors were encountered: