Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give informative meta= warning #4637

Merged
merged 2 commits into from Mar 28, 2019

Conversation

Projects
None yet
3 participants
@mrocklin
Copy link
Member

commented Mar 27, 2019

Often users get hung up on how exactly to create meta= objects when
using UDF functions like map/apply/...

They don't need to, we can figure it out for them, but sometimes they
like to silence these warnings. Now we provide the correct result in
the warning message itself.

Originally raised in https://stackoverflow.com/questions/55363496/dask-dataframe-defining-meta-for-date-diff-in-groubpy

Example

In [1]: import dask

In [2]: df = dask.datasets.timeseries()

In [3]: df.apply(lambda x: x, axis=1)
/Users/mrocklin/workspace/dask/dask/dataframe/core.py:3144: UserWarning:
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using..
  Before: .apply(func)
  After:  .apply(func, meta={'id': 'int64', 'name': 'object', 'x': 'float64', 'y': 'float64'})

  warnings.warn(meta_warning(meta))
Out[3]:
Dask DataFrame Structure:
                   id    name        x        y
npartitions=30
2000-01-01      int64  object  float64  float64
2000-01-02        ...     ...      ...      ...
...               ...     ...      ...      ...
2000-01-30        ...     ...      ...      ...
2000-01-31        ...     ...      ...      ...
Dask Name: apply, 60 tasks
  • Tests added / passed
  • Passes flake8 dask
Give informative meta= warning
Often users get hung up on how exactly to create meta= objects when
using UDF functions like map/apply/...

They don't need to, we can figure it out for them, but sometimes they
like to silence these warnings.  Now we provide the correct result in
the warning message itself.

Originally raised in https://stackoverflow.com/questions/55363496/dask-dataframe-defining-meta-for-date-diff-in-groubpy

Example
-------

```python
In [1]: import dask

In [2]: df = dask.datasets.timeseries()

In [3]: df.apply(lambda x: x, axis=1)
/Users/mrocklin/workspace/dask/dask/dataframe/core.py:3144: UserWarning:
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using..
  Before: .apply(func)
  After:  .apply(func, meta={'id': 'int64', 'name': 'object', 'x': 'float64', 'y': 'float64'})

  warnings.warn(meta_warning(meta))
Out[3]:
Dask DataFrame Structure:
                   id    name        x        y
npartitions=30
2000-01-01      int64  object  float64  float64
2000-01-02        ...     ...      ...      ...
...               ...     ...      ...      ...
2000-01-30        ...     ...      ...      ...
2000-01-31        ...     ...      ...      ...
Dask Name: apply, 60 tasks
```
if meta_str:
msg += ("\n"
" Before: .apply(func)\n"
" After: .apply(func, meta=%s)\n" % str(meta_str))

This comment has been minimized.

Copy link
@martindurant

martindurant Mar 27, 2019

Member

Do we also know the method they are using, or is that trying to go too far here?

This comment has been minimized.

Copy link
@mrocklin

mrocklin Mar 27, 2019

Author Member

I looked into fixing this but it looks like we only use this message on apply currently.

This comment has been minimized.

Copy link
@martindurant

martindurant Mar 27, 2019

Member

OK then - definitely an improvement.

Show resolved Hide resolved dask/dataframe/core.py Outdated
Update dask/dataframe/core.py
Co-Authored-By: mrocklin <mrocklin@gmail.com>
@jrbourbeau
Copy link
Member

left a comment

LGTM, thanks @mrocklin!

@jrbourbeau jrbourbeau merged commit 45a1b5a into dask:master Mar 28, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

asmith26 added a commit to asmith26/dask that referenced this pull request Apr 22, 2019

Give informative meta= warning (dask#4637)
* Give informative meta= warning

* Update dask/dataframe/core.py

Co-Authored-By: mrocklin <mrocklin@gmail.com>

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

Give informative meta= warning (dask#4637)
* Give informative meta= warning

* Update dask/dataframe/core.py

Co-Authored-By: mrocklin <mrocklin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.