Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give informative meta= warning #4637

Merged
merged 2 commits into from Mar 28, 2019
Merged

Give informative meta= warning #4637

merged 2 commits into from Mar 28, 2019

Conversation

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Mar 27, 2019

Often users get hung up on how exactly to create meta= objects when
using UDF functions like map/apply/...

They don't need to, we can figure it out for them, but sometimes they
like to silence these warnings. Now we provide the correct result in
the warning message itself.

Originally raised in https://stackoverflow.com/questions/55363496/dask-dataframe-defining-meta-for-date-diff-in-groubpy

Example

In [1]: import dask

In [2]: df = dask.datasets.timeseries()

In [3]: df.apply(lambda x: x, axis=1)
/Users/mrocklin/workspace/dask/dask/dataframe/core.py:3144: UserWarning:
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using..
  Before: .apply(func)
  After:  .apply(func, meta={'id': 'int64', 'name': 'object', 'x': 'float64', 'y': 'float64'})

  warnings.warn(meta_warning(meta))
Out[3]:
Dask DataFrame Structure:
                   id    name        x        y
npartitions=30
2000-01-01      int64  object  float64  float64
2000-01-02        ...     ...      ...      ...
...               ...     ...      ...      ...
2000-01-30        ...     ...      ...      ...
2000-01-31        ...     ...      ...      ...
Dask Name: apply, 60 tasks
  • Tests added / passed
  • Passes flake8 dask
Often users get hung up on how exactly to create meta= objects when
using UDF functions like map/apply/...

They don't need to, we can figure it out for them, but sometimes they
like to silence these warnings.  Now we provide the correct result in
the warning message itself.

Originally raised in https://stackoverflow.com/questions/55363496/dask-dataframe-defining-meta-for-date-diff-in-groubpy

Example
-------

```python
In [1]: import dask

In [2]: df = dask.datasets.timeseries()

In [3]: df.apply(lambda x: x, axis=1)
/Users/mrocklin/workspace/dask/dask/dataframe/core.py:3144: UserWarning:
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using..
  Before: .apply(func)
  After:  .apply(func, meta={'id': 'int64', 'name': 'object', 'x': 'float64', 'y': 'float64'})

  warnings.warn(meta_warning(meta))
Out[3]:
Dask DataFrame Structure:
                   id    name        x        y
npartitions=30
2000-01-01      int64  object  float64  float64
2000-01-02        ...     ...      ...      ...
...               ...     ...      ...      ...
2000-01-30        ...     ...      ...      ...
2000-01-31        ...     ...      ...      ...
Dask Name: apply, 60 tasks
```
if meta_str:
msg += ("\n"
" Before: .apply(func)\n"
" After: .apply(func, meta=%s)\n" % str(meta_str))
Copy link
Member

@martindurant martindurant Mar 27, 2019

Do we also know the method they are using, or is that trying to go too far here?

Copy link
Member Author

@mrocklin mrocklin Mar 27, 2019

I looked into fixing this but it looks like we only use this message on apply currently.

Copy link
Member

@martindurant martindurant Mar 27, 2019

OK then - definitely an improvement.

dask/dataframe/core.py Outdated Show resolved Hide resolved
Co-Authored-By: mrocklin <mrocklin@gmail.com>
Copy link
Member

@jrbourbeau jrbourbeau left a comment

LGTM, thanks @mrocklin!

@jrbourbeau jrbourbeau merged commit 45a1b5a into dask:master Mar 28, 2019
2 checks passed
jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this issue May 14, 2019
* Give informative meta= warning

* Update dask/dataframe/core.py

Co-Authored-By: mrocklin <mrocklin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants