Skip to content

Add single quotes around column names if strings#6471

Merged
mrocklin merged 2 commits intodask:masterfrom
gforsyth:col_name_str
Jul 30, 2020
Merged

Add single quotes around column names if strings#6471
mrocklin merged 2 commits intodask:masterfrom
gforsyth:col_name_str

Conversation

@gforsyth
Copy link
Copy Markdown
Contributor

  • Tests added / passed
  • Passes black dask / flake8 dask

Fixes #6470

Adds explicit single quotes around the column names in bad_dtypes so that it's easier to distinguish between str(int) column names and int column names.

Traceback (most recent call last):
  File "metadata_mismatch.py", line 11, in <module>
    df.compute()
  File "/Users/gil/github.com/dask/dask/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/Users/gil/github.com/dask/dask/dask/base.py", line 447, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/Users/gil/github.com/dask/dask/dask/threaded.py", line 76, in get
    results = get_async(
  File "/Users/gil/github.com/dask/dask/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/Users/gil/github.com/dask/dask/dask/local.py", line 316, in reraise
    raise exc
  File "/Users/gil/github.com/dask/dask/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/Users/gil/github.com/dask/dask/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/Users/gil/github.com/dask/dask/dask/dataframe/utils.py", line 672, in check_meta
    raise ValueError(
ValueError: Metadata mismatch found in `from_delayed`.

Partition type: `pandas.core.frame.DataFrame`
+--------+-------+----------+
| Column | Found | Expected |
+--------+-------+----------+
| '7'    | int64 | -        |
| '8'    | int64 | -        |
| 7      | -     | int64    |
| 8      | -     | int64    |
+--------+-------+----------+

(col, a, b)
# add single quotes around string variables
# to more clearly demarcate column type
(f"'{col}'", a, b) if isinstance(col, str) else (col, a, b)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe repr(col) instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that's a better idea.

@mrocklin mrocklin merged commit 5e29371 into dask:master Jul 30, 2020
@gforsyth gforsyth deleted the col_name_str branch July 30, 2020 16:28
kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metadata mismatch is not clear about why columns are different

2 participants