Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty HttpError is raised on _fetch_range randomly #323

Open
DPGrev opened this issue Jan 6, 2021 · 2 comments
Open

Empty HttpError is raised on _fetch_range randomly #323

DPGrev opened this issue Jan 6, 2021 · 2 comments

Comments

@DPGrev
Copy link
Contributor

DPGrev commented Jan 6, 2021

What happened:

When using to_parquet and writing to gcs from a compute engine vm we get the following error:

2021-01-06 15:48:39,785 [_call] ERROR - _call non-retriable exception: 
Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call
    self.validate_response(status, contents, json, path, headers)
  File "/opt/miniconda3/lib/python3.8/site-packages/gcsfs/core.py", line 1230, in validate_response
    raise HttpError({"code": status})
gcsfs.utils.HttpError
Traceback (most recent call last):
  File "/app/roxy_cryptochassis/main.py", line 9, in <module>
    fire.Fire(Collector)
  File "/opt/miniconda3/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/miniconda3/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/miniconda3/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/miniconda3/lib/python3.8/site-packages/roxy_cryptochassis/collect.py", line 125, in run
    self._flush_data(data)
  File "/opt/miniconda3/lib/python3.8/site-packages/roxy_cryptochassis/collect.py", line 178, in _flush_data
    data.repartition(npartitions=1).to_parquet(
  File "/opt/miniconda3/lib/python3.8/site-packages/dask/dataframe/core.py", line 4075, in to_parquet
    return to_parquet(self, path, *args, **kwargs)
  File "/opt/miniconda3/lib/python3.8/site-packages/dask/dataframe/io/parquet/core.py", line 593, in to_parquet
    meta, schema, i_offset = engine.initialize_write(
  File "/opt/miniconda3/lib/python3.8/site-packages/dask/dataframe/io/parquet/arrow.py", line 728, in initialize_write
    dataset = pq.ParquetDataset(path, filesystem=fs)
  File "/opt/miniconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 1212, in __init__
    self.validate_schemas()
  File "/opt/miniconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 1255, in validate_schemas
    file_metadata = piece.get_metadata()
  File "/opt/miniconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 676, in get_metadata
    f = self.open()
  File "/opt/miniconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 683, in open
    reader = self.open_file_func(self.path)
  File "/opt/miniconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 1049, in _open_dataset_file
    return ParquetFile(
  File "/opt/miniconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 199, in __init__
    self.reader.open(source, use_memory_map=memory_map,
  File "pyarrow/_parquet.pyx", line 1021, in pyarrow._parquet.ParquetReader.open
  File "/opt/miniconda3/lib/python3.8/site-packages/fsspec/spec.py", line 1432, in read
    out = self.cache._fetch(self.loc, self.loc + length)
  File "/opt/miniconda3/lib/python3.8/site-packages/fsspec/caching.py", line 151, in _fetch
    self.cache = self.fetcher(start, end)  # new block replaces old
  File "/opt/miniconda3/lib/python3.8/site-packages/gcsfs/core.py", line 1457, in _fetch_range
    _, data = self.gcsfs.call("GET", self.details["mediaLink"], headers=head)
  File "/opt/miniconda3/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper
    return maybe_sync(func, self, *args, **kwargs)
  File "/opt/miniconda3/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync
    return sync(loop, func, *args, **kwargs)
  File "/opt/miniconda3/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise exc.with_traceback(tb)
  File "/opt/miniconda3/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f
    result[0] = await future
  File "/opt/miniconda3/lib/python3.8/site-packages/gcsfs/core.py", line 525, in _call
    raise e
  File "/opt/miniconda3/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call
    self.validate_response(status, contents, json, path, headers)
  File "/opt/miniconda3/lib/python3.8/site-packages/gcsfs/core.py", line 1230, in validate_response
    raise HttpError({"code": status})
gcsfs.utils.HttpError

What you expected to happen:

To have an error message and gcsfs retrying.

Minimal Complete Verifiable Example:

This error is very hard to replicate but have had it happen randomly within +- 300 to_parquet calls. We experienced this with random sizes for the dataframe with the following to_parquet call.

# Empty dataframe
ddf.to_parquet(
    path,
    partition_on=["key1", "key2", "key3"],
    append=True,
    write_index=False,
    engine="pyarrow-dataset",
)

Anything else we need to know?:

This error always happens at: https://github.com/dask/gcsfs/blob/7eef6cf183acd93a71a8f8a4e1580540058824cb/gcsfs/core.py#L1538
in:
https://github.com/dask/gcsfs/blob/7eef6cf183acd93a71a8f8a4e1580540058824cb/gcsfs/core.py#L1525

A possible fix would be to by default retry the self.gcsfs.call("GET", self.details["mediaLink"], headers=head) call and omit retrying on certain status code. Currently is see no harm in retrying this call for _fetch_range, since it does not change any internal state.

If approved, I would be willing to create a pr for this.

Environment:

  • Dask version: 2020.12.0
  • Python version: 3.8.6
  • Operating System: Debian
  • Install method (conda, pip, source): pip
@martindurant
Copy link
Member

Can you please try the suggestions in #316 to get more information out of the error? Retrying for any error seems to me like a bad idea, since some indicate a permanent problem, such as lack of permissions - you would just end up delaying the message to the user or maybe even eating up API call quotas.

@DPGrev
Copy link
Contributor Author

DPGrev commented Jan 8, 2021

See: #316 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants