Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPFile._fetch_range raises when passed headers kwarg #4479

Closed
rpetchler opened this issue Feb 12, 2019 · 4 comments

Comments

Projects
None yet
2 participants
@rpetchler
Copy link
Contributor

commented Feb 12, 2019

Reproducible example:

from dask.bytes.http import HTTPFile

url = "http://www.example.com/"
headers = {"DNT": "1"}

fobj = HTTPFile(url=url, headers=headers)
print(fobj._fetch_range(0, 64))

Raised exception:

Traceback (most recent call last):
  File "main.py", line 7, in <module>
    print(fobj._fetch_range(0, 64))
  File "dask/bytes/http.py", line 249, in _fetch_range
    r = self.session.get(self.url, headers=headers, stream=True, **kwargs)
TypeError: get() got multiple values for keyword argument 'headers'

The cause appears to be a typo in these lines. The correct implementation calls kwargs.pop, not self.kwargs.pop.

@martindurant

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

HTTPFile isn't really meant to be used except for internal Dask access to remote data. You may well be right nevertheless, but I'd still like to know how you came across this class and what your intended use is.

(Would fsspec's version be useful to you?)

@rpetchler

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2019

Thanks for your prompt reply, @martindurant.

Here's a larger example that illustrates how passing headers through storage_options raises this exception:

import dask
import dask.bag as db

dask.config.set(scheduler='synchronous')

urls = [
    "http://www.example.com/",
    "http://www.example.com/",
]

headers = {"DNT": "1"}
storage_options = {"headers": headers}

bag = db.read_text(urls, storage_options=storage_options)
bag.map(len).sum().compute()

And here's the corresponding trace:

Traceback (most recent call last):
  File "httpfile.py", line 15, in <module>
    bag.map(len).sum().compute()
  File "dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "dask/base.py", line 398, in compute
    results = schedule(dsk, keys, **kwargs)
  File "dask/local.py", line 501, in get_sync
    return get_async(apply_sync, 1, dsk, keys, **kwargs)
  File "dask/local.py", line 447, in get_async
    fire_task()
  File "dask/local.py", line 443, in fire_task
    callback=queue.put)
  File "dask/local.py", line 490, in apply_sync
    res = func(*args, **kwds)
  File "dask/local.py", line 235, in execute_task
    result = pack_exception(e, dumps)
  File "dask/local.py", line 230, in execute_task
    result = _execute_task(task, data)
  File "dask/core.py", line 119, in _execute_task
    return func(*args2)
  File "dask/bag/core.py", line 2118, in empty_safe_apply
    _, part = peek(part)
  File "venv/lib/python3.7/site-packages/toolz/itertoolz.py", line 944, in peek
    item = next(iterator)
  File "dask/bag/core.py", line 1745, in map_chunk
    for a in zip(*args):
  File "dask/bag/text.py", line 105, in file_to_blocks
    for line in f:
  File "dask/bytes/http.py", line 185, in read
    self. _fetch(self.loc, end)
  File "dask/bytes/http.py", line 196, in _fetch
    self.cache = self._fetch_range(start, self.end)
  File "dask/bytes/http.py", line 249, in _fetch_range
    r = self.session.get(self.url, headers=headers, stream=True, **kwargs)
TypeError: get() got multiple values for keyword argument 'headers'

I encountered this error while working with a proprietary S3-like object store that uses headers to authenticate requests.

@martindurant

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

OK, got you. You are quite right. Would you like to create a PR?

rpetchler added a commit to rpetchler/dask that referenced this issue Feb 13, 2019

martindurant added a commit that referenced this issue Feb 13, 2019

@rpetchler

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2019

Closed in #4480.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.