Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ValueError: write to closed file' when using --output-document with redirects #453

Open
JustAnotherArchivist opened this issue May 16, 2020 · 1 comment
Labels

Comments

@JustAnotherArchivist
Copy link
Contributor

When using the --output-document option with a URL that produces a redirect, wpull crashes on trying to write the second response:

$ wpull --output-document foo.html https://medium.economist.com/refugee-camp-diary-31f2fe2942ef
INFO Fetching ‘https://medium.economist.com/refugee-camp-diary-31f2fe2942ef’.
  100.0% [=========================] 154.0 B 0:00:00 359.5 B/s
INFO Fetched ‘https://medium.economist.com/refugee-camp-diary-31f2fe2942ef’: 302 Moved Temporarily. Length: 154 [text/html].
INFO Fetching ‘https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Fmedium.economist.com%2Frefugee-camp-diary-31f2fe2942ef’.
  [O                        ] 3.0 B 0:00:01 2.2 B/s.../lib/python3.6/site-packages/wpull/protocol/http/client.py:185: UserWarning: HTTP session did not complete.
  warnings.warn(_('HTTP session did not complete.'))
ERROR Fatal exception.
Traceback (most recent call last):
  File ".../lib/python3.6/site-packages/wpull/application/app.py", line 157, in run
    yield from pipeline.process()
  File ".../lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
    yield from self._process_one_worker()
  File ".../lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
    task.result()
  File ".../lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
    item = yield from self.process_one(_worker_id=worker_id)
  File ".../lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
    yield from task.process(item)
  File ".../lib/python3.6/site-packages/wpull/application/tasks/download.py", line 492, in process
    yield from session.app_session.factory['Processor'].process(session)
  File ".../lib/python3.6/site-packages/wpull/processor/delegate.py", line 29, in process
    return (yield from processor.process(item_session))
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 92, in process
    return (yield from session.process())
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 186, in process
    yield from self._process_loop()
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 245, in _process_loop
    exit_early, wait_time = yield from self._fetch_one(cast(Request, self._item_session.request))
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 287, in _fetch_one
    duration_timeout=self._fetch_rule.duration_timeout
  File ".../lib/python3.6/site-packages/wpull/protocol/http/web.py", line 131, in download
    self._current_session.download(file, duration_timeout=duration_timeout)
  File ".../lib/python3.6/site-packages/wpull/protocol/http/client.py", line 154, in download
    yield from asyncio.wait_for(read_future, timeout=duration_timeout)
  File ".../lib/python3.6/asyncio/tasks.py", line 339, in wait_for
    return (yield from fut)
  File ".../lib/python3.6/site-packages/wpull/protocol/abstract/stream.py", line 17, in wrapper
    return (yield from func(self, *args, **kwargs))
  File ".../lib/python3.6/site-packages/wpull/protocol/http/stream.py", line 200, in read_body
    yield from self._read_body_by_chunk(response, file, raw=raw)
  File ".../lib/python3.6/site-packages/wpull/protocol/http/stream.py", line 350, in _read_body_by_chunk
    file.write(content)
  File ".../lib/python3.6/site-packages/wpull/writer.py", line 481, in write
    self._stream.write(data)
ValueError: write to closed file
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
INFO Exiting with status 1.
@JustAnotherArchivist
Copy link
Contributor Author

Also happens when downloading more than one URL, e.g. wpull --output-file foo URL0 URL1.

A very similar error happens on retries:

> wpull --output-document foo --tries 3 https://httpbingo.org/status/500
INFO Fetching ‘https://httpbingo.org/status/500’.

INFO Fetched ‘https://httpbingo.org/status/500’: 500 Internal Server Error. Length: 0 [unspecified].
.../lib/python3.6/site-packages/wpull/protocol/http/client.py:185: UserWarning: HTTP session did not complete.
  warnings.warn(_('HTTP session did not complete.'))
INFO Fetching ‘https://httpbingo.org/status/500’.

INFO Fetched ‘https://httpbingo.org/status/500’: 500 Internal Server Error. Length: 0 [unspecified].
ERROR Fatal exception.
Traceback (most recent call last):
  File ".../lib/python3.6/site-packages/wpull/application/app.py", line 157, in run
    yield from pipeline.process()
  File ".../lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
    yield from self._process_one_worker()
  File ".../lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
    task.result()
  File ".../lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
    item = yield from self.process_one(_worker_id=worker_id)
  File ".../lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
    yield from task.process(item)
  File ".../lib/python3.6/site-packages/wpull/application/tasks/download.py", line 492, in process
    yield from session.app_session.factory['Processor'].process(session)
  File ".../lib/python3.6/site-packages/wpull/processor/delegate.py", line 29, in process
    return (yield from processor.process(item_session))
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 92, in process
    return (yield from session.process())
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 186, in process
    yield from self._process_loop()
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 245, in _process_loop
    exit_early, wait_time = yield from self._fetch_one(cast(Request, self._item_session.request))
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 309, in _fetch_one
    action = self._handle_response(request, response)
  File ".../lib/python3.6/site-packages/wpull/processor/web.py", line 436, in _handle_response
    self._file_writer_session.discard_document(response)
  File ".../lib/python3.6/site-packages/wpull/writer.py", line 519, in discard_document
    response.body.flush()
  File ".../lib/python3.6/site-packages/wpull/writer.py", line 490, in flush
    self._stream.flush()
ValueError: flush of closed file
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
INFO Exiting with status 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant