Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource without a format field results in a crash of pipeline #2

Closed
rufuspollock opened this issue Jul 20, 2017 · 3 comments
Closed

Comments

@rufuspollock
Copy link
Member

stream_remote_resources: WARNING :Error while opening resource from url https://s3.amazonaws.com/rawstore-testing.datahub.io/sQqpgDlCdaDFdRjzxbZN9Q==: FormatError('Format "None" is not supported',)
stream_remote_resources: Traceback (most recent call last):
stream_remote_resources:   File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/stream_remote_resources.py", line 144, in opener
stream_remote_resources:     _stream.open()
stream_remote_resources:   File "/usr/local/lib/python3.6/site-packages/tabulator/stream.py", line 132, in open
stream_remote_resources:     raise exceptions.FormatError(message)
stream_remote_resources: tabulator.exceptions.FormatError: Format "None" is not supported
stream_remote_resources: During handling of the above exception, another exception occurred:
stream_remote_resources: Traceback (most recent call last):
stream_remote_resources:   File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/stream_remote_resources.py", line 201, in 
stream_remote_resources:     rows = stream_reader(resource, url, ignore_missing or url == "")
stream_remote_resources:   File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/stream_remote_resources.py", line 159, in stream_reader
stream_remote_resources:     schema, headers, stream, close = get_opener(url, _resource)()
stream_remote_resources:   File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/stream_remote_resources.py", line 153, in opener
stream_remote_resources:     _stream.close()
stream_remote_resources:   File "/usr/local/lib/python3.6/site-packages/tabulator/stream.py", line 156, in close
stream_remote_resources:     self.__parser.close()
stream_remote_resources: AttributeError: 'NoneType' object has no attribute 'close'
@akariv
Copy link
Collaborator

akariv commented Jul 20, 2017 via email

@rufuspollock
Copy link
Member Author

@akariv but those files had a file extension in the path https://github.com/datasets/country-codes/blob/master/datapackage.json#L44 (note i have now added the format as well). Yet format was not deduced. I guess you are using the file name directly. There are lots of options here e.g.

  • Use datapackage.json info when parsing files
  • Set content-type on files in s3 (is that used by requests lib?)
  • add file extension to files in rawstore (means we lose a little bit the md5 collision - but relatively minor)
  • enforce that format is set

@rufuspollock
Copy link
Member Author

FIXED. We resolved this issue - assembler will now proceed ok if no format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants