Skip to content

Support fallback to Structured Syntax Name Suffixes in MIME types #2805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tfmorris opened this issue Jun 24, 2020 · 1 comment · Fixed by #2806
Closed

Support fallback to Structured Syntax Name Suffixes in MIME types #2805

tfmorris opened this issue Jun 24, 2020 · 1 comment · Fixed by #2806
Assignees
Milestone

Comments

@tfmorris
Copy link
Member

tfmorris commented Jun 24, 2020

We should support the use of Structured Syntax Name Suffixes as a fallback for importers.

Currently we support rdf+xml and know that it's more specific than the generic text/xml so we should try the RDF importer before the XML importer, but if we encounter an unknown type with a syntax suffix, we don't prioritize the appropriate importer.

The idea would be that if we were to encounter application/3gpdash-qoe-report+xml or application/activity+json (to pick the first ones on the list), we would suggest the XML and JSON importers, respectively, as the first fallback options.

This is something we could implement directly in the importer framework, so perhaps it's orthogonal to the idea that @wetneb had for #2598.

Originally posted by @tfmorris in #2598 (comment)

@wetneb
Copy link
Member

wetneb commented Jun 24, 2020

It makes sense to exploit the structure of MIME types as much as we can.

tfmorris added a commit to tfmorris/OpenRefine that referenced this issue Jun 24, 2020
If we can't find a fully specified content type in our lookup,
fall back to just the suffix (which is registered with a leading +)
Fixes OpenRefine#2800 Fixes OpenRefine#2805
@wetneb wetneb added this to the 3.5 milestone Jun 25, 2020
wetneb pushed a commit that referenced this issue Jun 25, 2020
* Fix charset encoding & MIME type handling

Character set (ie what we call "encoding") is part of the Content-Type,
*not* the Content-Encoding, which specifies compression (e.g. gzip).

This correctly sets the character set encoding as well as cleaning
the MIME type so that additional parsing doesn't need to be done
downstream (and removes that code).

* Use "text" instead of "text/line-based" as default fallback format

The TextLineBasedGuesser only tries a limited number of
formats (CSV, TSV, fixed), so we can't get out of that hole to
find JSON, XML, etc.

Start with a more general format instead to improve our
guessing odds.

* Support content type Structured Name Syntax Suffixes (+json +xml)

If we can't find a fully specified content type in our lookup,
fall back to just the suffix (which is registered with a leading +)
Fixes #2800 Fixes #2805
tfmorris added a commit to tfmorris/OpenRefine that referenced this issue Jun 30, 2020
Fix database extensions exporter which is corrupting the dictionary
name with the value of the language.
wetneb pushed a commit that referenced this issue Jun 30, 2020
Fix database extensions exporter which is corrupting the dictionary
name with the value of the language.
@tfmorris tfmorris self-assigned this Jul 13, 2020
@wetneb wetneb mentioned this issue Apr 24, 2021
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants