-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource format not guessed automatically #1350
Conversation
At the dev meeting, we decided to do this guessing in a custom validator. |
This has turned into a major rewrite of how to handle resource formats.
ISSUES |
To be decided for 2.2 or 2.3 pending review @davidread you are probably interested in this |
url = data.get(key[:-1] + ('url',), '') | ||
type, encoding = mimetypes.guess_type(url) | ||
if type: | ||
data[key] = type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"value = data[key]" and "data[key] = type" Can you be more specific with your variable names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can't get away from data and key for a validator, but value and type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value is used everywhere in the validators as the value of that key. So it may be bad but its a least consistent.
Also type, encoding is what is the documented return types.
http://docs.python.org/3.4/library/mimetypes.html#module-mimetypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on value.
In the context of the mimetype docs "type" might be a useful abbreviation of "mimetype". But here I think 'mimetype' would be clearer. 'type' is way too overloaded in coding everywhere... I'm just nitpicking sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change it to mimetype! I agree its better.
My understanding from the call on Tuesday was that the format field is going from what was usually a file extension value (or almost any value a user might like to fill in) to a field that will contain a mime type some of the time instead. Reading the code it looks like the the format being stored is the file extension, not the mime type. Did I misunderstand the intent during call or am I misreading the code? |
This pull request is not the one that changes the format field to be a mimetype even though I wanted to prepare for the time if we decide to move that way... I would not try and squeeze that in version 2.2, as I do not think we have settled on this decision. I have tried to make the new behaviour as backwards compatible as possible (except for the fact it fills in a format if left empty and removes the nasty "unified_format" stuff). |
@kindly can we have a small separate pull request with just the template notice string change, so at least we have this fixed before the string freeze? Cheers |
what's missing is a way to update all the resources based on the new validator. This depends on the schema so it doesn't feel right to make it a migration. What about a paster command that does package_update(package_show(id)) on all the packages? |
[#1350] Remove misleading resource format notice
David to come back to looking at this in a couple of weeks |
["MXD", "ESRI ArcGIS project file", "application/x-mxd", []], | ||
["TAR", "TAR Compressed File", "application/x-tar", []], | ||
["PNG", "PNG Image File", "image/png", []], | ||
["RSS", "RSS feed", "application/rss+xml", []] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this line?
["GeoJSON", "Geographic JavaScript Object Notation", "application/json", []]
["SVG", "SVG vector image", "image/svg+xml", ["svg"]], | ||
["PPT", "Powerpoint Presentation", "application/vnd.ms-powerpoint", []], | ||
["JPEG", "JPG Image File", "image/jpeg", ["jpeg", "jpg"]], | ||
["Sparql Results", "Sparql Results", "application/sparql-results+xml", []], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it is useful to split SPARQL endpoints into two. This is a list of resource formats, and people recognize a SPARQL URL as an API. I don't think it is going to happen that someone publishes a text file with a SPARQL query or a SPARQL response in it, are they? Can we just get rid of the "Sparql query" line and rename "Sparql Results" to ["SPARQL", "SPARQL endpoint", "application/sparql-results+xml", []],?
["NetCDF", "NetCDF File", "application/netcdf", []], | ||
["ArcGIS Map Service", "ArcGIS Map Service", "ArcGIS Map Service", ["arcgis map service"]], | ||
["TSV", "Tab Separated Values File", "text/tsv", []], | ||
["WFS", "Web Feature Service", "text/xml", []], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mimetype for WFS should be "application/xml" i.e. the same as for "XML".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that setting it to the same as for XML would spoil the if_empty_guess_format as it might see "file.xml" and may say it is a WFS file. Better to have no mimetype for WFS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually for WFS (at least 1.1.0) the mime type defined by the OGC standard is text/xml
. I think is probably fine to leave it like this for the time being because of what you mention regarding if_empty_guess
[#1350] Resource format not guessed automatically
[WIP] The resource format isn't guessed automatically, but CKAN claims to guess it automatically. I recommend removing the text for for 2.2 and fix guessing for 2.3.