Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that field names are stripped #1971

Merged
merged 1 commit into from
Nov 3, 2014
Merged

Ensure that field names are stripped #1971

merged 1 commit into from
Nov 3, 2014

Conversation

aliceh75
Copy link
Contributor

Having field names that start/end with spaces causes a lot of complications, because some code strips the field names, while others don't. This is particularly true when using third party libraries, and parsing text-representations of strings such as the sort string "my field ASC, my other field DESC", etc.

See for instance: #1970

Given that there is no practical use for having field names that start or end with spaces (these will be just as confusing to humans as they are to machines), I suggest to disallow them.

At the same time I suggest that the Datapusher should strip field names before uploading data. I have already submitted a PR for this at : ckan/datapusher#47

@wardi
Copy link
Contributor

wardi commented Oct 15, 2014

This is really good, and thank you for updating the tests.

What about other white space? We could have tabs, newlines etc. in the data. We could do a if name.strip() != name check.

If datastore was brand new it would be nice to limit column names to word characters, but that would probably break too many existing installs.

@aliceh75
Copy link
Contributor Author

Yes, you right about other white spaces! I will update this.

While disallowing non-words characters would make the code less bug prone, it would also potentially upset our users - who knows what kind of things people use as headers in their CSVs (I can easily imagine scientific names, or mathematical formulas going there). We would have to store the labels separately, which would bring it's own layer of complexity.

The alternative is to have an easy-to-find library (both server and client side) that should be consistently used for operations on field names. I don't think this is the case; For example I have found myself writing code to split the new URL filter representation (field one:a value|field two:another value). It's easy to get wrong, to strip something that shouldn't be stripped, un-escape something that shouldn't be un-escaped, etc. But this is a larger topic than the one at hand in this PR!

@amercader amercader added this to the CKAN 2.3 milestone Oct 22, 2014
@aliceh75
Copy link
Contributor Author

@wardi : I've added support for other types of white-spaces, and more tests. While there I re-factored the test in a loop, as it is more readable that way.

wardi added a commit that referenced this pull request Nov 3, 2014
…ripped

Ensure that field names are stripped
@wardi wardi merged commit 0bbfcc1 into ckan:master Nov 3, 2014
@aliceh75 aliceh75 deleted the 1970-ensure-field-names-are-stripped branch January 28, 2015 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants