Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

built-in key lists for keys command should be more systematic #10

Open
dmolesUC opened this issue Feb 5, 2019 · 2 comments
Open

built-in key lists for keys command should be more systematic #10

dmolesUC opened this issue Feb 5, 2019 · 2 comments

Comments

@dmolesUC
Copy link
Owner

dmolesUC commented Feb 5, 2019

The current set of key lists is fairly random and ad-hoc:

  1. Default: default source (as naughty-strings, filtering out "", ".", and leading "..", all of which are known to fail in Amazon S3) (500 keys)
  2. disallow-backslash: as default source, disallowing backlash (320 keys)
  3. disallow-double-backslash: as default source, disallowing double backlash (498 keys)
  4. misc: miscellenous potential problems, incl. path elements & unicode blocks (612 keys)
  5. naughty-strings: Big List of Naughty Strings (504 keys)

The naughty-strings list is fairly reasonable, although not all that directly applicable to filenames; it tends to produce a lot of redundant failures since many of its Javascript escapes and whatnot are very similar. A better, shorter list could probably be made by boiling it down to more general cases.

The misc list is pretty ad-hoc and doesn't cover every Unicode code block, or even most of them.

We should come up with some systematic cases and some systematic lists based on those cases.

@dmolesUC
Copy link
Owner Author

dmolesUC commented Feb 6, 2019

Also, they should include some strings that are bytewise valid in other charsets (esp. historical charsets) but not in UTF8.

@dmolesUC
Copy link
Owner Author

dmolesUC commented Mar 4, 2019

The cos suite --unicode test now covers all Unicode characters, systematically. It does attempt to test invalid UTF8 sequences, but it's not clear how effective that test is (e.g. it may be sensitively dependent on how the AWS client library converts Go strings to JSON).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant