API: Disallow Namespace with null byte character or null level in it#3938
Conversation
|
cc @rdblue re checking for null bytes when instantiating |
c5daf86 to
a6375ed
Compare
28375b3 to
eb6a6ab
Compare
nastra
left a comment
There was a problem hiding this comment.
LGTM once the test is fixed
api/src/test/java/org/apache/iceberg/catalog/TestNamespace.java
Outdated
Show resolved
Hide resolved
f3a117d to
1e49e8e
Compare
|
cc @rdblue if you could possibly merge this now that the tests are passing (I just rebased off latest master but they should still be passing). |
1e49e8e to
5e32b53
Compare
| private static final Namespace EMPTY_NAMESPACE = new Namespace(new String[] {}); | ||
| private static final Joiner DOT = Joiner.on('.'); | ||
| private static final Predicate<String> CONTAINS_NULL_BYTE = | ||
| Pattern.compile("\0|\u0000", Pattern.UNICODE_CHARACTER_CLASS).asPredicate(); |
There was a problem hiding this comment.
What's the difference between \0 and \u0000?
There was a problem hiding this comment.
One is unicode and one isn't. I do notice that some of our systems complain when using \0 and not the full unicode \u0000 which is the preferred one.
To be safe, I just included both. Let me test and remove the ASCII one if it's not needed.
There was a problem hiding this comment.
Using just \0000 is sufficient, so I removed the ASCII one.
5e32b53 to
8eb4379
Compare
8eb4379 to
e47b7f5
Compare
| public class Namespace { | ||
| private static final Namespace EMPTY_NAMESPACE = new Namespace(new String[] {}); | ||
| private static final Joiner DOT = Joiner.on('.'); | ||
| private static final Predicate<String> CONTAINS_NULL_BYTE = |
There was a problem hiding this comment.
BYTE probably isn't correct since this is checking for the null unicode codepoint. Maybe just CONTAINS_NULL?
There was a problem hiding this comment.
That's fair. Maybe just CONTAINS_NULL_CHARACTER? CONTAINS_NULL makes it sound like we're checking for null itself, which this regex does not do.
|
Thanks, @kbendick! |
In the REST catalog, we have decided to use a null byte character to delimit certain portions of a
Namespace.To prepare for that, we should disallow any level in a namespace which contains a null byte character (technically either the deprecated
\0or the preferred unicode character\u0000).It also doesn't make sense for a level to be null, so I've added a check for that as well.
I added three new tests and I also tested the regular expression against a large number of patterns in a Scala REPL.