Skip to content

Unicode support is added for metric names and tag names/values.#205

Closed
vasiliyk wants to merge 1 commit intoOpenTSDB:nextfrom
vasiliyk:next
Closed

Unicode support is added for metric names and tag names/values.#205
vasiliyk wants to merge 1 commit intoOpenTSDB:nextfrom
vasiliyk:next

Conversation

@vasiliyk
Copy link
Member

Current whitelisted symbols: letter, digit, underscore symbol, dot symbol, forward slash.
Dash symbol was removed from white-list.

Right after merge I will prepare change in documentation: dash "-" symbol is removed from list of allowed symbols.

Current whitelisted symbols: letter, digit, underscore symbol, dot symbol, forward slash.
Dash symbol was removed from white-list.
@manolama
Copy link
Member

We can't remove the "-" since it's already used by a bunch of folks. It would cause a ton of trouble. Also, could you look at using the "next" branch and add a Config option that would enable the unicode validator for folks that need it?

@vasiliyk
Copy link
Member Author

@tsuna mentioned "-" on the mail list in the list of symbols that we may need to reserve, and I thought it was good idea to make it in next release. https://groups.google.com/forum/#!topic/opentsdb/4v43M3aOl-c

If you still worry about performance I can propose another variant of algorithm. For non unicode letters we have 0.005ms(current filter) vs 0.012ms for 25 symbols, and minimal overhead for unicode letters (additional 4 numbers comparisons) : 0.002ms. What do you prefer config option or this variant ?

  public static void validateString(final String what, final String s) {
    if (s == null) {
      throw new IllegalArgumentException("Invalid " + what + ": null");
    }
    final int n = s.length();
    for (int i = 0; i < n; i++) {
      final char c = s.charAt(i);
      if (!('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z') ||
          ('0' <= c && c <= '9') || c == '_' || c == '.' || c == '/' ||
          Character.isLetter(c)) {
        throw new IllegalArgumentException("Invalid " + what
            + " (\"" + s + "\"): illegal character: " + c);
      }
    }
  }

@tsuna
Copy link
Member

tsuna commented Jul 13, 2013

Since we already allowed - then I agree with Chris it would be better to keep allowing it. Sorry for suggesting otherwise in my email.

I like @vasiliyk's last suggestion in the comment above. I didn't even think such a minimal change would be all that would be required.

@vasiliyk vasiliyk closed this Jul 14, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants