Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keyname overhaul #3102

Closed
kodebach opened this issue Oct 19, 2019 · 7 comments
Closed

Keyname overhaul #3102

kodebach opened this issue Oct 19, 2019 · 7 comments
Assignees

Comments

@kodebach
Copy link
Member

kodebach commented Oct 19, 2019

There are a few issues already concerning key names:

In addition elektraKeySetName is known to be a bottleneck and should therefore be optimized.

The goal of this issue is to define the structure of key names, so that we can reimplement elektraKeySetName.

@kodebach kodebach self-assigned this Oct 19, 2019
@kodebach
Copy link
Member Author

Escaped Key Name Structure

The structure of escaped key names is defined by the following ANTLRv4 grammar.

grammar EscapedKeyName;

keyName
    :   namespacedName
    |   cascadingName
    ;

namespacedName
    :   Namespace ':' keyPath
    ;

cascadingName
    :   keyPath
    ;

keyPath
    :   ('/' keyPart)+
    |   '/'
    ;

keyPart
    :   inactivePart
    |   activePart
    ;

inactivePart
    : '.' canonicalPart
    | '.' UnprefixedArrayIndex
    ;
 
activePart
    :   canonicalPart
    |   nonCanonicalPart
    ;

canonicalPart
    :   namePart
    |   ArrayPart
    |   emptyPart
    ;

nonCanonicalPart
    :   '.'
    |   '..'
    |   
    |   UnprefixedArrayIndex
    ;

Namespace
    :   'spec'
    |   'dir'
    |   'user'
    |   'system'
    |   'meta'
    |   'default'
    |   'proc'
    ;

namePart
    : partStartChar partChar*
    | DotSequence
    ;

partStartChar
    :   EscapedStartChar
    |   EscapedSpecialChar
    |   NormalChar
    |   ':'
    ;
partChar
    :   EscapedSpecialChar
    |   StartChar
    |   NormalChar
    |   ':'
    |   '%'
    ;

emptyPart : '%' ;

ArrayPart : '#' PrefixedNumber ;
UnprefixedArrayIndex : '#' Number ;
DotSequence : '..' '.'+ ;

fragment Number : NonZeroDigit Digit* ;
fragment NonZeroDigit : [1-9] ;
fragment Digit : [0-9] ;

fragment PrefixedNumber
    :   Digit
    |   PrefixedNonZeroNumber
    ;

fragment PrefixedNonZeroNumber
    :   NonZeroDigit
    |   '_' PrefixedNonZeroNumber Digit
    ;

StartChar : [.%#@] ;
EscapedStartChar : '\\'StartChar ;
EscapedSpecialChar : '\\'[/\\] ;
NormalChar : ~[./\\%#@\n] ;

ErrChar: . ;

The grammar could be simplified, but doing it this way, allows us to define some terms:

  • cascading: any key name matched by the rule cascadingName is cascading, i.e. it is just a key path
  • namespaced: any key name matched by the rule namespacedName is namespaced, i.e. a key path prefixed by a namespace; path and namespace are separated by a colon
  • non-canonical: any key name that contains at least one nonCanonicalPart and at least one other keyPart is non-canonical. This means keys containing a single nonCanonicalPart (e.g. / or user:/) are not non-canonical.
  • canonical: any key that is not non-canonical is canonical
  • inactive: any key name that contains at least one inactivePart is inactive
  • active: any key name that is not inactive is active

Informally this means:

  • The base structure is: <namespace>:/<keypath>, where <keypath is a /-separated list of <keyPart>.
  • Cascading keys don't have a namespace. (The colon is also omitted.)
  • Inactive keys contain at least one part starting with . that is not . or ...
  • Canonical keys don't contain the parts . and .., empty parts or unprefixed array index parts.
  • No key part can start with %, unless % is the whole part.
  • The following escape sequence rules apply:
    • To prevent a slash / from terminating a key part, it must be preceded by a backslash \.
    • At the start of a key part, the characters ., %, # and @ have special meanings, to circumvent these meanings, the characters have to be preceded by a backslash \. If the characters appear anywhere else in a key part, the do not have to be escaped.
    • To insert a backlash \ into a key part, it must be preceded by a backslash \.
    • A backslash \ may not appear in any situation other than those described above. For example \a is always an invalid sequence and \@ is only valid if it appears at the start of a key part.

@kodebach
Copy link
Member Author

Unescaped Key Name Structure

The structure of unescaped key names is much simpler. Here is again a ANTLRv4 grammar:

grammar UnescapedKeyName;

unescapedKeyName
    :   NamespaceByte ('\u0000' unescapedPart?)* '\u0000'
    ;

unescapedPart : UChar+;

// 1 = cascading, 2 = meta, 3 = spec, ..., 7 = system
// equal to KEY_NS_* (modified)
NamespaceByte : [\u0001-\u0007] ;

UChar :   ~[\u0000] ;

ErrChar: . ;

Informally this just means:

  • All unescaped key names start with a byte indicating the namespace followed by a zero byte.
  • All unescaped key names end with a zero byte.
  • Between the first zero byte (after namespace byte) and the last zero byte (terminator), there is a sequence of arbitrary unescaped parts separated by zero bytes.

@kodebach
Copy link
Member Author

kodebach commented Oct 19, 2019

IMPORTANT: This an informal description, NOT an actual implementation guideline. Steps maybe combined, omitted if checked elsewhere etc.

Canonicalization of Key Names

The first step in elektraKeySetName is to canonicalize the name. Non-canonical key names are never stored.

Canonicalization doesn't touch a possibly present namespace. The rules below apply only to the keyPath part of the key name:

  • All repeated occurrences of / are replaced by a single / (// -> /, /// -> /, etc.).
  • All occurrences of /./ are replaced by a single / (/./ -> /, a/./b -> a/b, etc.).
  • All occurrences of /../ are replaced by a single / and the preceding key part is removed (a/../ -> /, a/b/../ -> a/, etc.). This introduces a rule not covered by the grammar above: The first part of a key must not be ...
  • Trailing slashes / are removed unless the whole keyPath is just / (a/ -> /, a/b/ -> a/b, / -> /, etc.).
  • Unprefixed array indices are properly prefixed. For an index with N digits we insert N-1 underscores _ between the hash # and the digits (#0 -> #0, #10 -> #_10, etc.). Another rule not covered by the grammar comes into play here: The index must be strictly less than 2^63, i.e. it must fit into a int64_t. If a key part starts with a \, no underscores are inserted.

@kodebach
Copy link
Member Author

kodebach commented Oct 19, 2019

IMPORTANT: This an informal description, NOT an actual implementation guideline. Steps maybe combined, omitted if checked elsewhere etc.

Unescaping of Key Names

The second step in elektraKeySetName is to convert the canonical key name into an unescaped key name.

This is done via the following rules:

  • The namespace is converted into the corresponding KEY_NS_* byte. For cascading keys we use KEY_NS_CASCADING. (Note: The values of KEY_NS_* will be modified to ensure the sort order proposed in order of namespaces in KeySet #3084)
  • After the namespace byte a zero byte is inserted.
  • Each key part is converted as follows:
    • We look at the first character in the key part and decided what to do:
      • If it is a backslash \, we just skip it and continue with the rest of the key part. If the backslash is not followed by \, /, ., %, @ or # it is an error.
      • If it is a percent sign %, we insert a zero byte and go to the next part. If there is anything else in this key part that would be an error.
      • If it is an at sign @, we have an error.
      • If it is a dot ., the key is marked as inactive.
    • Any occurrence of the sequence \/ is replaced with /.
    • Any occurrence of the sequence \\ is replaced with \.
    • All other characters are copied unchanged.
    • A zero byte is inserted at the end to separate the key part from the next one, or to terminate the whole unescaped key name.

@markus2330
Copy link
Contributor

Thank you so much for formalizing this. Please add this as docu in the PR.

@stale
Copy link

stale bot commented Oct 23, 2020

I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@stale stale bot added the stale label Oct 23, 2020
@markus2330
Copy link
Contributor

As it is already getting stale, slowly it is really time to get #3447 merged 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants