Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enum name consisting of symbols lead to conflicting enum values #1261

Merged
merged 8 commits into from
Dec 8, 2023

Conversation

vinayak-kukreja
Copy link
Contributor

@vinayak-kukreja vinayak-kukreja commented Nov 21, 2023

Characters like, !=, ==, =~, !~ etc would result in VALUE_ enum values since we are not able to code generate these characters to something valid. This PR is adding an allowlist with some known characters to provide valid enum values if such a case occurs.

Fixes cdk8s-team/cdk8s-cli#578

…nums values

Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>
const lowerCaseValue = value?.toString().toLowerCase();
if (lowerCaseValue && !processedValues.has(lowerCaseValue)) {
processedValues.add(lowerCaseValue);
if (value && isAllowlistedCharacter(value.toString())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a character you're checking for, but a complete num value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your lookup table. But is this going to deal with characters anywhere they occur? Would we want to, or is that not necessary?

How about we do something in a loop to replace all occurrences of groups of symbols with words? (This might be overkill, I'm not sure about the domain)

const memberName = snakeCase(rewriteNamedSymbols(`${value}`).replace(/[^a-z0-9]/gi, '_')).split('_').filter(x => x).join('_').toUpperCase();

function rewriteNamedSymbols(value: string) {
  const ret = new Array<string>();
  while (value.length > 0) {
    const [prefixName, prefixLen] = longestPrefixMatch(value, NAMED_SYMBOLS);
    if (prefixName) {
      ret.push(`_${prefixName}_`);
      value = value.slice(prefixLen);
    } else {
      ret.push(value.charAt(0));
      value = value.slice(1);
    }
  }
  if (ret[0] === '_') { ret.unshift('VALUE'); }
  if (ret[ret.length - 1] === '_') { ret.pop(); }
  return ret.join('');
}

function longestPrefixMatch(x: string, lookupTable: Record<string, string>) {
  let ret: string | undefined;
  let longest: number = 0;
  for (const [name, value] of Object.entries(lookupTable)) {
    if (x.startsWith(value) && value.length > longest) {
      ret = name;
      longest = value.length;
    }
  }
  return [ret, longest];
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, taking a look.

For context, we import Custom Resource Definitions(these are json schemas) in cdk8s and there is an issue with importing an enum in Prometheus CRD: https://github.com/prometheus-operator/kube-prometheus/blob/373e2b415bd0ddbcb049dea54db80c2a78b76945/manifests/setup/0alertmanagerConfigCustomResourceDefinition.yaml#L67-L71

It does not have these symbols occurring between other words but it makes sense to add this. My thought process here was to just add an allowlist so that we can add to that list if we agree with providing an exception for the user.

@@ -0,0 +1,58 @@
const allowListCharacters = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in SHOUTY_CASE since it's a constant.

And "allow list" is very generic. How about NAMED_SYMBOLS ?

Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>
Comment on lines 355 to 359
VALUE_0_PERIOD = 0.9,
/** 0.95 */
VALUE_0_95 = 0.95,
VALUE_0_PERIOD_95 = 0.95,
/** 0.99 */
VALUE_0_99 = 0.99,
VALUE_0_PERIOD_99 = 0.99,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callout: Looks like this would be a breaking change and would change enum names in certain scenarios.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch, that's probably not great.

Can we make an exception for a period inside numbers?

@vinayak-kukreja vinayak-kukreja changed the title fix: name consisting of only special characters lead to conflicting enum values fix!: name consisting of only special characters lead to conflicting enum values Dec 4, 2023
@vinayak-kukreja vinayak-kukreja changed the title fix!: name consisting of only special characters lead to conflicting enum values fix!: enum name consisting of symbols lead to conflicting enum values Dec 4, 2023
Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>
/** Foo-Bar */
FOO_BAR = \\"Foo-Bar\\",
/** Bar_Foo */
BAR_FO = \\"Bar_Foo\\",
Copy link
Contributor Author

@vinayak-kukreja vinayak-kukreja Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look correct. Taking a look.

  /** Bar_Foo */
  BAR_FO = \\"Bar_Foo\\",

Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>
@vinayak-kukreja vinayak-kukreja changed the title fix!: enum name consisting of symbols lead to conflicting enum values fix: enum name consisting of symbols lead to conflicting enum values Dec 5, 2023

function isExemptedSymbolPattern(input: string): boolean {
// Decimal numbers
if (/^\d*\.\d+$/.test(input)) {
Copy link
Contributor

@rix0rrr rix0rrr Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does something else than it looks like it's doing, and that's dangerous (I'm not sure if it's a bug, but at least it's confusing).

isExemptedSymbolPattern will only ever be called on the start of a string, because we keep constantly slicing off the start of the string that we have already replaced. So the "cursor" is focused on the . right now, and we've already gotten rid of everything in front of the cursor.

That means the \d* will never match anything, because it has already been removed.

That's fine, it apparently works because the tests are passing, but we're not testing for what this regex looks to be indicating we are: we're not testing for a . in between numbers: we're only testing for a . followed by numbers.

I suppose the difference should show up if we were going to translate an enum value like 1., which in the old system would be replaced by VALUE_1_ and in the new one by VALUE_1_PERIOD ?

So -- is that a problem? I'm not sure, I'll leave that to you. But if you're going to leave the current code, please update the regex so it doesn't look like it's matching digits at the start anymore.


If you want to be able to test full context, we're going to have to change the code a little:

Instead of slicing off the start of the string, every time we match something, instead we'll keep a counter ofs and we'll iterate over the string without mutating it:

let ofs = 0;
while (ofs < value.length) {
   const [prefixName, prefixLen] = longestPrefixMatch(value, ofs, NAMED_SYMBOLS);
   // ...
   ofs += prefixLen;
}

value.startsWith(x) now needs to be replaced with something like hasSubStringAt(value, x, ofs).

Finally, for the regex, we need to start searching at a specific index, instead of at the start. For that we use the flag y (sticky), which makes a regex match at a particular index, and not anywhere else. Plus, we acn use lookbehind assertions (?<=...) to say that something must be true for the string before the match, without actually being part of the match itself (or its index).

// RegExp API is a bit wonky in JavaScript so let's make a helper
function testRegexAt(regex: string, haystack: string, index: number) {
  const re = new RegExp(regex, 'y'); 
  re.lastIndex = index;
  return re.test(haystack);
}

function isExemptPattern(value: string, ofs: number) {
  const exemptPatterns = [
    // Note: use lookbehind to test for things before the current position
    '(?<=\d)\.\d',
    // ...more patterns here...?
   ];
   
  return exemptPatterns.some((p) => testRegexAt(p, value, ofs));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, right, I am dumb 😮‍💨

Thanks Rico, this was really helpful. Taking a look. 🙏

Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>
Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>
Comment on lines +296 to +301
/** .9 */
VALUE_PERIOD_9 = \\".9\\",
/** 9 */
VALUE_9 = \\"9\\",
/** 9. */
VALUE_9_PERIOD = \\"9.\\",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callout:

This is a breaking change but I believe this is needed since it was not working and would give the same value.
Similar to these are,

Foo.
.Foo

For instance, if I run this with mainline, I see:

      | index.ts:39:3 - error TS2300: Duplicate identifier 'FOO'.
      | 39   FOO = ".Foo",
      |      ~~~
      | index.ts:43:3 - error TS2300: Duplicate identifier 'FOO'.
      | 43   FOO = "Foo.",
      |      ~~~
      | index.ts:47:3 - error TS2300: Duplicate identifier 'VALUE_9'.
      | 47   VALUE_9 = ".9",
      |      ~~~~~~~
      | index.ts:49:3 - error TS2300: Duplicate identifier 'VALUE_9'.
      | 49   VALUE_9 = "9",
      |      ~~~~~~~
      | index.ts:51:3 - error TS2300: Duplicate identifier 'VALUE_9'.
      | 51   VALUE_9 = "9.",

Signed-off-by: Vinayak Kukreja <vinakuk@amazon.com>
Comment on lines +283 to +285
FOO_HYPHEN_BAR = \\"Foo-Bar\\",
/** Foo_Bar */
FOO_UNDERSCORE_BAR = \\"Foo_Bar\\",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to prior, these are breaking and would not work with our current code. Both would give FOO_BAR.

@cdklabs-automation cdklabs-automation added this pull request to the merge queue Dec 8, 2023
Merged via the queue into main with commit 1285288 Dec 8, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Importing prometheus-operator CRD causes compile to fail
3 participants