Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix line folding #87

Merged
merged 11 commits into from
Jun 27, 2024
Merged

Fix line folding #87

merged 11 commits into from
Jun 27, 2024

Conversation

kekavc24
Copy link
Contributor

@kekavc24 kekavc24 commented Jun 4, 2024

This PR:


  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Contribution guidelines:

Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

@kekavc24
Copy link
Contributor Author

kekavc24 commented Jun 4, 2024

@jonasfj This is the draft. Forgive the commits from #84, I will rebase this branch once it is merged.

Also, I added a funky method to reduce the code duplication after several iterations for both _tryYamlEncodeLiteral and _tryYamlEncodeFolded. Should I just duplicate the code? As both methods attempt to chomp strings the same way.

Will add tests for this then.

Comment on lines 190 to 191
/// Assumes the user did not try to account for windows documents by using
/// `\r\n` already
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to state that. The user should never account for encoding details.

Sematically, a YAML document consists of YamlNodes, see package:yaml
If a YamlNode is a YamlScalar then it can have a YamlScalar.value that is a string.
If value is a string, then that is the string that must be returned when the document is parsed.

If the user actually gives us a string that contains \r\n we can't encode it in folded or literal mode. The only way to represent a \r character is to do quoted string "\r\n" -- am I wrong? Is it possible to string escaping in folded or literal strings? I imagine it's not possible, but I could be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest you take a look at:
https://github.com/dart-lang/yaml_edit/blob/main/lib/src/editor.dart#L564-L583

Essentially, if I use YamlEditor.update(path, foo) then deepEquals(YamlEditor.parseAt(path), foo) == true. And this MUST always be true (assuming that deepEquals ignores the styling and only looks at .value).
No matter what string I insert or what style I attempt to use, YamlEditor must fallback to use a less pretty style, if it can't represent a string in the style I chose.

For example, if I try to use special characters or \0 null bytes or whatever, then this should still hold. But YamlEditor might have to fallback to quoted strings, even if I requested a Folded string style. The style is just a hint, not a requirement, because not all styles can be used for all values.

/// [_tryYamlEncodeFolded] when in [CollectionStyle.BLOCK].
(
String literalPrefix,
String indent,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning this seems a bit extreme. The caller already has indentSize, why can't the caller just do ' ' * indentSize, I think should be fine.

In general, I'd suggest we only pass around indentation as an integer.

String trimmedString,
String strippedString,
) _prepareBlockString(
String blockIndicator, String string, int indentSize, String lineEnding) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, feel free to insert assertions, they are practically free when a program is compiled in release mode. And a few sanity checks like this can only help catch bugs :D

Suggested change
String blockIndicator, String string, int indentSize, String lineEnding) {
String blockIndicator, String string, int indentSize, String lineEnding) {
assert(indentSize >= 0);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised there is no need to assert. We can pass in the indent and just apply to all strings that appear after \n

@kekavc24 kekavc24 requested a review from jonasfj June 6, 2024 08:59

final trimmedString = string.trimRight();
final removedPortion = string.substring(trimmedString.length);
String _tryYamlEncodeFolded(String string, int indentSize, String lineEnding) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be bigger than what you set out to do. But reading the code, I thinking that maybe we should restructure it a bit.


Maybe we should change how _tryYamlEncodeFolded and _tryYamlEncodeLiteral works.

What if we change the signature to:

/// Try to encode [value] as a _YAML folded string_ indent with [indent] spaces.
///
/// This method will apply an appropriate _chomping indicator_, if this is necessary
/// in order to correctly encode [value].
///
/// **Example** of a folded string.
/// ```yaml
/// key: >-
///   my folded
///   string
/// ```
///
/// See: https://yaml.org/spec/1.2.2/#813-folded-style
///
/// Returns `null`, if [value] cannot be encoded as a YAML folded string.
String? _tryYamlEncodeFolded(String value, int indent, String lineEnding) 

So we return null, when the value can't be encoded correctly as a folded string.

Then we get to change yamlEncodeBlockScalar such that it tries to encode in the style hinted by YamlScalar.style, and if this fails, we can fallback to a different block-mode, and if all block-modes fail we can fallback to quoted string mode.

This way, we don't have to write yamlEncodeBlockScalar such that it figures out of the value can be encoded in folded mode, and then _tryYamlEncodeFolded does the encoding.
Which is also weird, because _tryYamlEncodeFolded starts with try, but it actually doesn't have a mechanism for communicating back to say "oh, I can't correctly encode this value".

This way: _tryYamlEncodeFolded is responsible for:

  • (A) Determining if value can be encoded as a folded string.
  • (B) Actually encoding value as a folded string.

And yamlEncodeBlockScalar is responsible for:

  • (1) Trying _tryYamlEncodeFolded or _tryYamlEncodeLiteral (depending on YamlScalar.style).
  • (2) Falling back to another style if the preferred style couldn't be used.
  • (3) Falling back to using quoted string encoding (which always works), if none of the block styles could be used.

And yamlEncodeBlockScalar doesn't have to know what values can be encoded in what styles, it just determines the preferred option and tries it.

Copy link
Contributor Author

@kekavc24 kekavc24 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see this. Lemme get to it. Scratch the request to review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rush :)

We can also do it in a follow up.

Btw, there values we can't encode right?
Like special characters and some weird combination of whitespace and line breaks I would also assume.

Copy link
Contributor Author

@kekavc24 kekavc24 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, this PR is good to go.

No rush :)

We can also do it in a follow up.

Btw, there values we can't encode right? Like special characters and some weird combination of white space and line breaks I would also assume.

Your suggestions in the previous comment are in the ball park of what I had in mind.

I realised the code below makes the tests pass and we never get to encode the folded string as >+ my string as it trims the space as the end including \n. I removed #44 for that reason.

https://github.com/kekavc24/yaml_edit/blob/08a146ef8f2c7aba908d2017c9ec840b882c9e51/lib/src/strings.dart#L186-L196

No rush :)

We can also do it in a follow up.

Btw, there values we can't encode right? Like special characters and some weird combination of white space and line breaks I would also assume.

Yeah. The trailing line breaks and white-space stand out. Your suggestions actually make it possible to correctly debug what the issue is. I'll look into it.

I can still tackle it in this PR. No issue. Your call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestions actually make it possible to correctly debug what the issue is. I'll look into it.

I can still tackle it in this PR. No issue. Your call.

It's probably a good idea to spend some time making sure we get this right :D

It'd be really nice if we didn't have so many weird string encoding bugs.

@jonasfj
Copy link
Member

jonasfj commented Jun 6, 2024

@kekavc24 I landed #84 so please rebase this PR, or merge the main branch into it.

@kekavc24 kekavc24 marked this pull request as ready for review June 6, 2024 12:54
@kekavc24 kekavc24 requested a review from jonasfj June 6, 2024 12:55
@kekavc24
Copy link
Contributor Author

kekavc24 commented Jun 6, 2024

@jonasfj I've updated the change log with changes from #84 too.

String literalPrefix,
String trimmedString,
String strippedString
) _prepareBlockString(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this method is worth the dedupliation?

It seems weird that we do trimmed.replaceAll('\n', lineEnding + indent) and then inside _tryYamlEncodeFolded we split it on lineEnding.

It's unclear what we return from this method too.

I think may we should just let this be duplicated into the two method.

Then _tryYamlEncodeFolded can split on \n instead of splitting on lineEnding, which is probably going to result in the same thing, but hopefully going to be less confusing.

Copy link
Contributor Author

@kekavc24 kekavc24 Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was part of the existing code. I just extracted that function because encoding with ScalarStyle.FOLDED or ScalarStyle.LITERAL follow the same initial steps.

It seems weird that we do trimmed.replaceAll('\n', lineEnding + indent) and then inside _tryYamlEncodeFolded we split it on lineEnding.

It's unclear what we return from this method too.

I think may we should just let this be duplicated into the two method.

Then _tryYamlEncodeFolded can split on \n instead of splitting on lineEnding, which is probably going to result in the same thing, but hopefully going to be less confusing.

I am getting confused. Help me clarify something. If I do,

final multiLine = '''
My multiline string
is great
''';

Will Dart return the line breaks as \r\n when on Windows or just plain \n. This influenced my decision when refactoring that piece of code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't know, I would suspect it always uses \n, because anything just seems crazy 🤣

But as far as yaml_edit is concerned I don't see how that's relevant.

If package:yaml parses a YAML document that uses \n\r as line breaks, it won't included that kind of line breaks in folded or literal strings.

The only way to get that into a string in YAML is to use a double quoted string with escape characters: "\n\r".

I'm like 99% sure that parsing YAML on Windows won't cause multi-line strings to be broken by \n\r, they'll always be broken by \n once parsed. The actual YAML file on disk should contain \n\r, that's correct. But when parsed the YamlScalar.value should never be a string that contains \n\r, unless it was a double quoted string with an escape sequence.

Does that makes sense? Yes it's easy to get lost, and I'll admit there is a fair 12.3% risk I missed something here 🤣

Comment on lines 118 to 120
trimmedString =
trimmedSplit.foldIndexed(first, (index, previous, current) {
if (index == 0) return previous;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at reduceIndexed, in which case you might not need the index and you can just use reduce.

Calls combine for each element except the first.

Then you probably won't need the if (trimmedSplit.length > 1) {... else .... either.

I suspect that [foo].reduce(...) == foo (a list of length one always reduces to the first element).


final trimmedString = string.trimRight();
final removedPortion = string.substring(trimmedString.length);
String _tryYamlEncodeFolded(String string, int indentSize, String lineEnding) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestions actually make it possible to correctly debug what the issue is. I'll look into it.

I can still tackle it in this PR. No issue. Your call.

It's probably a good idea to spend some time making sure we get this right :D

It'd be really nice if we didn't have so many weird string encoding bugs.

lib/src/strings.dart Outdated Show resolved Hide resolved
Comment on lines +8 to +9
- Fix error thrown when inserting in nested list using `spliceList` method
([#83](https://github.com/dart-lang/yaml_edit/issues/83))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to file this as a separate PR, if you want. Then we get to land it quickly :D

Tagging it along here is harmless though.

@kekavc24 kekavc24 requested a review from jonasfj June 10, 2024 08:16
///
/// Apply the `keep (+)` chomping indicator for trailing whitespace to be
/// treated as content.
if (string.endsWith('\n ') || string.endsWith('\n')) return '+';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  if (string.endsWith('\n ') || string.endsWith('\n')) return '+';

What if it ends with '\n '?

I'm asking because I'm not 100% sure 🤣

It just seems like we could use a regular expression, like \n *$

Comment on lines 105 to 63
/// It is important that we ensure that [string] is free of unprintable
/// characters by calling [_hasUnprintableCharacters] before invoking this
/// function.
String _tryYamlEncodeSingleQuoted(String string) {
String? _tryYamlEncodeSingleQuoted(String string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we update the documentation comment, to say that it return null, if [string] can't be encoded as a single quoted string?

Also I guess we can remove the part from the documentation string about _hasUnprintableCharacters

And instead, just check if string contains anything from _hasUnprintableCharacters, and if so, just return null. Then the documentation comment becomes much simpler.

Like:

/// Encodes [string] as YAML single quoted string.
///
/// Returns `null`, if [string] can't be encoded as single-quoted string.
/// This might happen if it contains line-breaks or [_hasUnprintableCharacters].
///
/// See: https://yaml.org/spec/1.2.2/#732-single-quoted-style

Comment on lines 237 to 250
if (!_hasUnprintableCharacters(val)) {
if (value.style == ScalarStyle.SINGLE_QUOTED) {
encoded = _tryYamlEncodeSingleQuoted(val);
} else if (value.style == ScalarStyle.FOLDED) {
encoded = _tryYamlEncodeFolded(val, indentation, lineEnding);
} else if (value.style == ScalarStyle.LITERAL) {
encoded = _tryYamlEncodeLiteral(val, indentation, lineEnding);
}
}

if (encoded != null) return encoded;
}

return _tryYamlEncodePlain(value.value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this?

We make all the _try... methods return null if they can't encode correctly.
And they we try the best option first, and just try a few relevant fallbacks.

It's probably rare that literal works when folded doesn't, and vice versa, but it can happen, right? And trying is probably rather cheap.

I realize that we do end up checking _hasUnprintableCharacters inside each method. But I'm guess it's fairly cheap, and we'll only run it more than once in the cases where there actually is something unprintable. So it won't really affect performance much (after all, it's rare that there are unprintable characters).

Suggested change
if (!_hasUnprintableCharacters(val)) {
if (value.style == ScalarStyle.SINGLE_QUOTED) {
encoded = _tryYamlEncodeSingleQuoted(val);
} else if (value.style == ScalarStyle.FOLDED) {
encoded = _tryYamlEncodeFolded(val, indentation, lineEnding);
} else if (value.style == ScalarStyle.LITERAL) {
encoded = _tryYamlEncodeLiteral(val, indentation, lineEnding);
}
}
if (encoded != null) return encoded;
}
return _tryYamlEncodePlain(value.value);
switch (value.style) {
// Prefer 'plain', fallback to "double quoted"
case ScalarStyle.PLAIN:
return _tryYamlEncodePlain(value.value) ?? _yamlEncodeDoubleQuoted(val);
// Prefer 'single quoted', fallback to "double quoted"
case ScalarStyle.SINGLE_QUOTED:
return _tryYamlEncodeSingleQuoted(val) ?? _yamlEncodeDoubleQuoted(val);
// Prefer folded string, try literal as fallback
// otherwise fallback to "double quoted"
case ScalarStyle.FOLDED:
return _tryYamlEncodeFolded(val, indentation, lineEnding)
?? _tryYamlEncodeLiteral(val, indentation, lineEnding)
?? _yamlEncodeDoubleQuoted(val);
// Prefer literal string, try folded as fallback
// otherwise fallback to "double quoted"
case ScalarStyle.LITERAL:
return _tryYamlEncodeLiteral(val, indentation, lineEnding)
?? _tryYamlEncodeFolded(val, indentation, lineEnding)
?? _yamlEncodeDoubleQuoted(val);
// Prefer plain, try 'single quote', fallback to "double quoted"
default:
return _tryYamlEncodePlain(value.value)
?? _tryYamlEncodeSingleQuoted(val)
?? _yamlEncodeDoubleQuoted(val);
}
}

Copy link
Contributor Author

@kekavc24 kekavc24 Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tryYamlEncodePlain internally falls back to _yamlEncodeDoubleQuoted. Should I make it return null if it fails instead? I thought of changing the name but left it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's, then it'll be consistent.

If we do a switch statement like above, we get to deal with one thing in each method. And the heuristics of what we fall back to is all located in the switch statement.

@kekavc24 kekavc24 requested a review from jonasfj June 11, 2024 17:01
Comment on lines 256 to 224
final isString = value is String;

return _tryYamlEncodePlain(value.value);
if (isString && _hasUnprintableCharacters(value)) {
return _yamlEncodeDoubleQuoted(value);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If value is not a string, then we can always do a plain encoding, right?

if (value is! string) {
 return value.toString();
}

or since we know that a plain encoding will work we can do:

if (value is! string) {
 return _tryYamlEncodePlain(value)!;
}

Or we could make a method called yamlEncodePlain(value) which will do a plain encoding, and throw an AssertionError if that's not possible.

Comment on lines 275 to 244
/// Prefer folded string, try literal as fallback
/// otherwise fallback to "double quoted"
case ScalarStyle.FOLDED when isString:
return _tryYamlEncodeFolded(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);

/// Prefer literal string, try folded as fallback
/// otherwise fallback to "double quoted"
case ScalarStyle.LITERAL when isString:
return _tryYamlEncodeLiteral(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the comments.

Suggested change
/// Prefer folded string, try literal as fallback
/// otherwise fallback to "double quoted"
case ScalarStyle.FOLDED when isString:
return _tryYamlEncodeFolded(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);
/// Prefer literal string, try folded as fallback
/// otherwise fallback to "double quoted"
case ScalarStyle.LITERAL when isString:
return _tryYamlEncodeLiteral(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);
/// Prefer folded string, fallback to "double quoted"
case ScalarStyle.FOLDED when isString:
return _tryYamlEncodeFolded(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);
/// Prefer literal string, fallback to "double quoted"
case ScalarStyle.LITERAL when isString:
return _tryYamlEncodeLiteral(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);

or try with two fallbacks, though I can see that this might be overkill 🤣

Suggested change
/// Prefer folded string, try literal as fallback
/// otherwise fallback to "double quoted"
case ScalarStyle.FOLDED when isString:
return _tryYamlEncodeFolded(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);
/// Prefer literal string, try folded as fallback
/// otherwise fallback to "double quoted"
case ScalarStyle.LITERAL when isString:
return _tryYamlEncodeLiteral(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);
/// Prefer folded string, try literal as fallback
/// otherwise fallback to "double quoted"
case ScalarStyle.FOLDED when isString:
return _tryYamlEncodeFolded(value, indentation, lineEnding) ??
_tryYamlEncodeLiteral(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);
/// Prefer literal string, try folded as fallback
/// otherwise fallback to "double quoted"
case ScalarStyle.LITERAL when isString:
return _tryYamlEncodeLiteral(value, indentation, lineEnding) ??
_tryYamlEncodeFolded(value, indentation, lineEnding) ??
_yamlEncodeDoubleQuoted(value);

Comment on lines 51 to 66
/// Checks if a [string] has any unprintable characters or characters that
/// should be explicitly wrapped in double quotes according to
/// [unprintableCharCodes] and [doubleQuoteEscapeChars] respectively.
///
/// It should be noted that this check excludes the `\n` (line break)
/// character as it is encoded correctly when using [ScalarStyle.LITERAL] or
/// [ScalarStyle.FOLDED].
bool _shouldDoubleQuote(String string) {
if (string.isEmpty || string.trimLeft().length != string.length) return true;

final codeUnits = string.codeUnits;

return doubleQuoteEscapeChars.keys
.whereNot((charUnit) => charUnit == 10) // Anything but line breaks
.any(codeUnits.contains);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we remove this method?

It's too broad.

Example of things it does wrong:

  • empty string can be encoded with single quoted strings.
  • ", / and \ can all be encoded in single quoted mode, folded and literal mode.
  • Can \t be encoded in literal mode? maybe? perhaps that's not wise to do?

How about inside each:

  • _tryYamlEncodeSingleQuoted
  • _tryYamlEncodeFolded
  • _tryYamlEncodeLiteral

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a bit of refactoring in #88

@kekavc24
Copy link
Contributor Author

@jonasfj I cherry-picked your changes and fixed some of the issues. random_test.dart is kinda flaky and confusing.

I've managed to implement a naive string wrapping for folded strings at 80 characters (as mentioned here). Still a WIP.

I resolved to wrap the individual lines as I am folding the entire string. Any suggestions for improvement are welcome,

@kekavc24 kekavc24 requested a review from jonasfj June 17, 2024 16:09
);

expect(doc.toString(), equals('|+\n test\n test\n '));
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I tried something like this in an online editor that converted yaml to JSON and I kept getting the resulting string as: "test\ntest\n \n"

Probably because the editor always ended the YAML document with a line-break: |+\n test\n test\n would become: |+\n test\n test\n \n (because files normally end with a line break).

I tried to recreate it, and it wasn't a problem. But I'm guessing it's unwise for us to produce a YAML document where adding empty lines to the YAML document causes it to change its semantics.

If I parse the YAML document |+\n test\n test\n \n, then I won't get "test\ntest\n \n".

Example:

  • YAML: A: |+\n test\n test\n
    • YAML to JSON: {"A":"test\ntest\n "} (this is correct!)
  • If I add \n at the end of the YAML I get: A: |+\n test\n test\n \n
    • YAML to JSON: {"A":"test\ntest\n \n"} (this is wrong)
  • If I add \nB: true at the end of the YAML I get: A: |+\n test\n test\n \nB: true
  • YAML to JSON: {"A":"test\ntest\n \n","B":true} (this is wrong)

So if we allow this as a literal string, we are VERY likely to cause all sorts of breakages next time we try to append a key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another variant of this test case:

      test('generates literal string with trailing line break and space (2)',
          () {
        final doc = YamlEditor('key1: true');

        doc.update(
          ['key2'],
          wrapAsYamlNode('test\ntest\n ', scalarStyle: ScalarStyle.LITERAL),
        );
      });

it causes a crash:

  Assertion failed: (package:yaml_edit) Modification did not result in expected result.
  
  # YAML before edit:
  > key1: true
  
  # YAML after edit:
  > key1: true
  > key2: |+
  >     test
  >     test
  >      
  > 

I'm beginning to see how a literal string ending in \n might have to be encoded as a double quoted string. 🤣

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider falling back to double quoted mode for literal strings ending in newline + whitespace?

The whole adding a new key afterwards seems like a reasonable thing, and it's a little sad that this can't work.

Of course, we can't know what document the user gives us. So we could also view this as an orthogonal bug to do with adding keys to block maps and entries to block lists, because those are the scenarios where this could go wrong.

I don't know if there is a reasonably way to fix adding keys to block maps that have a preceeding key-value pair where the value is a literal string ending in \n 🤣 -- definitely we should not try to fix that in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also isn't it likely that the chomping indicator + is always going to cause us this kind of problems? Just asking, I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasfj The final \n just before the next key is the issue.

It seems chomping indicators only control the final line break before going to the next YamlScalar (value in list or key in map) and not content within the scalar itself (which is only controlled by either > or |)

After some investigation, we can exclusively use the strip chomping indicator - for both ScalarStyle.FOLDED and ScalarStyle.LITERAL. It always indicates we don't want the last line break that's just before the next key in map or value in the list.

😅 The fixes I've tried coming up with for the keep chomping indicator + bug were all shots in the dark and just another bug creeping in.

I'm sorry and thanks! The bug was staring right at us from the beginning. This is the line I should have removed before fixing _tryYamlEncodeFolded

if (removedPortion.contains('\n')) {
result = '>+\n' + ' ' * indentation;
} else {
result = '>-\n' + ' ' * indentation;
}

lib/src/strings.dart Outdated Show resolved Hide resolved
lib/src/strings.dart Outdated Show resolved Hide resolved
@jonasfj
Copy link
Member

jonasfj commented Jun 18, 2024

Wow, this starting to be a lot, I tried diving in a little bit.

My quick takeaway after seeing a few more bugs is that maybe we should be more conservative about what kind of strings we attempt to encode as literal strings.

Perhaps also as folded strings. Perhaps it's better to fallback to double quote string in the edge cases, rather than too complicated code. Certainly, we can always try to improve and not make it fallback in a future PR.

> Fix invalid encoding for string with trailing line breaks or white-space
> Update `_tryYamlEncodedPlain` to return null if string cannot be encoded
> Make `_yamlEncodeFlowScalar` and `yamlEncodeBlockScalar` code concise
> `_yamlEncodeFlowScalar` and `_yamlEncodeBlockScalar` always encode YamlScalars
@kekavc24
Copy link
Contributor Author

kekavc24 commented Jun 19, 2024

It seems chomping indicators only control the final line break before going to the next YamlScalar (value in list or key in map) and not content within the scalar itself (which is only controlled by either > or |).

After some investigation, we can exclusively use the strip chomping indicator - for both ScalarStyle.FOLDED and ScalarStyle.LITERAL. It always indicates we don't want the last line break that's just before the next key in map or value in the list.

@jonasfj I'm sorry for the chomping indicator oversight on my part.

I've reduced the scope of this PR and fixed the chomping indicator issue (or rather removed the chomping indicator + references). If you still want me to work on this or any other issue, I am happy to.

@kekavc24 kekavc24 requested a review from jonasfj June 19, 2024 03:58
lib/src/strings.dart Outdated Show resolved Hide resolved
}
/// Simplest block style.
/// * https://yaml.org/spec/1.2.2/#812-literal-style
return '|-\n$indent${string.replaceAll('\n', lineEnding + indent)}';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better than what we had, so let's accept it.

But I suspect this could cause issues for strings that end in empty lines or whitespace, or combination thereof.

@jonasfj
Copy link
Member

jonasfj commented Jun 27, 2024

I think we should land this as is, it's an overall improvement.

I think we should try to add more tests that challenge _tryYamlEncodeFolded and _tryYamlEncodeLiteral.
In particular with whitespace and linebreaks, in the middle, start and end.

I think we should try to just make smaller PRs that only touch either _tryYamlEncodeFolded or _tryYamlEncodeLiteral, because small improvements are easier to land.

@jonasfj jonasfj merged commit ad3292c into dart-lang:main Jun 27, 2024
2 checks passed
@kekavc24 kekavc24 deleted the fix-line-folding branch June 27, 2024 15:16
copybara-service bot pushed a commit to dart-lang/sdk that referenced this pull request Jun 27, 2024
… source_span, stream_channel, term_glyph, tools, vector_math, watcher, webdev, yaml_edit

Revisions updated by `dart tools/rev_sdk_deps.dart`.

dartdoc (https://github.com/dart-lang/dartdoc/compare/88df88c..7e5da60):
  7e5da609  2024-06-26  Sam Rawlins  Simplify some end2end tests into smaller tests (dart-lang/dartdoc#3795)
  36f1fc74  2024-06-26  Sam Rawlins  Store comment reference data on ModelNode (dart-lang/dartdoc#3797)
  7ed6ef80  2024-06-24  Sam Rawlins  Hide constructors that cannot be called or referenced by user code (dart-lang/dartdoc#3796)

ecosystem (https://github.com/dart-lang/ecosystem/compare/31148cd..54ca01a):
  54ca01a  2024-06-25  Devon Carew  improvements for transferred issues (dart-lang/ecosystem#274)

html (https://github.com/dart-lang/html/compare/3bc803d..f6c2c71):
  f6c2c71  2024-06-24  Kevin Moore  update lints (dart-lang/html#249)

http (https://github.com/dart-lang/http/compare/4d8e7ef..bf96551):
  bf96551  2024-06-26  Nate Bosch  Call out more limitations of runWithClient (dart-lang/http#1248)
  eb189e1  2024-06-26  Brian Quinlan  [cupertino_http] Fix a bug where content-length was not set for multipart messages (dart-lang/http#1247)
  0bbd166  2024-06-25  Hossein Yousefi  [cronet_http] Upgrade jni and jnigen to 0.9.3 (dart-lang/http#1246)
  dce17c6  2024-06-24  Kevin Moore  misc: add missing packages to root readme (dart-lang/http#1245)
  0b40397  2024-06-24  Kevin Moore  update lints (dart-lang/http#1244)
  2971f32  2024-06-24  Kevin Moore  Update lints (dart-lang/http#1243)
  497bc15  2024-06-24  Parker Lougheed  Update various flutter.dev and dart.dev links (dart-lang/http#1238)
  5f6fcae  2024-06-24  Parker Lougheed  Fix some minor spelling and grammar mistakes (dart-lang/http#1239)
  b7ec613  2024-06-24  Anikate De  pkgs/ok_http: DevTools Networking Support. (dart-lang/http#1242)

json_rpc_2 (https://github.com/dart-lang/json_rpc_2/compare/5b1cbd6..616937f):
  616937f  2024-06-26  Kevin Moore  Update lints, test JS & Wasm (dart-lang/json_rpc_2#116)

mime (https://github.com/dart-lang/mime/compare/4ca2f5e..fd7010b):
  fd7010b  2024-06-24  Kevin Moore  bump lints (dart-lang/mime#127)
  99e9855  2024-06-24  Jonas Finnemann Jensen  Add `topics` to `pubspec.yaml` (dart-lang/mime#121)

mockito (https://github.com/dart-lang/mockito/compare/2302814..9deddcf):
  9deddcf  2024-06-25  Sam Rawlins  Fix lint in examples.
  e99ba54  2024-06-24  Sam Rawlins  Properly generate code for parameter default value Strings.

source_span (https://github.com/dart-lang/source_span/compare/59a3903..89520f3):
  89520f3  2024-06-24  Kevin Moore  bump lints (dart-lang/source_span#114)

stream_channel (https://github.com/dart-lang/stream_channel/compare/b41ff7a..dc620d2):
  dc620d2  2024-06-24  Kevin Moore  bump lints (dart-lang/stream_channel#108)

term_glyph (https://github.com/dart-lang/term_glyph/compare/c86e817..6c2a977):
  6c2a977  2024-06-24  Kevin Moore  bump lints (dart-lang/term_glyph#54)

tools (https://github.com/dart-lang/tools/compare/e660a68..4c60686):
  4c60686  2024-06-24  Christopher Fujino  delete old log file if it exceeds kMaxLogFileSize of 25MiB (dart-lang/tools#277)
  7a231e5  2024-06-24  Parker Lougheed  [unified_analytics] Add lints to consistently use final (dart-lang/tools#278)

vector_math (https://github.com/google/vector_math.dart/compare/e7b11a2..a4304d1):
  a4304d1  2024-06-24  Kevin Moore  Update and fix lints, require Dart 3.1 (google/vector_math.dart#328)
  253f69b  2024-06-24  Kevin Moore  blast_repo fixes (google/vector_math.dart#327)

watcher (https://github.com/dart-lang/watcher/compare/c00fc2a..f312f1f):
  f312f1f  2024-06-24  Kevin Moore  update lints (dart-lang/watcher#169)

webdev (https://github.com/dart-lang/webdev/compare/c566112..75417c0):
  75417c09  2024-06-26  Ben Konyi  Remove remaining MV2 logic (dart-lang/webdev#2453)

yaml_edit (https://github.com/dart-lang/yaml_edit/compare/08a146e..ad3292c):
  ad3292c  2024-06-27  Kavisi  Fix line folding (dart-lang/yaml_edit#87)

Change-Id: Ie2f914dcf97e10cc5a8501c926ddd7e254a0d151
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/373443
Auto-Submit: Devon Carew <devoncarew@google.com>
Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
Reviewed-by: Konstantin Shcheglov <scheglov@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants