Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tab support for indentation stripping #9971

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

thecaralice
Copy link

Motivation

Many users prefer using hard tabs for indentation, but the existing indentation stripping logic only considers spaces as indentation characters and therefore doesn't strip any tab characters.

Context

Closes #7834.

Previously, stripIndentation would look for the longest common space-consisting line prefix; now it finds either the longest common space-consisting or tab-consisting prefix. A tab after a space is not considered as indentation, neither is a space after a tab.

@github-actions github-actions bot added the with-tests Issues related to testing. PRs with tests have some priority label Feb 8, 2024
@tomberek
Copy link
Contributor

tomberek commented Feb 9, 2024

First consideration is if this changes historical evaluation values.

@edolstra
Copy link
Member

edolstra commented Feb 9, 2024

See #2911, where the conclusion was that we cannot make this incompatible change without some kind of language versioning mechanism.

@thecaralice
Copy link
Author

See #2911, where the conclusion was that we cannot make this incompatible change without some kind of language versioning mechanism.

This change is unlikely to break anything, as it only changes how tab-indented files get parsed. More precisely, for an indented string to be affected, every line of text in it must start with a tab and there must be no additional indentation, for example:

''
<tab>something
<tab>something else
<tab>something again
''

The following, however, is not affected by the change:

''
  <tab>something
  <tab>something else
  <tab>something again
''

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-02-28-nix-team-meeting-129/40499/1

@pennae pennae mentioned this pull request Feb 29, 2024
Copy link
Contributor

@fricklerhandwerk fricklerhandwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gather from #7834 that the actual issue is subtle behavior happening silently, therefore the change mainly needs an (opt-out) warning or error on mixed indentation to reduce opportunities for problems rather than introducing more of them.

Also this needs a release note and documentation in the manual, as this would be a substantial change.

@@ -1 +1 @@
"This is an indented multi-line string\nliteral. An amount of whitespace at\nthe start of each line matching the minimum\nindentation of all lines in the string\nliteral together will be removed. Thus,\nin this case four spaces will be\nstripped from each line, even though\n THIS LINE is indented six spaces.\n\nAlso, empty lines don't count in the\ndetermination of the indentation level (the\nprevious empty line has indentation 0, but\nit doesn't matter).\nIf the string starts with whitespace\n followed by a newline, it's stripped, but\n that's not the case here. Two spaces are\n stripped because of the \" \" at the start. \nThis line is indented\na bit further.\nAnti-quotations, like so, are\nalso allowed.\n The \\ is not special here.\n' can be followed by any character except another ', e.g. 'x'.\nLikewise for $, e.g. $$ or $varName.\nBut ' followed by ' is special, as is $ followed by {.\nIf you want them, use anti-quotations: '', \${.\n Tabs are not interpreted as whitespace (since we can't guess\n what tab settings are intended), so don't use them.\n\tThis line starts with a space and a tab, so only one\n space will be stripped from each line.\nAlso note that if the last line (just before the closing ' ')\nconsists only of whitespace, it's ignored. But here there is\nsome non-whitespace stuff, so the line isn't removed. \nThis shows a hacky way to preserve an empty line after the start.\nBut there's no reason to do so: you could just repeat the empty\nline.\n Similarly you can force an indentation level,\n in this case to 2 spaces. This works because the anti-quote\n is significant (not whitespace).\nstart on network-interfaces\n\nstart script\n\n rm -f /var/run/opengl-driver\n ln -sf 123 /var/run/opengl-driver\n\n rm -f /var/log/slim.log\n \nend script\n\nenv SLIM_CFGFILE=abc\nenv SLIM_THEMESDIR=def\nenv FONTCONFIG_FILE=/etc/fonts/fonts.conf \t\t\t\t# !!! cleanup\nenv XKB_BINDIR=foo/bin \t\t\t\t# Needed for the Xkb extension.\nenv LD_LIBRARY_PATH=libX11/lib:libXext/lib:/usr/lib/ # related to xorg-sys-opengl - needed to load libglx for (AI)GLX support (for compiz)\n\nenv XORG_DRI_DRIVER_PATH=nvidiaDrivers/X11R6/lib/modules/drivers/ \n\nexec slim/bin/slim\nEscaping of ' followed by ': ''\nEscaping of $ followed by {: \${\nAnd finally to interpret \\n etc. as in a string: \n, \r, \t.\nfoo\n'bla'\nbar\ncut -d $'\\t' -f 1\nending dollar $$\n"
"This is an indented multi-line string\nliteral. An amount of whitespace at\nthe start of each line matching the minimum\nindentation of all lines in the string\nliteral together will be removed. Thus,\nin this case four spaces will be\nstripped from each line, even though\n THIS LINE is indented six spaces.\n\nAlso, empty lines don't count in the\ndetermination of the indentation level (the\nprevious empty line has indentation 0, but\nit doesn't matter).\nIf the string starts with whitespace\n followed by a newline, it's stripped, but\n that's not the case here. Two spaces are\n stripped because of the \" \" at the start. \nThis line is indented\na bit further.\nAnti-quotations, like so, are\nalso allowed.\n The \\ is not special here.\n' can be followed by any character except another ', e.g. 'x'.\nLikewise for $, e.g. $$ or $varName.\nBut ' followed by ' is special, as is $ followed by {.\nIf you want them, use anti-quotations: '', \${.\n Tabs are not interpreted as whitespace (since we can't guess\n what tab settings are intended), so don't use them.\n\tThis line starts with a space and a tab, so only one\n space will be stripped from each line.\nAlso note that if the last line (just before the closing ' ')\nconsists only of whitespace, it's ignored. But here there is\nsome non-whitespace stuff, so the line isn't removed. \nThis shows a hacky way to preserve an empty line after the start.\nBut there's no reason to do so: you could just repeat the empty\nline.\n Similarly you can force an indentation level,\n in this case to 2 spaces. This works because the anti-quote\n is significant (not whitespace).\nstart on network-interfaces\n\nstart script\n\n rm -f /var/run/opengl-driver\n ln -sf 123 /var/run/opengl-driver\n\n rm -f /var/log/slim.log\n \nend script\n\nenv SLIM_CFGFILE=abc\nenv SLIM_THEMESDIR=def\nenv FONTCONFIG_FILE=/etc/fonts/fonts.conf \t\t\t\t# !!! cleanup\nenv XKB_BINDIR=foo/bin \t\t\t\t# Needed for the Xkb extension.\nenv LD_LIBRARY_PATH=libX11/lib:libXext/lib:/usr/lib/ # related to xorg-sys-opengl - needed to load libglx for (AI)GLX support (for compiz)\n\nenv XORG_DRI_DRIVER_PATH=nvidiaDrivers/X11R6/lib/modules/drivers/ \n\nexec slim/bin/slim\nEscaping of ' followed by ': ''\nEscaping of $ followed by {: \${\nAnd finally to interpret \\n etc. as in a string: \n, \r, \t.\nfoo\n'bla'\nbar\ncut -d $'\\t' -f 1\nending dollar $$\nThis text uses\n\ttabs\nfor indentation\nAnd this text uses\n spaces\nbut is indented with tabs\n \tThis text uses\n\t both spaces and tabs\n\t\tso nothing is stripped\n\t"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ew, not your fault but this is not practically reviewable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing I can help with here

@@ -189,18 +195,25 @@ inline Expr * ParserState::stripIndentation(const PosIdx pos,
continue;
}
for (size_t j = 0; j < str->l; ++j) {
auto cur = str->p[j];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very hard to read already, and adding more stuff doesn't make it better. Not tested, but pretty sure it can be simplified to:

for (size_t j = 0; j < str->l; ++j) {
    auto cur = str->p[j];

    if (cur == '\n') {
        atStartOfLine = true;
        curIndent = 0;
        continue;
    }

    if (atStartOfLine) {
        // probably we first want to check if `cur` is the other `indentChar`
        // and issue a warning to prevent accidental mixed indentation (opt-out with a setting)
        if (indentChar == cur || (!indentChar && (cur == ' ' || cur == '\t'))) {
            if (!indentChar) indentChar = IndentChar(cur);
            curIndent++;
        } else {
            atStartOfLine = false;
            if (curIndent < minIndent) minIndent = curIndent;
        }
    }
}

@@ -175,6 +180,7 @@ inline Expr * ParserState::stripIndentation(const PosIdx pos,
/* Figure out the minimum indentation. Note that by design
whitespace-only final lines are not taken into account. (So
the " " in "\n ''" is ignored, but the " " in "\n foo''" is.) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the " " in "\n ''" is ignored, but the " " in "\n foo''" is.) */
the " " in "\n ''" is ignored, but the " " in "\n foo''" is not.) */

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope as well but valid

@@ -175,6 +180,7 @@ inline Expr * ParserState::stripIndentation(const PosIdx pos,
/* Figure out the minimum indentation. Note that by design
whitespace-only final lines are not taken into account. (So
the " " in "\n ''" is ignored, but the " " in "\n foo''" is.) */
std::optional<IndentChar> indentChar = std::nullopt;
bool atStartOfLine = true; /* = seen only whitespace in the current line */
size_t minIndent = 1000000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to be less arbitrary:

Suggested change
size_t minIndent = 1000000;
size_t minIndent = ~static_cast<size_t>(0); // maximum unsigned integer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is out of scope I think but alright, will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not. this too changes parser semantics!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pennae do you suggest that someone with more than a million indent spaces will be surprised by the parser stripping more spaces than it used to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes? this will have an impact on derivation hashes, which are not reasonable to expect to change between nix versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @pennae. This might look silly, but changes like that can have a huge impact, so we shouldn't change behaviour in any way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular one seems very silly to me. How likely is it that code exists in the wild that relies on this behavior? We don't have a quantifiable measure of stability, but evaluation results of Nixpkgs stable releases could be a good approximation.

If we take such an absolute stance on compatibility as @pennae and @thufschmitt are suggesting, we can essentially give up on evolving the code. This can be a reasonable strategy, but is it desirable? And in any case it should be communicated clearly, otherwise a lot of people will waste their time trying to do things that will end up being rejected.

@L-as
Copy link
Member

L-as commented Mar 9, 2024

Is there any good argument for why language versioning should be more than a nix-language-version setting along with an attribute in flakes to set it? Flakes are already experimental and don't need an RFC to be changed, and the nix-language-version setting could be guarded behind an experimental feature too.

}
if (!state.indent(cur)) state.stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the latest change. All I suggested was simplifying the logic in place to make it more readable without impact on performance characteristics. Adding abstractions might increase the overhead measurably, and although I have no numbers to substantiate what I'm saying, I'd argue such refactorings should always come with benchmarks. Nixpkgs and NixOS are huge code bases that already suffer performance issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I wouldn't call C++ my area of expertise, as I'm more proficient with Rust, but I don't see why would performance suffer in this case.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Mar 9, 2024

@L-as that discussion happened on NixOS/rfcs#137. I suggest opening inline comments if you have questions, or, if what you want to add is more on the meta level, continue on the Discourse thread. Please read the RFC text first though, where we collected a wealth of arguments for and against certain approaches. I'd appreciate additions if you find that something is missing.

To summarise: Evolving the language is Hard(tm) given the notions of backwards compatibility guarantees being discussed, as can be observed in this very thread. But we have no agreed-upon stance, not even a clearly delineated group of people with authority to make such a decision. And even if we had the decisionmakers and decisions, we also have a resource and prioritisation problem to solve in order to implement those decisions.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-03-11-nix-team-meeting-132/42960/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tabs silently break indentation stripping
8 participants