Fix #11659, problems with tab characters not counted correctly #11719

ScottPJones · 2015-06-16T02:07:58Z

This fixes issues with tabs not being counted correctly (they were counted as a fixed with of 8 spaces,
instead of moving over to the next tab stop), in the Base.indentation and Base.unindent functions,
which are used to copy/paste text in REPL.jl and LineEdit.jl.
This also allows code to set the tab width (Base.tab_indentation_width).
This also adds unit tests so that the functions should be completely covered.

pao · 2015-06-16T14:17:16Z

Did you mean to include 225eb89 in here (also included in #11718 as ca1d5c3)?

ScottPJones · 2015-06-16T14:18:44Z

I was just about to push a change to remove that dependency, as I found the code in test/repl.jl that was broken...

pao · 2015-06-16T14:20:34Z

rgr, thanks

ScottPJones · 2015-06-16T14:21:37Z

I know everybody here loves terseness, but sometimes, it hinders understanding... what does rgr mean, please? 😀

pao · 2015-06-16T14:22:45Z

"roger"...sorry, aviation background...

ScottPJones · 2015-06-16T14:30:38Z

OK, I just pushed the version without the #11718 change, now that the test is fixed here.
I was afraid that not understanding the rgr was a generational thing, with all of the younger genius hotshots in this community! 😀

ScottPJones · 2015-06-16T17:56:12Z

Grrrr! What now is broken on Appveyor and Travis? Again, it doesn't seem to be related to my change... and passed once on each...

pao · 2015-06-16T19:52:57Z

Only failure appears to be Travis 32-bit; these tests were last modified March 7 https://github.com/JuliaLang/julia/blame/master/test/ccall.jl#L121

cc @tkelman is this a new failure mode (sorry to bother you, but you've become the domain expert)?

tkelman · 2015-06-16T19:57:44Z

Some broken stuff got committed to master, that was the commit being merged into. I restarted the build.

tkelman · 2015-06-16T23:59:01Z

base/string.jl

+                col = div(col + tabwid, tabwid) * tabwid
+            elseif ch == '\n'
+                # Now we need to output enough indentation
+                for i = 1:max(0, col-indent) ; write(buf, ' ') ; end


A for-loop over an empty range will get skipped, so the max shouldn't be necessary here.
Stylistically I don't think 1-line for loops are all that readable. There was one here already which you're replacing, but you're also adding 3 more.

I was just following the style of the code I was replacing, but I can make those 3 lines.

ScottPJones · 2015-06-17T00:48:13Z

@tkelman I updated the for loops

tkelman · 2015-06-17T01:14:02Z

And the other part of my comment? You should be able to replace for i = 1:max(0, col-indent) with for i = 1:(col-indent)

ScottPJones · 2015-06-17T03:53:14Z

@tkelman Sorry! Just missed that in the rush to fix it, all done now.

StefanKarpinski · 2015-06-17T04:40:42Z

test/repl.jl

-    \tls
-    # time: 2014-06-30 20:44:29 EDT
-    # mode: julia
-    \t2 + 2"""


I really don't think the unindent code should be changing the indentation of these lines at all. The behavior of this was intended to be that if and only if the last line in the double quotes are pure white space, is that whitespace stripped from each preceding line of the quoted data. Perhaps that's not what got implemented though.

This all comes from @nolta's change, f8b3f9e
If the last line is not empty after stripping, it walks through every non-blank line, and figures out the minimum indentation and uses that for unindent.
It looks like a lot of code depends on that behavior also.

Yeah, that's unfortunate if I'm understanding this situation correctly. @nolta, how did you intend for this to work?

Given that that is the state of affairs, and code already depends on the behavior that @nolta added, do you see a problem with my change here?

Yes, that's true – this code is about creating fake REPL history data with the right content, which this does.

ScottPJones · 2015-06-19T12:08:24Z

Bump! Any reason why this can't be merged? Thanks!

StefanKarpinski · 2015-06-19T14:23:43Z

So I've said it before and I'll say it again – I really don't think we shouldn't be assuming anything about the width of tabs. The unindent function should strip a common whitespace string from the beginning of indented text, not a indent width. If the leading whitespace isn't consistent, it should be an error.

ScottPJones · 2015-06-19T14:44:55Z

? This change doesn't change anything about the width of tabs, it does calculate the tab stop correctly, unlike before, and allows the width of the tabs to be changed from the default (which other people asked for).
With this change, you get good results from the unindent function, unlike before.
The unindent function doesn't take a whitespace string to match, it takes a position.
What you are suggesting can't even be done with the current function.

StefanKarpinski · 2015-06-19T15:04:46Z

Right, that's what I was trying to convey when you set out to do this – if you're even making assumptions about tab sizes, you're doing it wrong (not you specifically, but the generic "you"). This is definitely a better implementation of unindent, but it should take the tab size as an argument (maybe a keyword arg), which I guess can default to the global value. Since the standard indent size for Julia is 4 rather than 8, I would argue that should be the default for that. But the bigger picture is that we should change the triple quote unindent so that it doesn't make any assumptions about tab size at all.

ScottPJones · 2015-06-19T16:27:42Z

Making assumptions about tab sizes is wrong? That's part of many standards, as I pointed out earlier!
You are supposed to use those values (or defaults) if you are going to deal with any of them.
Having the default tab size overrideable in unindent is a good idea though. I can add that easily.
Should it just be positional? With the default picking up the global value?

Tab size is not equivalent to indent size... that is where people have been getting messed up for years... (bad editors that didn't have smart tab and delete, and then allowed people to set the tab width... yechhh!). If you want something to display nicely everywhere, you really should stick to tabs meaning move over to the next mod 8 position... and use only spaces, or tabs followed by spaces, for whatever indentation you feel like.

Stripping indentation from triple-quote is really separate from what a general unindent or dedent function should do, but I still don't see that ignoring standards that define what the default tab positions should be is good.

StefanKarpinski · 2015-06-19T16:35:04Z

The standard tab size for Julia code is 4 spaces.

mbauman · 2015-06-19T16:38:33Z

Am I missing something here? indentation and unindent are unexported internal functions that seem to only serve one purpose: removing a common whitespace prefix from every line. For that purpose, it can be tab-size agnostic. So why not be tab-size agnostic?

StefanKarpinski · 2015-06-19T16:41:49Z

Yes, exactly my point. Having an unident utility function somewhere (a text formatting package maybe), and that should certainly handle tabs correctly, but that's irrelevant to Base Julia which simply shouldn't be assuming things about tabs at all.

ScottPJones · 2015-06-19T16:42:11Z

The problem is, when the indentation to be removed is less than the tab size, when not all lines have identical whitespace at the beginning of each line, which would be a common occurrence for people who prefer to use tabs and not just spaces in their programs.

StefanKarpinski · 2015-06-19T16:49:47Z

If any of the lines in an indented triple-quoted block doesn't start with the exact whitespace sequence that the last line does, it should be an error. This is how Python does it and it's the only way to be sure that the programmer is getting what they think they're getting.

ScottPJones · 2015-06-19T16:50:50Z

@mbauman Maybe if there had been some internal documentation about the intended purpose of these functions, from the beginning, this code wouldn't have been changed in a way that the original author didn't intend (and I'm not talking about this PR either...)
If this code really was supposed to be tab-agnostic, and remove a common whitespace prefix from every line, then that is what it should have done, instead of incorrectly calculating the indentation, using a fixed width for each tab character, and an unindent function that simply check for a common whitespace prefix, because it's only passed a number (incorrectly calculated, to boot).

Also, and this is a general big problem with Julia, saying a function is not exported doesn't really mean anything, because anybody can simply call it by doing Base.unindent. But that's a topic for another day.

StefanKarpinski · 2015-06-19T16:52:34Z

Thank you, as usual, for your diagnosis of our failures at large.

ScottPJones · 2015-06-19T17:02:25Z

@StefanKarpinski OK, then I'd say the following... let this PR go in to fix the indentation and unindent functions, which very well may be used by people, exported or not, and I can make another PR that has
stripprefix, that can take a line and an arbitrary prefix to strip, which can then be used to make triple quote tab agnostic.

Thank you, as usual, for your diagnosis of our failures at large.

Come on. You'd better grow a thicker skin, if you are going to do language design.
Am I wrong in my diagnosis's? That's the only thing that matters.

Also, I've said time and again, Julia is the most promising language I've seen in ages, and that all the contributors should be proud of their accomplishments.
I wouldn't spend a second trying to help resolve the problems I do see in it, if I didn't feel that the language had so much potential.

However, does that mean that it is perfect? No. Should everybody stay silent over any flaws they see, major or minor, because they might hurt the feelings of the contributors? No. My hope is that any flaws can be corrected in the very short term, before the user community gets even larger... I don't think it would be good to see a Python 2 vs Python 3 sort of split in the future. I hope that "Arraymeggedon" (and a "Stringnarok" 😀) can happen quickly, this year, while there is time, before julia gets too big.

mbauman · 2015-06-19T17:12:39Z

I've never touched this code myself, but it wasn't hard to grep through base/ and see where it is being used. There are only two uses: pasting at the REPL and triple-quoted strings. I think a rename could make sense here, but there's definitely no reason to support these unexported functions for users that are inadvisably using them. There's enough other work to be done.

Am I wrong in my diagnosis's? That's the only thing that matters.

I strongly disagree. We don't want to be alienating other collaborators by calling their work wrong or characterizing it as having big problems (edit: or even giving the appearance that their work may be judged as such). Some may have a thick enough skin to handle it, but I don't want anyone to get bullied away. Suggest fixes, and point out areas for improvement, please. But don't harp on them.

ScottPJones · 2015-07-08T13:29:35Z

Again, 2 of the 3 Travis builds failed due to totally unrelated changes.
This has been rebased again (which is why it is picking up the new Travis failures, unfortunately).

ScottPJones · 2015-07-14T23:46:19Z

This is mergeable again, and has added some more unit testing, so that all of the modified code should be completely covered (the old code which this PR fixes had 0 tests).

…n and unindent Fix REPL test

ScottPJones · 2015-07-28T18:45:00Z

Bump: this should fix the bug and bring string/io.jl to 100% coverage (the untested part represents about 25% of the total)

tkelman · 2015-07-29T02:20:59Z

Since the change of how triple quotes get parsed, where are indentation and unindent still used?

ScottPJones · 2015-07-29T02:26:08Z

In REPL.jl and LineEdit.jl

tkelman · 2015-07-29T05:51:54Z

Hm, looks like that originates from Keno/REPL.jl#35. How important is it that we do whatever those lines of code are doing in bracketed paste mode? If the indentation is a little odd for multi-line code pasted at the repl would anyone notice? Seems like we could try taking that out and just deprecating or removing these, doesn't look like any registered packages are using them.

ScottPJones · 2015-07-29T08:48:52Z

Why would you want to leave broken, totally untested, uncovered code in Julia, when there is a working fix, with full coverage tests, already available?

tkelman · 2015-07-29T08:52:32Z

Come on man, that's not what I want at all. The technical issues here from #11719 (comment) about assuming a hard-coded tab size remain. So even if we merge this PR, and it's certainly an improvement so we may as well, I'm saying that this particular code serves very little purpose and we could probably get rid of it at little cost.

ScottPJones · 2015-07-29T09:11:25Z

OK, I just think that, if you want a different approach for handling the unindentation in REPL.jl and LineEdit.jl, that can be done at some later point, instead of deprecating or removing stuff, which would be more effort than just merging this. (besides, I thought you didn't like simply removing stuff, even stuff that is not used in any registered package?)
Also, I paste in multi-line stuff frequently, so I'd notice.

tkelman · 2015-07-29T09:26:08Z

I don't like simply removing exported things without any deprecation period, especially not late in a release cycle or when that deletion is mixed in with other larger changes. Deprecation of exported things is the right way to move towards deleting them, which is a good thing when they're not widely used or doing so helps achieve other important goals. Deprecation of unexported internal functions that aren't widely used is more controversial, as you've seen, and may or may not be necessary.

So if this helps with pasting multi-line code, can you come up with an example where this PR changes the behavior in a way that's an obvious improvement?

ScottPJones · 2015-07-29T09:33:21Z

Demonstrably correct code (with fully coverage from unit tests) is not an obvious improvement?
I've always believed in improving code quality, whether or not external behavior is changed.

tkelman · 2015-07-29T09:37:24Z

I'm talking about actual end-user visible behavior in multi-line paste, which is the only place this gets used. 100% line coverage is great, but it doesn't mean there's zero chance this could be introducing any new bugs. I'm not saying we shouldn't merge this, I'm saying this particular piece of code is not worth this much conversation or effort.

Fix #11659, problems with tab characters not counted correctly

ScottPJones · 2015-07-29T09:45:28Z

External behavior is definitely fixed by this PR, I just didn't know how to write a unit tests for multi-line paste, just did it manually and saw the bad results before, and the correct results now.

         @inbounds while out < len
            ch::UInt32 = dat[pos += 1]
             # Handle ASCII characters

vs.

         @inbounds while out < len
             ch::UInt32 = dat[pos += 1]
             # Handle ASCII characters

Having everything turn really ragged when pasted just didn't seem nice to me.

tkelman · 2015-07-29T09:50:16Z

Great, that's all that I asked for. Interacting with you has been extremely frustrating, and it's not been improving. Fewer people over time have continued to attempt doing it.

I'm going to make the tab width a keyword argument to indent and unindent instead of being a global. globals are bad.

ref #11719

ScottPJones · 2015-07-29T10:10:49Z

Interacting with you has been extremely frustrating, and it's not been improving.

If you have any suggestions as to what you feel I'm doing wrong, please let me know
(preferable by e-mail).
I am trying to learn how to interact better with all the ins-and-outs of an OSS community, and
appreciate any and all constructive criticism.

I'm going to make the tab width a keyword argument to indent and unindent instead of being a global. globals are bad.

If you'd brought that up at any time in the review, I would have been happy to change it.
You are the only person who said anything about that, after 88 comments.

tkelman · 2015-07-29T10:12:53Z

#11719 (comment)

ScottPJones · 2015-07-29T10:15:21Z

OK, I missed the part about the keyword argument a month ago, however he didn't say that the global should be removed:

which I guess can default to the global value

ScottPJones force-pushed the spj/indent branch from e5f13e3 to b199027 Compare June 16, 2015 14:28

tkelman reviewed Jun 16, 2015
View reviewed changes

StefanKarpinski reviewed Jun 17, 2015
View reviewed changes

ScottPJones mentioned this pull request Jun 17, 2015

Give better error messages if invalid or munged history file #11718

Merged

ScottPJones force-pushed the spj/indent branch from ea248cb to f5744ae Compare July 8, 2015 12:12

ScottPJones force-pushed the spj/indent branch 2 times, most recently from 559d2aa to f934f94 Compare July 14, 2015 21:51

ScottPJones mentioned this pull request Jul 24, 2015

Add coverage tests for print_joined, print_unescaped, unescape_chars #12305

Merged

ScottPJones added 2 commits July 27, 2015 20:01

Fix JuliaLang#11659, tab characters counted incorrectly in indentatio…

b5bd9ca

…n and unindent Fix REPL test

Add coverage testing for indentation/unindent functions

271d4a9

ScottPJones force-pushed the spj/indent branch from f934f94 to 271d4a9 Compare July 28, 2015 00:03

tkelman added a commit that referenced this pull request Jul 29, 2015

Merge pull request #11719 from ScottPJones/spj/indent

9a71a92

Fix #11659, problems with tab characters not counted correctly

tkelman merged commit 9a71a92 into JuliaLang:master Jul 29, 2015

ScottPJones deleted the spj/indent branch July 29, 2015 09:45

tkelman added a commit that referenced this pull request Jul 29, 2015

make tab width a kwarg instead of a global

0a76466

ref #11719

tkelman mentioned this pull request Jul 29, 2015

make tab width a kwarg instead of a global #12369

Merged

ScottPJones referenced this pull request Jul 30, 2015

indent, unindent: default tab width = 4.

6c71b13

Fix #11659, problems with tab characters not counted correctly #11719

Fix #11659, problems with tab characters not counted correctly #11719

Conversation

ScottPJones commented Jun 16, 2015

pao commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

pao commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

pao commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

pao commented Jun 16, 2015

tkelman commented Jun 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ScottPJones commented Jun 17, 2015

tkelman commented Jun 17, 2015

ScottPJones commented Jun 17, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ScottPJones commented Jun 19, 2015

StefanKarpinski commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

StefanKarpinski commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

StefanKarpinski commented Jun 19, 2015

mbauman commented Jun 19, 2015

StefanKarpinski commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

StefanKarpinski commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

StefanKarpinski commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

mbauman commented Jun 19, 2015

ScottPJones commented Jul 8, 2015

ScottPJones commented Jul 14, 2015

ScottPJones commented Jul 28, 2015

tkelman commented Jul 29, 2015

ScottPJones commented Jul 29, 2015

tkelman commented Jul 29, 2015

ScottPJones commented Jul 29, 2015

tkelman commented Jul 29, 2015

ScottPJones commented Jul 29, 2015

tkelman commented Jul 29, 2015

ScottPJones commented Jul 29, 2015

tkelman commented Jul 29, 2015

ScottPJones commented Jul 29, 2015

tkelman commented Jul 29, 2015

ScottPJones commented Jul 29, 2015

tkelman commented Jul 29, 2015

ScottPJones commented Jul 29, 2015