Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #11659, problems with tab characters not counted correctly #11719

Merged
merged 2 commits into from
Jul 29, 2015

Conversation

ScottPJones
Copy link
Contributor

This fixes issues with tabs not being counted correctly (they were counted as a fixed with of 8 spaces,
instead of moving over to the next tab stop), in the Base.indentation and Base.unindent functions,
which are used to copy/paste text in REPL.jl and LineEdit.jl.
This also allows code to set the tab width (Base.tab_indentation_width).
This also adds unit tests so that the functions should be completely covered.

@pao
Copy link
Member

pao commented Jun 16, 2015

Did you mean to include 225eb89 in here (also included in #11718 as ca1d5c3)?

@ScottPJones
Copy link
Contributor Author

I was just about to push a change to remove that dependency, as I found the code in test/repl.jl that was broken...

@pao
Copy link
Member

pao commented Jun 16, 2015

rgr, thanks

@ScottPJones
Copy link
Contributor Author

I know everybody here loves terseness, but sometimes, it hinders understanding... what does rgr mean, please? 😀

@pao
Copy link
Member

pao commented Jun 16, 2015

"roger"...sorry, aviation background...

@ScottPJones
Copy link
Contributor Author

OK, I just pushed the version without the #11718 change, now that the test is fixed here.
I was afraid that not understanding the rgr was a generational thing, with all of the younger genius hotshots in this community! 😀

@ScottPJones
Copy link
Contributor Author

Grrrr! What now is broken on Appveyor and Travis? Again, it doesn't seem to be related to my change... and passed once on each...

@pao
Copy link
Member

pao commented Jun 16, 2015

Only failure appears to be Travis 32-bit; these tests were last modified March 7 https://github.com/JuliaLang/julia/blame/master/test/ccall.jl#L121

cc @tkelman is this a new failure mode (sorry to bother you, but you've become the domain expert)?

@tkelman
Copy link
Contributor

tkelman commented Jun 16, 2015

Some broken stuff got committed to master, that was the commit being merged into. I restarted the build.

col = div(col + tabwid, tabwid) * tabwid
elseif ch == '\n'
# Now we need to output enough indentation
for i = 1:max(0, col-indent) ; write(buf, ' ') ; end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A for-loop over an empty range will get skipped, so the max shouldn't be necessary here.
Stylistically I don't think 1-line for loops are all that readable. There was one here already which you're replacing, but you're also adding 3 more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just following the style of the code I was replacing, but I can make those 3 lines.

@ScottPJones
Copy link
Contributor Author

@tkelman I updated the for loops

@tkelman
Copy link
Contributor

tkelman commented Jun 17, 2015

And the other part of my comment? You should be able to replace for i = 1:max(0, col-indent) with for i = 1:(col-indent)

@ScottPJones
Copy link
Contributor Author

@tkelman Sorry! Just missed that in the rush to fix it, all done now.

\tls
# time: 2014-06-30 20:44:29 EDT
# mode: julia
\t2 + 2"""
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think the unindent code should be changing the indentation of these lines at all. The behavior of this was intended to be that if and only if the last line in the double quotes are pure white space, is that whitespace stripped from each preceding line of the quoted data. Perhaps that's not what got implemented though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all comes from @nolta's change, f8b3f9e
If the last line is not empty after stripping, it walks through every non-blank line, and figures out the minimum indentation and uses that for unindent.
It looks like a lot of code depends on that behavior also.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's unfortunate if I'm understanding this situation correctly. @nolta, how did you intend for this to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that that is the state of affairs, and code already depends on the behavior that @nolta added, do you see a problem with my change here?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true – this code is about creating fake REPL history data with the right content, which this does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ScottPJones
Copy link
Contributor Author

Bump! Any reason why this can't be merged? Thanks!

@StefanKarpinski
Copy link
Sponsor Member

So I've said it before and I'll say it again – I really don't think we shouldn't be assuming anything about the width of tabs. The unindent function should strip a common whitespace string from the beginning of indented text, not a indent width. If the leading whitespace isn't consistent, it should be an error.

@ScottPJones
Copy link
Contributor Author

? This change doesn't change anything about the width of tabs, it does calculate the tab stop correctly, unlike before, and allows the width of the tabs to be changed from the default (which other people asked for).
With this change, you get good results from the unindent function, unlike before.
The unindent function doesn't take a whitespace string to match, it takes a position.
What you are suggesting can't even be done with the current function.

@StefanKarpinski
Copy link
Sponsor Member

Right, that's what I was trying to convey when you set out to do this – if you're even making assumptions about tab sizes, you're doing it wrong (not you specifically, but the generic "you"). This is definitely a better implementation of unindent, but it should take the tab size as an argument (maybe a keyword arg), which I guess can default to the global value. Since the standard indent size for Julia is 4 rather than 8, I would argue that should be the default for that. But the bigger picture is that we should change the triple quote unindent so that it doesn't make any assumptions about tab size at all.

@ScottPJones
Copy link
Contributor Author

Making assumptions about tab sizes is wrong? That's part of many standards, as I pointed out earlier!
You are supposed to use those values (or defaults) if you are going to deal with any of them.
Having the default tab size overrideable in unindent is a good idea though. I can add that easily.
Should it just be positional? With the default picking up the global value?

Tab size is not equivalent to indent size... that is where people have been getting messed up for years... (bad editors that didn't have smart tab and delete, and then allowed people to set the tab width... yechhh!). If you want something to display nicely everywhere, you really should stick to tabs meaning move over to the next mod 8 position... and use only spaces, or tabs followed by spaces, for whatever indentation you feel like.

Stripping indentation from triple-quote is really separate from what a general unindent or dedent function should do, but I still don't see that ignoring standards that define what the default tab positions should be is good.

@StefanKarpinski
Copy link
Sponsor Member

The standard tab size for Julia code is 4 spaces.

@mbauman
Copy link
Sponsor Member

mbauman commented Jun 19, 2015

Am I missing something here? indentation and unindent are unexported internal functions that seem to only serve one purpose: removing a common whitespace prefix from every line. For that purpose, it can be tab-size agnostic. So why not be tab-size agnostic?

@StefanKarpinski
Copy link
Sponsor Member

Yes, exactly my point. Having an unident utility function somewhere (a text formatting package maybe), and that should certainly handle tabs correctly, but that's irrelevant to Base Julia which simply shouldn't be assuming things about tabs at all.

@ScottPJones
Copy link
Contributor Author

The problem is, when the indentation to be removed is less than the tab size, when not all lines have identical whitespace at the beginning of each line, which would be a common occurrence for people who prefer to use tabs and not just spaces in their programs.

@StefanKarpinski
Copy link
Sponsor Member

If any of the lines in an indented triple-quoted block doesn't start with the exact whitespace sequence that the last line does, it should be an error. This is how Python does it and it's the only way to be sure that the programmer is getting what they think they're getting.

@ScottPJones
Copy link
Contributor Author

@mbauman Maybe if there had been some internal documentation about the intended purpose of these functions, from the beginning, this code wouldn't have been changed in a way that the original author didn't intend (and I'm not talking about this PR either...)
If this code really was supposed to be tab-agnostic, and remove a common whitespace prefix from every line, then that is what it should have done, instead of incorrectly calculating the indentation, using a fixed width for each tab character, and an unindent function that simply check for a common whitespace prefix, because it's only passed a number (incorrectly calculated, to boot).

Also, and this is a general big problem with Julia, saying a function is not exported doesn't really mean anything, because anybody can simply call it by doing Base.unindent. But that's a topic for another day.

@StefanKarpinski
Copy link
Sponsor Member

Thank you, as usual, for your diagnosis of our failures at large.

@ScottPJones
Copy link
Contributor Author

@StefanKarpinski OK, then I'd say the following... let this PR go in to fix the indentation and unindent functions, which very well may be used by people, exported or not, and I can make another PR that has
stripprefix, that can take a line and an arbitrary prefix to strip, which can then be used to make triple quote tab agnostic.

Thank you, as usual, for your diagnosis of our failures at large.

Come on. You'd better grow a thicker skin, if you are going to do language design.
Am I wrong in my diagnosis's? That's the only thing that matters.

Also, I've said time and again, Julia is the most promising language I've seen in ages, and that all the contributors should be proud of their accomplishments.
I wouldn't spend a second trying to help resolve the problems I do see in it, if I didn't feel that the language had so much potential.

However, does that mean that it is perfect? No. Should everybody stay silent over any flaws they see, major or minor, because they might hurt the feelings of the contributors? No. My hope is that any flaws can be corrected in the very short term, before the user community gets even larger... I don't think it would be good to see a Python 2 vs Python 3 sort of split in the future. I hope that "Arraymeggedon" (and a "Stringnarok" 😀) can happen quickly, this year, while there is time, before julia gets too big.

@mbauman
Copy link
Sponsor Member

mbauman commented Jun 19, 2015

I've never touched this code myself, but it wasn't hard to grep through base/ and see where it is being used. There are only two uses: pasting at the REPL and triple-quoted strings. I think a rename could make sense here, but there's definitely no reason to support these unexported functions for users that are inadvisably using them. There's enough other work to be done.

Am I wrong in my diagnosis's? That's the only thing that matters.

I strongly disagree. We don't want to be alienating other collaborators by calling their work wrong or characterizing it as having big problems (edit: or even giving the appearance that their work may be judged as such). Some may have a thick enough skin to handle it, but I don't want anyone to get bullied away. Suggest fixes, and point out areas for improvement, please. But don't harp on them.

@ScottPJones
Copy link
Contributor Author

Again, 2 of the 3 Travis builds failed due to totally unrelated changes.
This has been rebased again (which is why it is picking up the new Travis failures, unfortunately).

@ScottPJones ScottPJones force-pushed the spj/indent branch 2 times, most recently from 559d2aa to f934f94 Compare July 14, 2015 21:51
@ScottPJones
Copy link
Contributor Author

This is mergeable again, and has added some more unit testing, so that all of the modified code should be completely covered (the old code which this PR fixes had 0 tests).

@ScottPJones
Copy link
Contributor Author

Bump: this should fix the bug and bring string/io.jl to 100% coverage (the untested part represents about 25% of the total)

@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

Since the change of how triple quotes get parsed, where are indentation and unindent still used?

@ScottPJones
Copy link
Contributor Author

In REPL.jl and LineEdit.jl

@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

Hm, looks like that originates from Keno/REPL.jl#35. How important is it that we do whatever those lines of code are doing in bracketed paste mode? If the indentation is a little odd for multi-line code pasted at the repl would anyone notice? Seems like we could try taking that out and just deprecating or removing these, doesn't look like any registered packages are using them.

@ScottPJones
Copy link
Contributor Author

Why would you want to leave broken, totally untested, uncovered code in Julia, when there is a working fix, with full coverage tests, already available?

@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

Come on man, that's not what I want at all. The technical issues here from #11719 (comment) about assuming a hard-coded tab size remain. So even if we merge this PR, and it's certainly an improvement so we may as well, I'm saying that this particular code serves very little purpose and we could probably get rid of it at little cost.

@ScottPJones
Copy link
Contributor Author

OK, I just think that, if you want a different approach for handling the unindentation in REPL.jl and LineEdit.jl, that can be done at some later point, instead of deprecating or removing stuff, which would be more effort than just merging this. (besides, I thought you didn't like simply removing stuff, even stuff that is not used in any registered package?)
Also, I paste in multi-line stuff frequently, so I'd notice.

@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

I don't like simply removing exported things without any deprecation period, especially not late in a release cycle or when that deletion is mixed in with other larger changes. Deprecation of exported things is the right way to move towards deleting them, which is a good thing when they're not widely used or doing so helps achieve other important goals. Deprecation of unexported internal functions that aren't widely used is more controversial, as you've seen, and may or may not be necessary.

So if this helps with pasting multi-line code, can you come up with an example where this PR changes the behavior in a way that's an obvious improvement?

@ScottPJones
Copy link
Contributor Author

Demonstrably correct code (with fully coverage from unit tests) is not an obvious improvement?
I've always believed in improving code quality, whether or not external behavior is changed.

@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

I'm talking about actual end-user visible behavior in multi-line paste, which is the only place this gets used. 100% line coverage is great, but it doesn't mean there's zero chance this could be introducing any new bugs. I'm not saying we shouldn't merge this, I'm saying this particular piece of code is not worth this much conversation or effort.

tkelman added a commit that referenced this pull request Jul 29, 2015
Fix #11659, problems with tab characters not counted correctly
@tkelman tkelman merged commit 9a71a92 into JuliaLang:master Jul 29, 2015
@ScottPJones
Copy link
Contributor Author

External behavior is definitely fixed by this PR, I just didn't know how to write a unit tests for multi-line paste, just did it manually and saw the bad results before, and the correct results now.

         @inbounds while out < len
            ch::UInt32 = dat[pos += 1]
             # Handle ASCII characters

vs.

         @inbounds while out < len
             ch::UInt32 = dat[pos += 1]
             # Handle ASCII characters

Having everything turn really ragged when pasted just didn't seem nice to me.

@ScottPJones ScottPJones deleted the spj/indent branch July 29, 2015 09:45
@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

Great, that's all that I asked for. Interacting with you has been extremely frustrating, and it's not been improving. Fewer people over time have continued to attempt doing it.

I'm going to make the tab width a keyword argument to indent and unindent instead of being a global. globals are bad.

@ScottPJones
Copy link
Contributor Author

Interacting with you has been extremely frustrating, and it's not been improving.

If you have any suggestions as to what you feel I'm doing wrong, please let me know
(preferable by e-mail).
I am trying to learn how to interact better with all the ins-and-outs of an OSS community, and
appreciate any and all constructive criticism.

I'm going to make the tab width a keyword argument to indent and unindent instead of being a global. globals are bad.

If you'd brought that up at any time in the review, I would have been happy to change it.
You are the only person who said anything about that, after 88 comments.

@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

#11719 (comment)

@ScottPJones
Copy link
Contributor Author

OK, I missed the part about the keyword argument a month ago, however he didn't say that the global should be removed:

which I guess can default to the global value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants