Skip to content

Conversation

@fonsp
Copy link
Member

@fonsp fonsp commented Dec 9, 2021

REPL.ends_with_semicolon is used to determine whether or not to display output in the REPL, but IJulia and Pluto also use it. This PR uses some regex replacements instead of a stateful search for a more accurate heuristic.

I did read and agree with this comment, but it turned out to be fairly easy to implement a "parser" good enough for this purpose. The new implementation is actually shorter, albeit with more "regex magic".

The new strategy requires a bit more calculation, but I think that this problem is not performance sensitive, and accuracy is more important? Let me know if you agree.

Fix #28743 and some others

@fonsp

This comment has been minimized.

@fonsp fonsp changed the title More accurate ends_with_semicolon [REPL] More accurate ends_with_semicolon Dec 10, 2021
@stevengj
Copy link
Member

stevengj commented Dec 10, 2021

(For fully robust semicolon detection, it seems better to extend Meta.parse to optionally add SemicolonNode meta-nodes to the expression, analogous to LineNumberNode.)

@fonsp
Copy link
Member Author

fonsp commented Dec 10, 2021

Thanks for the suggestions! I implemented both.

It's true that this still has corner cases, and that is unfortunate, but to turn it around: this is exactly why I made this small PR! The new approach from this PR has fewer corner cases than the current approach, so we should both be happy :)

Is this PR not an objective improvement over the current state? The old test cases still pass, the new test cases now also pass.

@stevengj
Copy link
Member

This doesn't seem right:

julia> ends_with_semicolon("foo; \"\" ")
true

Probably instead of stripping strings entirely you should replace them with empty strings.

fonsp and others added 2 commits December 11, 2021 00:27
@fonsp
Copy link
Member Author

fonsp commented Dec 10, 2021

Probably instead of stripping strings entirely you should replace them with empty strings.

Thanks! Also implemented now

@fonsp
Copy link
Member Author

fonsp commented Dec 14, 2021

@stevengj Thanks for the feedback so far! What do you think of the PR at this point? I think it's a simple improvement in UX, and it does not close the door for implementing this directly into the parser later like you suggested?

@stevengj
Copy link
Member

stevengj commented Dec 14, 2021

This returns false incorrectly in the current PR: ends_with_semicolon("foo # \"\"\" \n bar; # \"\"\" \n"), whereas in master it returns true.

@fonsp
Copy link
Member Author

fonsp commented Dec 15, 2021

@stevengj True! I don't think it's worth our time to fix this edge case, but please close the PR if you disagree.

@stevengj
Copy link
Member

stevengj commented Dec 15, 2021

How about an improved finite state machine that handles strings etc:

let matchend = Dict("\"" => r"\"", "\"\"\"" => r"\"\"\"", "'" => r"'",
                    "`" => r"`", "```" => r"```", "#" => r"$"m, "#=" => r"=#|#=")
    global _rm_strings_and_comments
    function _rm_strings_and_comments(code::Union{String,SubString{String}})
        buf = IOBuffer(sizehint=sizeof(code))
        pos = 1
        while true
            i = findnext(r"\"(?!\"\")|\"\"\"|'|`(?!``)|```|#(?!=)|#=", code, pos)
            isnothing(i) && break
            match = SubString(code, i)
            j = findnext(matchend[match]::Regex, code, last(i)+1)
            if match == "#=" # possibly nested
                nested = 1
                while j !== nothing
                    nested += SubString(code, j) == "#=" ? +1 : -1
                    iszero(nested) && break
                    j = findnext(r"=#|#=", code, last(j)+1)
                end
            elseif match[1] != '#' # quote match: check non-escaped
                while j !== nothing
                    notbackslash = findprev(!=('\\'), code, first(j)-1)::Int
                    isodd(first(j) - notbackslash) && break # not escaped
                    j = findnext(matchend[match]::Regex, code, first(j)+1)
                end
            end
            isnothing(j) && break
            if match[1] == '#'
                print(buf, SubString(code, pos, first(i)-1))
            else
                print(buf, SubString(code, pos, last(i)), ' ', SubString(code, j))
            end
            pos = last(j)+1
        end
        print(buf, SubString(code, pos, lastindex(code)))
        return String(take!(buf))
    end
end
ends_with_semicolon(code::Union{String,SubString{String}}) =
    contains(_rm_strings_and_comments(code), r";\s*$")
ends_with_semicolon(code::AbstractString) = ends_with_semicolon(String(code))

This passes all of the tests, except for the invalid #=# test mentioned above. And because it keeps track of whether it is in a string or a comment, it shouldn't be confused by """ mixed with comments, e.g. it passes this test. With this approach, it should be possible to return a result that is completely correct and consistent with Julia's parser.

Update: fixed to correctly identify escaped quotes, which are preceded by an odd number of backslashes.

@fonsp
Copy link
Member Author

fonsp commented Dec 15, 2021

Even better, thanks a million! I will implement this and update the tests tomorrow 👍

@fonsp fonsp marked this pull request as draft December 15, 2021 18:52
Co-Authored-By: Steven G. Johnson <2913679+stevengj@users.noreply.github.com>
@fonsp fonsp marked this pull request as ready for review December 22, 2021 13:25
@fonsp
Copy link
Member Author

fonsp commented Jan 14, 2022

What should I do to get this PR through?

@stevengj stevengj added the merge me PR is reviewed. Merge when all tests are passing label Jan 14, 2022
@oscardssmith oscardssmith merged commit a32a066 into JuliaLang:master Jan 15, 2022
@oscardssmith oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Jan 15, 2022
@fonsp
Copy link
Member Author

fonsp commented Jan 17, 2022

Thanks for your help @stevengj , I really admire that you resolved our discussion about a temporary 'hack' by writing a proper implementation :)

@fonsp fonsp deleted the patch-5 branch January 17, 2022 13:00
N5N3 pushed a commit to N5N3/julia that referenced this pull request Jan 24, 2022
* More accurate `ends_with_semicolon`

Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
* More accurate `ends_with_semicolon`

Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
* More accurate `ends_with_semicolon`

Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

REPL bug with semicolon output suppression

3 participants