Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #707 #1058

Merged
merged 1 commit into from
Mar 30, 2021
Merged

Fix #707 #1058

merged 1 commit into from
Mar 30, 2021

Conversation

andreyorst
Copy link
Contributor

@andreyorst andreyorst commented Dec 7, 2020

This is second attempt to fix #707 which lifts off limitation that matching keyword must be at the same line as do, e.g. it works for this particular example:

defmodule ElixirExampleTwo do
  def can_destroy_message(nil, _), do: {:error, %{reason: "not_found"}}

  def can_destroy_message(%{destroyed_at: destroyed_at}, _)
      when not is_nil(destroyed_at),
      do: {:error, %{reason: "not_found"}}

  def can_destroy_message(%{room_id: message_room_id}, %{room_id: member_room_id})
      when message_room_id != member_room_id,
      do: {:error, %{reason: "unauthorized"}}

  def can_destroy_message(%{content: content} = message, room_member)
      when not is_nil(content) and not is_nil(room_member) do
    # implementation
  end
end

This PR adds two search functions, that walk file line by line and check if line has given set of keywords at the start of the line. I'm not sure if this is optimal way to handle this, but it already works better than current implementation as it allows multiline definitions. I've benchmarked search functions on a medium sized file and the performance seems OK.

Check-list for what's needed to be done:

  • added missing keywords
  • handle multiline definitions
  • handle bodyless definitions
  • add tests for new keywords

Caveats:

  • Since the search is line-based and there are no special heuristics to check if both do: and do are part of definition, this will not work:

    # [] marks the expression found by Smartparens
    [defmodule Example do
      def foo(do: body) do
        body
      end]
    end
    
    Example.foo do
      IO.puts("1")
      IO.puts("2")
    end

    This can be fixed by implementing word-based search, which is possible to do, however proven to be extremely slow.

  • Also, some keys might not be handled rightnow, as I've found recently, defimpl uses for: keyword, which this algorithm supports, however if any other keywords support something similar, they'll need to be added manually in future PRs.

  • Overall parsing algorithm implemented here is not great, but works okayish

@andreyorst andreyorst changed the title Fix #707 more reliably WIP: Fix #707 more reliably Dec 9, 2020
@andreyorst andreyorst changed the title WIP: Fix #707 more reliably Fix #707 more reliably Dec 17, 2020
@andreyorst andreyorst changed the title Fix #707 more reliably Fix #707 Dec 18, 2020
@Fuco1
Copy link
Owner

Fuco1 commented Mar 30, 2021

Thank you so much for this work. I was a bit out of the loop for the last couple months as the world gone crazy again...

I'm going to merge this as you included tests and all and presumably have been using this since december. In case there will be troubles we can always revert and this seems to be already useful and with a lot of demand.

Sorry for the delay! If you plan to do some further improvements on elixir let me know and I'll add you as a contributor so you can merge further changes on your own.

@Fuco1 Fuco1 merged commit 25f4d6d into Fuco1:master Mar 30, 2021
@andreyorst andreyorst deleted the elixir-better-search branch March 30, 2021 20:33
@andreyorst
Copy link
Contributor Author

If you plan to do some further improvements on elixir let me know and I'll add you as a contributor so you can merge further changes on your own.

Thanks! I'm not writing in Elixir, was just trying the language back then, and was a bit upset with the support, so I just got my time trying to solve the issue 🙂 . So I don't think I will come back to Elixir in near future, as I prefer Erlang over it, but I may be working on Lua module, as it has very similar problem. Although I don't write Lua directly, rather using a transpiler, I still read and navigate Lua code, so good support for it much more needed for me.

However, I would like to come up with a more general solution, maybe a more robust searcher, or a rather simple code walker with stack like approach, so if you have any thoughts on how Smartparens could handle such languages, which have all these context dependent do end combinations, where the do keyword might not be the actual start of the statement, or end can be paired with another keyword, I would be glad to discuss it. Maybe I shoul open a discussion type issue on this topic, and gather all relevan issues from other languages, that have been reported? I'm sure there was something from Ruby, Lua, maybe Julia, Elixir still has some quirks I believe.

@Fuco1
Copy link
Owner

Fuco1 commented Mar 31, 2021

We basically already use a stack-based search and this is not powerful enough. Ours is probably a little bit more powerful than the canonical CFG but not by much. However, with the callbacks (unless, when, skip) the system is complete in a sense it can parse anything parsable. Although the API is quite bad to work with as you are writing a lot of extra code which is executed "somehow" and it lacks structure. So it ends up being ad-hoc for each langauge.

Ultimately an LSP-based search would be best, but I'm not sure LSP supports such API (i.e. give me corresponding delimiter to this one at point). And I'm not sure what the performance would be... maybe it would be faster to get all pairs in a region/document. I'm sure it can serve AST already.

@andreyorst
Copy link
Contributor Author

andreyorst commented Mar 31, 2021

Ultimately an LSP-based search would be best

I still think that TreeSitter approach is better, because, it is a tool built for working with AST, and is fast enough to parse huge files on every keypress, which makes it mostly ideal for languages with a lot of ambiguity, as you maintain the correct tree and can recover from errors in a meaningful way. LSP has the communication lag, which may be not great for structural editing, where you shuffle branches around. Second, writing a grammar for tree sitter is kinda easier than implementing this in LSP, which opens support for more languages (in theory), e.g. TreeSitter supports languages which do not have LSP implementation at all. The downside is that we're tying ourselves to specific technology, which is not great, but can also be said about LSP, though the support behind LSP is much greater.

My thought was that we can build a successive lookahead/behind first-error stack based search engine, which I was trying to do here.

So for example, in Lua case, where do keyword can be part of for or be used by itself, and we're searching the beginning of the expression from the middle:

for i = 0, 10 do
    print i
    dolocal j = i + 10
        print j
    end
end

so we want to do sp-splice-sexp-killing-backward from the cursor position . Here, do is a simple part of do end block, but do also can be a part of for, so we should be able to account for that as well. We do backward search, which finds do keyword and continues until fails or finds a matching keyword that can be combined with do. So currently our stack looks like '(do) then we push i to the stack. Still no error, because i can be part of the for, so our stack is '(i do). Next we find print, and we push it to the stack, but now it is an error, because '(print i do) stack can't be reduced as, do can't be part of print keyword. So we fail here, and mark do as a separate do which is not a part of any other keyword. So generally speaking our parser for Lua should have list of all combinations of do and other keywords, like '(while do end), '(for do end), '(function end). Then we build a search that knows all possible reductions, and hence can tell that do is not possible to be used with function. But this imples that parser has to know all language keywords that can't be paired with do or end to detect the error.
The stack can be later used to do fast forward search to find matching end delimiter in case of possible ambiguity there, as we already know how many successful reductions were made to find expression start.

I hope this makes sense :)

@Fuco1
Copy link
Owner

Fuco1 commented Mar 31, 2021

It makes complete sense and it's very clever! Is this how TreeSitter works in general or is this only the approach you took here?

I heard about TreeSitter before but honestly that's about it, I knew it existed. I am generally not biased towards LSP as the input lag is exactly what I was afraid of. If something else is much faster and "practically synchronous", that would be my first choice.

I guess once a generic support for it is written, extra languages can be added by anyone implementing the grammar, so that seems like a worthy pursuit.

@andreyorst
Copy link
Contributor Author

andreyorst commented Mar 31, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do-block syntax in Elixir confuses show-smartparens-mode
2 participants