-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Regex revamp #323
Comments
I think I agree about removing |
Yes, yes, and for the last one: yes! I don't actually think it makes sense for String to have regexp method defined on it. I'd love to kill magic variables. You could think |
@vendethiel Yes, I was thinking more about something like this: case some_string
when match = /some_(group1)/
# match[1]
when match = /some_(group2)/
# match[1]
end But case exp = some.complex.expression
when Foo
# compiler knows exp is Foo
when Bar
# compiler knows exp is Bar
end But the change would be to move the assignment before the (But maybe the |
Another possibility would be to keep the |
We brainstormed a bit about this. We imagined what would some code look like without the magic private def process_handler(handler)
flag = handler.flag
block = handler.block
case
when match = flag.match /--(\S+)\s+\[\S+\]/
process_double_flag("--#{match[1]}", block)
when match = flag.match /--(\S+)(\s+|\=)(\S+)?/
process_double_flag("--#{match[1]}", block, true)
when flag.match /--\S+/
process_flag_presence(flag, block)
when flag.match /-(.)\s*\[\S+\]/
process_single_flag(flag[0 .. 1], block)
when flag =~ /-(.)\s+\S+/, flag =~ /-(.)\s+/, flag =~ /-(.)\S+/
process_single_flag(flag[0 .. 1], block, true)
else
process_flag_presence(flag, block)
end
end To us, it looks much less readable (although it obviously gets the job done). So, we were thinking of ways to keep these magic variables while at the same time making them concurrent-safe. The ideas that we had are:
The other alternatives are to keep them as global variables (maybe thread-local, but fiber-local would be a bit hard to implement, I guess), or to remove them completely from the language. All of the above decisions apply to We are just putting these ideas here so we can know what you think about these. And also maybe you have a better idea of how to "fix" this. We didn't decide anything yet :-) What do you think? |
That seems a bit scary. (i'm not sure who the "you", is, though, but i'll take it as crystal users in general :P) I'm mostly against global variables being used here. Magical variables are a lesser evil, imho. It's really a mix of things that make it easier for them. Their "switch" statement will use said smart-matching feature, so each case can just be a regexp, and you can reap the fruits of the ... well, of the smart-matching : given "string" {
# "when" operates on the topic, so, "given"'s operand
when /hey: .* here/ {
say "I found {$/.Str}";
}
} But really, I'm not sure it even matters that much. Do you want crystal to be a scripting (as in, crystal -e to do awk-like things)-full language? |
I'm not fond of magic global variables, but Ruby made me love $1, $2, ... when dealing with conditional regexp matches. That being said, having them to be real globals sounds crazy. Yes, $ means it's global, but I expect them to be just local to the current scope, as in solution 1 (if I understand correctly). I don't see the need for |
Well, we're happy to let you know that we have found a way to keep The idea is that you can assign to def foo
$~ = "hey"
end
foo
puts $~ #=> "hey" The way this is implemented is by passing a hidden pointer to def foo(hidden_ptr)
hidden_ptr.value = "hey"
end
$~ :: MatchData?
foo(pointerof($~))
puts $~.not_nil! #=> "hey" The
Finally, We would assign to these variables in We believe these magic variables, once learned, make code much more readable and easier to read and write (although they might look cryptic at first). Text processing and command execution is something that is very common to do, so it's nice to have good support for this. And all of this is guaranteed to be concurrent (thread and fiber) safe. And, at the syntax level, nothing was really added. We hope you like it! :-) |
I'm closing this because of the above decision. |
@asterite It was not entirely clear from your final comments whether you decided to keep Ruby's
I hope it stays in, and I'd like to explain why I think keeping it is the best choice, and the best thing for Crystal. I think you're going to face decisions like this over and over, that amount to the question
Personally, I think the answer should either be "the Ruby way" or "both", except in cases where you need to diverge from Ruby in order to meet primary (or critical) goals of Crystal, such as being a compiled language, and keeping compilation times down to a reasonably fast speed, and mapping language types to native machine types, and so forth. In the case of using The value I see in maintaining Ruby syntax is that the closer you are to Ruby, the more Ruby developers will adopt Crystal. They will see Crystal as basically Ruby that compiles and is super-fast (even though that's not entirely an accurate description once you dig into the details: there's more to Crystal than just Ruby that compiles, but that's an easy way to describe it to new developers). The more you diverge, the more people will say "well, it's a little bit like Ruby, but it's really an entirely different language that just borrows ideas from Ruby", and it will be perceived more like Elixir—friendly to Rubyists, but still very different. As things stand, I think Crystal is very easy for a Rubyist to learn, although there are things you must learn. I see the changes as good, and I'm very excited about Crystal, but the more it diverges the more I'll be just a little bit disappointed, and a little bit more with each divergence. And I think there are thousands of people who will have similar feelings. So, in short: I think there is great value in maintaining as much compatibility with Ruby as is reasonable. Any changes should bring incredible gains, instead of just minor or debatable ones. And it's also possible to have it both ways, if the new way is really that much better than the old way. |
The Global AST node is still around, and code associated to it too, because it's used internally by the compiler to store pre-compiled regexes. The special $~ and $! are also represented as globals, in the syntax. We can improve this in the future.
Ruby has the
=~
operator for doing regular-expression matching:The operator is defined in Object, returning nil, and only redefined in String and Regexp.
There are some thing I don't like about this:
=~
has no intuitive name: Ruby calls it pattern match, but it's not that obvious from it looks (I usually think ofcase ... when .. end
when somebody says "pattern match").$~
,$1
, etc. magic variables.Granted, once you learn it you get used to it and it feels kind of comfortable, but is Regexp really that important for it to have an operator just for itself, and the
$~
,$1
magic variables?I'd like to do this:
=~
operator. Instead, add aString#match
method (similar toRegex#match
) that returns aMatchData
if successful andnil
if not.$~
,$1
, etc. magic variables. They might be thread-local, but to make them fiber-local or method-local we would have to add some tricks or magic to the compiler (or a way to make a method define variables in an outer scope, which sounds very magical and unintuitive), when their removal and use of amatch
result would be much simpler and understandable.The only downside I see is that this wouldn't work:
Instead, one would have to write:
which is a bit more verbose, but lacks all magic. But I don't think that code is way too often to justify those magic features.
What do you think?
As a side note, I'd also like to remove the
$?
magic variable, and then we would be free of them.The text was updated successfully, but these errors were encountered: