Skip RBS rewriter when file does not contain RBS syntax#917
Conversation
82e90b7 to
07a9394
Compare
| RB | ||
| end | ||
|
|
||
| def test_should_rewrite_returns_true_for_supported_typed_sigils |
There was a problem hiding this comment.
Two more cases to test:
-
Don't trigger if
# typed: trueexists later in the file. This is unlikely, but it's a performance improvement to ensure we're not scanning through the whole file -
Trigger if there's multiple magic comments, but
typedisn't first:# frozen_string_literal: true # typed: true
There was a problem hiding this comment.
> Don't trigger if # typed: true exists later in the file
Good call! I updated to only check for the typed marker in the magic comments block
Refactored to instead use strictness_in_content and valid_strictness checks from /sorbet/sigils.rb
07a9394 to
47118ab
Compare
cf00c3f to
d3125ac
Compare
| next if new_contents == contents | ||
|
|
||
| File.write(file, new_contents) | ||
| transformed_count += 1 |
There was a problem hiding this comment.
This will only count changed files in the Translated signatures in <x> files. message, but I'm happy to drop the commit if we want to leave as is
d3125ac to
f25aa67
Compare
paracycle
left a comment
There was a problem hiding this comment.
Shouldn't all this logic be encapsulated inside the RBSCommentsToSorbetSigs class, which already stores the ruby_contents as an ivar, and wouldn't need to pass it to a should_rewrite? method. The downside is just an extra object allocation only to return the same contents, but I think that's fine.
amomchilov
left a comment
There was a problem hiding this comment.
Few small points, but it's shaping up!
| end | ||
| end | ||
|
|
||
| Spoom::Sorbet::Translate::RBSCommentsToSorbetSigs::RBS_ANNOTATION_MARKERS = T.let(T.unsafe(nil), Array) |
There was a problem hiding this comment.
Let's mark these with private_constant where they're defined, so we don't expose them to the public.
f25aa67 to
8244d5a
Compare
amomchilov
left a comment
There was a problem hiding this comment.
LGTM pending the unresolved comments
| end | ||
| end | ||
|
|
||
| def test_contains_rbs_syntax_returns_true_when_typed_sigils_follow_magic_comments |
There was a problem hiding this comment.
| def test_contains_rbs_syntax_returns_true_when_typed_sigils_follow_magic_comments | |
| def test_contains_rbs_syntax_returns_true_when_typed_sigil_is_after_other_magic_comments |
| RB | ||
| end | ||
|
|
||
| def test_contains_rbs_syntax_returns_false_for_unrelated_yard_tags |
| end | ||
|
|
||
| #: (String ruby_contents, file: String, ?max_line_length: Integer?) -> String | ||
| def rewrite(ruby_contents, file:, max_line_length: nil) |
There was a problem hiding this comment.
| def rewrite(ruby_contents, file:, max_line_length: nil) | |
| def rewrite_if_needed(ruby_contents, file:, max_line_length: nil) |
or something like this
|
|
||
| def test_translate_to_rbi_method_sigs | ||
| contents = <<~RB | ||
| # typed: true |
There was a problem hiding this comment.
Can we instead call the underlying new.rewrite instead so we don't have to rewrite all the tests?
There was a problem hiding this comment.
Can do:) Updated (I just left the few in the cli sigs_test.rb file)
There was a problem hiding this comment.
Sorry scratch that, the newest commit moves the RBS check logic into new.rewrite so the tests would again need a sigil (unless we wanted to drop the sigil check and just look for RBS?) I could probably make a test helper to insert sigils to contents but I wonder if that would be more confusing. What do you think?
When a file is not typed or contains no RBS comment syntax, we can skip running the RBS rewriter on it Co-authored-by: Matt Kubej <matt.kubej@shopify.com>
8244d5a to
5d5eb6a
Compare
| class << self | ||
| #: (String source) -> bool | ||
| def contains_rbs_syntax?(source) | ||
| Sigils.contains_valid_sigil?(source) && source.match?(RBS_REWRITE_PATTERN) | ||
| end | ||
|
|
||
| #: (String ruby_contents, file: String, ?max_line_length: Integer?) -> String | ||
| def rewrite_if_needed(ruby_contents, file:, max_line_length: nil) | ||
| return ruby_contents unless contains_rbs_syntax?(ruby_contents) | ||
|
|
||
| new(ruby_contents, file:, max_line_length:).rewrite | ||
| end | ||
| end | ||
|
|
There was a problem hiding this comment.
I am not opinionated about this but I would have prefered:
| class << self | |
| #: (String source) -> bool | |
| def contains_rbs_syntax?(source) | |
| Sigils.contains_valid_sigil?(source) && source.match?(RBS_REWRITE_PATTERN) | |
| end | |
| #: (String ruby_contents, file: String, ?max_line_length: Integer?) -> String | |
| def rewrite_if_needed(ruby_contents, file:, max_line_length: nil) | |
| return ruby_contents unless contains_rbs_syntax?(ruby_contents) | |
| new(ruby_contents, file:, max_line_length:).rewrite | |
| end | |
| end | |
| #: (String source) -> bool | |
| def contains_rbs_syntax? | |
| Sigils.contains_valid_sigil?(@ruby_contents) && @ruby_contents.match?(RBS_REWRITE_PATTERN) | |
| end | |
| # @override | |
| #: () -> String | |
| def rewrite | |
| return @ruby_contents unless contains_rbs_syntax? | |
| super | |
| end |
There was a problem hiding this comment.
In summary, I think we are unnecessarily pulling logic into class methods and passing values around when the value we are interested in is being passed into the constructor of our class, and no-one cares what exactly rewrite does internally (i.e. in this case it decides to return the original buffer intact).
There was a problem hiding this comment.
I notice that we eagerly do parsing in the initialize of Translator, but we could also short-circuit the call to super in this class's initialize method by doing the sigil and rewrite checks in the initializer.
There was a problem hiding this comment.
I think I was thinking that I needed to exit early before we initialized to prevent hitting parse_ruby and that rewrite needed that path to have run already, but I think this is clicking for me now, thanks very much for the examples!
There was a problem hiding this comment.
@dejmedus No problem. I am glad to hear it is helpful.
As for initialize, there is nothing magic about that method, and returning early doesn't change behaviour in any way different than in other methods. Even an empty initialize method still initializes the object, so you can add any logic in initialize that you want.
| "# @override", | ||
| "# @overridable", |
There was a problem hiding this comment.
Random thought I had: does factoring out the common prefixes produce a more faster regular expression? E.g. overrid(?:e|able) instead of checking for the two whole words separately.
In short, yes, but barely. Onigmo doesn't optimize out common prefixes in the NFA it creates, but it doesn't make a real world difference here.
Got Claude to benchmark it for me:
require "benchmark"
# loading corpus... 101996 files, 616MB, loaded in 15.09s
# sanity-checking match equivalence... 19104 total matches across corpus
#
# # user system total real
# flat 0.536375 0.006953 0.543328 ( 0.552176)
# factored 0.530740 0.005332 0.536072 ( 0.544158)
#
# factored is 1.01x faster than flat (real time)
MARKERS = [
"# @abstract",
"# @interface",
"# @sealed",
"# @final",
"# @requires_ancestor:",
"# @override",
"# @overridable",
"# @without_runtime",
].freeze
FLAT = Regexp.union(*MARKERS)
FACTORED = /\# @(?:abstract|interface|sealed|final|requires_ancestor:|overrid(?:e|able)|without_runtime)/
raise "regexes diverge on hits" unless MARKERS.all? { |m| FLAT =~ m && FACTORED =~ m }
raise "regexes diverge on misses" if FLAT =~ "# @other" || FACTORED =~ "# @other"
CORPUS_GLOB = "/your/test/corpus/**/*.rb"
print "loading corpus... "
load_start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
CORPUS = Dir.glob(CORPUS_GLOB).map { |path| File.read(path) rescue nil }.compact.freeze
load_elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - load_start
total_bytes = CORPUS.sum(&:bytesize)
puts "#{CORPUS.size} files, #{total_bytes / 1_000_000}MB, loaded in #{load_elapsed.round(2)}s"
print "sanity-checking match equivalence... "
flat_hits = CORPUS.sum { |s| s.scan(FLAT).size }
factored_hits = CORPUS.sum { |s| s.scan(FACTORED).size }
raise "regexes diverge: flat=#{flat_hits} factored=#{factored_hits}" unless flat_hits == factored_hits
puts "#{flat_hits} total matches across corpus"
flat_time = Benchmark.measure("flat") { CORPUS.each { |s| s.scan(FLAT) } }
factored_time = Benchmark.measure("factored") { CORPUS.each { |s| s.scan(FACTORED) } }
slower, faster = [flat_time, factored_time].sort_by(&:real).reverse
ratio = slower.real / faster.real
faster_name = faster.equal?(flat_time) ? "flat" : "factored"
slower_name = slower.equal?(flat_time) ? "flat" : "factored"
puts <<~RESULTS
#{Benchmark::CAPTION}
flat #{flat_time.to_s.strip}
factored #{factored_time.to_s.strip}
#{faster_name} is #{ratio.round(2)}x faster than #{slower_name} (real time)
RESULTSThere was a problem hiding this comment.
factored is 1.01x faster than flat
Interesting!
9a52a1f to
0b8d325
Compare
| super(ruby_contents, file: file) | ||
| @ruby_contents = ruby_contents | ||
| if contains_rbs_syntax? | ||
| super(ruby_contents, file: file) |
There was a problem hiding this comment.
So if contains_rbs_syntax? returns false we still initialize the object but do not call super. This means instance methods after that are responsible of knowing which state the instance is in. It's bad design.
Why do we even instantiate the Translator if we don't need to rewrite? This check should be made earlier.
Let's extract this to a maybe_rewrite singleton method.
There was a problem hiding this comment.
If I'm understanding correctly, I believe this is what we originally had here but it was discussed that we could prevent needing to pass around values if the logic was moved inside
There was a problem hiding this comment.
The conditional call to super is a code smell, the fact that we can't test translation without the sigils is another one.
The main problems:
- Fragile: forgetting which condition triggers super leads to half-initialized objects. Future maintainers have to reason about two different initialization paths.
- Violates Liskov: subclass instances behave differently depending on whether the parent was initialized, which breaks substitutability.
- Hard to test: you need to cover both branches, and bugs in the "no super" path tend to surface late as
nilerrors.
Instead of short-circuiting the super call we should just short-circuit the class instantiation altogether since we don't need it.
There was a problem hiding this comment.
Ah, I understand, thank you. I dropped the commit and now are exiting before initialization and calling new.rewrite in tests
0b8d325 to
5d5eb6a
Compare
Morriar
left a comment
There was a problem hiding this comment.
Thanks! Sorry for the back and forth about the design.
If a file does not contain typed file markers (ex
# typed: true) or RBS syntax we can skip attempting to translate it. If a file has not been rewritten we can exclude it from the count of translated files