Skip to content

Skip RBS rewriter when file does not contain RBS syntax#917

Merged
dejmedus merged 3 commits into
mainfrom
jb-rbs-marker-gaurd
May 14, 2026
Merged

Skip RBS rewriter when file does not contain RBS syntax#917
dejmedus merged 3 commits into
mainfrom
jb-rbs-marker-gaurd

Conversation

@dejmedus
Copy link
Copy Markdown
Contributor

@dejmedus dejmedus commented May 12, 2026

If a file does not contain typed file markers (ex # typed: true) or RBS syntax we can skip attempting to translate it. If a file has not been rewritten we can exclude it from the count of translated files

@dejmedus dejmedus force-pushed the jb-rbs-marker-gaurd branch from 82e90b7 to 07a9394 Compare May 12, 2026 19:51
RB
end

def test_should_rewrite_returns_true_for_supported_typed_sigils
Copy link
Copy Markdown
Contributor

@amomchilov amomchilov May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more cases to test:

  1. Don't trigger if # typed: true exists later in the file. This is unlikely, but it's a performance improvement to ensure we're not scanning through the whole file

  2. Trigger if there's multiple magic comments, but typed isn't first:

    # frozen_string_literal: true
    # typed: true

Copy link
Copy Markdown
Contributor Author

@dejmedus dejmedus May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> Don't trigger if # typed: true exists later in the file

Good call! I updated to only check for the typed marker in the magic comments block

Refactored to instead use strictness_in_content and valid_strictness checks from /sorbet/sigils.rb

@dejmedus dejmedus force-pushed the jb-rbs-marker-gaurd branch from 07a9394 to 47118ab Compare May 12, 2026 20:57
Comment thread lib/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs.rb Fixed
Comment thread lib/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs.rb Fixed
@dejmedus dejmedus force-pushed the jb-rbs-marker-gaurd branch 2 times, most recently from cf00c3f to d3125ac Compare May 12, 2026 21:23
Comment thread lib/spoom/cli/srb/sigs.rb
next if new_contents == contents

File.write(file, new_contents)
transformed_count += 1
Copy link
Copy Markdown
Contributor Author

@dejmedus dejmedus May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only count changed files in the Translated signatures in <x> files. message, but I'm happy to drop the commit if we want to leave as is

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it.

@dejmedus dejmedus force-pushed the jb-rbs-marker-gaurd branch from d3125ac to f25aa67 Compare May 12, 2026 22:26
@dejmedus dejmedus marked this pull request as ready for review May 12, 2026 22:46
@dejmedus dejmedus requested a review from a team as a code owner May 12, 2026 22:46
@dejmedus dejmedus requested a review from amomchilov May 12, 2026 22:47
Copy link
Copy Markdown
Member

@paracycle paracycle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't all this logic be encapsulated inside the RBSCommentsToSorbetSigs class, which already stores the ruby_contents as an ivar, and wouldn't need to pass it to a should_rewrite? method. The downside is just an extra object allocation only to return the same contents, but I think that's fine.

Comment thread lib/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs.rb
Copy link
Copy Markdown
Contributor

@amomchilov amomchilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small points, but it's shaping up!

Comment thread lib/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs.rb Outdated
Comment thread lib/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs.rb Outdated
Comment thread test/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs_test.rb Outdated
Comment thread test/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs_test.rb Outdated
Comment thread test/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs_test.rb Outdated
Comment thread test/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs_test.rb Outdated
Comment thread rbi/spoom.rbi
end
end

Spoom::Sorbet::Translate::RBSCommentsToSorbetSigs::RBS_ANNOTATION_MARKERS = T.let(T.unsafe(nil), Array)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark these with private_constant where they're defined, so we don't expose them to the public.

@dejmedus dejmedus force-pushed the jb-rbs-marker-gaurd branch from f25aa67 to 8244d5a Compare May 13, 2026 02:19
@dejmedus dejmedus requested review from amomchilov and paracycle May 13, 2026 02:58
Copy link
Copy Markdown
Contributor

@amomchilov amomchilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending the unresolved comments

end
end

def test_contains_rbs_syntax_returns_true_when_typed_sigils_follow_magic_comments
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_contains_rbs_syntax_returns_true_when_typed_sigils_follow_magic_comments
def test_contains_rbs_syntax_returns_true_when_typed_sigil_is_after_other_magic_comments

RB
end

def test_contains_rbs_syntax_returns_false_for_unrelated_yard_tags
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

end

#: (String ruby_contents, file: String, ?max_line_length: Integer?) -> String
def rewrite(ruby_contents, file:, max_line_length: nil)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def rewrite(ruby_contents, file:, max_line_length: nil)
def rewrite_if_needed(ruby_contents, file:, max_line_length: nil)

or something like this


def test_translate_to_rbi_method_sigs
contents = <<~RB
# typed: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead call the underlying new.rewrite instead so we don't have to rewrite all the tests?

Copy link
Copy Markdown
Contributor Author

@dejmedus dejmedus May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do:) Updated (I just left the few in the cli sigs_test.rb file)

Copy link
Copy Markdown
Contributor Author

@dejmedus dejmedus May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry scratch that, the newest commit moves the RBS check logic into new.rewrite so the tests would again need a sigil (unless we wanted to drop the sigil check and just look for RBS?) I could probably make a test helper to insert sigils to contents but I wonder if that would be more confusing. What do you think?

dejmedus and others added 3 commits May 13, 2026 12:56
When a file is not typed or contains no RBS comment
syntax, we can skip running the RBS rewriter on it

Co-authored-by: Matt Kubej <matt.kubej@shopify.com>
@dejmedus dejmedus force-pushed the jb-rbs-marker-gaurd branch from 8244d5a to 5d5eb6a Compare May 13, 2026 19:07
@dejmedus dejmedus requested a review from Morriar May 13, 2026 19:17
Comment on lines +23 to +36
class << self
#: (String source) -> bool
def contains_rbs_syntax?(source)
Sigils.contains_valid_sigil?(source) && source.match?(RBS_REWRITE_PATTERN)
end

#: (String ruby_contents, file: String, ?max_line_length: Integer?) -> String
def rewrite_if_needed(ruby_contents, file:, max_line_length: nil)
return ruby_contents unless contains_rbs_syntax?(ruby_contents)

new(ruby_contents, file:, max_line_length:).rewrite
end
end

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not opinionated about this but I would have prefered:

Suggested change
class << self
#: (String source) -> bool
def contains_rbs_syntax?(source)
Sigils.contains_valid_sigil?(source) && source.match?(RBS_REWRITE_PATTERN)
end
#: (String ruby_contents, file: String, ?max_line_length: Integer?) -> String
def rewrite_if_needed(ruby_contents, file:, max_line_length: nil)
return ruby_contents unless contains_rbs_syntax?(ruby_contents)
new(ruby_contents, file:, max_line_length:).rewrite
end
end
#: (String source) -> bool
def contains_rbs_syntax?
Sigils.contains_valid_sigil?(@ruby_contents) && @ruby_contents.match?(RBS_REWRITE_PATTERN)
end
# @override
#: () -> String
def rewrite
return @ruby_contents unless contains_rbs_syntax?
super
end

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In summary, I think we are unnecessarily pulling logic into class methods and passing values around when the value we are interested in is being passed into the constructor of our class, and no-one cares what exactly rewrite does internally (i.e. in this case it decides to return the original buffer intact).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that we eagerly do parsing in the initialize of Translator, but we could also short-circuit the call to super in this class's initialize method by doing the sigil and rewrite checks in the initializer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was thinking that I needed to exit early before we initialized to prevent hitting parse_ruby and that rewrite needed that path to have run already, but I think this is clicking for me now, thanks very much for the examples!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dejmedus No problem. I am glad to hear it is helpful.

As for initialize, there is nothing magic about that method, and returning early doesn't change behaviour in any way different than in other methods. Even an empty initialize method still initializes the object, so you can add any logic in initialize that you want.

Comment on lines +16 to +17
"# @override",
"# @overridable",
Copy link
Copy Markdown
Contributor

@amomchilov amomchilov May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thought I had: does factoring out the common prefixes produce a more faster regular expression? E.g. overrid(?:e|able) instead of checking for the two whole words separately.

In short, yes, but barely. Onigmo doesn't optimize out common prefixes in the NFA it creates, but it doesn't make a real world difference here.

Got Claude to benchmark it for me:

require "benchmark"

# loading corpus... 101996 files, 616MB, loaded in 15.09s
# sanity-checking match equivalence... 19104 total matches across corpus
#
#                     # user     system      total        real
# flat              0.536375   0.006953   0.543328 (  0.552176)
# factored          0.530740   0.005332   0.536072 (  0.544158)
#
# factored is 1.01x faster than flat (real time)

MARKERS = [
  "# @abstract",
  "# @interface",
  "# @sealed",
  "# @final",
  "# @requires_ancestor:",
  "# @override",
  "# @overridable",
  "# @without_runtime",
].freeze


FLAT     = Regexp.union(*MARKERS)
FACTORED = /\# @(?:abstract|interface|sealed|final|requires_ancestor:|overrid(?:e|able)|without_runtime)/

raise "regexes diverge on hits"   unless MARKERS.all? { |m| FLAT =~ m && FACTORED =~ m }
raise "regexes diverge on misses" if FLAT =~ "# @other" || FACTORED =~ "# @other"

CORPUS_GLOB = "/your/test/corpus/**/*.rb"

print "loading corpus... "
load_start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
CORPUS = Dir.glob(CORPUS_GLOB).map { |path| File.read(path) rescue nil }.compact.freeze
load_elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - load_start
total_bytes = CORPUS.sum(&:bytesize)
puts "#{CORPUS.size} files, #{total_bytes / 1_000_000}MB, loaded in #{load_elapsed.round(2)}s"

print "sanity-checking match equivalence... "
flat_hits     = CORPUS.sum { |s| s.scan(FLAT).size }
factored_hits = CORPUS.sum { |s| s.scan(FACTORED).size }
raise "regexes diverge: flat=#{flat_hits} factored=#{factored_hits}" unless flat_hits == factored_hits
puts "#{flat_hits} total matches across corpus"

flat_time     = Benchmark.measure("flat")     { CORPUS.each { |s| s.scan(FLAT) } }
factored_time = Benchmark.measure("factored") { CORPUS.each { |s| s.scan(FACTORED) } }

slower, faster = [flat_time, factored_time].sort_by(&:real).reverse
ratio = slower.real / faster.real
faster_name = faster.equal?(flat_time) ? "flat" : "factored"
slower_name = slower.equal?(flat_time) ? "flat" : "factored"

puts <<~RESULTS

                  #{Benchmark::CAPTION}
  flat            #{flat_time.to_s.strip}
  factored        #{factored_time.to_s.strip}

  #{faster_name} is #{ratio.round(2)}x faster than #{slower_name} (real time)
RESULTS

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factored is 1.01x faster than flat

Interesting!

Comment thread lib/spoom/sorbet/translate/rbs_comments_to_sorbet_sigs.rb Outdated
@dejmedus dejmedus force-pushed the jb-rbs-marker-gaurd branch from 9a52a1f to 0b8d325 Compare May 13, 2026 21:25
Copy link
Copy Markdown
Contributor

@Morriar Morriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

design is wrong, see comment

super(ruby_contents, file: file)
@ruby_contents = ruby_contents
if contains_rbs_syntax?
super(ruby_contents, file: file)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if contains_rbs_syntax? returns false we still initialize the object but do not call super. This means instance methods after that are responsible of knowing which state the instance is in. It's bad design.

Why do we even instantiate the Translator if we don't need to rewrite? This check should be made earlier.

Let's extract this to a maybe_rewrite singleton method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, I believe this is what we originally had here but it was discussed that we could prevent needing to pass around values if the logic was moved inside

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional call to super is a code smell, the fact that we can't test translation without the sigils is another one.

The main problems:

  • Fragile: forgetting which condition triggers super leads to half-initialized objects. Future maintainers have to reason about two different initialization paths.
  • Violates Liskov: subclass instances behave differently depending on whether the parent was initialized, which breaks substitutability.
  • Hard to test: you need to cover both branches, and bugs in the "no super" path tend to surface late as nil errors.

Instead of short-circuiting the super call we should just short-circuit the class instantiation altogether since we don't need it.

Copy link
Copy Markdown
Contributor Author

@dejmedus dejmedus May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand, thank you. I dropped the commit and now are exiting before initialization and calling new.rewrite in tests

@dejmedus dejmedus force-pushed the jb-rbs-marker-gaurd branch from 0b8d325 to 5d5eb6a Compare May 14, 2026 18:18
@dejmedus dejmedus requested a review from Morriar May 14, 2026 18:27
Copy link
Copy Markdown
Contributor

@Morriar Morriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sorry for the back and forth about the design.

Copy link
Copy Markdown
Contributor

@amomchilov amomchilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shipit

@dejmedus dejmedus merged commit 21e8e9b into main May 14, 2026
23 checks passed
@dejmedus dejmedus deleted the jb-rbs-marker-gaurd branch May 14, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skip RBS rewriter when a file doesn't contain any RBS syntax

5 participants