RFC: Add Sorbet typechecker to dd-trace-rb #1607

ivoanjo · 2021-07-22T11:21:07Z

TL;DR

Minimal bootstrapping for now.
Type checking is optional. Default is no type checking for files without a magic comment. Should have minor impact on people that don't want to touch the type checking.
bundle exec srb tc runs the type checking.
I've enabled type checking during CI runs

Notes for reviewing

This PR has two commits -- one that has all of the setup and relevant things, and another which is 100% the result of running bundle exec srb init. I strongly suggest reviewing them separately, for added sanity. (If you want to quickly glance through the second commit -- your terminal won't choke on the diff like github does 😉 )
This PR depends on Sorbet: Include Kernel module to disambiguate from usage in BasicObject #1605, and CI is failing for unrelated reasons. Thus, I've opened it as a draft until both of them are cleared and the runway is clear. It otherwise is ready for review/discussion. Now updated.

What?

While not a replacement for all our tests and other tooling, a type checker is a nice extra line of defense for catching issues that we otherwise may have missed, such as #1602.

The Ruby ecosystem around type checking is still rather immature. You may have heard that Ruby 3 shipped "with type checking" but that's an oversimplification: it actually only shipped with a "language for describing types" and a "type analysis tool" (https://www.ruby-lang.org/en/news/2020/12/25/ruby-3-0-0-released/).
You may have noticed that nowhere in the previous sentence the words "type checker" were included.

So, Ruby 3 itself provides no type checker. In practice, there are two typecheckers in "wide" use for Ruby:

Steep, created by a Ruby core team member that also worked on manny of the other type-related bits that made it to Ruby 3 (but steep itself was not made part of Ruby)
Sorbet, a type checker in wide use at Stripe and Shopify

This RFC PR does a first pass at onboarding Sorbet to our codebase. There's a number of limitations on this "first pass", which I document below.

Why not Steep?

I've tried onboarding our codebase to steep, and gave up due to two big issues:

Steep relies on the Ruby 3 rbs format, but unlike Sorbet, there doesn't seem to be any way of starting small on a mostly-untyped codebase without any .rbs files, and iterate from there. There are tools to generate skeleton .rbs files (the rbs gem has two different modes for it + the typeprof gem), but they had a hard time with ddtrace.
There is currently little documentation for the type checking errors, once Steep does run. Google gave me ZERO hits on one of them (I guess it doesn't index the tool repository at least). And they're not all that easy to understand/fix, so what use is having a tool tell you "this is wrong", if then you're stuck with little help on something that may not even be a real issue but more likely a tool limitation?

For these reasons, I gave up on Steep (for this first iteration).

Working with Sorbet

Sorbet can be run with bundle exec srb tc (or bundle exec rake typecheck). There's also Language Server Protocol support, if your editor supports it.

The key thing to know about Sorbet is that allows file-level granularity for enabling type checks, see
https://sorbet.org/docs/static#file-level-granularity-strictness-levels.

In practice, this means that at the top of each .rb file, we can provide a comment that controls the reporting of type checking errors for that file. The usual ones (described in the above link as well) are:
# typed: ignore, # typed: false, # typed: true, # typed: strict, and # typed: strong.

In many cases, Sorbet can typecheck a file correctly with no extra type annotations, so we can start enjoying its checks (such as finding #1602) by just enabling it for our files. This can be done with the # typed: true, # typed: strict, # typed: strong modes (see the docs for details).

The # typed: false is the default for files with no comment. Thus, even newcomers into the codebase will not be forced to immediately deal with Sorbet for contributions, which given the current limitations of the tool, and level of maturity, I think is reasonable. (We can reconfigure this later, if needed). In this mode, Sorbet will only complain about syntax and missing constant errors.

Our codebase requires quite a few # typed: ignore because or dependencies that are pulled via appraisals and thus not seen/available to Sorbet (among other things that we do and that Sorbet) doesn't quite like. These files are completely ignored by Sorbet. (Note that if you then reference something on one of these files from another file, that file will not typecheck either, so "ignore" becomes somewhat contagious, and is best avoided if possible).

I've also added a CI step to run sorbet, which again follows the rules above so should be able to provide us with valuable insight without being an annoying source of build failures.

Limitations or "I thought the whole point of this was for us to add type declarations to the code"?

Now we get to the more unfortunate part of the news: I think that for now we don't have great options for doing this.

Sorbet type annotations are supposed to be inline with the code, and work through the sorbet-runtime gem.

Unfortunately, as described on https://sorbet.org/docs/faq#what-platforms-does-sorbet-support:

The sorbet-runtime gem is currently only tested on Ruby 2.6 and Ruby 2.7. It is known to not support Ruby 2.4. Feel free to report runtime issues for any current or future Ruby version.

Thus, since we still want to support Ruby 2.1, that doesn't quite work for us. I thought we could perhaps have an empty replacement of the sorbet runtime gem, and it turns out that Shopify already tried and abandoned it. So, inline annotations seem to be out for us.

For Sorbet, that leaves us with only one other option: .rbi files.
These are files that allow Sorbet to get type information for otherwise untyped code, and are the solution to having type information for many common libraries.

Unfortunately, using .rbi files for our own code is very error-prone, because Sorbet trusts .rbi files, and doesn't check them against the code. Thus, if we had a class A with method m declared both on our code, as well as on an .rbi file, and then we renamed m to n and forgot to update the .rbi, then Sorbet would not complain about it.
Worse, if somewhere there was a call to A#m somewhere in the code, Sorbet would still think it was correct, while in reality it would break at run-time.

The future?

Because our usage of Sorbet is rather simple, we can at any point switch to Steep, or any other tool, or just disable it, if it's not providing value. So we are not locked at all into its usage.

Hopefully in the near future we'll find a way to add type declarations to our library without impact to our support matrix of older Rubies.

Even better, we could then ship those types to our customers, for them to check that they are using ddtrace APIs correctly!

marcotc · 2021-07-28T19:28:08Z

sorbet/config

+--dir
+.


Is this the same as --dir .?

Yeah! This file was originally generated by sorbet (I tweaked it), but this was left untouched. I tried moving it to the same line to clarify but their arg parser seems confused:

ivo.anjo@macieira:~/datadog/dd-trace-rb$ be srb tc No errors! Great job. # <-- changed file ivo.anjo@macieira:~/datadog/dd-trace-rb$ git diff diff --git a/sorbet/config b/sorbet/config index e9909983..8106c29c 100644 --- a/sorbet/config +++ b/sorbet/config @@ -1,3 +1,2 @@ ---dir -. +--dir . --ignore=/integration ivo.anjo@macieira:~/datadog/dd-trace-rb$ be srb tc Argument ‘--dir .’ starts with a - but has incorrect syntax. To see all available options pass `--help`. ivo.anjo@macieira:~/datadog/dd-trace-rb$ rm sorbet/config ivo.anjo@macieira:~/datadog/dd-trace-rb$ be srb tc --dir . --ignore=/integration No errors! Great job.

There's no real parser here, this kind of file is taking one arg per line, so on one line it'd be like (bash) sorbet "--dir ." and not sorbet "--dir" "."

marcotc

No errors! Great job.

marcotc · 2021-07-28T19:30:12Z

My browser has suffered literal physical pain while loading all the files in this PR, my laptop is hot hot hot

lloeki · 2021-08-03T11:16:11Z

I've had some experience using Sorbet, not an expert by any stretch (anecdata of sample size == 1). I really wanted to like it, but I found it to be quite awkward, even more so for libraries to be published as gems. Although some issues here seem to come from the design itself, to be fair my last "life-size" try with Sorbet dates back around Feb 2019.

To use Sorbet annotations one has to add calls to a sig method (coming from extend T::Sig) in the source. Therefore Sorbet tries really hard to fit a signature description language within the Ruby syntax, because it has to, to allow Ruby to parse the sigs even if it never evaluates them.

Adding the annotations to a library means that Sorbet becomes a mandatory runtime dependency of either the library gem (in our case ddtrace) or the user's application, otherwise T::Sig and sig are undefined. They also preclude a downstream app to define a top-level T constant. This is whether or not the downstream user wants to use Sorbet because it's right in the source. To me this means that annotations would be a hard "no" for a gem author.

RBI files inherit from the annotation sig syntax, so they have to hinge upon a Ruby code structure to operate. So I found the end result a bit artificial and boilerplate-ish. It left a tingling feeling of duplicating code and annotating the duplication.

This section of the documentation about publishing RBIs in a gem for other to use, I could never make it work, both as a gem consumer and a gem provider. I must have done something wrong because it's advertised, but there is to my knowledge no other documentation that the linked paragraph, and I was at a loss (maybe there was a bug that has been fixed since).

This turned into Sorbet generating RBIs for depended-upon gems, and since it vendors RBIs, it quickly turns diffs into a huge ball of mud. And it's not even that helpful because return values are not handled by srb rbi gem.

Which ties into the whole autogenerating things. The hidden-definitions and todo.rbi can be quite annoying to handle, them and typed: false can be very obscure in their consequences. With Sorbet I found one could quickly get to a point where a lot of RBI files are generated, things pass, and it feels like progress, but unless one dives in and hand-types it's of misleading perceived value (by that I mean it looks like things are checked but they're not, which gives a false sense of confidence), with a tangible maintenance cost.

Another nitpick is that sorbet-static, a direct dependency of sorbet meant to perform checks with reasonable performance, is a native gem for x86_64-linux and x86_64-darwin, with no ruby platform to build from source as a fallback, and for which the x86_64-linux gem will install on Alpine, but the binary, being glibc, crashes on musl libc. This had me have some conditionals in the Gemfile for incompatible CI runs to pass.

Apologies if this feels negative! I do believe adding type checking is ultimately very valuable, even with the added signature hand-typing cost. That's why as an experiment I've started adding type information to AppSec using RBS and checking types with Steep.

ivoanjo · 2021-08-03T13:58:51Z

Thanks for the insight @lloeki . I agree with you in all points.

My thoughts is that right now, even it's very limited no-signatures mode, it provides a bit of value (after all it did catch a real bug) and it doesn't seem to add a lot of hassle. It's also quite easy to remove -- we just revert this PR, with no loss.

Is everyone ok with me adding it, with the knowledge that we're not locked into this solution and it's quite possible that we'll drop it and switch to steep at some point in the future?

(If so, I'll do a rebase and go ahead with the merge)

marcotc · 2021-08-03T22:20:32Z

@ivoanjo I think this PR is unintrusive enough that this can be merged. There is already basic type checking being enforced with it, which is great.

I think that our "final" solution for type enrichment will involve RBS files + yet-to-be-matured-tooling, but most of the type checking done here is translatable, even if not a direct 1:1 to our future path.

## TL;DR * Minimal bootstrapping for now. * Type checking is optional. Default is no type checking for files without a magic comment. Should have minor impact on people that don't want to touch the type checking. * `bundle exec srb tc` runs the type checking. * I've enabled type checking during CI runs ## What? While not a replacement for all our tests and other tooling, a type checker is a nice extra line of defense for catching issues that we otherwise may have missed, such as #1602. The Ruby ecosystem around type checking is still rather immature. You may have heard that Ruby 3 shipped "with type checking" but that's an oversimplification: it actually only shipped with a "language for describing types" and a "type analysis tool" (<https://www.ruby-lang.org/en/news/2020/12/25/ruby-3-0-0-released/>). You may have noticed that nowhere in the previous sentence the words "type checker" were included. So, Ruby 3 itself provides no type checker. In practice, there are two typecheckers in "wide" use for Ruby: * [Steep](https://github.com/soutaro/steep), created by a Ruby core team member that also worked on manny of the other type-related bits that made it to Ruby 3 (but steep itself was not made part of Ruby) * [Sorbet](https://sorbet.org/), a type checker in wide use at Stripe and Shopify This RFC PR does a first pass at onboarding Sorbet to our codebase. There's a number of limitations on this "first pass", which I document below. ## Why not Steep? I've tried onboarding our codebase to steep, and gave up due to two big issues: 1. Steep relies on the Ruby 3 rbs format, but unlike Sorbet, there doesn't seem to be any way of starting small on a mostly-untyped codebase without any `.rbs` files, and iterate from there. There are tools to generate skeleton `.rbs` files (the `rbs` gem has two different modes for it + the `typeprof` gem), but they had a hard time with ddtrace. 2. There is currently little documentation for the type checking errors, once Steep does run. Google gave me ZERO hits on one of them (I guess it doesn't index the tool repository at least). And they're not all that easy to understand/fix, so what use is having a tool tell you "this is wrong", if then you're stuck with little help on something that may not even be a real issue but more likely a tool limitation? For these reasons, I gave up on Steep (for this first iteration). ## Working with Sorbet Sorbet can be run with `bundle exec srb tc` (or `bundle exec rake typecheck`). There's also Language Server Protocol support, if your editor supports it. The key thing to know about Sorbet is that allows file-level granularity for enabling type checks, see <https://sorbet.org/docs/static#file-level-granularity-strictness-levels>. In practice, this means that at the top of each `.rb` file, we can provide a comment that controls the reporting of type checking errors for that file. The usual ones (described in the above link as well) are: `# typed: ignore`, `# typed: false`, `# typed: true`, `# typed: strict`, and `# typed: strong`. In many cases, **Sorbet can typecheck a file correctly with no extra type annotations**, so we can start enjoying its checks (such as finding #1602) by just enabling it for our files. This can be done with the `# typed: true`, `# typed: strict`, `# typed: strong` modes (see the docs for details). The `# typed: false` is the default for files with no comment. Thus, even newcomers into the codebase will not be forced to immediately deal with Sorbet for contributions, which given the current limitations of the tool, and level of maturity, I think is reasonable. (We can reconfigure this later, if needed). In this mode, Sorbet will only complain about syntax and missing constant errors. Our codebase requires quite a few `# typed: ignore` because or dependencies that are pulled via appraisals and thus not seen/available to Sorbet (among other things that we do and that Sorbet) doesn't quite like. These files are completely ignored by Sorbet. (Note that if you then reference something on one of these files from another file, that file will not typecheck either, so "ignore" becomes somewhat contagious, and is best avoided if possible). I've also added a CI step to run sorbet, which again follows the rules above so should be able to provide us with valuable insight without being an annoying source of build failures. ## Limitations or "I thought the whole point of this was for us to add type declarations to the code"? Now we get to the more unfortunate part of the news: I think that for now we don't have great options for doing this. Sorbet type annotations are supposed to be inline with the code, and work through the [`sorbet-runtime`] (https://rubygems.org/gems/sorbet-runtime) gem. Unfortunately, as described on <https://sorbet.org/docs/faq#what-platforms-does-sorbet-support>: > The sorbet-runtime gem is currently only tested on Ruby 2.6 and Ruby 2.7. It is known to not support Ruby 2.4. Feel free to report runtime issues for any current or future Ruby version. Thus, since we still want to support Ruby 2.1, that doesn't quite work for us. I thought we could perhaps have an empty replacement of the sorbet runtime gem, and it turns out that [Shopify already tried and abandoned it](https://github.com/Shopify/sorbet-runtime-stub). So, inline annotations seem to be out for us. For Sorbet, that leaves us with only one other option: `.rbi` files. These are files that allow Sorbet to get type information for otherwise untyped code, and are the solution to having type information for many common libraries. Unfortunately, using `.rbi` files for our own code is very error-prone, because Sorbet trusts `.rbi` files, and doesn't check them against the code. Thus, if we had a class `A` with method `m` declared both on our code, as well as on an `.rbi` file, and then we renamed `m` to `n` and forgot to update the `.rbi`, then Sorbet would not complain about it. Worse, if somewhere there was a call to `A#m` somewhere in the code, Sorbet would still think it was correct, while in reality it would break at run-time. ## The future? Because our usage of Sorbet is rather simple, we can at any point switch to Steep, or any other tool, or just disable it, if it's not providing value. So we are not locked at all into its usage. Hopefully in the near future we'll find a way to add type declarations to our library without impact to our support matrix of older Rubies. Even better, we could then ship those types to our customers, for them to check that they are using ddtrace APIs correctly!

Everything on this commit was 100% generated by Sorbet. Sorbet checked all our files and added the best `# typed: ...` comments it could for each (e.g. if it could type it, it added `true`, if it couldn't, it added `false`, etc). Included in here is also the `sorbet/` folder which is used to keep type information for our dependencies.

ivoanjo · 2021-08-04T08:57:28Z

Thanks Marco. Imma gonna press the big green button. Happy to revisit this at any time :)

**What does this PR do?**: This PR is spiritually a revert of #1607, when we added the Sorbet typechecker to dd-trace-rb. It includes two commits: One where we remove all configuration and scaffolding surrounding Sorbet, and one where we remove all of the `# typed: ...` magic comments and `include Kernel` definitions added to make Sorbet happy. **Motivation**: As documented in #2641, the team has decided that the value vs pain equation for Sorbet has shifted in the past months, and thus that it was time to remove Sorbet. **Additional Notes**: Sorbet type checking in CI was actually removed earlier this week in #2617. **How to test the change?**: CI should still be green.

**What does this PR do?**: This PR takes a page from my previous integration of Sorbet (#1607) by changing the type checking from opt-in to opt-out. AKA by default all files will be type checked, unless they are ignored via the `Steepfile`. My intention here is to make this a bit like the `.rubocop_todo.yml` file -- whatever is in, is in, but over time we can add more and more type signatures. I've also tweaked the type checker CI task to go through rake, and to print a nice error message to guide people that haven't used Steep before. I'm opening this as a an RFC as I'd like to know the team's thoughts on this approach. **Motivation**: Gently prod us to adopt more typechecking :) **Additional Notes**: Steep actually spotted a bug (#2671) while I was testing this change out. **How to test the change?**: Check that CI is still green!

ivoanjo requested a review from a team July 22, 2021 11:21

ivoanjo force-pushed the sorbet/everyone-gets-a-kernel branch from 6157908 to d02e043 Compare July 23, 2021 08:23

Base automatically changed from sorbet/everyone-gets-a-kernel to master July 26, 2021 08:01

ivoanjo force-pushed the sorbet/rfc-add-sorbet branch from c7029f2 to e8991b4 Compare July 26, 2021 08:32

ivoanjo marked this pull request as ready for review July 26, 2021 08:42

ivoanjo requested a review from lloeki July 26, 2021 08:43

marcotc reviewed Jul 28, 2021

View reviewed changes

marcotc approved these changes Jul 28, 2021

View reviewed changes

ivoanjo added 2 commits August 4, 2021 09:00

ivoanjo force-pushed the sorbet/rfc-add-sorbet branch from e8991b4 to 226aa70 Compare August 4, 2021 08:15

ivoanjo merged commit e391d2e into master Aug 4, 2021

ivoanjo deleted the sorbet/rfc-add-sorbet branch August 4, 2021 08:57

github-actions bot added this to the 0.52.0 milestone Aug 4, 2021

ivoanjo mentioned this pull request Feb 23, 2023

Remove sorbet #2641

Closed

ivoanjo mentioned this pull request Feb 24, 2023

Remove Sorbet typechecker #2647

Merged

ivoanjo mentioned this pull request Mar 7, 2023

RFC: Run steep typechecker on new files by default #2672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Add Sorbet typechecker to dd-trace-rb #1607

RFC: Add Sorbet typechecker to dd-trace-rb #1607

ivoanjo commented Jul 22, 2021 •

edited

marcotc Jul 28, 2021

ivoanjo Jul 29, 2021

lloeki Jul 29, 2021

marcotc left a comment

marcotc commented Jul 28, 2021

lloeki commented Aug 3, 2021

ivoanjo commented Aug 3, 2021 •

edited

marcotc commented Aug 3, 2021

ivoanjo commented Aug 4, 2021

RFC: Add Sorbet typechecker to dd-trace-rb #1607

RFC: Add Sorbet typechecker to dd-trace-rb #1607

Conversation

ivoanjo commented Jul 22, 2021 • edited

TL;DR

Notes for reviewing

What?

Why not Steep?

Working with Sorbet

Limitations or "I thought the whole point of this was for us to add type declarations to the code"?

The future?

marcotc Jul 28, 2021

Choose a reason for hiding this comment

ivoanjo Jul 29, 2021

Choose a reason for hiding this comment

lloeki Jul 29, 2021

Choose a reason for hiding this comment

marcotc left a comment

Choose a reason for hiding this comment

marcotc commented Jul 28, 2021

lloeki commented Aug 3, 2021

ivoanjo commented Aug 3, 2021 • edited

marcotc commented Aug 3, 2021

ivoanjo commented Aug 4, 2021

ivoanjo commented Jul 22, 2021 •

edited

ivoanjo commented Aug 3, 2021 •

edited