Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add Sorbet typechecker to dd-trace-rb #1607

Merged
merged 2 commits into from
Aug 4, 2021
Merged

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Jul 22, 2021

TL;DR

  • Minimal bootstrapping for now.
  • Type checking is optional. Default is no type checking for files without a magic comment. Should have minor impact on people that don't want to touch the type checking.
  • bundle exec srb tc runs the type checking.
  • I've enabled type checking during CI runs

Notes for reviewing

  • This PR has two commits -- one that has all of the setup and relevant things, and another which is 100% the result of running bundle exec srb init. I strongly suggest reviewing them separately, for added sanity. (If you want to quickly glance through the second commit -- your terminal won't choke on the diff like github does 😉 )
  • This PR depends on Sorbet: Include Kernel module to disambiguate from usage in BasicObject  #1605, and CI is failing for unrelated reasons. Thus, I've opened it as a draft until both of them are cleared and the runway is clear. It otherwise is ready for review/discussion. Now updated.

What?

While not a replacement for all our tests and other tooling, a type checker is a nice extra line of defense for catching issues that we otherwise may have missed, such as #1602.

The Ruby ecosystem around type checking is still rather immature. You may have heard that Ruby 3 shipped "with type checking" but that's an oversimplification: it actually only shipped with a "language for describing types" and a "type analysis tool" (https://www.ruby-lang.org/en/news/2020/12/25/ruby-3-0-0-released/).
You may have noticed that nowhere in the previous sentence the words "type checker" were included.

So, Ruby 3 itself provides no type checker. In practice, there are two typecheckers in "wide" use for Ruby:

  • Steep, created by a Ruby core team member that also worked on manny of the other type-related bits that made it to Ruby 3 (but steep itself was not made part of Ruby)
  • Sorbet, a type checker in wide use at Stripe and Shopify

This RFC PR does a first pass at onboarding Sorbet to our codebase. There's a number of limitations on this "first pass", which I document below.

Why not Steep?

I've tried onboarding our codebase to steep, and gave up due to two big issues:

  1. Steep relies on the Ruby 3 rbs format, but unlike Sorbet, there doesn't seem to be any way of starting small on a mostly-untyped codebase without any .rbs files, and iterate from there. There are tools to generate skeleton .rbs files (the rbs gem has two different modes for it + the typeprof gem), but they had a hard time with ddtrace.

  2. There is currently little documentation for the type checking errors, once Steep does run. Google gave me ZERO hits on one of them (I guess it doesn't index the tool repository at least). And they're not all that easy to understand/fix, so what use is having a tool tell you "this is wrong", if then you're stuck with little help on something that may not even be a real issue but more likely a tool limitation?

For these reasons, I gave up on Steep (for this first iteration).

Working with Sorbet

Sorbet can be run with bundle exec srb tc (or bundle exec rake typecheck). There's also Language Server Protocol support, if your editor supports it.

The key thing to know about Sorbet is that allows file-level granularity for enabling type checks, see
https://sorbet.org/docs/static#file-level-granularity-strictness-levels.

In practice, this means that at the top of each .rb file, we can provide a comment that controls the reporting of type checking errors for that file. The usual ones (described in the above link as well) are:
# typed: ignore, # typed: false, # typed: true, # typed: strict, and # typed: strong.

In many cases, Sorbet can typecheck a file correctly with no extra type annotations, so we can start enjoying its checks (such as finding #1602) by just enabling it for our files. This can be done with the # typed: true, # typed: strict, # typed: strong modes (see the docs for details).

The # typed: false is the default for files with no comment. Thus, even newcomers into the codebase will not be forced to immediately deal with Sorbet for contributions, which given the current limitations of the tool, and level of maturity, I think is reasonable. (We can reconfigure this later, if needed). In this mode, Sorbet will only complain about syntax and missing constant errors.

Our codebase requires quite a few # typed: ignore because or dependencies that are pulled via appraisals and thus not seen/available to Sorbet (among other things that we do and that Sorbet) doesn't quite like. These files are completely ignored by Sorbet. (Note that if you then reference something on one of these files from another file, that file will not typecheck either, so "ignore" becomes somewhat contagious, and is best avoided if possible).

I've also added a CI step to run sorbet, which again follows the rules above so should be able to provide us with valuable insight without being an annoying source of build failures.

Limitations or "I thought the whole point of this was for us to add type declarations to the code"?

Now we get to the more unfortunate part of the news: I think that for now we don't have great options for doing this.

Sorbet type annotations are supposed to be inline with the code, and work through the sorbet-runtime gem.

Unfortunately, as described on https://sorbet.org/docs/faq#what-platforms-does-sorbet-support:

The sorbet-runtime gem is currently only tested on Ruby 2.6 and Ruby 2.7. It is known to not support Ruby 2.4. Feel free to report runtime issues for any current or future Ruby version.

Thus, since we still want to support Ruby 2.1, that doesn't quite work for us. I thought we could perhaps have an empty replacement of the sorbet runtime gem, and it turns out that Shopify already tried and abandoned it. So, inline annotations seem to be out for us.

For Sorbet, that leaves us with only one other option: .rbi files.
These are files that allow Sorbet to get type information for otherwise untyped code, and are the solution to having type information for many common libraries.

Unfortunately, using .rbi files for our own code is very error-prone, because Sorbet trusts .rbi files, and doesn't check them against the code. Thus, if we had a class A with method m declared both on our code, as well as on an .rbi file, and then we renamed m to n and forgot to update the .rbi, then Sorbet would not complain about it.
Worse, if somewhere there was a call to A#m somewhere in the code, Sorbet would still think it was correct, while in reality it would break at run-time.

The future?

Because our usage of Sorbet is rather simple, we can at any point switch to Steep, or any other tool, or just disable it, if it's not providing value. So we are not locked at all into its usage.

Hopefully in the near future we'll find a way to add type declarations to our library without impact to our support matrix of older Rubies.

Even better, we could then ship those types to our customers, for them to check that they are using ddtrace APIs correctly!

@ivoanjo ivoanjo requested a review from a team July 22, 2021 11:21
@ivoanjo ivoanjo force-pushed the sorbet/everyone-gets-a-kernel branch from 6157908 to d02e043 Compare July 23, 2021 08:23
Base automatically changed from sorbet/everyone-gets-a-kernel to master July 26, 2021 08:01
@ivoanjo ivoanjo marked this pull request as ready for review July 26, 2021 08:42
@ivoanjo ivoanjo requested a review from lloeki July 26, 2021 08:43
Comment on lines +1 to +2
--dir
.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as --dir .?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! This file was originally generated by sorbet (I tweaked it), but this was left untouched. I tried moving it to the same line to clarify but their arg parser seems confused:

ivo.anjo@macieira:~/datadog/dd-trace-rb$ be srb tc
No errors! Great job.

# <-- changed file

ivo.anjo@macieira:~/datadog/dd-trace-rb$ git diff
diff --git a/sorbet/config b/sorbet/config
index e9909983..8106c29c 100644
--- a/sorbet/config
+++ b/sorbet/config
@@ -1,3 +1,2 @@
---dir
-.
+--dir .
 --ignore=/integration
ivo.anjo@macieira:~/datadog/dd-trace-rb$ be srb tc
Argument ‘--dir .’ starts with a - but has incorrect syntax. To see all available options pass `--help`.
ivo.anjo@macieira:~/datadog/dd-trace-rb$ rm sorbet/config
ivo.anjo@macieira:~/datadog/dd-trace-rb$ be srb tc --dir . --ignore=/integration
No errors! Great job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no real parser here, this kind of file is taking one arg per line, so on one line it'd be like (bash) sorbet "--dir ." and not sorbet "--dir" "."

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No errors! Great job.

@marcotc
Copy link
Member

marcotc commented Jul 28, 2021

My browser has suffered literal physical pain while loading all the files in this PR, my laptop is hot hot hot

@lloeki
Copy link
Contributor

lloeki commented Aug 3, 2021

I've had some experience using Sorbet, not an expert by any stretch (anecdata of sample size == 1). I really wanted to like it, but I found it to be quite awkward, even more so for libraries to be published as gems. Although some issues here seem to come from the design itself, to be fair my last "life-size" try with Sorbet dates back around Feb 2019.

To use Sorbet annotations one has to add calls to a sig method (coming from extend T::Sig) in the source. Therefore Sorbet tries really hard to fit a signature description language within the Ruby syntax, because it has to, to allow Ruby to parse the sigs even if it never evaluates them.

Adding the annotations to a library means that Sorbet becomes a mandatory runtime dependency of either the library gem (in our case ddtrace) or the user's application, otherwise T::Sig and sig are undefined. They also preclude a downstream app to define a top-level T constant. This is whether or not the downstream user wants to use Sorbet because it's right in the source. To me this means that annotations would be a hard "no" for a gem author.

RBI files inherit from the annotation sig syntax, so they have to hinge upon a Ruby code structure to operate. So I found the end result a bit artificial and boilerplate-ish. It left a tingling feeling of duplicating code and annotating the duplication.

This section of the documentation about publishing RBIs in a gem for other to use, I could never make it work, both as a gem consumer and a gem provider. I must have done something wrong because it's advertised, but there is to my knowledge no other documentation that the linked paragraph, and I was at a loss (maybe there was a bug that has been fixed since).

This turned into Sorbet generating RBIs for depended-upon gems, and since it vendors RBIs, it quickly turns diffs into a huge ball of mud. And it's not even that helpful because return values are not handled by srb rbi gem.

Which ties into the whole autogenerating things. The hidden-definitions and todo.rbi can be quite annoying to handle, them and typed: false can be very obscure in their consequences. With Sorbet I found one could quickly get to a point where a lot of RBI files are generated, things pass, and it feels like progress, but unless one dives in and hand-types it's of misleading perceived value (by that I mean it looks like things are checked but they're not, which gives a false sense of confidence), with a tangible maintenance cost.

Another nitpick is that sorbet-static, a direct dependency of sorbet meant to perform checks with reasonable performance, is a native gem for x86_64-linux and x86_64-darwin, with no ruby platform to build from source as a fallback, and for which the x86_64-linux gem will install on Alpine, but the binary, being glibc, crashes on musl libc. This had me have some conditionals in the Gemfile for incompatible CI runs to pass.

Apologies if this feels negative! I do believe adding type checking is ultimately very valuable, even with the added signature hand-typing cost. That's why as an experiment I've started adding type information to AppSec using RBS and checking types with Steep.

@ivoanjo
Copy link
Member Author

ivoanjo commented Aug 3, 2021

Thanks for the insight @lloeki . I agree with you in all points.

My thoughts is that right now, even it's very limited no-signatures mode, it provides a bit of value (after all it did catch a real bug) and it doesn't seem to add a lot of hassle. It's also quite easy to remove -- we just revert this PR, with no loss.

Is everyone ok with me adding it, with the knowledge that we're not locked into this solution and it's quite possible that we'll drop it and switch to steep at some point in the future?

(If so, I'll do a rebase and go ahead with the merge)

@marcotc
Copy link
Member

marcotc commented Aug 3, 2021

@ivoanjo I think this PR is unintrusive enough that this can be merged. There is already basic type checking being enforced with it, which is great.

I think that our "final" solution for type enrichment will involve RBS files + yet-to-be-matured-tooling, but most of the type checking done here is translatable, even if not a direct 1:1 to our future path.

 ## TL;DR

* Minimal bootstrapping for now.
* Type checking is optional. Default is no type checking for files
  without a magic comment. Should have minor impact on people that
  don't want to touch the type checking.
* `bundle exec srb tc` runs the type checking.
* I've enabled type checking during CI runs

 ## What?

While not a replacement for all our tests and other tooling, a type
checker is a nice extra line of defense for catching issues that we
otherwise may have missed, such as #1602.

The Ruby ecosystem around type checking is still rather immature. You
may have heard that Ruby 3 shipped "with type checking" but that's an
oversimplification: it actually only shipped with a "language for
describing types" and a "type analysis tool"
(<https://www.ruby-lang.org/en/news/2020/12/25/ruby-3-0-0-released/>).
You may have noticed that nowhere in the previous sentence the words
"type checker" were included.

So, Ruby 3 itself provides no type checker. In practice, there are two
typecheckers in "wide" use for Ruby:

* [Steep](https://github.com/soutaro/steep), created by a Ruby core team
  member that also worked on manny of the other type-related bits that
  made it to Ruby 3 (but steep itself was not made part of Ruby)
* [Sorbet](https://sorbet.org/), a type checker in wide use at Stripe
  and Shopify

This RFC PR does a first pass at onboarding Sorbet to our codebase.
There's a number of limitations on this "first pass", which I document
below.

 ## Why not Steep?

I've tried onboarding our codebase to steep, and gave up due to two big
issues:

1. Steep relies on the Ruby 3 rbs format, but unlike Sorbet, there
doesn't seem to be any way of starting small on a mostly-untyped
codebase without any `.rbs` files, and iterate from there. There are
tools to generate skeleton `.rbs` files (the `rbs` gem has two
different modes for it + the `typeprof` gem), but they had a hard time
with ddtrace.

2. There is currently little documentation for the type checking errors,
once Steep does run. Google gave me ZERO hits on one of them (I guess
it doesn't index the tool repository at least). And they're not all
that easy to understand/fix, so what use is having a tool tell
you "this is wrong", if then you're stuck with little help on something
that may not even be a real issue but more likely a tool limitation?

For these reasons, I gave up on Steep (for this first iteration).

 ## Working with Sorbet

Sorbet can be run with `bundle exec srb tc` (or `bundle exec rake
typecheck`). There's also Language Server Protocol support, if your
editor supports it.

The key thing to know about Sorbet is that allows file-level granularity
for enabling type checks, see
<https://sorbet.org/docs/static#file-level-granularity-strictness-levels>.

In practice, this means that at the top of each `.rb` file, we can
provide a comment that controls the reporting of type checking errors
for that file. The usual ones (described in the above link as well) are:
`# typed: ignore`, `# typed: false`, `# typed: true`, `# typed: strict`,
and `# typed: strong`.

In many cases, **Sorbet can typecheck a file correctly with no extra
type annotations**, so we can start enjoying its checks (such as finding
 #1602) by just enabling it for our files. This can be done with the `#
 typed: true`, `# typed: strict`, `# typed: strong` modes (see the docs
 for details).

The `# typed: false` is the default for files with no comment. Thus,
even newcomers into the codebase will not be forced to immediately deal
with Sorbet for contributions, which given the current limitations of
the tool, and level of maturity, I think is reasonable. (We can
reconfigure this later, if needed). In this mode, Sorbet will only
complain about syntax and missing constant errors.

Our codebase requires quite a few `# typed: ignore` because or
dependencies that are pulled via appraisals and thus not seen/available
to Sorbet (among other things that we do and that Sorbet) doesn't quite
like. These files are completely ignored by Sorbet. (Note that if you
then reference something on one of these files from another file, that
file will not typecheck either, so "ignore" becomes somewhat contagious,
and is best avoided if possible).

I've also added a CI step to run sorbet, which again follows the rules
above so should be able to provide us with valuable insight without
being an annoying source of build failures.

 ## Limitations or "I thought the whole point of this was for us to
   add type declarations to the code"?

Now we get to the more unfortunate part of the news: I think that for
now we don't have great options for doing this.

Sorbet type annotations are supposed to be inline with the code, and
work through the [`sorbet-runtime`]
(https://rubygems.org/gems/sorbet-runtime) gem.

Unfortunately, as described on
<https://sorbet.org/docs/faq#what-platforms-does-sorbet-support>:
> The sorbet-runtime gem is currently only tested on Ruby 2.6 and Ruby
  2.7. It is known to not support Ruby 2.4. Feel free to report runtime
  issues for any current or future Ruby version.

Thus, since we still want to support Ruby 2.1, that doesn't quite work
for us. I thought we could perhaps have an empty replacement of the
sorbet runtime gem, and it turns out that [Shopify already tried and
abandoned it](https://github.com/Shopify/sorbet-runtime-stub). So,
inline annotations seem to be out for us.

For Sorbet, that leaves us with only one other option: `.rbi` files.
These are files that allow Sorbet to get type information for otherwise
untyped code, and are the solution to having type information for many
common libraries.

Unfortunately, using `.rbi` files for our own code is very error-prone,
because Sorbet trusts `.rbi` files, and doesn't check them against the
code. Thus, if we had a class `A` with method `m` declared both on our
code, as well as on an `.rbi` file, and then we renamed `m` to `n` and
forgot to update the `.rbi`, then Sorbet would not complain about it.
Worse, if somewhere there was a call to `A#m` somewhere in the code,
Sorbet would still think it was correct, while in reality it would
break at run-time.

 ## The future?

Because our usage of Sorbet is rather simple, we can at any point
switch to Steep, or any other tool, or just disable it, if it's not
providing value. So we are not locked at all into its usage.

Hopefully in the near future we'll find a way to add type declarations
to our library without impact to our support matrix of older Rubies.

Even better, we could then ship those types to our customers, for them
to check that they are using ddtrace APIs correctly!
Everything on this commit was 100% generated by Sorbet.

Sorbet checked all our files and added the best `# typed: ...`
comments it could for each (e.g. if it could type it, it added
`true`, if it couldn't, it added `false`, etc).

Included in here is also the `sorbet/` folder which is used
to keep type information for our dependencies.
@ivoanjo
Copy link
Member Author

ivoanjo commented Aug 4, 2021

Thanks Marco. Imma gonna press the big green button. Happy to revisit this at any time :)

@ivoanjo ivoanjo merged commit e391d2e into master Aug 4, 2021
@ivoanjo ivoanjo deleted the sorbet/rfc-add-sorbet branch August 4, 2021 08:57
@github-actions github-actions bot added this to the 0.52.0 milestone Aug 4, 2021
@ivoanjo ivoanjo mentioned this pull request Feb 23, 2023
ivoanjo added a commit that referenced this pull request Feb 24, 2023
**What does this PR do?**:

This PR is spiritually a revert of #1607, when we added the Sorbet
typechecker to dd-trace-rb.

It includes two commits: One where we remove all configuration
and scaffolding surrounding Sorbet, and one where we remove all of the
`# typed: ...` magic comments and `include Kernel` definitions added
to make Sorbet happy.

**Motivation**:

As documented in #2641, the team has decided that the value vs pain
equation for Sorbet has shifted in the past months, and thus that
it was time to remove Sorbet.

**Additional Notes**:

Sorbet type checking in CI was actually removed earlier this week in
 #2617.

**How to test the change?**:

CI should still be green.
ivoanjo added a commit that referenced this pull request Mar 7, 2023
**What does this PR do?**:

This PR takes a page from my previous integration of Sorbet (#1607) by
changing the type checking from opt-in to opt-out.

AKA by default all files will be type checked, unless they are ignored
via the `Steepfile`. My intention here is to make this a bit like
the `.rubocop_todo.yml` file -- whatever is in, is in, but over time
we can add more and more type signatures.

I've also tweaked the type checker CI task to go through rake, and to
print a nice error message to guide people that haven't used Steep
before.

I'm opening this as a an RFC as I'd like to know the team's thoughts
on this approach.

**Motivation**:

Gently prod us to adopt more typechecking :)

**Additional Notes**:

Steep actually spotted a bug (#2671) while I was testing this change
out.

**How to test the change?**:

Check that CI is still green!
ivoanjo added a commit that referenced this pull request Mar 7, 2023
**What does this PR do?**:

This PR takes a page from my previous integration of Sorbet (#1607) by
changing the type checking from opt-in to opt-out.

AKA by default all files will be type checked, unless they are ignored
via the `Steepfile`. My intention here is to make this a bit like
the `.rubocop_todo.yml` file -- whatever is in, is in, but over time
we can add more and more type signatures.

I've also tweaked the type checker CI task to go through rake, and to
print a nice error message to guide people that haven't used Steep
before.

I'm opening this as a an RFC as I'd like to know the team's thoughts
on this approach.

**Motivation**:

Gently prod us to adopt more typechecking :)

**Additional Notes**:

Steep actually spotted a bug (#2671) while I was testing this change
out.

**How to test the change?**:

Check that CI is still green!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants