Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a spec-runner configuration that tests all of Ruby Core #1088

Merged
merged 1 commit into from
Feb 14, 2021

Conversation

lopopolo
Copy link
Member

@lopopolo lopopolo commented Feb 14, 2021

It runs to completion!

Passed 4500, skipped 1097, not implemented 774, failed 5188 specs.

Gist with the full error log: https://gist.github.com/lopopolo/d7f758a4482a64ede6bb8a77918baba6.

There are still a few segfaults in there, but as configured, the specs reliably execute to completion.

This change adds a self-referential Hash check to Hash#inspect to prevent a stack overflow.

This change modifes mrb_convert_type to prevent a recursive attempt to raise a TypeError when attempting to convert a type to String that doesn't implement #to_s. This was fixed upstream in mruby/mruby@01a8d84.

The all-core-specs.toml configuration enables all top-level specs in spec-runner/vendor/spec/core except:

  • IO (attempts to instantiate a binary Regexp that aborts the spec-runner)
  • Kernel (infinite loop somewhere in there)
  • Mutex
  • ObjectSpace
  • Process
  • Thread
  • ThreadGroup
  • TracePoint

This config also copies the permitted spec list from enforced-specs.toml for Array. Array#initialize specs have an exception that crashes the spec-runner.

It runs to completion!

```
Passed 4500, skipped 1097, not implemented 774, failed 5188 specs.
```

This change adds a self-referential Hash check to `Hash#inspect` to
prevent a stack overflow.

This change modifes `mrb_convert_type` to prevent a recursive attempt to
raise a `TypeError` when attempting to convert a type to `String` that
doesn't implement `#to_s`.

The `all-core-specs.toml` configuration enables all top-level specs in
`spec-runner/vendor/spec/core` except:

- `IO` (attempt to instantiate a binary `Regexp` that aborts the
  spec-runner.
- `Kernel` (infinite loop somewhere in there)
- `Mutex`
- `ObjectSpace`
- `Process`
- `Thread`
- `ThreadGroup`
- `TracePoint`

This config also copies the permitted spec list from
`enforced-specs.toml` for `Array`. `Array#initialize` specs have an
exception that crashes the spec-runner.
@lopopolo lopopolo added A-project Area: Infrastructure for running an open source project. A-vm Area: Interpreter VM implementations. A-ruby-core Area: Ruby Core types. A-spec Area: ruby/spec infrastructure and completeness. B-mruby Backend: Implementation of artichoke-core using mruby. labels Feb 14, 2021
@lopopolo
Copy link
Member Author

Hi @eregon 👋

This PR adds a configuration profile for Artichoke's spec runner that tests a wider swath or Ruby Core APIs.

I was curious if you'd be willing to add a note to https://eregon.me/blog/2020/06/27/ruby-spec-compatibility-report.html that points folks at this PR for a significant update.

@lopopolo lopopolo merged commit fe297f7 into trunk Feb 14, 2021
@lopopolo lopopolo deleted the spec-core-all branch February 14, 2021 20:28
@eregon
Copy link

eregon commented Feb 15, 2021

In the output above, does it mean 4500 spec examples (it) ran successfully, and then skipped/not implemented/failed are all added on top, so the sum of all of those is how many specs were attempted (all of core/ minus exclusions)?
There were 20841 core specs when I did the report, so I guess the exclusions above account for about half of the core specs.
A nice gain indeed if Artichoke now passes around 22% of core specs.

It'd be great to have this run in CI if possible, and ideally separate for language/core/library specs.

@lopopolo
Copy link
Member Author

In the output above, does it mean 4500 spec examples (it) ran successfully, and then skipped/not implemented/failed are all added on top, so the sum of all of those is how many specs were attempted (all of core/ minus exclusions)?

@eregon yes that's correct.

There were 20841 core specs when I did the report

Artichoke vendors the spec repo as of artichoke/spec@eebd7ab which might change the number of specs as well.

It'd be great to have this run in CI if possible, and ideally separate for language/core/library specs.

Yes this is something I'd like to do, but haven't gotten around to it. Linking #101 and #142.

@lopopolo
Copy link
Member Author

Tweaking the config a bit, I get these numbers

Passed 4669, skipped 1125, not implemented 873, failed 5335 specs.

but also a reliable segfault on interpreter teardown.

all-core-specs.toml
# This config file lists the ruby/specs that should pass for Artichoke Ruby.
#
# Valid values for `include` are:
#
# - all - run all specs. When `include` is set to `all`, the optional `skip`
#   field may list specs to skip.
# - none - run no specs, equivalent to the section not being present in this
#   config file.
# - `set` - run an enumerated set of specs. When `include` is set to `set`, the
#   set of specs must be listed in the required `specs` field, which is a list
#   of strings.

## Ruby Core

[specs.core.argf]
include = "all"

[specs.core.array]
include = "all"
skip = [
  "initialize",
  "new",
  "pack",
]

[specs.core.basicobject]
include = "all"

[specs.core.binding]
include = "all"

[specs.core.builtin_constants]
include = "all"

[specs.core.class]
include = "all"

[specs.core.comparable]
include = "all"

[specs.core.complex]
include = "all"

[specs.core.dir]
include = "all"

[specs.core.encoding]
include = "all"

[specs.core.enumerable]
include = "all"

[specs.core.enumerator]
include = "all"

[specs.core.env]
include = "all"

[specs.core.exception]
include = "all"

[specs.core.false]
include = "all"

[specs.core.fiber]
include = "all"

[specs.core.file]
include = "all"

[specs.core.filetest]
include = "all"

[specs.core.float]
include = "all"

[specs.core.gc]
include = "all"

[specs.core.hash]
include = "all"

[specs.core.integer]
include = "all"

[specs.core.io]
include = "none"

[specs.core.kernel]
include = "none"

[specs.core.main]
include = "all"

[specs.core.marshal]
include = "all"

[specs.core.matchdata]
include = "all"

[specs.core.math]
include = "all"

[specs.core.method]
include = "all"

[specs.core.module]
include = "all"

[specs.core.mutex]
include = "none"

[specs.core.nil]
include = "all"

[specs.core.numeric]
include = "all"

[specs.core.objectspace]
include = "none"

[specs.core.proc]
include = "all"

[specs.core.process]
include = "none"

[specs.core.queue]
include = "all"

[specs.core.random]
include = "all"

[specs.core.range]
include = "all"

[specs.core.rational]
include = "all"

[specs.core.regexp]
include = "all"

[specs.core.signal]
include = "all"

[specs.core.sizedqueue]
include = "all"

[specs.core.string]
include = "all"

[specs.core.struct]
include = "all"

[specs.core.symbol]
include = "all"
skip = [
  "element_reference",
]

[specs.core.systemexit]
include = "all"

[specs.core.thread]
include = "none"

[specs.core.threadgroup]
include = "none"

[specs.core.time]
include = "all"

[specs.core.tracepoint]
include = "none"

[specs.core.true]
include = "all"

[specs.core.unboundmethod]
include = "all"

[specs.core.warning]
include = "all"

FWIW, this is why I don't like prioritizing getting the spec-runner in CI. I don't like chasing segfaults and I'd rather spend my time working toward removing Artichoke's dependency on mruby altogether instead of making the FFI bits less segfaulty.

@eregon
Copy link

eregon commented Feb 21, 2021

I added a sentence in the blog post:

Update (February 2021): Artichoke now passes an estimated 22% of core library specs when trying to run all core specs with this PR, however those are not run in CI and might regress.

I think all other Rubies actually run specs in CI (important to avoid regressions), and in that blog post I relied on Ruby implementations untagging passing specs rather than trying to run all specs blindly and ignore errors, so I wanted to clarify how these numbers differ from the rest.

@lopopolo
Copy link
Member Author

Thanks @eregon!

I know Artichoke doesn't integrate with ruby/spec and MSpec the way it's intended.

It's still early days for Artichoke; when the time is right, these bits of polish will get prioritized.

@lopopolo
Copy link
Member Author

I think all other Rubies actually run specs in CI (important to avoid regressions), and in that blog post I relied on Ruby implementations untagging passing specs rather than trying to run all specs blindly and ignore errors, so I wanted to clarify how these numbers differ from the rest.

Hi @eregon 👋

I'm excited to share that Artichoke runs ruby/spec on every commit as of #1328, #1329, #1330, #1331, #1332, #1333.

I've filed ruby/spec#844 to reflect this upstream in the ruby/spec README. 🚀

Artichoke doesn't have File so I've kinda had to roll my own tagging. You can see the current set of tags here: https://github.com/artichoke/spec-state/blob/trunk/reports/tagged/latest.json.

@eregon
Copy link

eregon commented Aug 18, 2021

That's great!
Is there a place I can look at to easily find out how many language, core, and library are passing?
That would be quite useful for the next time I do an update like https://eregon.me/blog/2020/06/27/ruby-spec-compatibility-report.html

@lopopolo
Copy link
Member Author

Is there a place I can look at to easily find out how many language, core, and library are passing?

Not yet. I'm planning on turning the raw data into a webpage at some point.

The suite currently only runs a subset of core specs. I'm working on being able to expand this to include some library specs #1325, #1327.

I've never run any language specs before, but I imagine getting them running wouldn't be too tricky.

Note: this isn't my biggest priority, but I am excited that this bit of infrastructure is in place to do things automatically. The hardest part is done I think 😄

lopopolo added a commit that referenced this pull request Nov 27, 2021
Implement `String` in Rust

**Implement the `String` class from Ruby core in Rust with `spinoso-string` in `artichoke-backend`.** 💎 🎉 💎 🎉 💎

[All 144 APIs available on `String`][ruby-core-string] in MRI 3.0.2 are either implemented or stubbed to raise `NotImplementedError`.

[ruby-core-string]: https://ruby-doc.org/core-3.0.2/String.html

## Design

Artichoke's `String` is based on [`spinoso_string::String`[spinoso-string-doc]. `String` is a byte buffer and associated encoding. `String` supports UTF-8, ASCII, and binary encodings. Encodings are conventional (a UTF-8 `String` may contain invalid UTF-8 byte sequences, an ASCII `String` may contain bytes greater than `0x7F`).

Character oriented APIs like `String::center`, `String::get_char` and `String::char_len` are implemented according to the `String`'s conventional encoding.

Unlike `String` in MRI, Artichoke's `String` does not cache whether a `String`'s encoding is valid, nor does it cache whether the `String` is `ascii_only`. These enhancements are expected to come in a future set of changes. I've [discussed how I would like to restructure `String`'s internals][twitter-string-nirvana] to make this happen on Twitter

`spinoso-string` uses state of the art Rust dependencies for encoding-oriented operations with first-class SIMD implementations. `spinoso-string` is built on top of `bstr` (ASCII and UTF-8 decoding, `find`-oriented APIs), `simdutf8` (UTF-8 validity checks), and `bytecount` (UTF-8 character length).

[spinoso-string-doc]: https://artichoke.github.io/artichoke/spinoso_string/struct.String.html
[twitter-string-nirvana]: https://twitter.com/artichokeruby/status/1464117285002440707

### mruby FFI Compatibility

Much like earlier work on `Array`, the `MRB_API` functions for strings are used heavily in the VM in addition to being exposed to Ruby code. The mruby VM treats strings as memory managed byte buffers and uses them extensively in its internals.

It is not sufficient to `#ifdef` out the C API since it is used outside of the `string.c` compilation unit.

Artichoke includes several reimplementations of mruby C APIs, mostly in Rust and some in the `mruby-sys` C extension.

During the development of these FFI compat functions, I encountered 1 segfault and 1 buffer overrun, both documented in the commits that address them:

- 7a13a30
- 279c698

See https://twitter.com/artichokeruby/status/1463024367948812293 for some more color around these bugs.

## API Coverage

Artichoke implements the following String APIs in mostly native code with some pure Ruby implementations and the `mruby-pack` mrbgem:

### Native Rust Implementations

- `String#*`
- `String#+`
- `String#<<`
- `String#<=>`
- `String#==`
- `String#[]`
- `String#[]=` (stubbed, raises `NotImplementedError`)
- `String#ascii_only?`
- `String#b`
- `String#bytes`
- `String#bytesize`
- `String#byteslice`
- `String#capitalize`
- `String#capitalize!`
- `String#casecmp`
- `String#casecmp?`
- `String#center`
- `String#chars`
- `String#chomp`
- `String#chomp!`
- `String#chop`
- `String#chop!`
- `String#chr`
- `String#clear`
- `String#codepoints`
- `String#concat` (stubbed, raises `NotImplementedError`)
- `String#downcase`
- `String#downcase!`
- `String#empty?`
- `String#eql?`
- `String#getbyte`
- `String#hash`
- `String#include?`
- `String#index` (only supports `String` patterns)
- `String#initialize`
- `String#initialize_copy`
- `String#inspect`
- `String#intern`
- `String#length`
- `String#ord`
- `String#replace`
- `String#reverse`
- `String#reverse!`
- `String#rindex` (only supports `String` patterns)
- `String#scan`
- `String#setbyte`
- `String#size`
- `String#slice`
- `String#slice!` (stubbed, raises `NotImplementedError`)
- `String#to_f` (stubbed, raises `NotImplementedError`)
- `String#to_i`
- `String#to_s`
- `String#to_sym`
- `String#upcase`
- `String#upcase!`
- `String#valid_encoding?`

### Pure Ruby Implementations

- `String::try_convert`
- `String#%`
- `String#+@`
- `String#-@`
- `String#/`
- `String#===`
- `String#=~`
- `String#count` (stubbed, raises `NotImplementedError`)
- `String#crypt` (stubbed, raises `NotImplementedError`, not intended to be implemented for security considerations)
- `String#delete`
- `String#delete!`
- `String#delete_prefix`
- `String#delete_suffix`
- `String#delete_suffix!`
- `String#dump` (stubbed, raises `NotImplementedError`)
- `String#each_byte`
- `String#each_char`
- `String#each_codepoint`
- `String#each_grapheme_cluster` (stubbed, raises `NotImplementedError`)
- `String#each_line`
- `String#encode` (stubbed, pending completion of #183)
- `String#encode!` (stubbed, pending completion of #183)
- `String#encoding` (stubbed, pending completion of #183)
- `String#end_with?`
- `String#force_encoding` (stubbed, pending completion of #183)
- `String#grapheme_clusters` (stubbed, raises `NotImplementedError`)
- `String#gsub`
- `String#gsub!`
- `String#hex` (stubbed, raises `NotImplementedError`)
- `String#insert`
- `String#lines`
- `String#ljust`
- `String#lstrip`
- `String#lstrip!`
- `String#match`
- `String#match?`
- `String#next` (stubbed, raises `NotImplementedError`)
- `String#next!` (stubbed, raises `NotImplementedError`)
- `String#oct` (stubbed, raises `NotImplementedError`)
- `String#partition`
- `String#prepend`
- `String#rjust`
- `String#rpartition` (stubbed, raises `NotImplementedError`)
- `String#rstrip`
- `String#rstrip!`
- `String#scrub` (stubbed, raises `NotImplementedError`, this should be implemented in native code)
- `String#scrub!` (stubbed, raises `NotImplementedError`, this should be implemented in native code)
- `String#split` (partial implementation, buggy, does not support block arguments)
- `String#squeeze` (partial implementation, does not support character set arguments)
- `String#squeeze!` (partial implementation, does not support character set arguments)
- `String#start_with?`
- `String#strip`
- `String#strip!`
- `String#sub`
- `String#sub!`
- `String#succ` (stubbed, raises `NotImplementedError`)
- `String#succ!` (stubbed, raises `NotImplementedError`)
- `String#sum` (stubbed, raises `NotImplementedError`)
- `String#swapcase` (stubbed, raises `NotImplementedError`)
- `String#swapcase!` (stubbed, raises `NotImplementedError`)
- `String#to_a`
- `String#to_c` (stubbed, raises `NotImplementedError`, Artichoke does not support `Complex` numbers)
- `String#to_r` (stubbed, raises `NotImplementedError`, Artichoke does not support `Rational` numbers)
- `String#to_str`
- `String#tr`
- `String#tr!`
- `String#tr_s`
- `String#tr_s!`
- `String#undump` (stubbed, raises `NotImplementedError`)
- `String#unicode_normalize` (stubbed, raises `NotImplementedError`)
- `String#unicode_normalize!` (stubbed, raises `NotImplementedError`)
- `String#unicode_normalized?` (stubbed, raises `NotImplementedError`)
- `String#upto`

### mruby mrbgem C extension

- `String#unpack`
- `String#unpack1`

## ruby/spec compliance

This PR improves ruby/spec compliance for Artichoke 📈 🚀 

On current trunk, when running `rake spec`, Artichoke has the following stats:

```
Passed 1837, skipped 238, not implemented 14, failed 0 specs.
```

On the tip of #1222, when running `rake spec`, Artichoke has the following stats:

```
Passed 1845, skipped 238, not implemented 14, failed 0 specs.
```

This PR additionally allows for `SecureRandom#random_bytes` specs to pass since the returned `String` can properly declare their binary encoding. Previously, the returned `String`s had UTF-8 encoding which meant that if the random bytes happened to contain a multi-byte UTF-8 character byte sequence, the reported length would be less than `bytesize`. See:

- fac36b9

## Future Work

- Implement `String#[]=`.
- Add fast paths when `String`s of any encoding are valid per their encoding.
- Add fast paths when `String`s of any encoding are ASCII only.
- Remove dependency on `mruby-pack`.
- Expose `Encoding` in Ruby, see #183.
- Use encoding-aware `String` constructors in native APIs where possible/required per ruby/spec.

## Linked Issues

Closes #184.
Partially addresses #183.

These PRs were foundational to getting this PR to a mergeable state:

- #1012, thanks @ekroon 
- #1045
- #1047
- #1048
- #1049
- #1051
- #1088
- #1141
- #1271
- #1348
- #1349
- #1353
- #1357
- #1402
- #1403
- #1449
- #1450 (comment) and #1510, thanks @rurban
- #1451
- #1488, thanks @masonforest 
- #1489
- #1490
- #1491
- #1505, thanks @briankung 
- #1512
- #1513
- #1514
- #1515
- #1516
- #1517
- #1519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-project Area: Infrastructure for running an open source project. A-ruby-core Area: Ruby Core types. A-spec Area: ruby/spec infrastructure and completeness. A-vm Area: Interpreter VM implementations. B-mruby Backend: Implementation of artichoke-core using mruby.
Development

Successfully merging this pull request may close these issues.

2 participants