Skip to content

Commit 93a11b4

Browse files
committed
Update README.md
1 parent e600686 commit 93a11b4

File tree

1 file changed

+44
-30
lines changed

1 file changed

+44
-30
lines changed

README.md

Lines changed: 44 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ A Ruby gem for tokenizing, parsing, and transforming regular expressions.
99

1010
* Multilayered
1111
* A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
12-
* A lexer that produces a "stream" of token objects.
13-
* A parser that produces a "tree" of Expression objects (OO API)
12+
* A lexer that produces a "stream" of [Token objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
13+
* A parser that produces a "tree" of [Expression objects (OO API)](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
1414
* Runs on Ruby 2.x, 3.x and JRuby runtimes
1515
* Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
1616

@@ -36,14 +36,15 @@ Or, add it to your project's `Gemfile`:
3636

3737
```gem 'regexp_parser', '~> X.Y.Z'```
3838

39-
See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
39+
See the badge at the top of this README or [rubygems](https://rubygems.org/gems/regexp_parser)
40+
for the the latest version number.
4041

4142

4243
---
4344
## Usage
4445

4546
The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
46-
provides a single method that takes a regular expression (as a RegExp object or
47+
provides a single method that takes a regular expression (as a Regexp object or
4748
a string) and returns its results. The **Lexer** and the **Parser** accept an
4849
optional second argument that specifies the syntax version, like 'ruby/2.0',
4950
which defaults to the host Ruby version (using RUBY_VERSION).
@@ -101,7 +102,7 @@ start/end offsets for each token found.
101102
```ruby
102103
require 'regexp_parser'
103104

104-
Regexp::Scanner.scan /(ab?(cd)*[e-h]+)/ do |type, token, text, ts, te|
105+
Regexp::Scanner.scan(/(ab?(cd)*[e-h]+)/) do |type, token, text, ts, te|
105106
puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
106107
end
107108

@@ -124,7 +125,7 @@ A one-liner that uses map on the result of the scan to return the textual
124125
parts of the pattern:
125126

126127
```ruby
127-
Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
128+
Regexp::Scanner.scan(/(cat?([bhm]at)){3,5}/).map { |token| token[2] }
128129
#=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
129130
```
130131

@@ -220,7 +221,7 @@ syntax, and prints the token objects' text indented to their level.
220221
```ruby
221222
require 'regexp_parser'
222223

223-
Regexp::Lexer.lex /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
224+
Regexp::Lexer.lex(/a?(b(c))*[d]+/, 'ruby/1.9') do |token|
224225
puts "#{' ' * token.level}#{token.text}"
225226
end
226227

@@ -246,7 +247,7 @@ how the sequence 'cat' is treated. The 't' is separated because it's followed
246247
by a quantifier that only applies to it.
247248

248249
```ruby
249-
Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
250+
Regexp::Lexer.scan(/(cat?([b]at)){3,5}/).map { |token| token.text }
250251
#=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
251252
```
252253

@@ -274,7 +275,7 @@ require 'regexp_parser'
274275

275276
regex = /a?(b+(c)d)*(?<name>[0-9]+)/
276277

277-
tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
278+
tree = Regexp::Parser.parse(regex, 'ruby/2.1')
278279

279280
tree.traverse do |event, exp|
280281
puts "#{event}: #{exp.type} `#{exp.to_s}`"
@@ -355,7 +356,7 @@ _Note that not all of these are available in all versions of Ruby_
355356
| &emsp;&emsp;_Nest Level_ | `\k<n-1>` | &#x2713; |
356357
| &emsp;&emsp;_Numbered_ | `\k<1>` | &#x2713; |
357358
| &emsp;&emsp;_Relative_ | `\k<-2>` | &#x2713; |
358-
| &emsp;&emsp;_Traditional_ | `\1` thru `\9` | &#x2713; |
359+
| &emsp;&emsp;_Traditional_ | `\1` through `\9` | &#x2713; |
359360
| &emsp;&nbsp;_**Capturing**_ | `(abc)` | &#x2713; |
360361
| &emsp;&nbsp;_**Comments**_ | `(?# comment text)` | &#x2713; |
361362
| &emsp;&nbsp;_**Named**_ | `(?<name>abc)`, `(?'name'abc)` | &#x2713; |
@@ -375,7 +376,7 @@ _Note that not all of these are available in all versions of Ruby_
375376
| &emsp;&nbsp;_**Meta** \[2\]_ | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C` | &#x2713; |
376377
| &emsp;&nbsp;_**Octal**_ | `\0`, `\01`, `\012` | &#x2713; |
377378
| &emsp;&nbsp;_**Unicode**_ | `\uHHHH`, `\u{H+ H+}` | &#x2713; |
378-
| **Unicode Properties** | _<sub>([Unicode 13.0.0](https://www.unicode.org/versions/Unicode13.0.0/))</sub>_ | &#x22f1; |
379+
| **Unicode Properties** | _<sub>([Unicode 13.0.0])</sub>_ | &#x22f1; |
379380
| &emsp;&nbsp;_**Age**_ | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}` | &#x2713; |
380381
| &emsp;&nbsp;_**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | &#x2713; |
381382
| &emsp;&nbsp;_**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | &#x2713; |
@@ -384,13 +385,17 @@ _Note that not all of these are available in all versions of Ruby_
384385
| &emsp;&nbsp;_**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | &#x2713; |
385386
| &emsp;&nbsp;_**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | &#x2713; |
386387

387-
**\[1\]**: Ruby does not support lazy or possessive interval quantifiers. Any `+` or `?` that follows an interval
388-
quantifier will be treated as another, chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
388+
[Unicode 13.0.0]: https://www.unicode.org/versions/Unicode13.0.0/
389+
390+
**\[1\]**: Ruby does not support lazy or possessive interval quantifiers.
391+
Any `+` or `?` that follows an interval quantifier will be treated as another,
392+
chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
389393
[#69](https://github.com/ammar/regexp_parser/pull/69).
390394

391-
**\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex escapes when used in Regexp literals](
392-
https://github.com/ruby/ruby/commit/11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 ), so they will only reach the
393-
scanner and will only be emitted if a String or a Regexp that has been built with the `::new` constructor is scanned.
395+
**\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex
396+
escapes when used in Regexp literals](https://github.com/ruby/ruby/commit/11ae581),
397+
so they will only reach the scanner and will only be emitted if a String or a Regexp
398+
that has been built with the `::new` constructor is scanned.
394399

395400
##### Inapplicable Features
396401

@@ -407,25 +412,27 @@ expressions library (Onigmo). They are not supported by the scanner.
407412

408413
See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
409414

410-
_**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
411-
or incorrectly return tokens/objects as literals._
415+
_**Note**: Attempting to process expressions with unsupported syntax features can raise
416+
an error, or incorrectly return tokens/objects as literals._
412417

413418

414419
## Testing
415420
To run the tests simply run rake from the root directory.
416421

417-
The default task generates the scanner's code from the Ragel source files and runs all the specs, thus it requires Ragel to be installed.
422+
The default task generates the scanner's code from the Ragel source files and runs
423+
all the specs, thus it requires Ragel to be installed.
418424

419-
Note that changes to Ragel files will not be reflected when running `rspec` on its own, so to run individual tests you might want to run:
425+
Note that changes to Ragel files will not be reflected when running `rspec` on its own,
426+
so to run individual tests you might want to run:
420427

421428
```
422429
rake ragel:rb && rspec spec/scanner/properties_spec.rb
423430
```
424431

425432
## Building
426-
Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
427-
installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
428-
Ruby scanner code.
433+
Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/)
434+
to be installed. The build tasks will automatically invoke the 'ragel:rb' task to generate
435+
the Ruby scanner code.
429436

430437

431438
The project uses the standard rubygems package tasks, so:
@@ -445,19 +452,26 @@ rake install
445452
## Example Projects
446453
Projects using regexp_parser.
447454

448-
- [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool that uses regexp_parser to convert Regexps to css/xpath selectors.
455+
- [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool
456+
that uses regexp_parser to convert Regexps to css/xpath selectors.
449457

450-
- [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
458+
- [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions
459+
to JavaScript-compatible regular expressions.
451460

452-
- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
461+
- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor
462+
with alias support.
453463

454-
- [mutant](https://github.com/mbj/mutant) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
464+
- [mutant](https://github.com/mbj/mutant) manipulates your regular expressions
465+
(amongst others) to see if your tests cover their behavior.
455466

456-
- [repper](https://github.com/jaynetics/repper) is a regular expression pretty-printer for Ruby.
467+
- [repper](https://github.com/jaynetics/repper) is a regular expression
468+
pretty-printer and formatter for Ruby.
457469

458-
- [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that uses regexp_parser to lint Regexps.
470+
- [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that
471+
uses regexp_parser to lint Regexps.
459472

460-
- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper that uses regexp_parser to generate examples of postal codes.
473+
- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper
474+
that uses regexp_parser to generate examples of postal codes.
461475

462476

463477
## References

0 commit comments

Comments
 (0)