@@ -9,8 +9,8 @@ A Ruby gem for tokenizing, parsing, and transforming regular expressions.
99
1010* Multilayered
1111 * A scanner/tokenizer based on [ Ragel] ( http://www.colm.net/open-source/ragel/ )
12- * A lexer that produces a "stream" of token objects.
13- * A parser that produces a "tree" of Expression objects (OO API)
12+ * A lexer that produces a "stream" of [ Token objects] ( https://github.com/ammar/regexp_parser/wiki/Token-Objects )
13+ * A parser that produces a "tree" of [ Expression objects (OO API) ] ( https://github.com/ammar/regexp_parser/wiki/Expression-Objects )
1414* Runs on Ruby 2.x, 3.x and JRuby runtimes
1515* Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [ See Supported Syntax] ( #supported-syntax )
1616
@@ -36,14 +36,15 @@ Or, add it to your project's `Gemfile`:
3636
3737``` gem 'regexp_parser', '~> X.Y.Z' ```
3838
39- See rubygems for the the [ latest version number] ( https://rubygems.org/gems/regexp_parser )
39+ See the badge at the top of this README or [ rubygems] ( https://rubygems.org/gems/regexp_parser )
40+ for the the latest version number.
4041
4142
4243---
4344## Usage
4445
4546The three main modules are ** Scanner** , ** Lexer** , and ** Parser** . Each of them
46- provides a single method that takes a regular expression (as a RegExp object or
47+ provides a single method that takes a regular expression (as a Regexp object or
4748a string) and returns its results. The ** Lexer** and the ** Parser** accept an
4849optional second argument that specifies the syntax version, like 'ruby/2.0',
4950which defaults to the host Ruby version (using RUBY_VERSION).
@@ -101,7 +102,7 @@ start/end offsets for each token found.
101102``` ruby
102103require ' regexp_parser'
103104
104- Regexp ::Scanner .scan /(ab?(cd) *[e-h] +) / do |type , token , text , ts , te |
105+ Regexp ::Scanner .scan( /(ab?(cd) *[e-h] +) / ) do |type , token , text , ts , te |
105106 puts " type: #{ type } , token: #{ token } , text: '#{ text } ' [#{ ts } ..#{ te } ]"
106107end
107108
@@ -124,7 +125,7 @@ A one-liner that uses map on the result of the scan to return the textual
124125parts of the pattern:
125126
126127``` ruby
127- Regexp ::Scanner .scan( /(cat?([bhm] at) ) {3,5} / ).map {|token | token[2 ]}
128+ Regexp ::Scanner .scan(/(cat?([bhm] at) ) {3,5} / ).map { |token | token[2 ] }
128129# => ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
129130```
130131
@@ -220,7 +221,7 @@ syntax, and prints the token objects' text indented to their level.
220221``` ruby
221222require ' regexp_parser'
222223
223- Regexp ::Lexer .lex /a?(b(c) ) *[d] +/ , ' ruby/1.9' do |token |
224+ Regexp ::Lexer .lex( /a?(b(c) ) *[d] +/ , ' ruby/1.9' ) do |token |
224225 puts " #{ ' ' * token.level} #{ token.text } "
225226end
226227
@@ -246,7 +247,7 @@ how the sequence 'cat' is treated. The 't' is separated because it's followed
246247by a quantifier that only applies to it.
247248
248249``` ruby
249- Regexp ::Lexer .scan( /(cat?([b] at) ) {3,5} / ).map {|token | token.text}
250+ Regexp ::Lexer .scan(/(cat?([b] at) ) {3,5} / ).map { |token | token.text }
250251# => ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
251252```
252253
@@ -274,7 +275,7 @@ require 'regexp_parser'
274275
275276regex = /a?(b+(c) d) *(?<name>[0-9] +) /
276277
277- tree = Regexp ::Parser .parse( regex, ' ruby/2.1' )
278+ tree = Regexp ::Parser .parse(regex, ' ruby/2.1' )
278279
279280tree.traverse do |event , exp |
280281 puts " #{ event } : #{ exp.type } `#{ exp.to_s } `"
@@ -355,7 +356,7 @@ _Note that not all of these are available in all versions of Ruby_
355356| &emsp ;&emsp ; _ Nest Level_ | ` \k<n-1> ` | ✓ ; |
356357| &emsp ;&emsp ; _ Numbered_ | ` \k<1> ` | ✓ ; |
357358| &emsp ;&emsp ; _ Relative_ | ` \k<-2> ` | ✓ ; |
358- | &emsp ;&emsp ; _ Traditional_ | ` \1 ` thru ` \9 ` | ✓ ; |
359+ | &emsp ;&emsp ; _ Traditional_ | ` \1 ` through ` \9 ` | ✓ ; |
359360| &emsp ;  ; _ ** Capturing** _ | ` (abc) ` | ✓ ; |
360361| &emsp ;  ; _ ** Comments** _ | ` (?# comment text) ` | ✓ ; |
361362| &emsp ;  ; _ ** Named** _ | ` (?<name>abc) ` , ` (?'name'abc) ` | ✓ ; |
@@ -375,7 +376,7 @@ _Note that not all of these are available in all versions of Ruby_
375376| &emsp ;  ; _ ** Meta** \[ 2\] _ | ` \M-c ` , ` \M-\C-C ` , ` \M-\cC ` , ` \C-\M-C ` , ` \c\M-C ` | ✓ ; |
376377| &emsp ;  ; _ ** Octal** _ | ` \0 ` , ` \01 ` , ` \012 ` | ✓ ; |
377378| &emsp ;  ; _ ** Unicode** _ | ` \uHHHH ` , ` \u{H+ H+} ` | ✓ ; |
378- | ** Unicode Properties** | _ <sub >([ Unicode 13.0.0] ( https://www.unicode.org/versions/Unicode13.0.0/ ) ) </sub >_ | ⋱ ; |
379+ | ** Unicode Properties** | _ <sub >([ Unicode 13.0.0] ) </sub >_ | ⋱ ; |
379380| &emsp ;  ; _ ** Age** _ | ` \p{Age=5.2} ` , ` \P{age=7.0} ` , ` \p{^age=8.0} ` | ✓ ; |
380381| &emsp ;  ; _ ** Blocks** _ | ` \p{InArmenian} ` , ` \P{InKhmer} ` , ` \p{^InThai} ` | ✓ ; |
381382| &emsp ;  ; _ ** Classes** _ | ` \p{Alpha} ` , ` \P{Space} ` , ` \p{^Alnum} ` | ✓ ; |
@@ -384,13 +385,17 @@ _Note that not all of these are available in all versions of Ruby_
384385| &emsp ;  ; _ ** Scripts** _ | ` \p{Arabic} ` , ` \P{Hiragana} ` , ` \p{^Greek} ` | ✓ ; |
385386| &emsp ;  ; _ ** Simple** _ | ` \p{Dash} ` , ` \p{Extender} ` , ` \p{^Hyphen} ` | ✓ ; |
386387
387- ** \[ 1\] ** : Ruby does not support lazy or possessive interval quantifiers. Any ` + ` or ` ? ` that follows an interval
388- quantifier will be treated as another, chained quantifier. See also [ #3 ] ( https://github.com/ammar/regexp_parser/issue/3 ) ,
388+ [ Unicode 13.0.0 ] : https://www.unicode.org/versions/Unicode13.0.0/
389+
390+ ** \[ 1\] ** : Ruby does not support lazy or possessive interval quantifiers.
391+ Any ` + ` or ` ? ` that follows an interval quantifier will be treated as another,
392+ chained quantifier. See also [ #3 ] ( https://github.com/ammar/regexp_parser/issue/3 ) ,
389393[ #69 ] ( https://github.com/ammar/regexp_parser/pull/69 ) .
390394
391- ** \[ 2\] ** : As of Ruby 3.1, meta and control sequences are [ pre-processed to hex escapes when used in Regexp literals] (
392- https://github.com/ruby/ruby/commit/11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 ), so they will only reach the
393- scanner and will only be emitted if a String or a Regexp that has been built with the ` ::new ` constructor is scanned.
395+ ** \[ 2\] ** : As of Ruby 3.1, meta and control sequences are [ pre-processed to hex
396+ escapes when used in Regexp literals] ( https://github.com/ruby/ruby/commit/11ae581 ) ,
397+ so they will only reach the scanner and will only be emitted if a String or a Regexp
398+ that has been built with the ` ::new ` constructor is scanned.
394399
395400##### Inapplicable Features
396401
@@ -407,25 +412,27 @@ expressions library (Onigmo). They are not supported by the scanner.
407412
408413See something missing? Please submit an [ issue] ( https://github.com/ammar/regexp_parser/issues )
409414
410- _ ** Note** : Attempting to process expressions with unsupported syntax features can raise an error,
411- or incorrectly return tokens/objects as literals._
415+ _ ** Note** : Attempting to process expressions with unsupported syntax features can raise
416+ an error, or incorrectly return tokens/objects as literals._
412417
413418
414419## Testing
415420To run the tests simply run rake from the root directory.
416421
417- The default task generates the scanner's code from the Ragel source files and runs all the specs, thus it requires Ragel to be installed.
422+ The default task generates the scanner's code from the Ragel source files and runs
423+ all the specs, thus it requires Ragel to be installed.
418424
419- Note that changes to Ragel files will not be reflected when running ` rspec ` on its own, so to run individual tests you might want to run:
425+ Note that changes to Ragel files will not be reflected when running ` rspec ` on its own,
426+ so to run individual tests you might want to run:
420427
421428```
422429rake ragel:rb && rspec spec/scanner/properties_spec.rb
423430```
424431
425432## Building
426- Building the scanner and the gem requires [ Ragel] ( http://www.colm.net/open-source/ragel/ ) to be
427- installed. The build tasks will automatically invoke the 'ragel: rb ' task to generate the
428- Ruby scanner code.
433+ Building the scanner and the gem requires [ Ragel] ( http://www.colm.net/open-source/ragel/ )
434+ to be installed. The build tasks will automatically invoke the 'ragel: rb ' task to generate
435+ the Ruby scanner code.
429436
430437
431438The project uses the standard rubygems package tasks, so:
@@ -445,19 +452,26 @@ rake install
445452## Example Projects
446453Projects using regexp_parser.
447454
448- - [ capybara] ( https://github.com/teamcapybara/capybara ) is an integration testing tool that uses regexp_parser to convert Regexps to css/xpath selectors.
455+ - [ capybara] ( https://github.com/teamcapybara/capybara ) is an integration testing tool
456+ that uses regexp_parser to convert Regexps to css/xpath selectors.
449457
450- - [ js_regex] ( https://github.com/jaynetics/js_regex ) converts Ruby regular expressions to JavaScript-compatible regular expressions.
458+ - [ js_regex] ( https://github.com/jaynetics/js_regex ) converts Ruby regular expressions
459+ to JavaScript-compatible regular expressions.
451460
452- - [ meta_re] ( https://github.com/ammar/meta_re ) is a regular expression preprocessor with alias support.
461+ - [ meta_re] ( https://github.com/ammar/meta_re ) is a regular expression preprocessor
462+ with alias support.
453463
454- - [ mutant] ( https://github.com/mbj/mutant ) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
464+ - [ mutant] ( https://github.com/mbj/mutant ) manipulates your regular expressions
465+ (amongst others) to see if your tests cover their behavior.
455466
456- - [ repper] ( https://github.com/jaynetics/repper ) is a regular expression pretty-printer for Ruby.
467+ - [ repper] ( https://github.com/jaynetics/repper ) is a regular expression
468+ pretty-printer and formatter for Ruby.
457469
458- - [ rubocop] ( https://github.com/rubocop-hq/rubocop ) is a linter for Ruby that uses regexp_parser to lint Regexps.
470+ - [ rubocop] ( https://github.com/rubocop-hq/rubocop ) is a linter for Ruby that
471+ uses regexp_parser to lint Regexps.
459472
460- - [ twitter-cldr-rb] ( https://github.com/twitter/twitter-cldr-rb ) is a localization helper that uses regexp_parser to generate examples of postal codes.
473+ - [ twitter-cldr-rb] ( https://github.com/twitter/twitter-cldr-rb ) is a localization helper
474+ that uses regexp_parser to generate examples of postal codes.
461475
462476
463477## References
0 commit comments