Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support special characters for mapping rule wildcards in url path #714

Merged
merged 1 commit into from
Jun 15, 2018

Conversation

mayorova
Copy link
Contributor

Some valid paths are not supported by the mapping rules matcher:
Mapping rule: /api/{email}/whatever is supposed to match /api/me@email.com/whatever, but instead APIcast returns No Mapping Rules Matched.

I'm not sure if all special characters listed in https://tools.ietf.org/html/rfc3986#section-3.3 should be used though, as it seems that some are not accepted by nginx itself...

return pattern:gsub('?.*', ''):gsub("{.-}", '([\\w_.-]+)'):gsub("%.", "\\.")
-- as per the RFC: https://tools.ietf.org/html/rfc3986#section-3.3
local wildcard_regex = [[([\w-.~%%!$&'()*+,;=@:]+)]] -- using long Lua brackets [[...]], escaping `%`
return pattern:gsub('?.*', ''):gsub("{.-}", wildcard_regex):gsub("%.", "\\.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to try #707 on this to see if we can improve performance of it. The :gsub should not be JITable, so replacing it could help significantly in micro benchmarks.

Copy link
Contributor

@mikz mikz May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark code:

require('benchmark.ips')(function(b)
  b.time = 5
  b.warmup = 2

  local gsub = string.gsub
  local re_gsub = ngx.re.gsub

  local transforms = {
    { '?.*', '' },
    { "{.-}", [[([\w-.~%%!$&'()*+,;=@:]+)]] },
    { "%.", [[\.]] },
  }

  local function loop(pattern)
    for i=1, #transforms do
      pattern = gsub(pattern, transforms[i][1], transforms[i][2])
    end
    return pattern
  end

  local function closure(pattern)
    pattern = gsub(pattern, '?.*', '')
    pattern = gsub(pattern, "{.-}", [[([\w-.~%%!$&'()*+,;=@:]+)]])
    pattern = gsub(pattern, "%.", "\\.")

    return pattern
  end

  local function global(pattern)
    -- as per the RFC: https://tools.ietf.org/html/rfc3986#section-3.3
    local wildcard_regex = [[([\w-.~%%!$&'()*+,;=@:]+)]] -- using long Lua brackets [[...]], escaping `%`
    return pattern:gsub('?.*', ''):gsub("{.-}", wildcard_regex):gsub("%.", "\\.")
  end

  local function pcre(pattern)
    pattern = re_gsub(pattern, [[\?.*]], '')
    pattern = re_gsub(pattern, [[\{.+?\}]], [[([\w-.~%!$$&'()*+,;=@:]+)]])
    pattern = re_gsub(pattern, [[\.]], [[\.]])

    return pattern
  end

  local function pcre_jit(pattern)
    pattern = re_gsub(pattern, [[\?.*]], '', 'oj')
    pattern = re_gsub(pattern, [[\{.+?\}]], [[([\w-.~%!$$&'()*+,;=@:]+)]], 'oj')
    pattern = re_gsub(pattern, [[\.]], [[\.]], 'oj')

    return pattern
  end

  local str = '/tfgoodosod/{foo}/sdsd?asdasdd'

  print('loop:   ' , loop(str))
  print('closure:' , closure(str))
  print('global: ' , global(str))
  print('pcre:   ' , pcre(str))

  b:report('loop', function() return loop(str) end)
  b:report('gsub', function() return closure(str) end)
  b:report('string.gsub', function() return global(str) end)
  b:report('re.gsub', function() return pcre(str) end)
  b:report('re.gsub jit', function() return pcre_jit(str) end)

  b:compare()
end)

Warming up --------------------------------------
                loop
     68352 i/100ms

                gsub
     69712 i/100ms

         string.gsub
     38264 i/100ms

             re.gsub
       242 i/100ms

         re.gsub jit
       807 i/100ms

Calculating -------------------------------------
   730569.9 (±0.9%) i/s -    3691008 in   5.052678s
   739498.1 (±1.3%) i/s -    3764448 in   5.091396s
   731344.9 (±2.0%) i/s -    3673344 in   5.024815s
      851.0 (±18.4%) i/s -       4114 in   5.002932s
     4226.9 (±3.4%) i/s -      21789 in   5.161376s

So, could you use gsub function from the closure to get 2x performance ? :)

Copy link
Contributor

@mikz mikz May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated results using resty.core and JITable PCRE:

Benchmark code

local re_gsub = ngx.re.gsub

require('resty.core')
require('benchmark.ips')(function(b)
  b.time = 5
  b.warmup = 2

  local gsub = string.gsub

  local ffi_re_gsub = ngx.re.gsub

  local transforms = {
    { '?.*', '' },
    { "{.-}", [[([\w-.~%%!$&'()*+,;=@:]+)]] },
    { "%.", [[\.]] },
  }

  local function loop(pattern)
    for i=1, #transforms do
      pattern = gsub(pattern, transforms[i][1], transforms[i][2])
    end
    return pattern
  end

  local function closure(pattern)
    pattern = gsub(pattern, '?.*', '')
    pattern = gsub(pattern, "{.-}", [[([\w-.~%%!$&'()*+,;=@:]+)]])
    pattern = gsub(pattern, "%.", "\\.")

    return pattern
  end

  local function global(pattern)
    -- as per the RFC: https://tools.ietf.org/html/rfc3986#section-3.3
    local wildcard_regex = [[([\w-.~%%!$&'()*+,;=@:]+)]] -- using long Lua brackets [[...]], escaping `%`
    return pattern:gsub('?.*', ''):gsub("{.-}", wildcard_regex):gsub("%.", "\\.")
  end

  local function pcre(pattern)
    pattern = re_gsub(pattern, [[\?.*]], '')
    pattern = re_gsub(pattern, [[\{.+?\}]], [[([\w-.~%!$$&'()*+,;=@:]+)]])
    pattern = re_gsub(pattern, [[\.]], [[\.]])

    return pattern
  end

  local function pcre_jit(pattern)
    pattern = re_gsub(pattern, [[\?.*]], '', 'oj')
    pattern = re_gsub(pattern, [[\{.+?\}]], [[([\w-.~%!$$&'()*+,;=@:]+)]], 'oj')
    pattern = re_gsub(pattern, [[\.]], [[\.]], 'oj')

    return pattern
  end

  local function ffi_pcre(pattern)
    pattern = ffi_re_gsub(pattern, [[\?.*]], '')
    pattern = ffi_re_gsub(pattern, [[\{.+?\}]], [[([\w-.~%!$$&'()*+,;=@:]+)]])
    pattern = ffi_re_gsub(pattern, [[\.]], [[\.]])

    return pattern
  end

  local function ffi_pcre_jit(pattern)
    pattern = ffi_re_gsub(pattern, [[\?.*]], '', 'oj')
    pattern = ffi_re_gsub(pattern, [[\{.+?\}]], [[([\w-.~%!$$&'()*+,;=@:]+)]], 'oj')
    pattern = ffi_re_gsub(pattern, [[\.]], [[\.]], 'oj')

    return pattern
  end

  local str = '/tfgoodosod/{foo}/sdsd?asdasdd'

  print('loop:   ' , loop(str))
  print('closure:' , closure(str))
  print('global: ' , global(str))
  print('pcre:   ' , pcre(str))

  b:report('loop', function() return loop(str) end)
  b:report('gsub', function() return closure(str) end)
  b:report('string.gsub', function() return global(str) end)
  b:report('re.gsub', function() return pcre(str) end)
  b:report('re.gsub jit', function() return pcre_jit(str) end)

  b:report('ffi re.gsub', function() return ffi_pcre(str) end)
  b:report('ffi re.gsub jit', function() return ffi_pcre_jit(str) end)

  b:compare()
end)

Warming up --------------------------------------
                loop
     70503 i/100ms

                gsub
     71918 i/100ms

         string.gsub
     71841 i/100ms

             re.gsub
       245 i/100ms

         re.gsub jit
       839 i/100ms

         ffi re.gsub
     31313 i/100ms

     ffi re.gsub jit
    269551 i/100ms

Calculating -------------------------------------
   731503.8 (±3.4%) i/s -    3666156 in   5.018140s
   758034.1 (±1.8%) i/s -    3811654 in   5.029960s
   750995.1 (±1.9%) i/s -    3807573 in   5.071966s
      748.2 (±17.6%) i/s -       3675 in   5.092833s
     4626.4 (±4.1%) i/s -      23492 in   5.087335s
   321969.3 (±2.1%) i/s -    1628276 in   5.059453s
  3683596.4 (±3.0%) i/s -   18599019 in   5.054253s

So probably best to use the PCRE jited version :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results of the same benchmark with JIT off:

Warming up --------------------------------------
                loop
     33967 i/100ms

                gsub
     33434 i/100ms

         string.gsub
     32714 i/100ms

             re.gsub
       205 i/100ms

         re.gsub jit
       596 i/100ms

         ffi re.gsub
      4601 i/100ms

     ffi re.gsub jit
      7278 i/100ms

Calculating -------------------------------------
   554070.5 (±2.5%) i/s -    2785294 in   5.030156s
   570121.9 (±2.5%) i/s -    2875324 in   5.046784s
   547568.0 (±1.7%) i/s -    2747976 in   5.020110s
      543.7 (±25.6%) i/s -       2665 in   5.228329s
     2492.8 (±4.6%) i/s -      12516 in   5.032186s
    48904.9 (±2.6%) i/s -     248454 in   5.083841s
    81249.3 (±4.0%) i/s -     407568 in   5.023757s

@davidor
Copy link
Contributor

davidor commented Jun 13, 2018

@mayorova according to @mikz benchmarks, the ffi re.gsub jit version seems way faster, so we should probably use it.

@mayorova
Copy link
Contributor Author

@davidor yes, thanks! I am reviewing @mikz 's comments just now. Just took me a bit to find the hidden benchmark code 😅

Copy link
Contributor

@mikz mikz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 👍

@davidor davidor merged commit 3cec6c8 into master Jun 15, 2018
@davidor davidor deleted the wildcard-regex branch June 15, 2018 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants