Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin-transform-unicode-regex: replace RegExp constructor/calls and replace at runtime #10523

Open
Istador opened this issue Oct 3, 2019 · 3 comments

Comments

@Istador
Copy link

commented Oct 3, 2019

Bug Report

Current Behavior
When I want to transpile for IE11 and the code contains regular expressions with the unicode flag, then the flag needs to be removed and the regular expression pattern needs to be replaced.

@babel/plugin-transform-unicode-regex does this, but only for RexExp literals, not when using the RegExp constructor (new RegExp) or call expressions.

babel-plugin-transform-regexp-constructors offers the functionality to transform RegExp constructs to RegExp literals, which can then be modified by @babel/plugin-transform-unicode-regex. But this only works, if the inputs to RegExp are plain strings known at compile time. If the inputs are in any way dynamic babel-plugin-transform-regexp-constructors will not modify them.

What is needed is a RegExp transformation at runtime, as the RegExp constructor could even be filled with user content (e.g. by an input text field).


Input Code

const a = /Käse/iu
const b = new RegExp('Käse', 'iu')
const c = new RegExp((x => x.trim())('Käse '), 'iu')

c could be minified to b at compile time, but this is just an example. If the string would be user input, then it couldn't be minified.


Output Code

var a = /[K\u212A]\xE4[s\u017F]e/i;
var b = /[K\u212A]\xE4[s\u017F]e/i;
var c = new RegExp(function (x) {
  return x.trim();
}('Käse '), 'iu');

Which fails in IE11, because of the 'u' flag.


Expected behavior/code

var a = /[K\u212A]\xE4[s\u017F]e/i;
var b = /[K\u212A]\xE4[s\u017F]e/i;
var c = new RegExp(FUNCTION_CALL_TO_REWRITE_AT_RUNTIME(function (x) {
  return x.trim();
}('Käse ')), 'i');

Babel Configuration (.babelrc, package.json, cli command)

{
  sourceType: 'unambiguous',
  presets: [[
    '@babel/preset-env',
    {
      useBuiltIns: 'usage',
      corejs: 3,
      modules: 'commonjs',
      targets: {
        ie      : 11,
        chrome  : 60,
        firefox : 56,
      },
    },
  ]],
  plugins: [
    'babel-plugin-transform-regexp-constructors',
    '@babel/plugin-transform-unicode-regex',
  ],
}

Environment

  • Babel version(s): 7.6.2
  • Node/npm version: Node 12.7.0, npm 6.10.0
  • OS: Windows 7
  • Monorepo: yes?
  • How you are using Babel: webpack 4.41.0 with babel-loader 8.0.6

Possible Solution

As a workaround I wrote a babel plugin that combines babel-plugin-transform-regexp-constructors and @babel/plugin-transform-unicode-regex together and adds an additional function call to the code if needed.

But I don't know exactly how to polyfill the function properly. Currently I just inject it inside the window object, which feels very dirty. It'd be way better, if the plugin would automatically inject the function, without polluting window, but only if it detects such problematic regular expressions in the code (because the code for it is quite big).

Application:

window.rewritePattern = require('regexpu-core')

Plugin:

const rewritePattern = require('regexpu-core')

function convert(path, t) {
  const args = path.get('arguments')
  const evaluatedArgs = args.map((a) => a.evaluate())
  if (! evaluatedArgs[1] || ! evaluatedArgs[1].value || ! evaluatedArgs[1].value.includes('u')) { return }
  let pattern = evaluatedArgs[0]
  let flags   = evaluatedArgs[1].value

  if (pattern.confident) {
    return t.regExpLiteral(
      rewritePattern(pattern.value, flags),
      flags.replace('u', ''),
    )
  }
  else {
    return t.newExpression(
      path.node.callee,
      [
        t.callExpression(
          t.memberExpression(
            t.identifier('window'),
            t.identifier('rewritePattern'),
          ),
          [
            pattern.deopt.node,
          ],
        ),
        t.stringLiteral(flags.replace('u', '')),
      ],
    )
  }
}

function maybeReplaceRegExp(path, t) {
  if (! t.isIdentifier(path.node.callee, { name: 'RegExp' })) { return }
  const regexp = convert(path, t)
  if (regexp) {
    path.replaceWith(regexp)
  }
}

module.exports = function({ types: t }) {
  return {
    name: 'transform-unicode-regex',
    visitor: {
      RegExpLiteral({ node }) {
        if (! node.flags || ! node.flags.includes('u')) { return }
        node.pattern = rewritePattern(node.pattern, node.flags)
        node.flags = node.flags.replace('u', '')
      },
      NewExpression(path) {
        maybeReplaceRegExp(path, t)
      },
      CallExpression(path) {
        maybeReplaceRegExp(path, t)
      },
    },
  }
}

Which produces the following output that works in IE11:

var a = /[K\u212A]\xE4[s\u017F]e/i;
var b = /[K\u212A]\xE4[s\u017F]e/i;
var c = new RegExp(window.rewritePattern(function (x) {
  return x.trim();
}('Käse ')), "i");
@babel-bot

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

Hey @Istador! We really appreciate you taking the time to report an issue. The collaborators on this project attempt to help as many people as possible, but we're a limited number of volunteers, so it's possible this won't be addressed swiftly.

If you need any help, or just have general Babel or JavaScript questions, we have a vibrant Slack community that typically always has someone willing to help. You can sign-up here for an invite."

@JLHwung

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2019

Currently we don't have plan to support transforming RegExp constructors because it is highly dynamic. It is better to do it on the user land as it is in your proposed workaround.

It'd be way better, if the plugin would automatically inject the function, without polluting window

Instead of referencing regexpu-core from global object, you can inject a regexpu-core import if the regular expressions are problematic.

import { addDefault } from "@babel/helper-module-imports";

// create unique identifier for imported regexpu-core
const rewritePatternIdentifier = path.scope.generateUidIdentifier("rewritePattern");
addDefault(rewritePatternIdentifier, "regexpu-core");

// replace RegExp constructor calls
t.newExpression(path.node.callee, [
  t.callExpression(rewritePatternIdentifier, [pattern.deopt.node]),
  t.stringLiteral(flags.replace("u", ""))
]);
@Istador

This comment has been minimized.

Copy link
Author

commented Oct 14, 2019

That didn't work for me, because addDefault wants a path and not an identifier as the first parameter.

TypeError: [...].js: path.find is not a function
    at new ImportInjector ([...]\node_modules\@babel\helper-module-imports\lib\import-injector.js:46:30)
    at addDefault ([...]\node_modules\@babel\helper-module-imports\lib\index.js:30:10)
    [...]

This did the trick for me:

const rewritePatternIdentifier = addDefault(path, "regexpu-core");

instead of:

const rewritePatternIdentifier = path.scope.generateUidIdentifier("rewritePattern");
addDefault(rewritePatternIdentifier, "regexpu-core");

Thank you for the pointer into the right direction.


I don't expect babel to do this natively, but I'd expect the plugin @babel/plugin-transform-unicode-regex to do that, whose source is part of this repository.

If this isn't added, then I'll probably separate this plugin into a new project, that can be installed via npm. When I find time for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.