Parse scientific notation numbers in input.ts. #567

eternauta1337 · 2019-01-09T14:49:51Z

This PR adds a remix-like regex at input.ts in order to parse values like "1e2" to "100" or "1.5e20" to "150000000000000000000", and tests for it at input.test.js. It also modifies a couple of the other tests in this file to make them a bit more accurate.

Note that the introduced MATCH_SCIENTIFIC regex is slightly more complicated than the others, because it involves performing a Big Number operation in the replacement string. It should be looked at carefully to make sure that we don't introduce new bugs with this change.

Also note that this PR is compatible with the spirit of PR #556, that intends to do no parsing whatsoever at the programatic level in encodeCall and only do it when parsing CLI input, in order to maintain full compatibility with encodeCall and web3 contract interaction.

spalladino · 2019-01-09T16:01:35Z

packages/cli/src/utils/input.ts

+    args = args.replace(MATCH_HEX, '$2"$3"');
+
+    // replace scientific notation numbers by regular numbers
+    const MATCH_SCIENTIFIC = /(^|([,[]\s*))(\s*[-]?\d+(\.\d+)?e(\+)?\d+\s*)/g;


Aren't we missing an end-of-string (or ($|([,\]])) anchor in this regex?

I'm not sure I'd call (^|([,[]\s*)) start of string anchors, they're more like capturing groups. Remember that in this case we are parsing the whole input, not just a single value.

Despite that, if we are missing an "end of string capturing group", then we would need to add them to the other regex-es, since none of them use them.

Despite that, if we are missing an "end of string capturing group", then we would need to add them to the other regex-es, since none of them use them.
Yep, exactly what I was thinking about.

Interesting... if we do add the end of string capturing groups in all the regex-es, it would capture ending commas and ]'s. Take a look: https://regex101.com/r/oES6aY/1/

packages/cli/test/utils/input.test.js

eternauta1337 · 2019-02-01T10:29:59Z

After careful consideration of @spalladino 's suggestion on using end-of-string anchors in the scientific notation regex, (which was awesome btw 🤜 🤛 ), I discovered that the original regex-es actually contain a few bugs because of this as well.

For example, the MATCH_HEX regex would convert and address-like string such as 0x39af68cF04Abb0eF8f9d8191E1blalala to "0x39af68cF04Abb0eF8f9d8191E1b"lalala (notice the incorrect placement of the quote " characters). This is because this regex also needs an end of string anchor.

Another bug was that negative numbers would not be wrapped by quotes. So, something like 20 would be transformed to "20" but something like -20 would not be transformed to "-20".

Not using an end of string anchor on scientific notation-ish entries like 1e2lala would produce something like 100lala.

To address these problems, the new code contains the following changes:

All regex-es now contain start and end of string anchors.
The start and end of string anchors are now lookaheads/lookbehinds, which simplify the post-processing of the matches because comma , characters are no longer part of the match.
The regex-es are now dynamically built with reusable components using new RegExp(...), which makes the code easier to understand, and safer.

Despite these fixes, I believe that using regular expressions to parse input is rather fragile, and I recommend that we switch to manually parsing the input with a recursive function. If you agree, let me know and I can create a new issue, which could be addressed later on.

spalladino

Looks good Ale, though I agree that this kind of regex matching could be fragile. I can think of two alternatives:

Be much more strict, and require everything to be properly enclosed in quotes (ie be valid JSON). This is more painful when accepting user input, but is a lot safer.
Generating a parser from a full grammar to handle the input, which should be safer and accommodate for more cases, but is also a lot more of work.

spalladino · 2019-02-01T14:23:10Z

packages/lib/package.json

@@ -69,7 +69,7 @@
    "openzeppelin-solidity": "~1.10.0",
    "semver": "^5.5.1",
    "truffle-flattener": "^1.2.8",
-    "web3": "^1.0.0-beta.37"
+    "web3": "1.0.0-beta.37"


This change is critical at the moment. Mind moving it to a separate PR, and merging it immediately?

Yes, apparently web3 1.x beta > 37 and < 41 breaks CI tests with the "websocket" sub-dependency of "web3-providers-ws".

It's already pinned in master from one of my other PRs: https://github.com/zeppelinos/zos/blob/master/packages/lib/package.json#L72

spalladino · 2019-02-01T14:23:42Z

packages/lib/package.json

@@ -8,7 +8,7 @@
  "scripts": {
    "test": "scripts/test.sh",
    "prepublishOnly": "echo 'Removing mock contracts...' && grep -hoP '^\\s*contract \\K(\\w+)' contracts/mocks/*.sol | sort | uniq | xargs -t -I% rm build/contracts/%.json",
-    "compile-contracts": "rm -rf build/contracts && truffle compile",
+    "compile-contracts": "rm -rf build/contracts && npx truffle compile",


AFAIK you don't need npx for npm scripts, since ./node_modules/bin is prepended to PATH.

eternauta1337 · 2019-02-02T13:54:11Z

@spalladino

Be much more strict, and require everything to be properly enclosed in quotes (ie be valid JSON). This is more painful when accepting user input, but is a lot safer.

Generating a parser from a full grammar to handle the input, which should be safer and accommodate for more cases, but is also a lot more of work.

The third option would be to merge this now (since it does restore the scientific notation feature) and open a new issue to revisit the topic with a safer approach (option 1 or 2). I would go with the second option; Almost 90% of the code from one of my closed encodeCall PRs could be reused.

spalladino · 2019-02-04T16:25:22Z

Agree Ale, merging now! And please add a new issue to review the parsing of arguments, including both options. I think Remix is currently going with (1), but it'd be interesting to have (2).

Parse scientific notation numbers in input.ts.

648da2a

eternauta1337 added the status:to-review Awaiting review label Jan 9, 2019

eternauta1337 added 2 commits January 9, 2019 12:12

Use npx for truffle in lib's scripts (for CI tests to pass).

12d8e6a

Accoung for negative scientific numbers.

10cb862

spalladino reviewed Jan 9, 2019

View reviewed changes

facuspagnuolo self-assigned this Jan 9, 2019

facuspagnuolo reviewed Jan 9, 2019

View reviewed changes

packages/cli/test/utils/input.test.js Show resolved Hide resolved

facuspagnuolo removed status:to-review Awaiting review labels Jan 9, 2019

eternauta1337 added 2 commits January 31, 2019 15:06

Added bignumber.js dependency to cli.

539efa0

Improved regex matching in input.ts.

e38ab69

eternauta1337 added 2 commits February 1, 2019 07:30

Minor change in how dynamic regex-es are built in input.ts.

f06b426

Merge with master and update lerna bootstrapping.

4e10922

eternauta1337 force-pushed the feature/scientific-notation-at-input branch from d377f5a to 475b229 Compare February 1, 2019 13:04

Fix web3 to beta 37.

475b229

eternauta1337 added the status:to-review Awaiting review label Feb 1, 2019

spalladino approved these changes Feb 1, 2019

View reviewed changes

facuspagnuolo assigned spalladino and unassigned facuspagnuolo Feb 1, 2019

Remove npx usage in package.json files.

2e00e0d

spalladino merged commit 505399b into master Feb 4, 2019

spalladino deleted the feature/scientific-notation-at-input branch February 4, 2019 16:24

eternauta1337 mentioned this pull request Feb 4, 2019

Safer parsing of input #626

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse scientific notation numbers in input.ts. #567

Parse scientific notation numbers in input.ts. #567

eternauta1337 commented Jan 9, 2019

spalladino Jan 9, 2019

eternauta1337 Jan 9, 2019

spalladino Jan 9, 2019

eternauta1337 Jan 9, 2019

eternauta1337 commented Feb 1, 2019 •

edited

Loading

spalladino left a comment

spalladino Feb 1, 2019

eternauta1337 Feb 2, 2019

eternauta1337 Feb 2, 2019

spalladino Feb 1, 2019

eternauta1337 commented Feb 2, 2019 •

edited

Loading

spalladino commented Feb 4, 2019

Parse scientific notation numbers in input.ts. #567

Parse scientific notation numbers in input.ts. #567

Conversation

eternauta1337 commented Jan 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eternauta1337 commented Feb 1, 2019 • edited Loading

spalladino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eternauta1337 commented Feb 2, 2019 • edited Loading

spalladino commented Feb 4, 2019

eternauta1337 commented Feb 1, 2019 •

edited

Loading

eternauta1337 commented Feb 2, 2019 •

edited

Loading