Skip to content
This repository has been archived by the owner on Jan 24, 2022. It is now read-only.

Parse scientific notation numbers in input.ts. #567

Merged
merged 9 commits into from
Feb 4, 2019

Conversation

eternauta1337
Copy link
Contributor

Fix #559

This PR adds a remix-like regex at input.ts in order to parse values like "1e2" to "100" or "1.5e20" to "150000000000000000000", and tests for it at input.test.js. It also modifies a couple of the other tests in this file to make them a bit more accurate.

Note that the introduced MATCH_SCIENTIFIC regex is slightly more complicated than the others, because it involves performing a Big Number operation in the replacement string. It should be looked at carefully to make sure that we don't introduce new bugs with this change.

Also note that this PR is compatible with the spirit of PR #556, that intends to do no parsing whatsoever at the programatic level in encodeCall and only do it when parsing CLI input, in order to maintain full compatibility with encodeCall and web3 contract interaction.

@eternauta1337 eternauta1337 added the status:to-review Awaiting review label Jan 9, 2019
args = args.replace(MATCH_HEX, '$2"$3"');

// replace scientific notation numbers by regular numbers
const MATCH_SCIENTIFIC = /(^|([,[]\s*))(\s*[-]?\d+(\.\d+)?e(\+)?\d+\s*)/g;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we missing an end-of-string (or ($|([,\]])) anchor in this regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'd call (^|([,[]\s*)) start of string anchors, they're more like capturing groups. Remember that in this case we are parsing the whole input, not just a single value.

Despite that, if we are missing an "end of string capturing group", then we would need to add them to the other regex-es, since none of them use them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite that, if we are missing an "end of string capturing group", then we would need to add them to the other regex-es, since none of them use them.
Yep, exactly what I was thinking about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... if we do add the end of string capturing groups in all the regex-es, it would capture ending commas and ]'s. Take a look: https://regex101.com/r/oES6aY/1/

@facuspagnuolo facuspagnuolo self-assigned this Jan 9, 2019
@facuspagnuolo facuspagnuolo removed status:to-review Awaiting review labels Jan 9, 2019
@eternauta1337
Copy link
Contributor Author

eternauta1337 commented Feb 1, 2019

After careful consideration of @spalladino 's suggestion on using end-of-string anchors in the scientific notation regex, (which was awesome btw 🤜 🤛 ), I discovered that the original regex-es actually contain a few bugs because of this as well.

For example, the MATCH_HEX regex would convert and address-like string such as 0x39af68cF04Abb0eF8f9d8191E1blalala to "0x39af68cF04Abb0eF8f9d8191E1b"lalala (notice the incorrect placement of the quote " characters). This is because this regex also needs an end of string anchor.

Another bug was that negative numbers would not be wrapped by quotes. So, something like 20 would be transformed to "20" but something like -20 would not be transformed to "-20".

Not using an end of string anchor on scientific notation-ish entries like 1e2lala would produce something like 100lala.

To address these problems, the new code contains the following changes:

  • All regex-es now contain start and end of string anchors.
  • The start and end of string anchors are now lookaheads/lookbehinds, which simplify the post-processing of the matches because comma , characters are no longer part of the match.
  • The regex-es are now dynamically built with reusable components using new RegExp(...), which makes the code easier to understand, and safer.

Despite these fixes, I believe that using regular expressions to parse input is rather fragile, and I recommend that we switch to manually parsing the input with a recursive function. If you agree, let me know and I can create a new issue, which could be addressed later on.

@eternauta1337 eternauta1337 force-pushed the feature/scientific-notation-at-input branch from d377f5a to 475b229 Compare February 1, 2019 13:04
@eternauta1337 eternauta1337 added the status:to-review Awaiting review label Feb 1, 2019
Copy link
Contributor

@spalladino spalladino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Ale, though I agree that this kind of regex matching could be fragile. I can think of two alternatives:

  • Be much more strict, and require everything to be properly enclosed in quotes (ie be valid JSON). This is more painful when accepting user input, but is a lot safer.
  • Generating a parser from a full grammar to handle the input, which should be safer and accommodate for more cases, but is also a lot more of work.

@@ -69,7 +69,7 @@
"openzeppelin-solidity": "~1.10.0",
"semver": "^5.5.1",
"truffle-flattener": "^1.2.8",
"web3": "^1.0.0-beta.37"
"web3": "1.0.0-beta.37"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is critical at the moment. Mind moving it to a separate PR, and merging it immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, apparently web3 1.x beta > 37 and < 41 breaks CI tests with the "websocket" sub-dependency of "web3-providers-ws".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already pinned in master from one of my other PRs: https://github.com/zeppelinos/zos/blob/master/packages/lib/package.json#L72

@@ -8,7 +8,7 @@
"scripts": {
"test": "scripts/test.sh",
"prepublishOnly": "echo 'Removing mock contracts...' && grep -hoP '^\\s*contract \\K(\\w+)' contracts/mocks/*.sol | sort | uniq | xargs -t -I% rm build/contracts/%.json",
"compile-contracts": "rm -rf build/contracts && truffle compile",
"compile-contracts": "rm -rf build/contracts && npx truffle compile",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK you don't need npx for npm scripts, since ./node_modules/bin is prepended to PATH.

@eternauta1337
Copy link
Contributor Author

eternauta1337 commented Feb 2, 2019

@spalladino

  1. Be much more strict, and require everything to be properly enclosed in quotes (ie be valid JSON). This is more painful when accepting user input, but is a lot safer.
  2. Generating a parser from a full grammar to handle the input, which should be safer and accommodate for more cases, but is also a lot more of work.

The third option would be to merge this now (since it does restore the scientific notation feature) and open a new issue to revisit the topic with a safer approach (option 1 or 2). I would go with the second option; Almost 90% of the code from one of my closed encodeCall PRs could be reused.

@spalladino spalladino merged commit 505399b into master Feb 4, 2019
@spalladino spalladino deleted the feature/scientific-notation-at-input branch February 4, 2019 16:24
@spalladino
Copy link
Contributor

Agree Ale, merging now! And please add a new issue to review the parsing of arguments, including both options. I think Remix is currently going with (1), but it'd be interesting to have (2).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status:to-review Awaiting review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants