Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further enhancements to the Bech32 library. #277

Merged

Conversation

jonathanknowles
Copy link
Member

@jonathanknowles jonathanknowles commented May 17, 2019

Issue Number

Issue #238

Overview

This PR:

  1. Implements error location detection for the Bech32 encoding, based on the JavaScript reference implementation.

  2. Provides a set of property tests to verify that corrupted Bech32 strings are correctly rejected by the decode function. Corruption is simulated in various ways:

    • deleting characters
    • inserting characters
    • swapping characters
    • mutating characters

Caveats

  1. The error location detection is best effort:

    • In cases where it is possible to unambiguously identify the locations of erroneous characters, the decoder will return the StringToDecodeContainsInvalidChars error with a suitable list of locations.
    • In cases where it's not possible to unambiguously identify the locations of erroneous characters, the decoder will return the StringToDecodeContainsInvalidChars with the empty list.
  2. Currently, there is no automated test to confirm that the locateErrors and syndrome functions behave identically to the equivalent functions in the reference JavaScript implementation. ⚠️ It is possible that these implementations differ (and this is something that we should pay attention to).

Nevertheless, in the test suite, we verify that if the position of an invalid character is detected, then it is detected at the correct location.

@jonathanknowles jonathanknowles self-assigned this May 17, 2019
Copy link
Contributor

@rvl rvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is nice and readable (except for locateErrors).

Error Location Detection
-------------------------------------------------------------------------------}

gf_1024_exp :: Array Int Int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these tables? Are they something like p^i where i is the i'th entry in the table and p is some special number?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - it would be nice to have a comment/reference/description of these tables somewhere if there is a description available. I see the source is from https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js but adding few words would help maintain it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tables are a direct translation of the tables found in the reference implementation here:

https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js#L4
https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js#L77

As for what they do: they seem to be a pair of lookup tables that store the values of a pair of pre-computed functions (an exponent and a logarithm). As to the exact nature of these functions, I currently do not know the answer. We would almost certainly have to ask the creator of the reference implementation. (https://github.com/sipa)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would almost certainly have to ask the creator of the reference implementation.

I would go for this - if we can't find how tables are generated ourself. It is possible maintainer won't be responsive in which case we would try to figure it long term - but its worth a try to ping him

575, 992, 463, 983, 243, 360, 970, 350, 267, 615, 766, 494, 31, 1009,
452, 710, 552, 128, 612, 600, 275, 322, 193 ]

syndrome :: (Bits a, Num a) => a -> a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this function?

It could also be shortened a little by using foldr xor 0 and factoring the if testBit residue n then s else 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see its a direct translation of https://github.com/sipa/bech32/blob/bdc264f84014c234e908d72026b7b780122be11f/ecc/javascript/bech32_ecc.js#L154 .

As this is translation from js -> haskell I would expect first iteration to have more js-ish style and I am ok with that. With more direct translation as here it is much easier to convince the reviewers that implementation is correct (by correct I mostly mean that it is a dirrect translation of reference implementation). On second or third implementation we can modify this to look more functional or more haskell-ish.

Alternativelly we as reviewers might all want to fully understand underlying mechanisms of beach32 at this point and check it - but I think it is much more feasible to go with first approach at this stage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvl wrote:

What is this function?

This is a direct translation of the corresponding function in the reference JavaScript implementation:
https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js#L154

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be shortened a little by using foldr xor 0 and factoring the if testBit residue n then s else 0.

That's a good idea. I originally wrote this in a style that was as close as possible to the original implementation (for easier verification). If I have time, I'll try to rewrite this in a shorter way.

@akegalj
Copy link
Contributor

akegalj commented May 20, 2019

What is a difference between https://github.com/sipa/bech32/tree/master/ref/haskell and https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js ? Is js implementation more complete ?

Copy link
Contributor

@akegalj akegalj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just skimmed over it and it looks good.

To approve it I would have to go for another run to compare in more details with referance implementation (ping me when it is ready)

| otherwise =
Right $ HumanReadablePart hrp
where
(validPortion, invalidPortion) = BS.break (not . valid) hrp
invalidCharPositions = CharPosition . fst <$>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good candidate for NonEmpty instead of current [] . Additionally this would simplify L104 (and maybe others) as you would be guaranteed to have failure there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good candidate for NonEmpty instead of current [] . Additionally this would simplify L104 (and maybe others) as you would be guaranteed to have failure there

Actually, it's possible for the locateErrors function to return the empty list, if it is not able to reliably determine the locations of errors (even if the string is provably erroneous).

I'm going to add a comment to the locateErrors function to make this clear.

Error Location Detection
-------------------------------------------------------------------------------}

gf_1024_exp :: Array Int Int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - it would be nice to have a comment/reference/description of these tables somewhere if there is a description available. I see the source is from https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js but adding few words would help maintain it

575, 992, 463, 983, 243, 360, 970, 350, 267, 615, 766, 494, 31, 1009,
452, 710, 552, 128, 612, 600, 275, 322, 193 ]

syndrome :: (Bits a, Num a) => a -> a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see its a direct translation of https://github.com/sipa/bech32/blob/bdc264f84014c234e908d72026b7b780122be11f/ecc/javascript/bech32_ecc.js#L154 .

As this is translation from js -> haskell I would expect first iteration to have more js-ish style and I am ok with that. With more direct translation as here it is much easier to convince the reviewers that implementation is correct (by correct I mostly mean that it is a dirrect translation of reference implementation). On second or third implementation we can modify this to look more functional or more haskell-ish.

Alternativelly we as reviewers might all want to fully understand underlying mechanisms of beach32 at this point and check it - but I think it is much more feasible to go with first approach at this stage.

@jonathanknowles
Copy link
Member Author

@akegalj wrote:

What is a difference between https://github.com/sipa/bech32/tree/master/ref/haskell and https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js ? Is js implementation more complete ?

The JavaScript implementation is apparently more complete. While the Haskell reference implementation is capable of stating whether (or not) a given string is valid Bech32 (returning a simple Boolean value to reflect validity), the JavaScript reference implementation is also capable of detecting the locations of errors, in certain cases. This information can be used to give the user feedback.

See the following demonstration: http://bitcoin.sipa.be/bech32/demo/demo.html

@piotr-iohk piotr-iohk mentioned this pull request May 20, 2019
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/bech32-error-location-detection branch from 3b268b4 to 02547ec Compare May 20, 2019 08:30
@jonathanknowles jonathanknowles changed the title [WIP] Bech32 Error Location Detection Further enhancements to Bech32 library. May 20, 2019
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/bech32-error-location-detection branch from bfc71b8 to adcbcc8 Compare May 20, 2019 08:53
@jonathanknowles jonathanknowles marked this pull request as ready for review May 20, 2019 08:55
@jonathanknowles jonathanknowles changed the title Further enhancements to Bech32 library. Further enhancements to the Bech32 library. May 20, 2019
@@ -14,18 +14,19 @@
--
-- From an original implementation by Marko Bencun:
--
-- [sipa/bech32](https://github.com/sipa/bech32/tree/bdc264f84014c234e908d72026b7b780122be11f/ref/haskell)

-- [sipa/bech32](https://git.io/fj8FV)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shortened URLs are an official part of GitHub (and are easier to paste into source code).

Bech32.decode (getValidBech32String v) `shouldBe`
Right (humanReadablePart v, unencodedDataPart v)

describe "Decoding a corrupted string should fail" $ do
Copy link
Contributor

@akegalj akegalj May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you have a look at these check bits https://github.com/sipa/bech32/blob/master/ecc/javascript/segwit_addr_ecc.js#L27 ? (maybe they are not relevant)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akegalj wrote:

did you have a look at these check bits https://github.com/sipa/bech32/blob/master/ecc/javascript/segwit_addr_ecc.js#L27 ? (maybe they are not relevant)

The Haskell translation of that particular function is at the following location:
lib/bech32/src/Codec/Binary/Bech32/Internal.hs:392

The test for that function is here:
lib/bech32/test/Codec/Binary/Bech32Spec.hs:235

@@ -87,6 +96,116 @@ spec = do
Bech32.encode hrp mempty
`shouldBe` Right (B8.pack "hrp1g9xj8m")

describe "Arbitrary ValidBech32String" $
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to ffi C implementation from https://github.com/sipa/bech32/blob/master/ref/c/segwit_addr.h and run property test comparing address encode/decode and encode/decode against our implementation ?

This way we would have complete test against implementation from another language. We could even use these implementation if there are more performant - but also we could only use these test in order to have complete run when we will be refactoring the implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the most complete reference implementation seems to be the JavaScript one, which also provides error location detection. (The other implementations only provide a simple Boolean check to determine whether a given encoded string is valid.) I agree that a potentially useful side project would be to compare these two implementations. Perhaps it would be possible to use https://github.com/tweag/inline-js to make this easier.

Copy link
Contributor

@akegalj akegalj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I have partially looked and compared js and haskell implementation. As these bits are only relevant for error report and not actual encoding and decoding mechanism - it doesn't seem super important even if we did make an error in the translation js -> hs (which doesn't seems so from what I have seen)

Tests seem valuable

(there is a posibility to compare our bech32 encoder with other reference implementation out there, like C, if there would be suspicion that something is off. On another hand most of the implementation from https://github.com/sipa/bech32/ use same test vectors and I believe that should be enough)

@KtorZ KtorZ force-pushed the jonathanknowles/bech32-error-location-detection branch 3 times, most recently from 1e22147 to a668f72 Compare May 20, 2019 21:39
@KtorZ
Copy link
Member

KtorZ commented May 20, 2019

bors r+

iohk-bors bot added a commit that referenced this pull request May 20, 2019
277: Further enhancements to the Bech32 library. r=KtorZ a=jonathanknowles

## Issue Number

Issue #238 

## Overview

**This PR:**

1. Implements **error location detection** for the Bech32 encoding, based on the JavaScript [reference implementation](https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js).

2. Provides a set of **property tests** to verify that corrupted Bech32 strings are correctly **rejected** by the `decode` function. Corruption is simulated in various ways:
    * deleting characters
    * inserting characters
    * swapping characters
    * mutating characters

## Caveats

1. The error location detection is **best effort**:
    * In cases where it **is** possible to unambiguously identify the locations of erroneous characters, the decoder will return the `StringToDecodeContainsInvalidChars` error with a suitable list of locations.
    * In cases where it's **not** possible to unambiguously identify the locations of erroneous characters, the decoder will return the `StringToDecodeContainsInvalidChars` with the empty list.

2. Currently, there is no automated test to confirm that the `locateErrors` and `syndrome` functions behave identically to the equivalent functions in the reference JavaScript implementation. ⚠️  **It is possible that these implementations differ** (and this is something that we should pay attention to).

Nevertheless, in the test suite, we verify that **if** the position of an invalid character **is** detected, then it is **detected at the correct location**.

283: Sqlite: add checkpoints and transactions to DBLayer r=KtorZ a=rvl

Relates to issue #154. This PR branch is based on #282.

# Overview

- Implemented saving and loading of transaction history to SQLite
- Implemented saving and loading of wallet checkpoints to SQLite, including the state for sequential scheme address discovery.

# Comments

- I haven't made any decision about what to do with internal functions such as the `Wallet` constructor.
- The SqliteSpec testing is a bit light. However, there should be enough implemented for all the DBSpec tests to work. Also I plan to finish the QSM tests tomorrow which should provide even more coverage.
- Cascading deletes are not working with persistent-sqlite, which is annoying.
- DB indexes are missing on some fields. (Needs some custom SQL, can be fixed later)


Co-authored-by: Jonathan Knowles <jonathan.knowles@iohk.io>
Co-authored-by: KtorZ <matthias.benkort@gmail.com>
Co-authored-by: Rodney Lorrimar <rodney.lorrimar@iohk.io>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented May 20, 2019

Timed out (retrying...)

iohk-bors bot added a commit that referenced this pull request May 21, 2019
277: Further enhancements to the Bech32 library. r=KtorZ a=jonathanknowles

## Issue Number

Issue #238 

## Overview

**This PR:**

1. Implements **error location detection** for the Bech32 encoding, based on the JavaScript [reference implementation](https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js).

2. Provides a set of **property tests** to verify that corrupted Bech32 strings are correctly **rejected** by the `decode` function. Corruption is simulated in various ways:
    * deleting characters
    * inserting characters
    * swapping characters
    * mutating characters

## Caveats

1. The error location detection is **best effort**:
    * In cases where it **is** possible to unambiguously identify the locations of erroneous characters, the decoder will return the `StringToDecodeContainsInvalidChars` error with a suitable list of locations.
    * In cases where it's **not** possible to unambiguously identify the locations of erroneous characters, the decoder will return the `StringToDecodeContainsInvalidChars` with the empty list.

2. Currently, there is no automated test to confirm that the `locateErrors` and `syndrome` functions behave identically to the equivalent functions in the reference JavaScript implementation. ⚠️  **It is possible that these implementations differ** (and this is something that we should pay attention to).

Nevertheless, in the test suite, we verify that **if** the position of an invalid character **is** detected, then it is **detected at the correct location**.

Co-authored-by: Jonathan Knowles <jonathan.knowles@iohk.io>
Co-authored-by: KtorZ <matthias.benkort@gmail.com>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented May 21, 2019

Timed out

The new name is more accurate. The previous name could be taken to imply
that the checksum is separate, which it is not.
This way, we can fail fast before performing any further parsing.
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/bech32-error-location-detection branch from a668f72 to 501f515 Compare May 21, 2019 05:11
@jonathanknowles jonathanknowles merged commit b116caa into master May 21, 2019
@iohk-bors iohk-bors bot deleted the jonathanknowles/bech32-error-location-detection branch May 21, 2019 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants