Add new address `Pool addr ix` data type. Simplify `SharedState n k`. #3068

HeinrichApfelmus · 2021-12-20T16:39:09Z

Issue number

Overview

Previous work in epic ADP-1043 introduced delta encodings, DBVars, and an embedding of the wallet state and its delta encodings into a database table. It's time to integrate these tools with the wallet code. To facilitate code review, the integration proceeds in a sequence of refactorings that do not change functionality and pass all unit tests.

In this step, we continue refactoring the address discovery state.

First, we implement a new abstract data type, Pool addr ix, which aids with BIP-44 style address discovery. Its main benefit is that it is polymorphic in the address type and thus focuses on the essence of address discovery with an address gap while avoiding the plethora of type classes related to specific address formats.
Then, we use this data type to simplify the implementation of the address discovery state for mult-signature wallets, SharedState n k.

Details

Pool is an abstract data type, its constructor is not to be exported. Its internal invariants are collected in prop_consistent. To bypass the invariants, use loadUnsafe; the database layer is allowed to do that.
Some of the unit tests pertaining to the address discovery aspects SharedState are partially superseded by the more general unit tests in Cardano.Wallet.Address.PoolSpec.

Comments

The SeqState n k states will be refactored in terms of the new Pool addr ix data type next.
Merge PR Introduce Prologue and Discoveries of an address book #3056 before this one, because this pull request is based on the branch of the former.

sevanspowell · 2021-12-22T02:07:07Z

lib/core/src/Cardano/Wallet/Address/Pool.hs

+
+    -- * Internal
+    , loadUnsafe
+    , prop_sequence


I'm a sucker for modules exporting abstract data types, with functions guaranteed to satisfy certain invariants. I've never considered exporting tests for those invariants from the module itself though. I'm a bit skeptical because these invariants only return True/False and don't give any information as to why they failed (which is useful in test suites), but I suppose you can just decorate these functions in the test suite with extra reproduction/debugging information.

I suppose you can just decorate these functions in the test suite with extra reproduction/debugging information.

Printing the seed and/or a counterexample are probably enough — and only necessary when the properties fail anyway, which they won't. 😉

In any case, the issue here is that as the data type is abstract, I have to export something that can be used to test the internal invariants. I was wondering whether I should export property tests directly, i.e. export prop_sequence etc. with type Property. But that doesn't quite work, because I actually use these properties in the load function, so Bool seemed like the only option left.

Are you familiar with Conal Elliot's "Denotational Design" work?

He talks about these sorts of modules having constructors, operations, and observations.

constructors Something -> Pool (into the algebra)

operations Pool -> Pool (within the algebra)

observations Pool -> Something (out of the algebra)

(paraphrasing)

There is also the concept of a "denotation", an observation from which every other observation can be derived (or you could view it as being the "sum" of every other observation). An abstract type like Pool in essence is it's denotation. So if you can identify and export a denotation to Pool, you shouldn't need to export any of these properties from the module.

That is, with the denotation, we can assert that this Pool is observably no different from another Pool, and hence they are the same.

I'll see if I can provide some better review here.

I think I've managed to write some code to explain my point, give me a bit to get it up if you will.

I started, but didn't get as far as I thought I would today: https://github.com/input-output-hk/cardano-wallet/tree/sevanspowell/adp-1043/suggestions

Essentially:

Identified that Pool doesn't require generator, or any thought of addresses. Addresses can be layered on at a later level. Pool acts as some sort of allocator I guess.

Expose everything needed to assert that Pools are "observationally" equal.

Layer addresses on at Shared AddressDiscovery layer. That way we keep can keep generation there, no need to store the generator anywhere, no need to assert equality of generators.

But I didn't finish working out the finer details.

None of this is important or necessarily "better" just thoughts. Feel free to merge 👍

@sevanspowell Yes, I'm familiar with "denotational design" — I like it, but I'm less fond of its more orthodox interpretations.

The main theme is this: Haskell's data construct allows us to define free algebras, but not subalgebras (= subsets) or quotient algebras (= modulo an equivalence relation / redundancy). In contrast, a "denotation" can be any mathematical object, where we can take subsets and quotients at will. Thus, attempting to model all denotations as algebraic data types ("orthodox") will fail. Thankfully, using abstract (as opposed to algebraic) data types allows us to get sub- and quotient algebras as well, but we need to do some proofs in order to make the jump.

The problem of asserting that values are "observationally" equal is mostly about constructing a quotient algebra: Two values can be different as elements of the free algebra, but have the same denotation in the quotient algebra.

In this case, however, the Pool type is essentially a subalgebra and not a quotient algebra. This means that the main problem is not that we want the info function to be injective, but the point is that it is not surjective. The purpose of properties like prop_fresh is to tell us more about the codomain of the info function. (Also, we need to make sure that operations such as update really stay within the subalgebra.)

Identified that Pool doesn't require generator, or any thought of addresses. Addresses can be layered on at a later level. Pool acts as some sort of allocator I guess.

Alas, the mapping ix -> addr is one of my main reasons for using a Pool. 😅 The purpose of the Pool data type is to aid in address discovery, i.e. to give an inverse mapping addr -> ix. I agree that removing the addresses will get rid of the prop_generator export, but the price is that the original purpose, building the inverse mapping, has been lost and has to be performed elsewhere. More specifically, I'd like to use the Pool data type for the Sequential address discovery as well.

no need to assert equality of generators.

I do see that the Eq implementation of SharedState is a bit awkward. But that doesn't affect our ability to reason about equality of Pool — we can still identify when two functions are equal, we just can't do it in terms of the Eq type class. That's another instance where a concept from the metatheory ("equality") cannot be expressed within the language ("decidable Eq instance"), and where it is a good idea to not attempt to express denotation directly in the language.

I propose that we devote a dev meeting to these topics. 😄

@HeinrichApfelmus I need more resources on this please! I would love to read more about sub and quotient algebras, and get a better understanding of your points here.

I'll attempt a counterpoint from my understanding, which is limited.

The problem of asserting that values are "observationally" equal is mostly about constructing a quotient algebra: Two values can be different as elements of the free algebra, but have the same denotation in the quotient algebra.

I think I understand this point, as it's one of the powerful ideas of denotations. Indeed, in the free algebra, given some geometric function cw that rotates something clockwise, cw (cw (cw (cw x))) is different from x, but in the denotation, they are the same (rotating 90 degrees four times is the same as not rotating at all). This is great if you only wish to observe the final rotation of the object, but not great if you want to ensure that rotations always occur in multiples of four (our pool allocator for example wishes to assert that there is always a gap of 20). And (effectively) our only observation is `lookup :: Ord addr => addr -> Pool addr ix -> Maybe ix', how could we assert that the correct gap is used?? I believe this is your point?

But, I've changed the denotation to be Pool -> (PoolGap, [(ix, AddressState)]), from which we can assert the prop_fresh and prop_gap are true. That is, we have changed the algebra's denotation to the denotation we are interested in. So even though other algebras might not be able to be modelled as an algebraic data type, this one can be I think?

Alas, the mapping ix -> addr is one of my main reasons for using a Pool.

I have not removed the mapping from ix -> addr completely, I've simply split the idea of "gap allocation" from "ix -> addr association". The goal is not to be abstract for the sake of being abstract, but to simplify usage and testing. Though not obvious in my draft, perhaps it would help to think of it differently:

GapAllocator -- Responsible for allocating unused indexes with a given gap, and keeping track of usage states. The idea is the user will associate these indexes with some data of their own. -- Required properties (amongst others) prop_gap prop_fresh -- Constructors new :: (Ord ix, Enum ix) => Gap -> GapAllocator ix -- Operations clear :: (Ord ix, Enum ix) => GapAllocator ix -> GapAllocator ix allocate :: (Ord ix, Enum ix) => ix -> GapAllocator ix -> GapAllocator ix -- Observations size :: GapAllocator ix -> Int lookup :: Ord ix => ix -> GapAllocator ix -> Maybe GapAllocatorState next :: Enum ix => GapAllocator ix -> ix -> Maybe ix -- Denotation info :: GapAllocator ix -> (Gap, [(ix, AllocationState)]) Pool -- Responsible for maintaining mapping between address and index -- Required properties (amongst others) prop_generator -- any functions we need -- Uses GapAllocator internally

i.e. we haven't lost anything from the pool, simply hyper-focused it on address -> index mapping, and allowed a submodule to take care of allocating gaps.

I do see that the Eq implementation of SharedState is a bit awkward. But that doesn't affect our ability to reason about equality of Pool — we can still identify when two functions are equal, we just can't do it in terms of the Eq type class.

I agree. I did experiment with this idea though (which I think you already understood, but worth re-iterating). Instead of passing in the generator to an abstract "Pool" type, create named pools:

SharedAddressPool -- Uses GapAllocator internally lookup :: SharedAddressPool ix addr -> addr -> Maybe ix -- etc. SequentialAddressPool -- Uses GapAllocator internally lookup :: SequentialAddressPool ix addr -> addr -> Maybe ix -- etc.

These named pools would be abstract data types (no constructor exported). By naming the pools, we essentially encode the "generator" in the type, without having to explictly compare the generator function. Two pools are the same if they are both the same data type (i.e. SharedAddressPool and SharedAddressPool), have (observably) the same GapAllocator state, and have the same Map addr ix. However complicated the generator is for any type, it is the same generator for any other value of the same type.

Then if you wanted to recover an abstract interface to them both, curry their lookup functions to get two functions of the same type: addr -> Maybe ix.

Much logic could be shared between the two, as internally they would just be Map addr ix, and they could be tested in a uniform manner if they have uniform interfaces.

Just a different perspective I guess. I'd love to hear more of your thoughts on this.

I'd love to hear more of your thoughts on this.

I'll try to put some thoughts here, but this comment is probably to small to address all aspects. We should chat?

in the free algebra, given some geometric function cw that rotates something clockwise [90 degrees], cw (cw (cw (cw x))) is different from x, but in the denotation, they are the same.

Yes, this is a good example where the denotation is a quotient algebra (namely a quotient where cw . cw . cw . cw = id holds). The question I'm getting at is: How to do we represent a quotient algebra in Haskell, i.e. what would the type of info be? We cannot use a data type with a constructor Cw, because that would give us only a free algebra. (Cave: I have made the example to be about the algebra that denotes cw. But this is different from the algebra that denotes x.)

And (effectively) our only observation is `lookup :: Ord addr => addr -> Pool addr ix -> Maybe ix',

Actually, the field selectors gap and addresses are exported as functions, so these are part of the observation as well.

But, I've changed the denotation to be Pool -> (PoolGap, [(ix, AddressState)]), from which we can assert the prop_fresh and prop_gap are true. That is, we have changed the algebra's denotation to the denotation we are interested in. So even though other algebras might not be able to be modelled as an algebraic data type, this one can be I think?

Not in a strict sense. The thing is that e.g. [(0,Unused),(37,Used)] :: [(ix,AddressState)] is a legitimate element of the result type of info, but there is no valid address pool x for which info x yields this element — info is not surjective. In that sense, info is missing some crucial information, and that information is the "sub" in subalgebra — the address pools are a subset of possible AddressState patterns. The "orthodox" denotational approach works well for quotient algebras — typically by mapping the algebra onto a subalgebra of functions. 😅

I have not removed the mapping from ix -> addr completely, I've simply split the idea of "gap allocation" from "ix -> addr association". The goal is not to be abstract for the sake of being abstract, but to simplify usage and testing.

I appreciate this sentiment, but in this case, I would pragmatically argue that splitting the module is more trouble than it's worth. The reasoning is this: As address discovery algorithm is inherently about the mapping from indices to addresses, we need an abstraction that combines both the association and the gap; then, whether we split this abstraction further or not becomes an implementation detail. However, the current implementation is very short — e.g. 10 lines for the most complex function ensureFresh — and I do not see how splitting it further would improve usability or correctness.

By naming the pools, we essentially encode the "generator" in the type, without having to explictly compare the generator function. […] Much logic could be shared between the two, as internally they would just be Map addr ix, and they could be tested in a uniform manner if they have uniform interfaces.

The previous code used a type named ParentContext which had a similar effect. However, I decided to replace it with the function type ix -> addr; this makes the interface not just uniform, but identical. 😄 The point is that the two pools do exactly the same thing, except with different address generation functions ix -> addr. By separating this out, we can have two correct pools for the price of one.

If we really wanted an Eq instance for address pools, I would advocate for an approach where we explicitly state that we want Eq for functions. The tool for that is defunctionalization: We create a data type GeneratorFun whose constructors correspond to the functions that we use. Then, a function denotation (:wink:) converts the constructor back to the function that it represents:

data GeneratorFun ix addr where Shared :: ScriptTemplate -> … -> GeneratorFun (Index 'Soft …) (KeyFingerprint …) Sequential :: XPub -> GeneratorFun (Index 'Soft …) (KeyFingerprint …) denotation :: GeneratorFun ix addr -> (ix -> addr) denotation … = unsafePaymentKeyFingerprint …

Thanks for forming such a detailed answer, I don't disagree with the current implementation, but I definitely (for my own interest), wanted to see how some of these ideas held up, so thanks for defending yours. I'm not sure I 100% understand yet, but I'll get there 👍

Sorry, I should have approved this PR before the holidays, slipped my mind.

great explanation @HeinrichApfelmus !

lib/core/src/Cardano/Wallet/Address/Pool.hs

paweljakubas · 2021-12-23T12:05:47Z

lib/core/src/Cardano/Wallet/Address/Pool.hs

+size = Map.size . addresses
+
+-- | Given an index, retrieve the next index that is still in the pool.
+next :: Enum ix => Pool addr ix -> ix -> Maybe ix


also, what about adding next ix of Used address, Unused address, or does not matter what address?

in case of next ix of Used...I would also think about having possibility to get ix of first (the smallest possible ix overall), or the first next AFTER the last was used. those two situations are possible - example -> #3068 (comment)

Actually, I was not quite sure whether to include this function at all. 🤔 My original intention was to use it for "light-wallet-style" address discovery, where each address is queried in turn, and discovery stops when the next function returns Nothing.

Extending this function to distinguish Used or Unused status may be useful for generating change addresses. Currently, change addresses use size and gap directly, but I do like the idea of moving all "knowledge about the pool gap" into this module. I would like to keep next as it is now, and revisit this issue in a future PR.

What I am saying is that:

we can use meticulously next here, and we will have state like that:

Used Used Used Unused (gap times)

Here, next will give Just 3
But, we can have people using hardware wallet and then they could restore in cardano-wallet basedon mnemonic. And they can MANUALLY have the following state:

Used Unused Used Used Unused (gap times)

So I just want to know and understand what we are going to get when next is called: (a) Just 1 OR (b) Just 4 ?
So the retrieve the Unused ix with the lowest ix, OR retrieve the first Unused after Used with the highest ix.
You see my point?

Ah. The intention is that next ix does not care about the 'Used' or 'Unused' status at all, and will simply return succ ix most of the time.

This is useful for address discovery in a light client setting, where the discovery process goes like this: Start with ix = 0, query the explorer for the corresponding address, continue with next ix until it returns Nothing.

The next function is not intended to be used to generate change addresses where the Used and Unused status matters.

I will update the documentation accordingly and maybe use a different name. 🤔

lib/core/src/Cardano/Wallet/Address/Pool.hs

sevanspowell · 2021-12-24T01:52:04Z

lib/core/src/Cardano/Wallet/DB/Sqlite/AddressBook.hs

+          = case Shared.ready st of
+            Shared.Pending -> st
+            Shared.Active pool0 ->
+                let pool = AddressPool.loadUnsafe pool0 addrs


Why does the database need this backdoor? Is there a situation where we shouldn't fail if we read a not consistent address pool from the database?

I would also prefer to use AddressPool.load here, but the idea is that the Discoveries s will be paired with Prologue s frequently, and outside the database layer. Then, it's important that the isomorphism s ~ (Prologue s, Discoveries s) is fast. I agree that load should be used near the database layer, but that is for another PR, as well as the issue that I would prefer to not export the constructors for Prologue and Discoveries.

sevanspowell · 2021-12-24T01:54:00Z

lib/core/src/Cardano/Wallet/Address/Pool.hs

+--
+-- Does nothing if the address was not in the pool.
+update :: (Ord addr, Enum ix) => addr -> Pool addr ix -> Pool addr ix
+update addr pool@Pool{addresses} =


In this domain, is there a foreseeable need for an address to be "free'd"? As in, to update from Used -> Unused. Simply asking out of curiosity. I imagine it would make maintaining an address gap impossible?

is there a foreseeable need for an address to be "free'd"?

I'm afraid not, this would be antithetical to the very existence of the Pool data type whose purpose is precisely to maintain prop_gap. 😅

sevanspowell · 2021-12-24T01:57:21Z

thanks @sevanspowell for this remark! Conal Elliot's "Denotational Design" looks very smart, indeed! I will have to think about in the next days. On my side, if we convert to this way of design, then it would be good to spread that throughout the rest of codebase. Also, is it not a perfect topic for the first Fri dev meeting in Jan?

@paweljakubas

Sandy Maguire's Algebra-Driven Design (https://leanpub.com/algebra-driven-design) gives a great introduction to Conal's ideas.

I'd love to explore the consequences of using those ideas in the code base more (and have been trying to here and there), but the risks and limitations are unknown to me. It sounds like @HeinrichApfelmus would know more about the limitations of such a design methodology.

So yeah, great topic for a dev meeting.

This data structure aids with BIP-44 style address discovery that relies on an address gap.

HeinrichApfelmus · 2022-01-05T12:31:34Z

bors try

paweljakubas

LGTM! Thanks @sevanspowell and @HeinrichApfelmus for all comments and discussion!

iohk-bors · 2022-01-05T13:05:56Z

try

Build failed:

buildkite/cardano-wallet

* Use the new address pool data structure from `Cardano.Wallet.Address.Pool` * Remove `ParentContextShared` constructor from sequential state * Merge `SharedStateFields` and `SharedStatePending` types into `SharedSate`.

HeinrichApfelmus · 2022-01-05T17:21:43Z

bors try

iohk-bors · 2022-01-05T19:48:04Z

try

Build succeeded:

HeinrichApfelmus · 2022-01-05T20:23:37Z

bors merge

iohk-bors · 2022-01-05T21:10:42Z

Build succeeded:

3073: Refactor and simplify `SeqState n k` r=HeinrichApfelmus a=HeinrichApfelmus ### Issue number ADP-1308 ### Overview Previous work in epic ADP-1043 introduced delta encodings, DBVars, and an embedding of the wallet state and its delta encodings into a database table. It's time to integrate these tools with the wallet code. To facilitate code review, the integration proceeds in a sequence of refactorings that do not change functionality and pass all unit tests. In this step, we continue refactoring the address discovery state. Here, we refactor and simplify the `SeqState n k` type to use the new abstract data type `Pool addr ix`, which aids with BIP-44 style address discovery. ### Details * The testing module `PoolSpec` now also provides a shrinker `shrinkPool` for use in the old testing module `Cardano.Wallet.Primitive.AddressDiscovery.SequentialSpec`. * The property tests pertaining to the address discovery aspects of `SeqState` are superseded by the more general unit tests in `Cardano.Wallet.Address.PoolSpec`. ### Comments * Merge PR ##3068 before this one, because this pull request is based on the branch of the former. Co-authored-by: Heinrich Apfelmus <heinrich.apfelmus@iohk.io>

3099: Add `WalletState`, a pure model for the entire wallet r=HeinrichApfelmus a=HeinrichApfelmus ### Issue number ADP-1375 ### Overview Previous work in epic ADP-1043 introduced delta encodings, DBVars, and an embedding of the wallet state and its delta encodings into a database table. It's time to integrate these tools with the wallet code. To facilitate code review, the integration proceeds in a sequence of refactorings that do not change functionality and pass all unit tests. In this step, we introduce a data type `WalletState` which represents the entire wallet state — not just the most recent checkpoint, but _all_ checkpoints. ``` data WalletState s = WalletState { prologue :: Prologue s , checkpoints :: Checkpoints (BlockHeader, UTxO, Discoveries s) } ``` The states for the different wallets currently stored in the `walletsVar` DBVar. Eventually, the data type will become the purely functional in-memory representation of all the data associated with a wallet (though perhaps with a different name). The `DBLayer` type will eventually be replaced by a `Store` for values of this type. The introduction of this type has become possible thanks to the previous separation of the address discovery state `s` into `Prologue s` and `Discoveries s` (PRs #3056, #3068, #3073). ### Details * As the queries in the `DBLayer` will more and more become queries on the in-memory cache `walletsVar` instead of queries on the database table, I have begun to add unit tests for the database tables. Here, I have added a property test for `loadPrologue` and `insertPrologue`. Eventually, these separate tests will become a single generic test for a `Store` of `Table`. ### Comments * One of the next steps will be to replace `UTxO` in the above type by its delta encoding `DeltaUTxO`. This will reduce the memory footprint of the in-memory representation. * The `Cardano.Wallet.DB.Model` module actually implements a pure model for the entire wallet state, including TxHistory etc. When extending the type, we can scavenge from there. Co-authored-by: Heinrich Apfelmus <heinrich.apfelmus@iohk.io>

3099: Add `WalletState`, a pure model for the entire wallet r=HeinrichApfelmus a=HeinrichApfelmus ### Issue number ADP-1375 ### Overview Previous work in epic ADP-1043 introduced delta encodings, DBVars, and an embedding of the wallet state and its delta encodings into a database table. It's time to integrate these tools with the wallet code. To facilitate code review, the integration proceeds in a sequence of refactorings that do not change functionality and pass all unit tests. In this step, we introduce a data type `WalletState` which represents the entire wallet state — not just the most recent checkpoint, but _all_ checkpoints. ``` data WalletState s = WalletState { prologue :: Prologue s , checkpoints :: Checkpoints (BlockHeader, UTxO, Discoveries s) } ``` The states for the different wallets currently stored in the `walletsVar` DBVar. Eventually, the data type will become the purely functional in-memory representation of all the data associated with a wallet (though perhaps with a different name). The `DBLayer` type will eventually be replaced by a `Store` for values of this type. The introduction of this type has become possible thanks to the previous separation of the address discovery state `s` into `Prologue s` and `Discoveries s` (PRs #3056, #3068, #3073). ### Details * As the queries in the `DBLayer` will more and more become queries on the in-memory cache `walletsVar` instead of queries on the database table, I have begun to add unit tests for the database tables. Here, I have added a property test for `loadPrologue` and `insertPrologue`. Eventually, these separate tests will become a single generic test for a `Store` of `Table`. ### Comments * One of the next steps will be to replace `UTxO` in the above type by its delta encoding `DeltaUTxO`. This will reduce the memory footprint of the in-memory representation. * The `Cardano.Wallet.DB.Model` module actually implements a pure model for the entire wallet state, including TxHistory etc. When extending the type, we can scavenge from there. Co-authored-by: Heinrich Apfelmus <heinrich.apfelmus@iohk.io> Co-authored-by: IOHK <devops+stack-project@iohk.io>

HeinrichApfelmus force-pushed the HeinrichApfelmus/ADP-1043/address-pool-shared branch from b5e4f91 to 7840d2a Compare December 20, 2021 16:59

HeinrichApfelmus mentioned this pull request Dec 20, 2021

Introduce Prologue and Discoveries of an address book #3056

Merged

sevanspowell reviewed Dec 22, 2021

View reviewed changes

HeinrichApfelmus force-pushed the HeinrichApfelmus/ADP-1043/address-book-iso branch from 2f65a6f to cb103b4 Compare December 22, 2021 11:38

HeinrichApfelmus force-pushed the HeinrichApfelmus/ADP-1043/address-pool-shared branch from 7840d2a to aed3d23 Compare December 22, 2021 11:38

Base automatically changed from HeinrichApfelmus/ADP-1043/address-book-iso to master December 22, 2021 13:13

HeinrichApfelmus requested a review from paweljakubas December 22, 2021 14:48