-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new address Pool addr ix
data type. Simplify SharedState n k
.
#3068
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,240 @@ | ||
{-# LANGUAGE DeriveGeneric #-} | ||
{-# LANGUAGE GADTs #-} | ||
{-# LANGUAGE NamedFieldPuns #-} | ||
{-# LANGUAGE RankNTypes #-} | ||
{-# LANGUAGE ScopedTypeVariables #-} | ||
{-# LANGUAGE TypeFamilies #-} | ||
|
||
-- | | ||
-- Copyright: © 2021 IOHK | ||
-- License: Apache-2.0 | ||
-- | ||
-- An address pool caches a collection of addresses. | ||
-- The purpose of this data structure is to aid in BIP-44 style | ||
-- address discovery with an address gap. | ||
module Cardano.Wallet.Address.Pool | ||
( Pool | ||
, generator | ||
, addresses | ||
, gap | ||
, lookup | ||
, size | ||
, successor | ||
, new | ||
, load | ||
, update | ||
, clear | ||
|
||
-- * Internal | ||
, loadUnsafe | ||
, prop_sequence | ||
, prop_gap | ||
, prop_fresh | ||
, prop_generator | ||
, prop_consistent | ||
) | ||
where | ||
|
||
import Prelude hiding | ||
( last, lookup ) | ||
|
||
import Cardano.Wallet.Primitive.Types.Address | ||
( AddressState (..) ) | ||
import Control.DeepSeq | ||
( NFData ) | ||
import Data.Map.Strict | ||
( Map ) | ||
import Data.Ord | ||
( Down (..) ) | ||
import Fmt | ||
( Buildable (..) ) | ||
import GHC.Generics | ||
( Generic ) | ||
|
||
{- HLINT ignore "Avoid restricted qualification" -} | ||
import qualified Data.List as List | ||
import qualified Data.Map.Strict as Map | ||
|
||
{------------------------------------------------------------------------------- | ||
Address Pool, abstract data type | ||
-------------------------------------------------------------------------------} | ||
-- | An address pool caches a collection of addresses (type @addr@) | ||
-- which are derived from a numeric index (type @ix@). | ||
data Pool addr ix = Pool | ||
{ generator :: ix -> addr | ||
-- ^ Mapping from a numeric index to its corresponding address. | ||
-- | ||
-- This mapping is supposed to be (practically) a one-way function: | ||
-- Given an 'addr', it is impossible to compute the preimage | ||
-- 'ix' in practice. | ||
-- The purpose of the 'Pool' data structure is to help inverting | ||
-- this function regardless. The idea is that addresses | ||
-- with small indices @0,1,…@ are 'Used' before addresses with larger | ||
-- indices; specifically, only less than 'gap' many addresses in sequence | ||
-- may be 'Unused' before the next 'Used' address. | ||
-- This usage scheme restricts the search space considerably | ||
-- and allows us to practically invert the 'generator' function. | ||
, gap :: Int | ||
-- ^ The pool gap determines how 'Used' and 'Unused' | ||
HeinrichApfelmus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
-- have to be distributed. | ||
-- See 'prop_gap' and 'prop_fresh'. | ||
, addresses :: Map addr (ix, AddressState) | ||
-- ^ Partial, cached inverse of the 'generator'. | ||
-- This map contains all cached addresses @addr@, | ||
-- their corresponding indices @ix@, | ||
-- and whether they are 'Used' or 'Unused'. | ||
-- See 'prop_sequence'. | ||
} deriving (Generic) | ||
|
||
instance (NFData addr, NFData ix) => NFData (Pool addr ix) | ||
|
||
-- | Internal invariant: | ||
-- The indices of the addresses in a pool form a finite | ||
-- sequence beginning with 'fromEnum'@ 0@. | ||
prop_sequence :: (Ord ix, Enum ix) => Pool addr ix -> Bool | ||
prop_sequence Pool{addresses} = | ||
indices `List.isPrefixOf` [toEnum 0..] | ||
where | ||
indices = List.sort $ map fst $ Map.elems addresses | ||
|
||
-- | Internal invariant: | ||
-- If we order the 'addresses' by their indices, | ||
-- then there are always /less than/ 'gap' many 'Unused' | ||
HeinrichApfelmus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
-- addresses between two consecutive 'Used' addresses, | ||
-- or before the first 'Used' address. | ||
prop_gap :: Ord ix => Pool addr ix -> Bool | ||
prop_gap Pool{gap,addresses} | ||
= all (< gap) . consecutiveUnused . List.group $ statuses | ||
where | ||
consecutiveUnused ((Used:_):xs) = consecutiveUnused xs | ||
consecutiveUnused (x@(Unused:_):(Used:_):xs) = | ||
length x : consecutiveUnused xs | ||
consecutiveUnused _ = [] | ||
|
||
statuses = map snd $ List.sortOn fst $ Map.elems addresses | ||
|
||
-- | Internal invariant: | ||
-- If we order the 'addresses' by their indices, | ||
-- there are exactly 'gap' many 'Unused' addresses after the last | ||
-- 'Used' address. | ||
prop_fresh :: Ord ix => Pool addr ix -> Bool | ||
HeinrichApfelmus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
prop_fresh Pool{gap,addresses} = | ||
takeWhile (== Unused) end == replicate gap Unused | ||
where | ||
end = map snd $ List.sortOn (Down . fst) $ Map.elems addresses | ||
|
||
-- | Internal invariant: | ||
-- All 'addresses' in the pool have been generated from their index | ||
-- via the pool 'generator'. | ||
prop_generator :: Eq addr => Pool addr ix -> Bool | ||
HeinrichApfelmus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
prop_generator Pool{generator,addresses} = | ||
and $ Map.mapWithKey isGenerated addresses | ||
where | ||
isGenerated addr (ix,_) = generator ix == addr | ||
|
||
-- | Internal invariant: The pool satisfies all invariants above. | ||
prop_consistent :: (Ord ix, Enum ix, Eq addr) => Pool addr ix -> Bool | ||
prop_consistent p = | ||
all ($ p) [prop_sequence, prop_gap, prop_fresh, prop_generator] | ||
|
||
{------------------------------------------------------------------------------- | ||
Pretty printing | ||
-------------------------------------------------------------------------------} | ||
instance Buildable (Pool addr ix) where | ||
build pool = "AddressPool " | ||
<> "{ " <> build (size pool) <> " addresses" | ||
<> ", gap = " <> build (gap pool) | ||
<> "}" | ||
|
||
instance (Show addr, Show ix) => Show (Pool addr ix) where | ||
show pool = "AddressPool" | ||
<> "{ generator = <<function>>" | ||
<> ", gap = " <> show (gap pool) | ||
<> ", addresses = " <> show (addresses pool) | ||
<> "}" | ||
|
||
{------------------------------------------------------------------------------- | ||
Address Pool, operations | ||
-------------------------------------------------------------------------------} | ||
-- | Create a new address pool. | ||
new :: (Ord addr, Enum ix) => (ix -> addr) -> Int -> Pool addr ix | ||
new generator gap | ||
= ensureFresh (toEnum 0) $ Pool{ generator, gap, addresses = Map.empty } | ||
HeinrichApfelmus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
-- | Replace the collection of addresses in a pool, | ||
-- but only if this collection satisfies the necessary invariants | ||
-- such as 'prop_sequence' etc. | ||
load | ||
:: (Ord addr, Ord ix, Enum ix) | ||
=> Pool addr ix -> Map addr (ix,AddressState) -> Maybe (Pool addr ix) | ||
load pool0 addrs = if prop_consistent pool then Just pool else Nothing | ||
where pool = loadUnsafe pool0 addrs | ||
|
||
-- | Replace the collection of addresses in a pool, | ||
-- but skips checking the invariants. | ||
loadUnsafe :: Pool addr ix -> Map addr (ix,AddressState) -> Pool addr ix | ||
loadUnsafe pool addrs = pool{ addresses = addrs } | ||
|
||
-- | Remove all previously discovered addresses, | ||
-- i.e. create a new pool with the same 'generator' and 'gap' as the old pool. | ||
clear :: (Ord addr, Enum ix) => Pool addr ix -> Pool addr ix | ||
clear Pool{generator,gap} = new generator gap | ||
|
||
-- | Look up an address in the pool. | ||
lookup :: Ord addr => addr -> Pool addr ix -> Maybe ix | ||
lookup addr Pool{addresses} = fst <$> Map.lookup addr addresses | ||
|
||
-- | Number of addresses cached in the pool. | ||
size :: Pool addr ix -> Int | ||
HeinrichApfelmus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
size = Map.size . addresses | ||
|
||
-- | Given an index @ix@, return the enumerated successor @Just (succ ix)@ | ||
-- as long as the address corresponding to this successor is still | ||
-- in the pool. | ||
-- | ||
-- This function is useful for address discovery in a light client setting, | ||
-- where the discovery procedure is: | ||
-- Start with index @ix = 0@, query the corresponding address in an explorer, | ||
-- @update@ address pool and repeat with @successor ix@ until the latter | ||
-- returns 'Nothing'. According to the BIP-44 standard, | ||
-- the account may not contain any other addresses than the ones discovered. | ||
-- | ||
-- This function is not useful for generating change addresses, | ||
-- as it does not take 'Used' or 'Unused' status into account. | ||
successor :: Enum ix => Pool addr ix -> ix -> Maybe ix | ||
successor Pool{addresses} ix = let jx = succ ix in | ||
if fromEnum jx >= Map.size addresses then Nothing else Just jx | ||
|
||
-- | Update an address to the 'Used' status | ||
-- and create new 'Unused' addresses in order to satisfy 'prop_fresh'. | ||
-- | ||
-- Does nothing if the address was not in the pool. | ||
update :: (Ord addr, Enum ix) => addr -> Pool addr ix -> Pool addr ix | ||
HeinrichApfelmus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
update addr pool@Pool{addresses} = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this domain, is there a foreseeable need for an address to be "free'd"? As in, to update from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm afraid not, this would be antithetical to the very existence of the |
||
case Map.lookup addr addresses of | ||
Nothing -> pool | ||
Just (ix,_) -> ensureFresh (succ ix) $ pool | ||
{ addresses = Map.adjust (\(i,_) -> (i, Used)) addr addresses } | ||
|
||
-- | Create additional 'Unused' addresses from larger indices | ||
-- in order to satisfy 'prop_fresh' again. | ||
-- | ||
-- Preconditions: | ||
-- | ||
-- * The index @ix@ satisfies: | ||
-- | ||
-- either @ix = fromEnum 0@ | ||
-- or @ix = succ jx@ and the index @jx@ is a 'Used' address. | ||
-- | ||
-- * All addresses with index @ix@ or larger are 'Unused'. | ||
ensureFresh :: (Ord addr, Enum ix) => ix -> Pool addr ix -> Pool addr ix | ||
ensureFresh ix pool@Pool{generator,gap,addresses} | ||
= pool { addresses = Map.union addresses nexts } | ||
where | ||
fresh = toEnum $ Map.size addresses -- first index that is not in the pool | ||
nexts = Map.fromList | ||
[ (generator i, (i, Unused)) | i <- [fresh .. to] ] | ||
where | ||
to = toEnum $ fromEnum ix + fromIntegral gap - 1 | ||
-- example: | ||
-- ix = 0 && fresh = 0 && gap = 20 `implies` [fresh .. to] = [0..19] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a sucker for modules exporting abstract data types, with functions guaranteed to satisfy certain invariants. I've never considered exporting tests for those invariants from the module itself though. I'm a bit skeptical because these invariants only return True/False and don't give any information as to why they failed (which is useful in test suites), but I suppose you can just decorate these functions in the test suite with extra reproduction/debugging information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Printing the seed and/or a counterexample are probably enough — and only necessary when the properties fail anyway, which they won't. 😉
In any case, the issue here is that as the data type is abstract, I have to export something that can be used to test the internal invariants. I was wondering whether I should export property tests directly, i.e. export
prop_sequence
etc. with typeProperty
. But that doesn't quite work, because I actually use these properties in theload
function, soBool
seemed like the only option left.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you familiar with Conal Elliot's "Denotational Design" work?
He talks about these sorts of modules having constructors, operations, and observations.
Something -> Pool
(into the algebra)Pool -> Pool
(within the algebra)Pool -> Something
(out of the algebra)(paraphrasing)
There is also the concept of a "denotation", an observation from which every other observation can be derived (or you could view it as being the "sum" of every other observation). An abstract type like
Pool
in essence is it's denotation. So if you can identify and export a denotation toPool
, you shouldn't need to export any of these properties from the module.That is, with the denotation, we can assert that this Pool is observably no different from another Pool, and hence they are the same.
I'll see if I can provide some better review here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've managed to write some code to explain my point, give me a bit to get it up if you will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started, but didn't get as far as I thought I would today: https://github.com/input-output-hk/cardano-wallet/tree/sevanspowell/adp-1043/suggestions
Essentially:
But I didn't finish working out the finer details.
None of this is important or necessarily "better" just thoughts. Feel free to merge 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sevanspowell Yes, I'm familiar with "denotational design" — I like it, but I'm less fond of its more orthodox interpretations.
The main theme is this: Haskell's
data
construct allows us to define free algebras, but not subalgebras (= subsets) or quotient algebras (= modulo an equivalence relation / redundancy). In contrast, a "denotation" can be any mathematical object, where we can take subsets and quotients at will. Thus, attempting to model all denotations as algebraic data types ("orthodox") will fail. Thankfully, using abstract (as opposed to algebraic) data types allows us to get sub- and quotient algebras as well, but we need to do some proofs in order to make the jump.The problem of asserting that values are "observationally" equal is mostly about constructing a quotient algebra: Two values can be different as elements of the free algebra, but have the same denotation in the quotient algebra.
In this case, however, the
Pool
type is essentially a subalgebra and not a quotient algebra. This means that the main problem is not that we want theinfo
function to be injective, but the point is that it is not surjective. The purpose of properties likeprop_fresh
is to tell us more about the codomain of theinfo
function. (Also, we need to make sure that operations such asupdate
really stay within the subalgebra.)Alas, the mapping
ix -> addr
is one of my main reasons for using aPool
. 😅 The purpose of thePool
data type is to aid in address discovery, i.e. to give an inverse mappingaddr -> ix
. I agree that removing the addresses will get rid of theprop_generator
export, but the price is that the original purpose, building the inverse mapping, has been lost and has to be performed elsewhere. More specifically, I'd like to use the Pool data type for the Sequential address discovery as well.I do see that the
Eq
implementation ofSharedState
is a bit awkward. But that doesn't affect our ability to reason about equality ofPool
— we can still identify when two functions are equal, we just can't do it in terms of theEq
type class. That's another instance where a concept from the metatheory ("equality") cannot be expressed within the language ("decidableEq
instance"), and where it is a good idea to not attempt to express denotation directly in the language.I propose that we devote a dev meeting to these topics. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HeinrichApfelmus I need more resources on this please! I would love to read more about sub and quotient algebras, and get a better understanding of your points here.
I'll attempt a counterpoint from my understanding, which is limited.
I think I understand this point, as it's one of the powerful ideas of denotations. Indeed, in the free algebra, given some geometric function
cw
that rotates something clockwise,cw (cw (cw (cw x)))
is different fromx
, but in the denotation, they are the same (rotating 90 degrees four times is the same as not rotating at all). This is great if you only wish to observe the final rotation of the object, but not great if you want to ensure that rotations always occur in multiples of four (our pool allocator for example wishes to assert that there is always a gap of 20). And (effectively) our only observation is `lookup :: Ord addr => addr -> Pool addr ix -> Maybe ix', how could we assert that the correct gap is used?? I believe this is your point?But, I've changed the denotation to be
Pool -> (PoolGap, [(ix, AddressState)])
, from which we can assert the prop_fresh and prop_gap are true. That is, we have changed the algebra's denotation to the denotation we are interested in. So even though other algebras might not be able to be modelled as an algebraic data type, this one can be I think?I have not removed the mapping from ix -> addr completely, I've simply split the idea of "gap allocation" from "ix -> addr association". The goal is not to be abstract for the sake of being abstract, but to simplify usage and testing. Though not obvious in my draft, perhaps it would help to think of it differently:
i.e. we haven't lost anything from the pool, simply hyper-focused it on address -> index mapping, and allowed a submodule to take care of allocating gaps.
I agree. I did experiment with this idea though (which I think you already understood, but worth re-iterating). Instead of passing in the generator to an abstract "Pool" type, create named pools:
These named pools would be abstract data types (no constructor exported). By naming the pools, we essentially encode the "generator" in the type, without having to explictly compare the generator function. Two pools are the same if they are both the same data type (i.e. SharedAddressPool and SharedAddressPool), have (observably) the same GapAllocator state, and have the same
Map addr ix
. However complicated the generator is for any type, it is the same generator for any other value of the same type.Then if you wanted to recover an abstract interface to them both, curry their lookup functions to get two functions of the same type:
addr -> Maybe ix
.Much logic could be shared between the two, as internally they would just be
Map addr ix
, and they could be tested in a uniform manner if they have uniform interfaces.Just a different perspective I guess. I'd love to hear more of your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to put some thoughts here, but this comment is probably to small to address all aspects. We should chat?
Yes, this is a good example where the denotation is a quotient algebra (namely a quotient where
cw . cw . cw . cw = id
holds). The question I'm getting at is: How to do we represent a quotient algebra in Haskell, i.e. what would the type ofinfo
be? We cannot use adata
type with a constructorCw
, because that would give us only a free algebra. (Cave: I have made the example to be about the algebra that denotescw
. But this is different from the algebra that denotesx
.)Actually, the field selectors
gap
andaddresses
are exported as functions, so these are part of the observation as well.Not in a strict sense. The thing is that e.g.
[(0,Unused),(37,Used)] :: [(ix,AddressState)]
is a legitimate element of the result type ofinfo
, but there is no valid address poolx
for whichinfo x
yields this element —info
is not surjective. In that sense,info
is missing some crucial information, and that information is the "sub" in subalgebra — the address pools are a subset of possibleAddressState
patterns. The "orthodox" denotational approach works well for quotient algebras — typically by mapping the algebra onto a subalgebra of functions. 😅I appreciate this sentiment, but in this case, I would pragmatically argue that splitting the module is more trouble than it's worth. The reasoning is this: As address discovery algorithm is inherently about the mapping from indices to addresses, we need an abstraction that combines both the association and the gap; then, whether we split this abstraction further or not becomes an implementation detail. However, the current implementation is very short — e.g. 10 lines for the most complex function
ensureFresh
— and I do not see how splitting it further would improve usability or correctness.The previous code used a type named
ParentContext
which had a similar effect. However, I decided to replace it with the function typeix -> addr
; this makes the interface not just uniform, but identical. 😄 The point is that the two pools do exactly the same thing, except with different address generation functionsix -> addr
. By separating this out, we can have two correct pools for the price of one.If we really wanted an
Eq
instance for address pools, I would advocate for an approach where we explicitly state that we wantEq
for functions. The tool for that is defunctionalization: We create a data typeGeneratorFun
whose constructors correspond to the functions that we use. Then, a functiondenotation
(:wink:) converts the constructor back to the function that it represents:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for forming such a detailed answer, I don't disagree with the current implementation, but I definitely (for my own interest), wanted to see how some of these ideas held up, so thanks for defending yours. I'm not sure I 100% understand yet, but I'll get there 👍
Sorry, I should have approved this PR before the holidays, slipped my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great explanation @HeinrichApfelmus !