Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement and test all channels of WebSocket API #10

Merged
merged 6 commits into from May 18, 2020

Conversation

ericpashman
Copy link
Collaborator

@ericpashman ericpashman commented Nov 14, 2019

Sorry for the delay, but here's my implementation of all data channels provided by the WebSocket API, issue #9. I consider this mergeable as-is, but there are some outstanding issues:

  1. I haven't yet implemented any of the private fields on messages pertaining to your own orders that show up only when authenticated. These were also unsupported in the limited existing implementation of the "full" channel.
  2. I haven't implemented "activate" messages, which relate to triggered stop orders. The documentation on these looks odd and I'll need to play around with some stop orders on the sandbox server and see what happens. Meanwhile, this won't result in any runtime errors so long as you don't place stop orders.
  3. My Aeson instances and QuickCheck instances need some work. They're all correct, so far as my testing can tell, but the Aeson instances (including the ones that already existed) should be re-written where possible to use the more efficient toEncoding method, and my QuickCheck Arbitrary instances are pure boilerplate that probably should be replaced with generic instances using the functionality in the generic-random package or similar.

I didn't use generic-random mainly because I felt bad about already having introduced a couple of new dependencies in the test target; the new QuickCheck and text dependencies are not really new, because they were already there as sub-dependencies of other packages, but the quickcheck-instances and aeson-qq dependencies are new. The latter one can go away in the future, though, as newer versions of QuickCheck provide all the functionality I'm using from that package. (We just need to update to a recent Stack package set.) And the only instances we need from quickcheck-instances are for the Text and UUID types, I believe, so if you prefer we can just steal those and eliminate that dependency too.

This commit includes a bunch of new QuickCheck tests of the Aeson instances; they're fairly limited in that they only test that we correctly decode the JSON we encode our types into, which doesn't tell us whether that JSON matches Coinbase's, but I've paired them with some unit tests that decode the example messages given in the API documentation. The idea is that we should collect some more messages ourselves from the live server, store them, and test them in the same way; then we're not dependent on getting lots of messages over the network every time for basic regression testing. I plan to implement similar tests for the REST API code in the future.

That said, pulling and testing the example messages from the docs has shown me that the docs leave a lot to be desired. Several message types are undocumented, and many of the examples are not actually what the API produces, and not even valid JSON. I've made a bunch of comments in the code about these issues and others, annotated with NOTE, TODO, or FIXME. If you search for those keywords, you should be able quickly to review most of the gotchas and questionable bits in this PR that we may need to discuss before merging.

And of course I've uncovered some other issues that I'll open in the days ahead.

@dimitri-xyz
Copy link
Owner

dimitri-xyz commented Nov 24, 2019

I cannot build this pull request. I get a few errors like:

haskell-coinbase-pro/src/Coinbase/Exchange/Types/Socket.hs:279:33: error:
Not in scope: type constructor or class ‘BestPrice’

I cannot find the BestPrice type. Did you forget something to include something? Or did I miss something?

Sorry about the delay on this. This is good work.

@ericpashman
Copy link
Collaborator Author

Yikes, looks like I broke this while tidying up my Git history just prior to submitting the PR. Will fix and re-submit.

@ericpashman
Copy link
Collaborator Author

OK, this should work now. Sorry for the history-mangling. Some further notes:

  1. Four tests currently fail. These are all tests of REST API functionality that I haven't touched.
  2. Aside from adding a bunch of property tests of the WebSocket API's Aeson instances, I also added several more channels to the pre-existing tests of the WebSocket API that hit the server. This ends up pulling down a lot more data and typically speeds up the tests considerably.
  3. I included a stack.yaml.lock file in this commit. I hadn't meant to, but it's supposed to be under version control, so there it is.

Sorry again for the mess, and for the size of this PR.

@ericpashman
Copy link
Collaborator Author

Oh, I forgot to mention that this also reverts the parsing of the client_oid field of “received” message types to the code you first proposed in your fix #4. My very helpful suggestion of a “more idiomatic” way to write it does not in fact work, as the left-side parse results in an empty value rather than a Nothing, and the (.:!) operator only handles the latter. As far as I can tell, using the applicative operator as you suggested is the only way to get the result we want.

@ericpashman ericpashman mentioned this pull request May 12, 2020
@dimitri-xyz
Copy link
Owner

Could we rebase this PR on the latest head (70352fd)? It makes it much easier to review!

@ericpashman
Copy link
Collaborator Author

Sure, I don't think this PR touches any files with newer merged changes, so there shouldn't be any conflicts.

If you haven't done it before then, I'll rebase tomorrow when I'm at a computer with a local copy of the repo.

@ericpashman
Copy link
Collaborator Author

Oh, you might also want to search for TODO, FIXME and NOTE comments and review them for gotchas that we might want to deal with before committing. That said, though it's been too long for me to recall the details, I considered this PR ready to be merged when I submitted it.

Comment on lines +42 to +49

-- TODO: Make this a list of all products
products = [ ProductId "ETH-BTC"
, ProductId "ETH-EUR"
, ProductId "ETH-USD"
, ProductId "BTC-EUR"
, ProductId "BTC-USD"
]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update this to the latest list, it feels sloppy not to do so before we merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for testing purposes, to diversify the test data. Running the tests against every market might makes sense in some circumstances, but not as part of the automated test suite, I don't think. That said, it would be nice to have a list of all products and then maybe select a subset of markets to run the tests against as part of the automated suite, or let users choose what they want to test against.

Attempting to update the list just now, I ran getProducts and it returns only six products, despite there actually being dozens. I don't know what's going on there, but I don't know of another way to get a list of all products. If this is a bug in Coinbase's code, we should make them aware of it, see #11.

Also note that this is the sort of thing that will have to be kept up to date if it's meant to be complete, so we'll have to figure out how we want to do that.

Comment on lines +605 to +626
toJSON L2SnapshotMsg{..} = object
[ "type" .= ("snapshot" :: Text)
, "product_id" .= msgProductId
, "asks" .= msgAsks
, "bids" .= msgBids
]
toJSON L2UpdateMsg{..} = object
[ "type" .= ("l2update" :: Text)
, "time" .= msgTime
, "product_id" .= msgProductId
, "changes" .= msgChanges
]
toJSON OpenMsg{..} = object
[ "type" .= ("open" :: Text)
, "sequence" .= msgSequence
, "time" .= msgTime
, "product_id" .= msgProductId
, "order_id" .= msgOrderId
, "side" .= msgSide
, "remaining_size" .= msgRemainingSize
, "price" .= msgPrice
]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to be able to use deriving via to derive these automatically. This is technical debt. And it seems so close now! (Just change from camelCase to snake_case and remove the prefix) Some future work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used deriving via, though I know the basic idea. How does that work here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure exactly. We would have to write a rule (i.e. a toJSON instance for a newtype wrapper) to derive these and then just apply this one rule to all these types. But I am not sure how to deal with the "type" field in this context. It seems worth exploring now.

The best description I know for deriving via is here:
https://skillsmatter.com/skillscasts/12654-deriving-via

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be straightforward using Aeson's generic-deriving facilities and applying an appropriate fieldLabelModifier and constructorTagModifier. However, I'm pretty sure that I would have done that if possible. Perhaps there is some problem in deriving generic representations?

I'll look at it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've looked at this and I think it can indeed be done with genericToJSON and a minimum of fiddling. I will re-write and resubmit tomorrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back. This can almost be done by deriving generically, needing only to change a bunch of field and constructor tag names, but there are a small handful of cases where the fields on the ExchangeMessage constructors are insufficiently parallel to the JSON messages to make this work.

[I've included a few footnotes below on some digressions from the main point.]

For example, the fact that "received" messages are parsed into two constructors (ReceivedLimitMsg and ReceivedMarketMsg) makes it impossible to derive generically because there's no way to add JSON fields that don't correspond to a Haskell record field (the "order_type" field is absent in generically produces JSON because that information is captured by the distinction between the two constructors themselves).[1]

Another example of insufficiently parallel types and messages: "received" messages for market orders sometimes have a "size" field and sometimes a "funds" fields, but the ReceivedMarketMsg constructor has a combined msgMarketBounds field with type Either Size (Maybe Size, Cost). This is explained by a comment in the code which reads:

market orders have no price and are bounded by either size, funds or both

Evidently this is meant to capture the idea that a market order is specified either with a quantity (Size) to sell or an amount of funds (Cost) to use to buy (evidently with the possibility in the latter case of there also sometimes being a quantity present).[2]

So we can either (a) leave things as they are, with a big hand-coded ToJSON instance; or (b) re-write the data structure to be parallel to the server's JSON and then derive generically. In the above example, that would mean combining the ReceivedLimitMsg and ReceivedMarketMsg constructors and replacing the msgMarketBounds :: Either Size (Maybe Size, Cost) field with two fields, msgMaybePrice :: Maybe Price and msgMaybeFunds :: Maybe Cost.[3]

In my opinion, it makes sense to re-write the data structure to be parallel to the JSON (at least in the long run, if not now, and at least in the case of ExchangeMessage, though maybe not of other types). That would vastly simplify the code and eliminate pretty much all of the pain of debugging the Aeson instances. And I think it makes sense: the point of this library is to implement the Coinbase Pro API, which means giving users a good Haskell representation of what Coinbase's servers spit out. To me, the fact that we have to write fiddly ToJSON and FromJSON instances by hand suggests that our types are trying to do too much.

Now, I can see the value in providing types that express stronger invariants than what the API makes explicit in its message schema, but it doesn't seem to me that the existing types, where they diverge from the schema in ways like I mentioned above, actually help much. Is that Either type for a market order's "bounds" (a concept not present in the API or its documentation) really better than just using a Maybe for the two message fields that don't always show up? Even if the Either correctly encodes the permutations of fields that you might see, who cares? It mostly just seems more complicated.

There's something of a middle ground, which I imagine is similar to what you had in mind when you mentioned deriving via: we can use exactly parallel types for parsing, then translate into non-parallel types like those we have, e.g., splitting the representation of the "received" message type into two constructors. To the extent that we feel non-parallel representations really add value that might make sense.

Thoughts?


Footnotes:

  1. I mean it's impossible to do this generically with the provided facilities for modifications along the way, which only allow for re-naming things. Obviously we could produce some wrong or incomplete JSON generically, then go back and fix it by adding fields or whatever. I mention this possibility towards the end of this comment.
  2. Is this another Coinbase bug, or is this just wrong? A market buy should always specify an amount of funds (aka "cost"), without specifying a quantity (aka "size"), and a market sell should always specify a quantity ("size") without specifying an amount of funds ("cost"). Do you know the origin of this code? If not, this may merit further investigation.
  3. Why Cost and notFunds? This reminds me that Size, Cost, and other types that refer to quantities of some asset would benefit from having the asset (aka "product") name or ID embedded in the type.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa! There's a lot to unpack on your comments. Here are my thoughts.

First and foremost, I am not as concerned about deriving the ToJSON instances as I am about deriving the Arbitrary instances. This is why I said this is maybe future work. I think it is worth improving, but I don't think it (the technical debt) has to be solved now, for these ToJSON instances.

You did a good job at pointing out and documenting the difficulties with deriving them automatically because of the multiple constructors (i.e. ReceivedLimitMsg and ReceivedMarketMsg).

I strongly disagree that there is no value in capturing the API invariants expressed in the type of the field msgMarketBounds :: Either Size (Maybe Size, Cost). This invariant does exist (it just can't be expressed in JSON), This is evidenced by the fact that I don't ever recall a parsing failure because of these (let me know if you see one). Furthermore, once this invariant is established, we don't have to keep checking this value again and again as would be the case with multiple Maybes. I think this is one of those "Parse, don't validate" and "make invalid states unrepresentable" situations. Does this type make it more complicated to parse? For sure! But precisely because it enforces more strict invariants. (I have written other code that depends on this library and it was quite useful to be able to count on these invariants. The alternative is having to do this "type checking" again at a later stage. The invariant does show up either way.)

Although it may seem initially counter intuitive, allowing market orders to be bound by size, funds or both, is not a bug. This is exactly how market orders should be implemented in a "continuous limit orderbook" market. It is implicit that when limited by both, we should use the minimum (most restrictive) bound. This is one situation where Coinbase got it right and almost everybody else got it wrong.

The type Cost was created to distinguish from Price. The Cost of 2 (BTC) at a price of 10K (USD/BTC) is 20K (USD). Note the units. Funds seems to also match this idea, but seems a little bit less precise. I agree that costs could have the units encoded at the type level, for example, as a phantom type. As a requirement, I think that any implementation of this should also take care of the above calculation and changing units from Price to Cost.

Considering that after you looked at it carefully, it looks like a significant amount of work for an uncertain amount of gain. I suggest we leave it as is for now.

Comment on lines 621 to 776
instance Arbitrary CurrencyDetails where
arbitrary = CurrencyDetails
<$> arbitrary
<*> arbitrary
<*> arbitrary
<*> arbitrary
<*> arbitrary
<*> arbitrary

instance Arbitrary L2BookEntry where
arbitrary = L2BookEntry <$> arbitrary <*> arbitrary

instance Arbitrary BookChange where
arbitrary = BookChange <$> arbitrary <*> arbitrary <*> arbitrary

instance Arbitrary SendExchangeMessage where
arbitrary = do
sub <- Subscribe <$> arbitrary <*> arbitrary
unsub <- Unsubscribe <$> arbitrary <*> arbitrary
elements [sub, unsub]

instance Arbitrary ExchangeMessage where
arbitrary = oneof
[ ErrorMsg <$> arbitrary
, SubscriptionsMsg <$> arbitrary
, HeartbeatMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
, StatusMsg <$> arbitrary <*> arbitrary
, StartTickerMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary
, TickerMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
, L2SnapshotMsg <$> arbitrary <*> arbitrary <*> arbitrary
, L2UpdateMsg <$> arbitrary <*> arbitrary <*> arbitrary
, ReceivedLimitMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
, ReceivedMarketMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary <*> arbitrary
, OpenMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary
, MatchMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
, LastMatchMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
, DoneMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
, ChangeLimitMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
, ChangeMarketMsg <$> arbitrary <*> arbitrary <*> arbitrary <*> arbitrary
<*> arbitrary <*> arbitrary <*> arbitrary
]

--------------------------------------------------------------------------------
-- `Arbitrary` instances for types defined in `... .MarketData.Types`
-- FIXME: These should be moved to the module in which these types are tested,
-- but the types themselves should probably move to `... .Core.Types` or
-- `... Core`.

instance Arbitrary Size where
arbitrary = Size <$> arbitrary

instance Arbitrary Price where
arbitrary = Price <$> arbitrary

instance Arbitrary Cost where
arbitrary = Cost <$> arbitrary

instance Arbitrary CurrencyId where
arbitrary = CurrencyId <$> arbitrary

instance Arbitrary ProductId where
arbitrary = ProductId <$> arbitrary

instance Arbitrary Side where
arbitrary = elements [Buy, Sell]

instance Arbitrary CoinScientific where
arbitrary = CoinScientific <$> arbitrary

instance Arbitrary Sequence where
arbitrary = Sequence <$> arbitrary

instance Arbitrary TradeId where
arbitrary = TradeId <$> arbitrary

instance Arbitrary OrderId where
arbitrary = OrderId <$> arbitrary

instance Arbitrary ClientOrderId where
arbitrary = ClientOrderId <$> arbitrary

instance Arbitrary Reason where
arbitrary = elements [Filled, Canceled]

--------------------------------------------------------------------------------

-- FIXME: This is a hack. Newer versions of the `quickcheck-instances` package
-- provide a proper `Arbitrary` instance for `UUID`. Delete this after updating
-- package dependencies.
instance Arbitrary UUID where
arbitrary = elements [nil]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too much extra code! Let's add any necessary dependencies and just derive these.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's quite absurd. I believe we can derive all or most of these using generic-random; I'll re-write using that.

Copy link
Owner

@dimitri-xyz dimitri-xyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for bringing the socket interface back from the dead. I do feel it would be wise for us to cut down on the boiler plate code for generating arbitrary instances before merging it, though. (And file/document corresponding bugs) We have a lot of testing code already. Feel free to merge it yourself, once you consider it done.

@ericpashman
Copy link
Collaborator Author

ericpashman commented May 14, 2020 via email

@ericpashman
Copy link
Collaborator Author

ericpashman commented May 16, 2020 via email

@dimitri-xyz
Copy link
Owner

Just a couple thoughts.

The invariant in msgMarketBounds is documented in the API Spec for order placement. (Look for "market order parameters" there).

Here's an attempt to make the meaning of those parameters clearer.

All buy-side orders (limit or market) have a "funds" (i.e. cost) bound. Consider a market bid (buy order) in the BTC-USD market to buy 3 BTC. Although it is not explicitly specified, the matching engine must ensure the customer has enough funds to pay for the 3 bitcoins. How can this be done? Coinbase puts a hold on all the customers available USD funds and then "estimates" whether those funds will be sufficient to buy the 3 BTC. If the "risk management engine" (that makes the estimate) thinks those funds are insufficient, the order is immediately rejected ("insufficient funds"). If it thinks there's enough money to pay for the 3 BTC, the order goes to the matching engine. Exactly the same process happens for limit orders, but in that case it is easy to calculate how much USD to place on hold. Buying 3 BTC at $10K each will require at most $30K. This is the amount placed on hold. So, market bids will always have two bounds: one specified in "how many BTC to buy" (size) and another in "how much money to place on hold (funds)". Coinbase's API just allows customers to express those explicitly.

@ericpashman
Copy link
Collaborator Author

All changes made as discussed. Merging.

@ericpashman ericpashman merged commit e30f134 into dimitri-xyz:master May 18, 2020
@dimitri-xyz
Copy link
Owner

Nice! :-)

@ericpashman ericpashman mentioned this pull request May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants