Skip to content
This repository has been archived by the owner on Feb 16, 2020. It is now read-only.

[KRAKEN] Normalizing the approach for retry handling #1411

Merged
merged 10 commits into from
Dec 8, 2017
Merged

[KRAKEN] Normalizing the approach for retry handling #1411

merged 10 commits into from
Dec 8, 2017

Conversation

cmroche
Copy link
Contributor

@cmroche cmroche commented Dec 4, 2017

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

Feature, will probably fix some bugs though

  • What is the current behavior? (You can also link to an open issue here)

Currently retry handling is implemented by the individual exchanges, and so isn't very consistent. Depending on the implementation, some exchanges have irrecoverable errors to solve issues like API errors causing orders cancels to never end, others impement them in different functions. In Kraken, for instance, there was no retry handling on getTicker, but some other exchanges do implement this.

  • What is the new behavior (if this is a feature change)?

Also recent work (6 months ago) @askmike started to add retry handling directly in the portfolioManager. This is implemented fairly easily using the "retry" package which comes with feature not otherwise handled by our per exchange implementation, such as limiting the number of retries and exponential backoff of the retry delay.

In this PR I extended that functionality to all exchange functions, and was able to drop retry handling from kraken completely.

This approach is also fully compatible with exchange that currently have their own retry handling, so there is no need to update all the exchange implementations until someone is ready to test them.

This should help simplify the implementation and debugging of exchanges going forward.

  • Other information:

Kraken's service is in such a state is disrepair right now that it was a convenient place to test the retry handlers worked correctly. Through various actions, importing, paper trading the recoverable errors are caught and dispatched through the retry handler as expected.

@askmike
Copy link
Owner

askmike commented Dec 4, 2017

Great stuff, generally:

  • I want exchange libraries to be responsible for retrying everything

@askmike
Copy link
Owner

askmike commented Dec 4, 2017

Great stuff!

Apologies for the text below which is about the context of the PR, not the contnt. The following is not a direct response to this PR, but about the direction I want Gekko to go in:

  • I want exchange libraries to be responsible for retrying everything that can be retried, because whether there was a recoverable or a non recoverable should be something decided at this level instead: if the error is recoverable: the lib should retry*, if it is not it should upstream the error (things like: invalid API key, not enough balance). Throughout the exchange API this is sometimes done, but not consistently.
  • I love the idea of exposing helper functions to actually implement this on the exchange level, the exchange wrapper can have the unique checks in place per exchange to determine whether something is recoverable or not. I love how you are upstreaming either a recoverable or non recoverable error as well.

*After X retries (100 or a 1000 or so) it should stop retrying and upstream the error, not sure how to handle this when exchanges are temporary down. But not sure if trying to buy based on an advice from a day ago is still a smart thing to do.

Most things in this PR are already in the same line as above.

@@ -72,7 +72,7 @@ Fetcher.prototype._fetch = function(since) {
if(++this.tries >= this.limit)
return;

this.watcher.getTrades(since, this.processTrades, false);
util.retryForever((callback) => this.watcher.getTrades(since, callback, false), this.processTrades);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep this on the exchange level, if you are retrying inside kraken.js already, what is the need for this?

`Gekko tried to retrieve data since ${since.format('YYYY-MM-DD HH:mm:ss')}, however
${provider} did not return any trades.`
);
util.retryForever((callback) => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -214,7 +214,8 @@ Manager.prototype.buy = function(amount, price) {
'price:',
price
);
this.exchange.buy(amount, price, this.noteOrder);

util.retryForever((callback) => this.exchange.buy(amount, price, callback), this.noteOrder);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

@@ -245,7 +246,8 @@ Manager.prototype.sell = function(amount, price) {
'price:',
price
);
this.exchange.sell(amount, price, this.noteOrder);

util.retryForever((callback) => this.exchange.sell(amount, price, callback), this.noteOrder);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

this.orders = [];
done();
});
util.retryForever((callback) => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

@@ -363,7 +367,7 @@ Manager.prototype.relayOrder = function(done) {

var getOrders = _.map(
this.orders,
order => next => this.exchange.getOrder(order, next)
order => next => util.retryForever((callback) => this.exchange.getOrder(order, callback), next)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

Copy link
Owner

@askmike askmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See nits in combination with the story above.

@askmike
Copy link
Owner

askmike commented Dec 4, 2017

Maybe there is a difference in retrying at exchange level and at the portfolio level which I am not seeing, in that case please let me know why you did :)

@cmroche
Copy link
Contributor Author

cmroche commented Dec 4, 2017

@askmike As you may have noticed over the weekend, there were some issue with the implementation of error and retry handling within kraken.js, when I think about the nature of the issue (ie: When should we retry, who decides we should retry) these are the criteria that come to mind:

  1. The exchange implementation should be easy to keep consistent through all implementations, and reduce the amount of copy and paste code. This will reduce bugs in other feature implemented in the engine as it is generally difficult to test multiple exchanges particularly in a live trade scenario.

  2. The exchange knows when an error is not recoverable, and we should abort the retry (ie: data errors like an order doesn't exist, or bad parameters for addOrder) retrying would be problematic in this case.

  3. The upstream calls determine if it want's a request to be retried. For example there is a cancel order that is possibly called every minute to adjust the entry, we don't need to keep retrying if this fails, wait a minute and let it try again. Implementing a retry without blocking the time could result in a run-away retry problem spamming the API.

The goal of this system was to meet those 3 criteria. The exchange tells us when to retry, the engine provides the implementation, reduces copy-paste code in the exchange function and reducing error. It opens the door the the engine deciding that a retry isn't appropriate (it: get ticker updating every frame could be implemented without the retry since it is already on a timer).

That was my thinking :)

@askmike
Copy link
Owner

askmike commented Dec 5, 2017

I agree with the first 2 points, but I don't understand the third one:

The upstream calls determine if it want's a request to be retried.

That makes sense, but right now the upstream code just retries infinitely:

In case getTrades fails (which on Kraken has a big change of happening but definitely not on exchanges like bitfinex/poloniex), there is most likely something else wrong. See #869 and #1106 as examples to this.

@cmroche
Copy link
Contributor Author

cmroche commented Dec 5, 2017

@askmike One of the cancel orders in portfolioManager doesn't have a retry at all, I'm pretty sure it would survive. But you are right, getTrades has other issues. It's a longer term goal that it could be addressed, if this was in place.

The main argument IMHO is really about reducing bugs. I personally feel that exchange implementations will be more reliable long term with this approach since it simplifies the implementation, reduces copy and paste errors, etc. If you look at the current exchanges they don't all do retry in the same way, or even on the same functions. No one person is likely to refactor all exchange files if it can be avoided because there is risk in the amount of testing necessary to be reliable. This approach makes all improvements, present and future, available reliably to all exchange implementations.

If you feel that this is not a priority, then it would probably make sense to move the util.retry calls into the exchange.

@askmike
Copy link
Owner

askmike commented Dec 5, 2017

I am 100% with you on decreasing bugs as the main priority! Though I think changing the behaviour of code that implements getTrades (by adding infinite retries) might possibly add new bugs on exchanges where getTrades not working means something being really wrong (IP being banned, as soon as they change their API, etc).

No one person is likely to refactor all exchange files if it can be avoided because there is risk in the amount of testing necessary to be reliable. This approach makes all improvements, present and future, available reliably to all exchange implementations.

I agree that no one is going to go over all exchanges all at once and fix everything, but just infinitely retrying all exchanges (even the ones that are considered very stable like Poloniex, Bitstamp and Bitfinex, a few of the most used exchanges) is not a great solution to this imo.

One of the cancel orders in portfolioManager doesn't have a retry at all

This is the part where I think we should think about how we want cancelOrder to work in the future, and slowly build towards that (without having to fix all exchange implementations). Specifically looking at cancelOrder:

cancelOrder

this.exchange.cancelOrder(order, callback);

The order will be something that the manager previously received via the sell or buy methods. The callback should have the parameter err.

Right now the docs (=spec) does not define what the err should be, but we can change this right now into: the exchange implementation should retry unless the error is not recoverable (order was filled, order does not exist, etc). In which case the err hitting portfolioManager indicates a problem where retrying does not even make sense.

As soon as there is any issue with the cancelOrder implementation in another exchange we can patch it up. I think that would already be an improvement from the current situation. What do you think?

@cmroche
Copy link
Contributor Author

cmroche commented Dec 5, 2017

@askmike Ok, I think I misunderstood your previous points. The issue is about the infinite retry, yes you are likely right. I did it this way to preserve what I saw in the exchange implementations, but fully agree that they should perhaps be something less intense. 3 retries seems a bit short, but perhaps 10 retries with a max of 30 seconds. This falloff I think is very safe from being banned.

Additionally can use the helper still with the error types to give better control, and move these functions into the exchange itself, but that sort of implies doing the same thing for changes you made in the portfolioManager a few months back, we want all the retries on exchange functions in the exchange in that case right?

This is definitely still an improvement I'll make the changes once I have your feedback.

@askmike
Copy link
Owner

askmike commented Dec 5, 2017

but that sort of implies doing the same thing for changes you made in the portfolioManager a few months back

Ah! This is probably where the confusion came from: I shouldn't have add retries here, since it goes against the design (we both agree on). I'll try to correct this soon.

we want all the retries on exchange functions in the exchange in that case right?

Yep!

This is definitely still an improvement I'll make the changes once I have your feedback.

A complete PR would be great, but I am already very happy about all of these changes (just not infi retry in getTrades consumers). But I'll gladly wait for more :) Your efforts are extremely appreciated!

@cmroche
Copy link
Contributor Author

cmroche commented Dec 6, 2017

FYI, I am working on this, but it may take a day or two to complete and test what I had in mind.

@cmroche cmroche closed this Dec 6, 2017
@cmroche cmroche reopened this Dec 6, 2017
@cmroche
Copy link
Contributor Author

cmroche commented Dec 6, 2017

@askmike

So this is what it looks like if we move the retry handling directly into the exchange. In this case we still use the helper function for handle retry, which gives us a nicer implementation of retry logic, but allows us to specify the retry configuration at exchange level since ideals are bound to vary from one implementation to another, and also preserves the upstreaming of irrecoverable and recoverable errors.

If you are happy with these changes the next step will be for me to update the documentation to clarify the expectation of retry handling on the exchange implementation, and reference this implementation as a template for how to handle it.j

I'd likely also take this myself into the Bitfnex and Binance exchanges since I use those regularly. But only after we've tested a bit in our stress test case (Kraken) :)

As a bonus I also cleaned up a lot of the copy and paste code into a handler function, and standardized the formatting of the error messages. It look much nicer on the output when errors do occur.

@askmike
Copy link
Owner

askmike commented Dec 7, 2017

So this is what it looks like if we move the retry handling directly into the exchange. In this case we still use the helper function for handle retry, which gives us a nicer implementation of retry logic, but allows us to specify the retry configuration at exchange level since ideals are bound to vary from one implementation to another, and also preserves the upstreaming of irrecoverable and recoverable errors.

👍👍👍👍

If you are happy with these changes the next step will be for me to update the documentation to clarify the expectation of retry handling on the exchange implementation, and reference this implementation as a template for how to handle it.

I am very happy with the overall changes, however there were a few small issues I found while testing. I have fixed them here: https://github.com/cmroche/gekko/pull/1

If you agree & merge them they should show up here.

@cmroche
Copy link
Contributor Author

cmroche commented Dec 7, 2017

Changes look great thanks @askmike, thanks.

I will work on updating the documentation tonight and send a second PR for those changes probably over the weekend.

small fixes in error handling for kraken (upstream #1411)
@askmike askmike merged commit 1b95a3d into askmike:develop Dec 8, 2017
@askmike askmike changed the title Normalizing the approach for retry handling [KRAKEN] Normalizing the approach for retry handling Dec 8, 2017
@cmroche cmroche deleted the retry_handling branch December 9, 2017 00:08
@askmike askmike mentioned this pull request Dec 11, 2017
@wupeng1211
Copy link
Contributor

@askmike gekko/core/util.js:187
retryHelper(fn, options, callback);
^

ReferenceError: options is not defined
at Object.retry (/home/ka/gekko/core/util.js:187:21)
at Mailer.mail (/home/ka/gekko/plugins/mailer.js:84:8)
at Mailer.bound [as mail] (/home/ka/gekko/node_modules/lodash/dist/lodash.js:729:21)
at Mailer.setupMail (/home/ka/gekko/plugins/mailer.js:42:12)
at Mailer.setup (/home/ka/gekko/plugins/mailer.js:79:15)
at Mailer.bound (/home/ka/gekko/node_modules/lodash/dist/lodash.js:729:21)
at new Mailer (/home/ka/gekko/plugins/mailer.js:15:8)
at load (/home/ka/gekko/core/pluginUtil.js:95:22)
at /home/ka/gekko/node_modules/async/dist/async.js:1156:9
at replenish (/home/ka/gekko/node_modules/async/dist/async.js:1030:17)
at iterateeCallback (/home/ka/gekko/node_modules/async/dist/async.js:1015:17)
at /home/ka/gekko/node_modules/async/dist/async.js:988:16
at /home/ka/gekko/node_modules/async/dist/async.js:1158:13
at load (/home/ka/gekko/core/pluginUtil.js:68:14)
at /home/ka/gekko/node_modules/async/dist/async.js:1156:9
at replenish (/home/ka/gekko/node_modules/async/dist/async.js:1030:17)
at iterateeCallback (/home/ka/gekko/node_modules/async/dist/async.js:1015:17)
at /home/ka/gekko/node_modules/async/dist/async.js:988:16
at /home/ka/gekko/node_modules/async/dist/async.js:1158:13
at load (/home/ka/gekko/core/pluginUtil.js:68:14)
at /home/ka/gekko/node_modules/async/dist/async.js:1156:9
at replenish (/home/ka/gekko/node_modules/async/dist/async.js:1030:17)

@tdsticks
Copy link

See my solution in #1465, hope that works for you.

greenbigfrog added a commit to greenbigfrog/gekko that referenced this pull request Dec 13, 2017
askmike pushed a commit that referenced this pull request Dec 14, 2017
@askmike askmike mentioned this pull request Jan 13, 2018
askmike pushed a commit that referenced this pull request Jan 13, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants