Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*** Do you want consistent data from binance? *** #42

Closed
oliver-zehentleitner opened this issue Feb 23, 2020 · 27 comments
Closed

*** Do you want consistent data from binance? *** #42

oliver-zehentleitner opened this issue Feb 23, 2020 · 27 comments
Assignees
Labels
info Informations
Projects

Comments

@oliver-zehentleitner
Copy link
Member

oliver-zehentleitner commented Feb 23, 2020

Its important to understand, that the data between a disconnect and a reconnect is LOST!

And reconnects take place! Regarding to https://github.com/binance-exchange/binance-official-api-docs/blob/master/web-socket-streams.md#general-wss-information disconnects have to be expected every 24h.

A single connection to stream.binance.com is only valid for 24 hours; expect to be disconnected at the 24 hour mark

So you have to find a strategy for that!

Ideas to handle this are:

  • replace_stream(): start a new stream with the same settings and stop the old one, as soon the new received its first data. This could be scheduled and made automatic, but therefore we have to write a new method, which is not done yet!
  • stream the same data through 2 ident websocket connections on one system or better, on different systems
  • download missing data via REST after a reconnect.

It would be cool to discuss this as a community, do brainstorming and define a best practise solution. Then we can extend the library with helping features to implement the best practise solution.

I look forward to good suggestions!

Best regards,
Oliver

@oliver-zehentleitner oliver-zehentleitner added the question Further information is requested label Feb 23, 2020
@oliver-zehentleitner oliver-zehentleitner self-assigned this Feb 23, 2020
@oliver-zehentleitner oliver-zehentleitner removed the question Further information is requested label Feb 23, 2020
@DaWe35
Copy link
Contributor

DaWe35 commented Apr 27, 2020

(typo: reconnect ist LOST!˙)
How often does disconnection occur?

@oliver-zehentleitner
Copy link
Member Author

Thanks, fixed typo and added info about disconnect interval to OP.

@daverio
Copy link

daverio commented May 29, 2020

(just to say, great work!)
I think there is no "best practice" and that both:

  1. a way to get back missing data from the rest api. But dont know if this should be part of this library or more at the level of a data management module (which put the data in buffers or a database)
  2. a way to insure that there is a little as possible "blind" period (basically your solution 1 or 2)

Both are important. The first one is important to manage a "history", which should have as little as possible missing data. Then the second is important for high frequency, if you get blind, and that your algo is currently executing some trades, that might result on bad behavior (evidently a good algo should have logics to manage this kind of periods), even if the blind period is quite short!

My preference would two ident websockets, I think it is safer as if one of the processes have some issue, the data can always be retrieved from the second. But it does not prevent the case where both are disconnected...

But at the end, I think its pretty impossible to have a system with 0% of missing data!

@oliver-zehentleitner
Copy link
Member Author

thank you for your opinion! maybe you are right and the solution to the problem should not be integrated directly into this module, but the necessity for a good solution is actually there for everyone who uses this lib. we could also solve and share this as an extra module.

@ghost
Copy link

ghost commented Aug 9, 2020

@oliver-zehentleitner
Not sure wether or not I should ask my question here
I was thinking of using replace_stream() to have consistent data. I took a look at the source code but I couldn't figure it out completely. My question is that if I start a new stream using this replace_stream() mechanism, will the first data that I receive in the new stream and the old stream be duplicates? I mean, when I pop the data using pop_stream_data_from_stream_buffer(), will their be duplicate data?
thanks

@oliver-zehentleitner
Copy link
Member Author

Yes it will be a duplicate!

@Nikolaj464
Copy link

Hi, I am currently using "python-binance" to manage my bot, but I realised i'm goung to need user data streams for precise data.

That's why i'm thinking of using this library,
I think this library shouldn't handle the data loss, but what it should do is throw an exception or return an error message, saying the stream got disconnected , so that users know they got disconnected and can recover lost data themselves.

Now since I don't use this library yet, I don't know if it already returns an error, letting the callback function know the ws got disconnected and is trying to reconnect.

If this system isn't in place, in my opinion it should become a new feature.

@oliver-zehentleitner
Copy link
Member Author

Hi Nikolaj!

Here is an example file for userData streams: https://github.com/oliver-zehentleitner/unicorn-binance-websocket-api/blob/master/example_userdata_stream_new_style.py (just remove the part for "isolated_margin")

Both is possible, raising an exception as well as sending a signal.

Now since I don't use this library yet, I don't know if it already returns an error, letting the callback function know the ws got disconnected and is trying to reconnect.

This lib works a little bit different. You dont care about reconnects. You just create a stream with create_stream() and then you take the receives from the stream_buffer by using pop_stream_data_from_stream_buffer() within a loop. If the socket disconnect, the unicorn_manager restarts it again. If its not restart able for any resason like api_key is wrong, it stops trying to reconnect. If you want you can receive an exception then. Just enable throw_exception_if_unrepairable

Just download this repo and start the example files, i think this way you get a fast start.

@Nikolaj464
Copy link

Hi oliver, I'm happy you were able to respond this quickly !

I've been reading your source code for the last couple hours, and I don't undestand why I shouldn't care about reconnects ?
Let's take an example:
If I have a user data stream (trade execution reports and balance updates), that gets disconnected, while a trade is occurring, I understand that your manager will reconnect the stream, but if there was any data we were supposed to receive between, the disconnection and the reconnection, that data is LOST, like you said in the OP.

That's why I don't just want an exception to be thrown when the manager is unable to reconnect but also a signal, when he gets disconnected even if he is able to reconnect afterwards.
Why ? Because if we have this signal, we can then get the trades history for a certain symbol, with the "python-binance" package for example, and try to recover the lost data ourselves.

@oliver-zehentleitner
Copy link
Member Author

ok, we have a misunderstand :)

this topic is exactly for that what you are talking about. recognizing the reconnect and care about lost data or running multiple nodes or what ever

i was just meaning that you dont need to initialize a restart to get the stream running again.

But you need to know when you stopped receiving data and when you started receiving again. I will make some minds and post infos asap.

@Nikolaj464
Copy link

Great keep me updated ! :) (I joined your telegram group If you prefer communicating through telegram)

@oliver-zehentleitner
Copy link
Member Author

I think the process should be:

  1. you get informed that the stream is disconnected
  2. you control in your DB the IDs of the last saved entries for each channel and market that you use
  3. you start downloading via REST
  4. you get informed that the stream is receiving again.
  5. you stop importing via REST

Take care to not import duplicates.

Thats it, isnt it?

Whats the best way to send such signals? I think like sending an event between processes or calling some callback functions? Both and the devs can choose? I think sending events is my favorite. I would not use the callback functon (my flavour).

@Nikolaj464
Copy link

Yes I do think that should be the process of recovering lost data.

As for the way to receive the signal, I don't really understand what you mean with an event between processes (to me that's what a callback is ?), so I'm for the callback function because the dev can then do what he wants with that information, set his own stream_statuses, do his data recovering etc... But i'm sure it would be nice to have the possibility to choose.

@oliver-zehentleitner
Copy link
Member Author

a callback works that way:
the lib has a thread which is running in a while True loop, if the error happens it starts your callback function with your code, as soon its finished the functions stops blocking and the runtime returns to the while True.

an event (i dont know whats the common name for that) works that way:
you start a thread in your code and control in a while True loop if an event/signal is transmitted. If an error happens, the lib sends the signal and thats it. your thread with the loop receives the signal and starts your code within your thread and doesnt affect the lib in any way. Similar how the stream_buffer works.

Writing to the stream_buffer costs the loop that receives the data from binance a minimum and it can return to receive the next data. in a callback function this loop would also have to write to the database and if the database is slow this limits the receives also.

Better is receiving and dropping to the stream_buffer, in an other thread or sub process you can pickup the data from the stream_buffer and do whatever you want. do you see the value?

@Nikolaj464
Copy link

I do understand now, and you are right, the event system is better, because it won't affect the runtime of the receiving loop.
Which means it will be able to receive more messages, if it is a really populated stream.

@oliver-zehentleitner oliver-zehentleitner added this to To do in Todo Oct 19, 2020
@nocciolate
Copy link

Hi Oliver, in the initial post at the top you were suggesting as an option to stream the same data through 2 ident websocket connections. Do you happen to have an example how you would do this?

@oliver-zehentleitner
Copy link
Member Author

if you want to save all the data into a database you could do:

  1. create websockets
  2. receive data
  3. save data to DB with [if not exists] statement

do the same on a second independent server.

@nocciolate
Copy link

I was hopping I would find a way to run two concurrent python files on one mashine and get both outputs in a third file and do the rest there. Just not sure yet how to read the incoming messages from the first two files in the third one, but I guess this is a pretty basic question, so I'll try to figure it out somehow. As you can imagine my knowledge is indeed pretty limited and I can do a lot when I see some examples, but coming up on my own with that logic is not always that easy. Cheers for the quick reply :)

@oliver-zehentleitner
Copy link
Member Author

running the same script two times concurrent should not be the problem.

writing that to a single file ... hm

i dont know what you want to do, but if you want to store the received data, just use a database like mysql or postgresql. its made for that and much easyier to handle in this situation, maybe you have to learn a bit more upfront but i guess its the better way. in a database you say "insert blah into blub if not exists" so you can insert two times the same thing and its just saved once.

maybe you want to take a look into this "example" :D https://github.com/Dreem-lense/Binance-websocket-api-to-mysql

@nocciolate
Copy link

nocciolate commented Nov 29, 2020

Hi Oliver, I actually managed to execute what you were suggesting yesterday after taking a crash course in mysql.

So basically, I have two concurrent streams (IDs 64d11864-fbf3-4e1f-bd07-bf1bf8b38268 and 26990714-d07d-4b95-85ae-e45b56283427) pulling "trade" for the same 3 symbols. I renew each stream after 12 hours with 6 hours time lag between both scripts - to ensure asynchronical restart. Both are writing the results in the same 3 mysql databases - dublicates are being taken care of. Thanks for your suggestion!

I let it run (via a parent script) through the night and I got the following issues:

[1] are the very first error messages that came up. Right after each other for both scripts.
[2] are the error messages that strarted being printed constantly afterwards.

Do you think there's something I could do to fix this?

[1]
CRITICAL:root:BinanceWebSocketApiSocket.start_socket(64d11864-fbf3-4e1f-bd07-bf1bf8b38268, ['trade'], ['ltcusdt', 'ethusdt', 'btcusdt']) - Exception ConnectionClosed - error_msg: code = 1006 (connection closed abnormally [internal]), no reason
CRITICAL:root:BinanceWebSocketApiManager.stream_is_crashing(64d11864-fbf3-4e1f-bd07-bf1bf8b38268)

CRITICAL:root:BinanceWebSocketApiSocket.start_socket(26990714-d07d-4b95-85ae-e45b56283427, ['trade'], ['ltcusdt', 'ethusdt', 'btcusdt']) - Exception ConnectionClosed - error_msg: code = 1006 (connection closed abnormally [internal]), no reason
CRITICAL:root:BinanceWebSocketApiManager.stream_is_crashing(26990714-d07d-4b95-85ae-e45b56283427)

[2]
CRITICAL:root:BinanceWebSocketApiManager._create_stream_thread() stream_id=26990714-d07d-4b95-85ae-e45b56283427 error: 7 - cannot schedule new futures after interpreter shutdown - if this stream did not restart after this error, please create an issue: https://github.com/oliver-zehentleitner/unicorn-binance-websocket-api/issues/new/choose

CRITICAL:root:BinanceWebSocketApiManager._create_stream_thread() stream_id=64d11864-fbf3-4e1f-bd07-bf1bf8b38268 error: 7 - cannot schedule new futures after interpreter shutdown - if this stream did not restart after this error, please create an issue: https://github.com/oliver-zehentleitner/unicorn-binance-websocket-api/issues/new/choose

@oliver-zehentleitner
Copy link
Member Author

Please open for questions not related to the original topic a new issue!

I would need the log file (level INFO), but I guess you have a lot of "High CPU usage since 5 seconds" messages in your log and the python interpreter died because of to less resources.

Just read this issue for further information: #131

@oliver-zehentleitner oliver-zehentleitner moved this from To do to In progress in Todo Nov 29, 2020
@oliver-zehentleitner
Copy link
Member Author

oliver-zehentleitner commented Dec 2, 2020

i added the stream_signal_buffer, it can be tested with https://github.com/oliver-zehentleitner/unicorn-binance-websocket-api#from-the-latest-source-dev-stage-with-pip-from-github

Wiki: stream_signal_buffer

Tests and feedback are appreciated!

@oliver-zehentleitner oliver-zehentleitner moved this from In progress to Suspended in Todo Dec 2, 2020
@oliver-zehentleitner oliver-zehentleitner moved this from Suspended to Done in Todo Dec 2, 2020
@oliver-zehentleitner
Copy link
Member Author

oliver-zehentleitner commented Dec 3, 2020

the signal system is released with version 1.27.0: https://github.com/oliver-zehentleitner/unicorn-binance-websocket-api/releases/tag/1.27.0

@jon4hz
Copy link

jon4hz commented Jan 7, 2021

Since I spent the last few weeks building a trading system that heavily relies on a local (consistent) order book, I thought I could share my thoughts and experiences here as well.
To start the streams I use docker so the system is scalable and I can add/remove coins flexible. Also it's an easy way to manage the python scripts without any systemd hacking, etc. To store the data I use timescale, which is basically Postgresql with optimizations for time-series data.

To get the order book I use the depth stream and since this stream contains update ids (more here it's pretty easy to see if you missed any data or not. My experience was, that you will miss some streams from time to time but mostly UBWA takes good care of a fast reconnect and stable connection. Also I placed my server near to the Binance servers which possibly gives me a small advantage.

To avoid such disconnection I'm planing to start always two containers (time-shifted) with the same stream and add a short check in the script to decide whether it should write down the data or not. I'll update this post as soon as I know more but in my opinion this should work unless the whole host crashes or Binance has problems.

In a perfect scenario I would probably use multiple hosts for the data gathering and one dedicated database host but my budget is to tight for that currently.

@jon4hz
Copy link

jon4hz commented Feb 6, 2021

Hey, update to my last post. Running multiple containers on a single host doesn't help.

@Deidara6026
Copy link

This comes a year late, but how long do reconnects typically take?

@oliver-zehentleitner
Copy link
Member Author

First we need to dedect that a connection is not working anymore. Take a look at this parameters of create_stream():
ping_interval=20, ping_timeout=20, close_timeout=10

Then we wait 6 seconds to restart it: https://unicorn-binance-websocket-api.docs.lucit.tech/unicorn_binance_websocket_api.html?highlight=restart_timeout#module-unicorn_binance_websocket_api.manager

@LUCIT-Systems-and-Development LUCIT-Systems-and-Development locked and limited conversation to collaborators Mar 12, 2022
@oliver-zehentleitner oliver-zehentleitner converted this issue into discussion #254 Mar 12, 2022
@oliver-zehentleitner oliver-zehentleitner unpinned this issue Mar 12, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
info Informations
Projects
Todo
  
Done
Development

No branches or pull requests

7 participants