-
Notifications
You must be signed in to change notification settings - Fork 4k
[LocalDB] OutOfMemory when creating a new DB #101
Comments
What is your interval (this determines the amount of history Gekko will try to keep in memory)? I am currently running multiple instances with memory logging each fetch cycle. The memory does not appear to be leaking right now (though the memory footprint is indeed quite large). I will keep this running for a couple of hours to see what happens. Here some things I already know:
*Why neDB? I need to persist historical data and these are the limitations:
That leaves me with 4 options I think:
I am currently leaning towards the last option, anybody has any other ideas? If anybody wants to dicsuss: get on IRC: #gekkobot (freenode)! |
I kept the standard settings:
I will let this run until he end of the day to see if the problem replicates when the new file needs to be created. |
Pure time-series storage? Though. I have no experiences, but a few guys https://github.com/creationix/nstore regards On Mon, Dec 30, 2013 at 12:39 PM, Mike van Rossum
|
So same error today when the new file is created:
A couple of hours before this the mem usage was: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND |
What commit are you on?
|
I'm on this one 2d9f142. I'll get the latest version. |
@gabbello we are working on replacing the biggest memory hug in the localDB version (neDB) with our own solution. @yin thanks for the tests! I went a little further with your tests and found an even more compact solution than storing in BSON: store in CSV and gzip (using node's zlib) it before saving it to disk. The tests are a mess now because of my playing but this is a really simple implementation hacked together: https://gist.github.com/askmike/8191017 What do you think? (I could add some basic state to keep days 'open' by holding them in memory, if we need to append a new candle we can push to the array in memory and use the write method). I haven't tested a lot of things yet.. EDIT: @gabbello note that the localDB branch is still using neDB so you probably won't notice a difference yet.. |
The latest was broken when I run it , I just want to find latest usable
|
ok, thanks guys, I will wait for a new version without neDB |
Looks great Mike, actually. For the moment gzip's fine, native zlib should In my company, they generate GBs of CSVs a day and they keep trading, so Can you Let me know if you need me tomorrow on this? https://github.com/nmrugg/LZMA-JS and others, if someone asks.
|
Ah, I am not sure if the one in node core is JS based or native (the function expects a callback which might hint that is at least being compressed in another thread), but for now that probably would not matter that much. If you would have some time available tomorrow that would be awesome! I have the entire daily / minutely DB implementation in the codebase in my head so I think it would be best to let me worry about the state of days in memory, etc. But if you want to help out we would need some tests for the functions I wrote in the gist (so the same as you did before, just not with BSON anymore and maybe you can use my Store straight away). |
Can I suggest that you DON'T compress the ondisk files within gekko - one of the advantages of the current db is that it is human readable and relatively easy to analyse in excel for example. So I would suggest sticking to a simple text based format (CSV sounds good!) - management of disk space can be done externally in a similar way to log file rotation on ux systems. |
So you want a switch to turn off automatic compression?
|
@yin - why compress anyway - the data isn't that big and requirements will vary so you will just end up with so many options, on my desktop I am not bothered and can keep any number of files. On a Pi I may want compression and to keep a minimum number of files. So handle it like log files - in fact you could probably use the log rotation engine on a ux system - and do it outside of gekko. On my machine the current DB files are 113KB and compress down to 47KB. Thinking about the process I am not sure that any data needs to be held in memory.... (assume a CSV) - to store a new candle you are just appending a new line to a file (and the candle itself is passed around internally as a data structure) so no database / memory requirement there. The only time that the history is accessed is on startup so this is an infrequent operation and again can be handled as file operations and no need to cache the data in memory, just provide methods to retrieve historical data from the file. |
@djmuk when storing the data we have to make a tradeoff between: stuff to keep in memory / disk space usage / CPU power. I don't know what systems people are using to run Gekko so I put this question on the forum. If we would not compress the history we would save on readability of the files* and CPU power, while losing disk space. I think backtesting and looking at historical data is of extreme importance. I have some far future ideas of Gekko connecting online to grab a trading method online (maybe even eToro style), but until than it is important to keep as much history around as possible IMO. Log rotation would kill this.
True and one of the things about compression is that we can't just append, so we need to keep in memory max the rest of the the day (which for a full day is 500KB in memory as an JS array, as a CSV string probably smaller). I'm not sure if this outweighs the benefit of getting more than 2 times smaller data on disk.
Right now it's infrequent (but not perse on startup, in the scenario where the full history is not available on startup it will come as soon as it's ready). So the only data that needs to be cached is: if we are watching a market (eg. inserting new stuff every minute) - keep track of the current day so we can easily append. But I do have plans for building other stuff on top of Gekko like a web GUI, in this scenario we need to access the data quite often to spawn charts, etc. TLDR: Arguments for compressing:
Arguments against compressing:
Why I think small files are important:
|
On Tue, Dec 31, 2013 at 11:14 AM, Mike van Rossum
|
@yin - are you are saying it is cheaper to store raw data than to spend the cpu power in compressing it? My example of log rotation was to illustrate that it isn't syslogd's problem to compress or manage the data files, it is handled externally and I think gekko should do the same and leave it to an external process (even if it just a cron job/script that is included in the distribution), after all on windows I could just dump them into a compressed folder if I was worried about space. 100KB/day = 36MB/year, given the way storage costs scale I could store 100-1000x that and not worry. I think it is MORE important that the data is stored in a transparent format such as csv, as you say historical data is important so it needs to be accessible independently of gekko, |
@djmuk I rather write a 20 LOC script that can convert Gekko's storage into a more general format than to take transparency of format into consideration: if there ever comes a new JS based datastore that's superfast and does exactly what we need, I don't want to be stuck with a 'transparant format'. Because of CPU issues we can make compressing optional, that's fine. Also looking at the data by running something that draws a chart > opening database files. But the 100KB per day is a single market. If you want to watch BTC-e (18 markets) that's almost 2MB per day ~ 700B per year. And I don't even dare to count how many markets cryptsy has (I think close to / over 100, that would be 10MB / day ~ 3.6GB per year). Why do we want to store > 1 market? Right now the only method (EMA) is extremely simple, more advanced methods correlate between different markets (esp ones where FIAT/BTC markets creates trends that bubble through the rest). Also arbitrage. I don't think we should optimize Gekko for storing this amount of data, but it shouldn't eat up your harddrive if you want to watch 1 exchange IMO. |
On Tue, Dec 31, 2013 at 3:24 PM, djmuk notifications@github.com wrote:
@mike Let me make compression configurable, when time allows.
|
fixed. |
Note that I'm running geeko on a low memory device (raspberryPi 256 MB).
So the localDB instance that I started run for about 16 H and then crashed when creating the file for the new day with the following message:
When I restarted the instance this morning the first lines were (I presume that this means that the system is using the file for 29th, but makes a new one starting from first candle for 30th):
After restarting the instance see below the MEM usage
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16216 pi 20 0 82756 20m 6588 S 0,0 9,8 0:09.55 node
Note that during yesterday (after about 10 h of running) I noted that MEM usage was about 4 times higher than it is now.
In the same interval I was/and still am running a master branch geeko without any problems.
The text was updated successfully, but these errors were encountered: