Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows 10 out of memory error #23

Closed
mberry opened this issue Sep 5, 2016 · 27 comments
Closed

Windows 10 out of memory error #23

mberry opened this issue Sep 5, 2016 · 27 comments

Comments

@mberry
Copy link

mberry commented Sep 5, 2016

Referring to production version, not m2

At low block heights factomd starts and syncs perfectly, at higher blocks (~30,000?) it crashes on startup with out of memory error.

Physical memory is abundant, set virtual memory to an oversized 20gb pagefile, deleted blockchain and re-synced, still getting the same runtime error.

Windows 10 / 8gb ram / .factom stored on ssd C: drive with >30Gb free

runtime: out of memory: cannot allocate 65536-byte block (1073020928 in use)

runtime stack: http://pastebin.com/raw/bVnt4CJC

@michaelbeam
Copy link
Contributor

Try setting the LogLevel in factomd.conf to 'notice' instead of 'debug' or 'info'. The higher log levels will cause factomd to try and create thousands of files and may cause weird behavior with memory or the file system.

@mberry
Copy link
Author

mberry commented Sep 6, 2016

Cycled through all logging levels, still same error.
Looking online, see numerous instances of Windows memory allocation in Go causing problems, running a Linux VM on same computer works perfectly. Personally happy using linux, just wanted to raise the issue.
Possibly fixed in Go 1.7, there's a mention in the comments:

https://golang.org/src/runtime/malloc.go

// Number of bits in page to span calculations (4k pages).
// On Windows 64-bit we limit the arena to 32GB or 35 bits.
// Windows counts memory used by page table into committed memory
// of the process, so we can't reserve too much memory.
// See https://golang.org/issue/5402 and https://golang.org/issue/5236.
// On other 64-bit platforms, we limit the arena to 512GB, or 39 bits.

Thanks,

@michaelbeam
Copy link
Contributor

Thanks, we will probably not switch to go1.7 this late in the development process for m2 but we will probably use it for the next minor release afterwards. Hopefully that will clear out problems with windows

@mberry
Copy link
Author

mberry commented Sep 7, 2016

Thanks Michael,
From reading the forums it seems to only affect some Windows users, possibly just Win10. From my experience Microsoft seems to have broken a lot of things that worked perfectly fine in 8.
Cheers!

@carryforward
Copy link
Contributor

carryforward commented Sep 7, 2016

I have cross compiled the current version using golang 1.7 for 32 bit.

http://dropjar.com/#14yl3uhh

7bc9310e1d82a304ff99b421552efc8eadc44b8e8eb0f4d0bd4ac22f30259cee 17experiment.zip
c7154f9cb9c96a6fbc58285f772cc70884d934571371ee932759064e0bbdbd56 factom-cli.exe
edab82ba2547af39a278e733ea5a2cf4f12e94b6ed74795ee4dcb6ec031222e8 factomd.exe
369924e70ca1f4151cbcf97ee5fa393e5e585494901aa90e4c68815e7dd21da7 fctwallet.exe
37d13cc8fa748584802954763e8485247ccfca7ad6a4bacbbc9503776442b40c walletapp.exe

sha256

I would be very curious if go v1.7 has fixed this issue.

@carryforward
Copy link
Contributor

carryforward commented Sep 7, 2016

and here is a batch compiled as 64 bit binaries.

8c734d9896752f4cb29cdbdd56b2ea99dfe474f30da21be9a8bd0195847d921e 17x64experiment.zip
cf2c00269a6bd3cf050798754af0ebcf7a77cced34bfcdc7269ef987c1ff699d factom-cli.exe
73ede059f0218697fe90757e352c2a8a747d01abe9d1c9b5038038a5c2a85c5a factomd.exe
3f10b18a6444e6506b7c4bd7eaaa034390391d229737a3454c5e527fc0624487 fctwallet.exe
85b0a938935397a9c1b3cf57ee02138652d5cf02d248f0f8a92898a8d5f7b268 walletapp.exe

http://dropjar.com/#10yb87mw

I would be very happy if this issue was as easy to fix as recompiling with an updated go version. I find it unlikely, but if it worked, it would be great. Let me know if these crash too.

@mberry
Copy link
Author

mberry commented Sep 8, 2016

Hi Brian,
deleted /.factom and put the 64-bit binary on overnight, synced up fine, restarts perfectly, as far as I can tell theres a working factomd on my windows pc now.

Don't believe I changed anything in the windows environment, put the pagefile back to it's original size, but that is it. Running the Go 1.6 binary still crashes with the memory issue.

Had a poke around seeing if I could break something, fctwallet behaving normally, generated an address, imported a test papermill address from my linux box, transferred some spare change around, basic stuff all seems to be functional.

Copied over the staticfiles and walletapp starts, didn't test though.

Haven't tested any of the 32 bit versions.

Output for reference: http://pastebin.com/raw/qtrENCvJ

@carryforward
Copy link
Contributor

hmm, this seems too good to be true.

One problem you may see is if you restart factomd, it will seem to start fine, but it will not download new blocks that are created.

you can check this by restarting factomd (without deleting .factom), and in another terminal run the command "factom-cli get height"

It should show the same block height as http://explorer.factom.org/

If your factomd can keep up with the network for a few hours, then we might have something.

I am hopeful we might have fixed this.

@mberry
Copy link
Author

mberry commented Sep 8, 2016

Yeah, for sure, I'll leave it on. Definitely share your suspicions, was much too simple.

Is up to date, at block 53480 currently.

@mberry
Copy link
Author

mberry commented Sep 8, 2016

factomd:

12:32:29 2016-09-08 [INF] BMGR: At 53485: syncing to block height 53485 from peer 52.19.117.149:8108
2016/09/08 12:32:33 [::1] - GET /v1/directory-block-height/ - 0s
2016/09/08 13:30:53 [::1] - GET /v1/directory-block-height/ - 511.3µs
2016/09/08 13:42:05 [::1] - GET /v1/directory-block-height/ - 500.1µs
2016/09/08 13:55:05 [::1] - GET /v1/directory-block-height/ - 500.1µs
2016/09/08 14:47:29 [::1] - GET /v1/directory-block-height/ - 0s
2016/09/08 16:07:29 [::1] - GET /v1/directory-block-height/ - 0s
2016/09/08 16:11:29 [::1] - GET /v1/directory-block-height/ - 0s

factom-cli:

C:\Users\berry\Desktop\17x64>factom-cli get height
53491
C:\Users\berry\Desktop\17x64>factom-cli get height
53492
C:\Users\berry\Desktop\17x64>factom-cli get height
53493
C:\Users\berry\Desktop\17x64>factom-cli get height
53498
C:\Users\berry\Desktop\17x64>factom-cli get height
53506
C:\Users\berry\Desktop\17x64>factom-cli get height
53507

Working fine so far.

@mberry
Copy link
Author

mberry commented Sep 8, 2016

Can i suggest you put something in response to the people raising windows concerns on reddit and bitcointalk? You need more people testing it, have read the forums and was very hard to determine if it was the same issue as I had.

Also GET requests for balances still taking a very long time, another day i'll play around and raise it in the appropriate place.

Cheers mate.

Edit: I do think it's the same issue, but such a minefield, hopefully is all memory related.

@carryforward
Copy link
Contributor

And it keeps working even after restarting factomd? Lets do that before calling for more testing.

The balance taking a long time is because fctwallet loads the entire factoid sub chain each time it is restarted via slow API calls to factomd. Subsequent balance checks should be much faster. The m2 version, factom-walletd, has a local cache which will make the initial call much faster.

@carryforward
Copy link
Contributor

looking closer I see you did restart it. lets do it one more time for good measure.

@mberry
Copy link
Author

mberry commented Sep 8, 2016

I'll give it a few restarts. Do you need an ip address to see my node? Happy to send it to you as a personal message.

Had one get height request stall, possibly i'm just not a patient person though. Request did appear in the stack when I restarted.

@mberry
Copy link
Author

mberry commented Sep 8, 2016

Sorry, just realised you can see the node.

here's the output

http://pastebin.com/raw/Djjuzf9e

@carryforward
Copy link
Contributor

I wonder if the cause of the errors lies in the program not being shut down gracefully. you were closing it down with ctrl+c in this session, but I think most people just click the red x in the top right. it might not close the database correctly, or something along those lines.

Would you mind trying the golang 1.6.2 code and closing it with ctrl+c to see if that makes a difference?

@mberry
Copy link
Author

mberry commented Sep 8, 2016

And yeah thanks about the get requests, there's another call that eludes me right now but does take a very long time to process initially and then once cached it's instant, my solution was to just include them when opening the daemon. Always assumed you guys knew about it, felt a bit stupid to say something.

@mberry
Copy link
Author

mberry commented Sep 8, 2016

Ctrl + c always, i'll give it a crack now

@carryforward
Copy link
Contributor

yah we know about the slowness. getting the balance and listing the transactions both take a long time the first time. it's ugly, but works.

ok, if you were ctrl+c'ing before then that probably doesn't have much to do with it.

as per the db, I would copy the folder first, so you can restore it. first try it with the database you have. give it a few blocks, then X it, or maybe it won't even load the whole thing in 1.6.2.

@mberry
Copy link
Author

mberry commented Sep 8, 2016

In most cases shut down with ctrl +c, may have force closed at some point, definitely a possibility.

Give me some time, i'll go through it all, is there any tests/circumstances you think will be good to run through?

@carryforward
Copy link
Contributor

I guess the worst case is Xing out the window around 35k blocks as it is downloading. it downloads in chunks of 500, so it sits for a while then the measured height ticks up fairly rapidly. my guess is the work case to close it is when it is in the block increasing phase. it might be worse in the apparent stalled phase though.

you might see the block increase rate easier with this: http://localhost:8090/controlpanel

@mberry
Copy link
Author

mberry commented Sep 12, 2016

Ran a few more tests, started from scratch with v1.6go, force closed in both phases numerous times, didn't affect restart. Had a script making get height calls through factom-cli every 5 mins, which eventually stopped right at the current block at the time. Despite not responding factomd was still processing blocks for 40 mins after.

Manually tried a few cli commands, it just hung everytime with no response. In hindsight should have bypassed factom-cli and directly made a GET request, most likely the same outcome. Closed and restarted v1.6 to get out of memory error.

Checksums matched for the interrupted v1.6 ldb files and those built by v1.7 earlier. So hasn't corrupted anything. Let me know if you think there's merit in comparing any other files in the factom folder.

v1.7go was able to start from there, and left it running for most of a day, haven't had any issues so far, but only tried basic stuff, I'll give it a run through all the other commands later to see if the newer go version causes any problems.

Nothing new in there, but here's the output and notes just in case:
http://pastebin.com/raw/zFaxfLdU

@carryforward
Copy link
Contributor

carryforward commented Oct 3, 2016

Here are some experimental linux 64 bit + golang v1.7 binaries too:

36d31827b552412e925f035ee04b6aa8e506e5a41bd00bada8789b8757736919 factom-cli
8a55d59187a84b0fada61c772f6282cbd39e19e2d3b794bee622b53f57a0535f factomd
66b7e86ae53ca95a2694f3850849d6fdb8029cf3854edc4d0b28c8aa7df5bbf0 fctwallet
d611adc179e23a5d7d38c3fe9666d6dc1ed99c6b2d22fb82afab0da9bf5e9f30 walletapp

fa100ff6c569f77dffa5be82978b090a6fb1d4b33ff355abef5c71fd443b8a6b linux64_1.7.zip
http://dropjar.com/#16h15epm

@mberry mberry changed the title Windows out of memory error Windows 10 out of memory error Oct 5, 2016
@carryforward carryforward reopened this Oct 17, 2016
@carryforward
Copy link
Contributor

linux64_1.7_experiment.zip
win32_17experiment.zip
win64_17experiment.zip

I just realized I can attach files here. no more dropjar expirations.

@mberry
Copy link
Author

mberry commented Oct 23, 2016

Have now encountered the memory allocation problem on my linux virtual machine, guess it's related to chain size? Linux worked fine with 1.6 a few weeks ago. Ubuntu 16.04, 4Gb ram, two cores allocated.

Been through most api calls on windows with 1.7 daemon and no problems. Untested so far on linux.

Edit: sorry, forgot to mention linux64_1.7 fixed the problem also.

@carryforward
Copy link
Contributor

yah, it may seem like we aren't putting much dev effort behind optimizing the M1 code (this code). We are mostly just keeping it working well enough to get to the 2nd phase. M2 will be much better resource wise.

@carryforward
Copy link
Contributor

please try the new 64 bit windows binaries.
https://github.com/FactomProject/distribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants