Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak during disk reading on large dataset #64

Closed
Refefer opened this issue Jan 17, 2018 · 4 comments
Closed

Memory leak during disk reading on large dataset #64

Refefer opened this issue Jan 17, 2018 · 4 comments

Comments

@Refefer
Copy link

Refefer commented Jan 17, 2018

Hi there,

Using master, it appears there's a memory leak when reading in the initial dataset. Specifying the --disk flag, it appears to happen when checking for the max feature size. It runs until OOM ends up killing it, which is after consuming around 105gbs of memory on this particular machine.

Our dataset of choice is 68Gbs in size, about 160 million samples, with 1 million features total (sparse). Happy to provide more information as needed.

Any thoughts?

@Refefer
Copy link
Author

Refefer commented Jan 17, 2018

It appears to be a leak in general, most likely in the OndiskReader since it continues to leak when using a smaller dataset during train.

@aksnzhy
Copy link
Owner

aksnzhy commented Jan 17, 2018

Thanks for your report. We will check and fix this problem as soon as possible.

@aksnzhy
Copy link
Owner

aksnzhy commented Jan 18, 2018

Hi, @Refefer Thanks for reporting this important bug. I have already fixed it and you can try it now.
Please let me know if there has any other problem!

@Refefer
Copy link
Author

Refefer commented Jan 19, 2018

@aksnzhy ran some tests and it appears your patch fixed the problem. Closing and thanks!

@Refefer Refefer closed this as completed Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants