Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] external memory support for tree_method='hist' #4093

Open
CodingCat opened this issue Jan 31, 2019 · 10 comments
Open

[Roadmap] external memory support for tree_method='hist' #4093

CodingCat opened this issue Jan 31, 2019 · 10 comments
Assignees
Labels

Comments

@CodingCat
Copy link
Member

@CodingCat CodingCat commented Jan 31, 2019

in the doc we mentioned that Distributed and external memory version only support tree_method=approx

but is it valid now? what's the problem with external memory + hist? I tested but didn't see any issue, did I miss anything?

@CodingCat

This comment has been minimized.

Copy link
Member Author

@CodingCat CodingCat commented Jan 31, 2019

@CodingCat

This comment has been minimized.

Copy link
Member Author

@CodingCat CodingCat commented Jan 31, 2019

hmmm....looks like the training converged too early....but didn't get the idea why it is related to external memory or the row.page has some problem....

@hcho3

This comment has been minimized.

Copy link
Collaborator

@hcho3 hcho3 commented Jan 31, 2019

@CodingCat When I wrote the fast hist, I assumed that each DMatrix would have a single block, i.e. entire data fit in the memory. External memory support has not been tested. Is external memory high in your priority?

@trivialfis

This comment has been minimized.

Copy link
Member

@trivialfis trivialfis commented Jan 31, 2019

There's open issue related to external memory #4037. I'm still struggling to understand it...

@CodingCat

This comment has been minimized.

Copy link
Member Author

@CodingCat CodingCat commented Jan 31, 2019

@hcho3 , yes, our data size makes external memory a must-have......but I am a bit confused which part makes hist does not work with external memory....

@trivialfis what I observed is that when I setup external memory with hist tree_method, the metrics (train/test) doesn't change after 5 - 6 iterations, though the training still moves forward.....

@CodingCat

This comment has been minimized.

Copy link
Member Author

@CodingCat CodingCat commented Feb 8, 2019

I have started looking into why external memory version of hist is messed up with accuracy

@CodingCat CodingCat self-assigned this Feb 8, 2019
@hcho3

This comment has been minimized.

Copy link
Collaborator

@hcho3 hcho3 commented Feb 8, 2019

@CodingCat Is this issue blocking?

@CodingCat

This comment has been minimized.

Copy link
Member Author

@CodingCat CodingCat commented Feb 8, 2019

no, it just means you need more memory to use distributed hist,

@hcho3 hcho3 changed the title Distributed and external memory version only support tree_method=approx??? [Roadmap] external memory support for tree_method='hist' Mar 8, 2019
@hcho3 hcho3 added the type: roadmap label Mar 8, 2019
@rongou

This comment has been minimized.

Copy link
Contributor

@rongou rongou commented Feb 21, 2020

@CodingCat Is this still an issue? As part of my work on gpu external memory support, I tested hist on the Higgs dataset and got identical AUC metrics. The external memory version is actually slightly faster.

Mode Time(seconds) AUC
hist 1309.64 0.8393
hist external memory 1228.53 0.8393
@CodingCat

This comment has been minimized.

Copy link
Member Author

@CodingCat CodingCat commented Feb 21, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.