-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External memory, gpu_hist and subsampling combination bug #7476
Comments
…7476) - The error happens because when reading from external memory the batch is reassembled for every new iteration. The variable `original_page_` is initialized from the first batch, when the constructor of `GradiendBasedSample` is called. After iterating through the batches the original memory is not accessible, so when trying to access the memory pointed by `original_page_` causes an error. - The solution is instead of accessing data from the `original_page_`, to access the data from the first page of the available batch. fix dmlc#7476
…7476) - The error happens because when reading from external memory the batch is reassembled for every new iteration. The variable `original_page_` is initialized from the first batch, when the constructor of `GradiendBasedSample` is called. After iterating through the batches the original memory is not accessible, so when trying to access the memory pointed by `original_page_` causes an error. - The solution is instead of accessing data from the `original_page_`, to access the data from the first page of the available batch. fix dmlc#7476
Thanks for sharing! Ping me when you have the fix. ;-) |
Might be tricky. One needs to make a copy of the original page. |
I think we can obtain the shared ptr via: std::shared_ptr<EllpackPage const> page =
dmat->GetBatches<EllpackPage>(batch_param).begin().Page(); But I'm curious about your solution. ;-) |
Hi, Here is my solution. Regarding the original page, the memory that the original_page_ points to is no longer accesible after iterating. In that manner, I think a copy of that pointer is not needed. Thank you for the suggestion. Best Regards |
- This commit refers to the suggestion dmlc#7481 (review) - Adds a test that accompanies the fix dmlc#7476, the test segfaults before the commit dmlc#7481.
…7476) - The error happens because when reading from external memory the batch is reassembled for every new iteration. The variable `original_page_` is initialized from the first batch, when the constructor of `GradiendBasedSample` is called. After iterating through the batches the original memory is not accessible, so when trying to access the memory pointed by `original_page_` causes an error. - The solution is instead of accessing data from the `original_page_`, to access the data from the first page of the available batch. fix dmlc#7476
- This commit refers to the suggestion dmlc#7481 (review) - Adds a test that accompanies the fix dmlc#7476, the test segfaults before the commit dmlc#7481.
Hi,
I've detected that xgboost segfaults when I try to use subsampling with external memory and gpu_hist as a parameter.
Segfault happens when the number of batches is greater than 1 and subsampling is less than 1.0.
Here I send the script that causes the segfault.
I have a solution for the bug, a pull request follows quickly.
external_memory_3.zip
Best Regards
The text was updated successfully, but these errors were encountered: