-
Notifications
You must be signed in to change notification settings - Fork 153
XGBoost repeatedly copying data across machines - slowing down computation #33
Comments
So, after some digging we found out the reason it was slow. Anybody has any ideas on how to fix this data copying issue? Thanks, |
The data is indeed loaded from distributed data store, but only at startup time. So you can tell the difference from longer number of rounds. The major goal of distributed xgboost is to scale up to the scale that could not be handled by single machine version. So it is totally possible that distributed version running slower than single node version, if the data fits into single node. |
Hi Tianqi, Thank you for your response! So, if I understand you correctly, speed would be of secondary concern as long as distributed xgboost can scale up across machines. It is good to understand the design goal, since that makes clear the trade-offs that have been made in the development. Having said that, do you have any ideas on how it might be possible to speed up the distributed implementation of xgboost? In your opinion, would moving to Hadoop framework be beneficial here for speedup as compared to the MPI framework, in other words, does the xgboost implementation on top of Hadoop also loads data from a distributed data store over the network? Thanks, |
Hi @ankurd28 Speed is definitely important for us. As the data scales up, the data loading cost over network is minor compared to the running cost of training in our experience (This is different from data processing problems like mapreduce, where little computation is done on each examples, and data locality is crucial). Because more computation hits in as we get more data. It is likely not a problem for larger dataset. For small dataset, however, as the running cost already was low, and the data loading bottleneck surface up. |
Hi Tianqi, Thanks a lot for your response! Best, |
Fellow XGBoost Users,
I am facing a strange problem that I am hoping to get some help from you!
It seems that multi-machine multi-threaded XGBoost is taking more time to finish the task as compared to the multi-threaded version on a single machine!
Initially, I was experiencing trouble that XGBoost kept complaining that it was compiled in the local mode. However, I followed this issue reported by another user: xgboost is compiled in local mode #31 and solved it by following their advice.
However, now my job when run with a single machine but two threads completes in 17 seconds, whereas the same job with two machines and three threads (2 threads on one machine and 1 thread on another machine) takes ~90 seconds. I am running these jobs on AWS t2.medium and t2.micro instance.
Does anyone know why this might be happening? At this point of time, it seems to me, that either there is some thing wrong with my MPI setup (not sure what that might be though) or perhaps the way distributed XGBoost was compiled in issue #31 is not the correct way.
Thanks,
Ankur
The text was updated successfully, but these errors were encountered: