XGBoost repeatedly copying data across machines - slowing down computation #33

ankurd28 · 2015-10-05T18:39:40Z

Fellow XGBoost Users,

I am facing a strange problem that I am hoping to get some help from you!
It seems that multi-machine multi-threaded XGBoost is taking more time to finish the task as compared to the multi-threaded version on a single machine!

Initially, I was experiencing trouble that XGBoost kept complaining that it was compiled in the local mode. However, I followed this issue reported by another user: xgboost is compiled in local mode #31 and solved it by following their advice.

However, now my job when run with a single machine but two threads completes in 17 seconds, whereas the same job with two machines and three threads (2 threads on one machine and 1 thread on another machine) takes ~90 seconds. I am running these jobs on AWS t2.medium and t2.micro instance.

Does anyone know why this might be happening? At this point of time, it seems to me, that either there is some thing wrong with my MPI setup (not sure what that might be though) or perhaps the way distributed XGBoost was compiled in issue #31 is not the correct way.

Thanks,
Ankur

ankurd28 · 2015-10-05T19:55:22Z

So, after some digging we found out the reason it was slow.
Distributed XGBoost with MPI is copying the data back and forth across the two machines and that is making the whole computation slow-down.

Anybody has any ideas on how to fix this data copying issue?

Thanks,
Ankur

tqchen · 2015-10-05T21:19:19Z

The data is indeed loaded from distributed data store, but only at startup time. So you can tell the difference from longer number of rounds.

The major goal of distributed xgboost is to scale up to the scale that could not be handled by single machine version. So it is totally possible that distributed version running slower than single node version, if the data fits into single node.

ankurd28 · 2015-10-06T17:43:01Z

Hi Tianqi,

Thank you for your response!

So, if I understand you correctly, speed would be of secondary concern as long as distributed xgboost can scale up across machines. It is good to understand the design goal, since that makes clear the trade-offs that have been made in the development.

Having said that, do you have any ideas on how it might be possible to speed up the distributed implementation of xgboost? In your opinion, would moving to Hadoop framework be beneficial here for speedup as compared to the MPI framework, in other words, does the xgboost implementation on top of Hadoop also loads data from a distributed data store over the network?

Thanks,
Ankur

tqchen · 2015-10-06T18:18:21Z

Hi @ankurd28 Speed is definitely important for us.

As the data scales up, the data loading cost over network is minor compared to the running cost of training in our experience (This is different from data processing problems like mapreduce, where little computation is done on each examples, and data locality is crucial).

Because more computation hits in as we get more data. It is likely not a problem for larger dataset. For small dataset, however, as the running cost already was low, and the data loading bottleneck surface up.

ankurd28 · 2015-10-06T19:24:51Z

Hi Tianqi,

Thanks a lot for your response!
I completely understand your point!

Best,
Ankur

ankurd28 changed the title ~~Multi-threaded XGBoost taking more time than single-threaded version~~ Multi-machine multi-threaded XGBoost taking more time than single-machine multi-threaded version Oct 5, 2015

ankurd28 changed the title ~~Multi-machine multi-threaded XGBoost taking more time than single-machine multi-threaded version~~ Multi-machine XGBoost taking more time than single-machine version Oct 5, 2015

ankurd28 closed this as completed Oct 5, 2015

ankurd28 reopened this Oct 5, 2015

ankurd28 changed the title ~~Multi-machine XGBoost taking more time than single-machine version~~ XGBoost repeatedly copying data across machines - slowing down computation Oct 5, 2015

ankurd28 closed this as completed Oct 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGBoost repeatedly copying data across machines - slowing down computation #33

XGBoost repeatedly copying data across machines - slowing down computation #33

ankurd28 commented Oct 5, 2015

ankurd28 commented Oct 5, 2015

tqchen commented Oct 5, 2015

ankurd28 commented Oct 6, 2015

tqchen commented Oct 6, 2015

ankurd28 commented Oct 6, 2015

XGBoost repeatedly copying data across machines - slowing down computation #33

XGBoost repeatedly copying data across machines - slowing down computation #33

Comments

ankurd28 commented Oct 5, 2015

ankurd28 commented Oct 5, 2015

tqchen commented Oct 5, 2015

ankurd28 commented Oct 6, 2015

tqchen commented Oct 6, 2015

ankurd28 commented Oct 6, 2015