Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup Issues and Large Problems #2

Open
GoogleCodeExporter opened this issue Mar 21, 2015 · 3 comments
Open

Speedup Issues and Large Problems #2

GoogleCodeExporter opened this issue Mar 21, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

Hello,
   I am noticing that the speedup is very bad as the number of processors
increases beyond 4. It is very easy to see what the problem is. The serial
Cholesky Factorization (Not the PICF) used in the IPM will always take the
same amount of time for a fixed problem size independent of the number of
processors. Thus  If this were done in parallel, the speedup might be improved.
Here are the results I got for a cluster of 8-core Xeon Nodes using an
infiniband network.
 CPUS   TIME    TRAIN   IO  ICF IPM B EST.  OUT EFF   SpdUp
2   16.507  13.684  1.547   3.529   3.344   6.810   1.277   1.000 2.000
4   13.955  12.398  0.755   3.197   2.749   6.451   0.803   0.591 2.366
8   12.015  10.929  0.515   2.466   2.490   5.974   0.570   0.343 2.748
16  6.425   5.437   0.529   1.137   2.356   1.944   0.459   0.321 5.138
24  5.082   4.181   0.163   0.512   2.396   1.272   0.739   0.271 6.496
32  5.684   4.269   1.050   0.968   2.307   0.994   0.365   0.182 5.808

Also , there seems to be some issues if the input file get too large..
The program never get through the PICF.

Thanks, 
  Patrick Nichols

Original issue reported on code.google.com by PatJNichols@gmail.com on 5 Aug 2008 at 5:44

@GoogleCodeExporter
Copy link
Author

Hi Patrick,

What is your dataset size and what is your parameter for -rank_ratio? 
For now, Cholesky Factorization is serial because it usually works on
a smaller matrix. During our experiment on RCV 800k dataset, we set rank_ratio
to 0.001 so that the matrix CF works on is 800*800. I suspect you set 
rank_ratio to 
a large value which may cause bad speedup. You could decrease rank_ratio and 
try.

In fact, we used to consider Parallel Cholesky Factorization, but it will be
even slower on distributed computers because it requires much communication.
For most problems, the matrix CF works on is set to be small through rank_ratio.

Original comment by baihong...@gmail.com on 19 Aug 2008 at 8:22

@GoogleCodeExporter
Copy link
Author

Thanks,
   That seems to drastically help the speedup. One quick question...I noticed that
the resulting treshold/bias for my training data set seems to change with 
different
rank_ratio parameters. My naive impulse is to assume that this is bad. Is this 
true?

Pat


Original comment by PatJNichols@gmail.com on 23 Aug 2008 at 10:34

@GoogleCodeExporter
Copy link
Author

Because for Interior Point Method, we have to do approximation to make it 
solvable.
-rank_ratio is to control this approximation. Generally, the larger the 
rank_ratio
is, the better the result is. But we have to trade off between time and 
accuracy.
Make #number_of_data * #rank_ratio =1000 will be generally enough.

Original comment by baihong...@gmail.com on 24 Aug 2008 at 7:31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant