Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Provide recommendation for mnative? #348

Closed
Laurae2 opened this issue Mar 15, 2017 · 9 comments
Closed

[R-package] Provide recommendation for mnative? #348

Laurae2 opened this issue Mar 15, 2017 · 9 comments

Comments

@Laurae2
Copy link
Contributor

Laurae2 commented Mar 15, 2017

@guolinke I am just wondering if recommending using mnative can yield better performance for those installing directly from install_github (default is mcore2 in R).

Installation log when installing using install_github in Windows for instance: we can see it is tuned for Core 2 architecture:

c:/Rtools/mingw_64/bin/g++ -m64 -std=c++0x -I"C:/PROGRA~1/MIE74D~1/RCLIEN~1/R_SERVER/include" -DNDEBUG -I../..//include -DUSE_SOCKET      -I"C:/swarm/workspace/External-R-3.3.2/vendor/extsoft/include"  -fopenmp -pthread -std=c++11   -O2 -Wall  -mtune=core2 -c lightgbm-all.cpp -o lightgbm-all.o
c:/Rtools/mingw_64/bin/g++ -m64 -std=c++0x -I"C:/PROGRA~1/MIE74D~1/RCLIEN~1/R_SERVER/include" -DNDEBUG -I../..//include -DUSE_SOCKET      -I"C:/swarm/workspace/External-R-3.3.2/vendor/extsoft/include"  -fopenmp -pthread -std=c++11   -O2 -Wall  -mtune=core2 -c lightgbm_R.cpp -o lightgbm_R.o
c:/Rtools/mingw_64/bin/g++ -m64 -shared -s -static-libgcc -o lightgbm.dll tmp.def ./lightgbm-all.o ./lightgbm_R.o -fopenmp -pthread -lws2_32 -liphlpapi -LC:/swarm/workspace/External-R-3.3.2/vendor/extsoft/lib/x64 -LC:/swarm/workspace/External-R-3.3.2/vendor/extsoft/lib -LC:/PROGRA~1/MIE74D~1/RCLIEN~1/R_SERVER/bin/x64 -lR

This would require adding in the README.md of the R-package that to maximize performance, adding -march=native should be done but might break packages.

Regarding -O3 (if we were to push for even more), I know it is refused by CRAN for compatibility issues (some packages are breaking with -O3).

@chivee
Copy link
Collaborator

chivee commented Mar 16, 2017

@Laurae2 , did that means that we should alter the c++ build rather than just R libraries. I think we can make that a suggestion rather than a compulsory process.

@guolinke
Copy link
Collaborator

@Laurae2
I remember the difference between O2 and O3 in LightGBM is very small.
You can try some benchmarks on this.

@Laurae2
Copy link
Contributor Author

Laurae2 commented Mar 17, 2017

@chivee no, this would just be a suggestion to users if they want to achieve better local training speed. I'm not sure if it has a major impact though, I'll test all that thoroughly before I make a PR. As @guolinke there are very small differences just for O2 and O3 flag alone.

@guolinke when I get time on my server I'll try O3 and march=native to see what happens to the speed. I'm collecting a lot of (long) benchmarks since last month on xgboost and LightGBM to understand their performance (in ranking predictions (AUC), and speed) behavior depending on parameters.

I'll get back here once my new benchmarks are done.

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 1, 2017

@guolinke Some results here. Not posting the exact details for the benchmark because there will be more at a mini-conference I am doing next month.

Settings:

  • v1 is LightGBM v1
  • v2 is LightGBM v2 @1bf7bbd
  • default means compiled with -O2 -mtune=core2
  • O3 means compiled with -O3 -march=native
  • O3-fmath means compiled with -O3 -ffast-math -march=native
  • O2 means compiled with -O2 -march=native
  • Os means compiled with -Os

Best means the best flags for compilation for maximum speed, with default settings overriding all the others if the difference is not significant (<~1%) and not consistent (similar flags giving results off).

  • CPU: i7-3930K
  • R + gcc 4.9

Summary (tl;dr)

We notice LightGBM v2 with O3 -march=native (specifically -O3 is benefiting for the performance. LightGBM v1 has no visible benefits from using any other flags than the defaults currently. Depending on the model parameters, different flags provide different performance (like: LightGBM v2 + -march=native performance boost is kicking off when building deeper trees, or if the overhead is low/large like for 1 thread runs).

Therefore, the following recommendations could be made:

  • -O2 -mtune=core2 for LightGBM v1 for maximum performance.
  • -O3 -march=native for LightGBM v2 for maximum performance.
  • When doing cross-validation of models, it is always better running several processes with a small number of threads (like 4x process 1-thread) than a multithreaded single process sequentially (like 1x process 4-threads), even though your RAM might explode.

I will follow up with more in the next month.


Bosch, 12 threads, LightGBM v1:

Parameters v1 + default v1 + Os v1 + O2 v1 + O3 v1 + O3-fmath Best
depth=3 724.18s 903.17s 725.38s 729.89s 723.23s default
depth=6 579.29s 685.88s 584.64s 584.59s 583.89s default
depth=9 395.23s 454.56s 398.25s 400.50s 398.93s default
depth=12 596.55s 654.80s 596.90s 608.39s 604.25s default

Bosch, 12 threads, LightGBM v2:

Parameters v2 + default v2 + Os v2 + O2 v2 + O3 v2 + O3-fmath Best
depth=3 873.08s 1104.39s 861.57s 861.99s 872.17s O2
depth=6 730.06s 872.77s 724.59s 722.88s 724.98s O3
depth=9 567.59s 634.52s 570.66s 556.12s 614.80s O3
depth=12 854.97s 923.84s 845.12s 834.60s 847.38s O3

Bosch, 6 threads, LightGBM v1:

Parameters v1 + default v1 + Os v1 + O2 v1 + O3 v1 + O3-fmath Best
depth=3 913.44s 1208.02s 903.01s 921.13s 915.41s O2
depth=6 718.29s 885.44s 722.16s 723.94s 726.72s default
depth=9 449.03s 533.58s 451.60s 455.08s 452.59s default
depth=12 622.24s 704.10s 623.36s 618.28s 619.96s O3

Bosch, 6 threads, LightGBM v2:

Parameters v2 + default v2 + Os v2 + O2 v2 + O3 v2 + O3-fmath Best
depth=3` 956.25s 1248.24s 965.32s 969.56s 975.95s default
depth=6` 787.95s 952.82s 795.35s 782.70s 788.41s ???
depth=9` 548.84s 639.46s 546.65s 547.61s 547.05s ???
depth=12 770.47s 862.75s 766.49s 773.30s 762.61s ???

Bosch, 1 thread, LightGBM v1:

Parameters v1 + default v1 + Os v1 + O2 v1 + O3 v1 + O3-fmath Best
depth=3 2360.10s 3314.84s 2389.20s 2406.67s 2337.28s O3-fmath
depth=6 1757.84s 2335.01s 1810.60s 1816.25s 1769.16s default
depth=9 968.05s 1250.17s 994.99s 1007.10s 975.83s default
depth=12 1202.59s 1468.61s 1238.31s 1246.01s 1216.62s default

Bosch, 1 thread, LightGBM v2:

Parameters v2 + default v2 + Os v2 + O2 v2 + O3 v2 + O3-fmath Best
depth=3 2477.49s 3316.81s 2437.84s 2342.69s 2412.35s O3
depth=6 1850.66s 2334.77s 1830.01s 1745.34s 1799.20s O3
depth=9 1003.35s 1243.15s 990.65s 954.06s 970.39s O3
depth=12 1236.83s 1469.03s 1216.49s 1159.22s 1191.33s O3

@guolinke
Copy link
Collaborator

guolinke commented Apr 1, 2017

@Laurae2 Thanks for your benchmark 👍 .
If change to O3 is needed, you can create a PR for it.

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 3, 2017

@guolinke I'll open a PR to add a recommendation when I get some good charts ready and when the mini-conference will be ready soon (early next month), I'll link to it on the PR.

I also have xgboost benchmarks for comparison, do you want to see them? (I also got for nthread={1, 2, 3, 4, 5, 6, 12} and depth={3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, but then it gets very large in GitHub, I plan to make a blog post on it instead)

@guolinke
Copy link
Collaborator

guolinke commented Apr 3, 2017

Sure. The comparison benchmarks are always welcome. It can help to find out which part we can further improve.

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 4, 2017

@guolinke Here for xgboost:

  • xgboost is at commit b4d97d3
  • default means compiled with -O2 -mtune=core2
  • O3 means compiled with -O3 -march=native -funroll-loops
  • O3-fmath means compiled with -O3 -ffast-math -march=native -funroll-loops
  • -funroll-loops is added because it is xgboost's default (actually, not even seeing a difference with or without)

xgboost was "slow", I skipped -O2 -march=native and -Os (it took 2 days for each full benchmark per thread count, the singlethreaded run was very long to do).

To compare xgboost and LightGBM, best is copy&paste into Excel (or anything similar) and make charts. See the end of this comment for the Excel table example.

Default run:

image

Default flag:

image

image

-O3 flag:

image

image

-O3 -fast-math flag:

image

image


Summary (tl;dr)

Configuration to choose, difference might be large depending on case:

  • Deep trees and multithreading: -O2 -mtune=core2
  • Small trees and multithreading: -O3 -ffast-math -march=native -funroll-loops
  • No multithreading: -O3 -march=native -funroll-loops

One can see dmlc/xgboost#1950 more for understand xgboost implementation details.

More to come soon next month (on 10 May).


Bosch, 12 threads, xgboost depth-wise at b4d97d3:

Parameters dw + default dw + O3 dw + O3-fmath Best
depth=3 1049.86s 1037.48s 1026.85s O3-fmath
depth=6 832.13s 843.74s 789.30s O3-fmath
depth=9 790.78s 799.14s 788.94s default
depth=12 1288.12s 1303.58s 1323.37s default

Bosch, 12 threads, xgboost loss guide at b4d97d3:

Parameters lg + default lg + O3 lg + O3-fmath Best
depth=3 1047.75s 1042.41s 1030.32s O3-fmath
depth=6 844.80s 841.92s 838.87s O3-fmath
depth=9 799.60s 802.58s 797.94s default
depth=12 1263.58s 1292.64s 1330.31s default

Bosch, 6 threads, xgboost depth-wise at b4d97d3:

Parameters dw + default dw + O3 dw + O3-fmath Best
depth=3 1222.31s 1194.40s 1171.52s O3-fmath
depth=6 865.96s 866.79s 833.08s O3-fmath
depth=9 696.18s 710.25s 703.25s default
depth=12 1036.29s 1062.12s 1070.23s default

Bosch, 6 threads, xgboost loss guide at b4d97d3:

Parameters lg + default lg + O3 lg + O3-fmath Best
depth=3 1215.27s 1194.47s 1176.07s O3-fmath
depth=6 871.79s 860.68s 855.88s O3-fmath
depth=9 717.43s 714.81s 705.16s O3-fmath
depth=12 1061.09s 1077.32s 1089.91s default

Bosch, 1 thread, xgboost depth-wise at b4d97d3:

Parameters dw + default dw + O3 dw + O3-fmath Best
depth=3 3122.58s 2719.62s 2885.43s O3
depth=6 2076.36s 1909.22s 1967.32s O3
depth=9 1296.96s 1215.27s 1260.41s O3
depth=12 1684.07s 1520.32s 1577.45s O3

Bosch, 1 thread, xgboost loss guide at b4d97d3:

Parameters lg + default lg + O3 lg + O3-fmath Best
depth=3 3032.19s 2771.35s 2944.40s O3
depth=6 2049.57s 1941.74s 1934.76s O3-fmath
depth=9 1304.50s 1208.21s 1265.47s O3
depth=12 1571.86s 1503.40s 1615.36s O3

Excel table example:

Copy & paste:

  • LightGBM v1 table with header: on A1
  • LightGBM v2 table with header: on I1
  • xgboost-depthwise table with header: on Q1
  • xgboost-lossguide table with header: on W1
  • Paste all the table below on A7
  • Paste formula =INDEX($A$1:$AA$5,F8,G8) on E8, then double click the small box at bottom right on the cell to paste down
  • Paste formula =NUMBERVALUE(LEFT(E8, LEN(E8)-1)) on D8, then double click the small box at bottom right on the cell to paste down
  • Do the charts you want (even a pivot chart if you want)
Model Flag Depth Speed CellVal Row Column
LightGBM v1 default 3 2360.1 2 2
LightGBM v1 default 6 1757.84 3 2
LightGBM v1 default 9 968.05 4 2
LightGBM v1 default 12 1202.59 5 2
LightGBM v1 Os 3 3314.84 2 3
LightGBM v1 Os 6 2335.01 3 3
LightGBM v1 Os 9 1250.17 4 3
LightGBM v1 Os 12 1468.61 5 3
LightGBM v1 O2 3 2389.2 2 4
LightGBM v1 O2 6 1810.6 3 4
LightGBM v1 O2 9 994.99 4 4
LightGBM v1 O2 12 1238.31 5 4
LightGBM v1 O3 3 2406.67 2 5
LightGBM v1 O3 6 1816.25 3 5
LightGBM v1 O3 9 1007.1 4 5
LightGBM v1 O3 12 1246.01 5 5
LightGBM v1 O3-fmath 3 2337.28 2 6
LightGBM v1 O3-fmath 6 1769.16 3 6
LightGBM v1 O3-fmath 9 975.83 4 6
LightGBM v1 O3-fmath 12 1216.62 5 6
LightGBM v2 default 3 2477.49 2 10
LightGBM v2 default 6 1850.66 3 10
LightGBM v2 default 9 1003.35 4 10
LightGBM v2 default 12 1236.83 5 10
LightGBM v2 Os 3 3316.81 2 11
LightGBM v2 Os 6 2334.77 3 11
LightGBM v2 Os 9 1243.15 4 11
LightGBM v2 Os 12 1469.03 5 11
LightGBM v2 O2 3 2437.84 2 12
LightGBM v2 O2 6 1830.01 3 12
LightGBM v2 O2 9 990.65 4 12
LightGBM v2 O2 12 1216.49 5 12
LightGBM v2 O3 3 2342.69 2 13
LightGBM v2 O3 6 1745.34 3 13
LightGBM v2 O3 9 954.06 4 13
LightGBM v2 O3 12 1159.22 5 13
LightGBM v2 O3-fmath 3 2412.35 2 14
LightGBM v2 O3-fmath 6 1799.2 3 14
LightGBM v2 O3-fmath 9 970.39 4 14
LightGBM v2 O3-fmath 12 1191.33 5 14
xgboost-depthwise default 3 3122.58 2 18
xgboost-depthwise default 6 2076.36 3 18
xgboost-depthwise default 9 1296.96 4 18
xgboost-depthwise default 12 1684.07 5 18
xgboost-depthwise O3 3 2719.62 2 19
xgboost-depthwise O3 6 1909.22 3 19
xgboost-depthwise O3 9 1215.27 4 19
xgboost-depthwise O3 12 1520.32 5 19
xgboost-depthwise O3-fmath 3 2885.43 2 20
xgboost-depthwise O3-fmath 6 1967.32 3 20
xgboost-depthwise O3-fmath 9 1260.41 4 20
xgboost-depthwise O3-fmath 12 1577.45 5 20
xgboost-lossguide default 3 3032.19 2 24
xgboost-lossguide default 6 2049.57 3 24
xgboost-lossguide default 9 1304.5 4 24
xgboost-lossguide default 12 1571.86 5 24
xgboost-lossguide O3 3 2771.35 2 25
xgboost-lossguide O3 6 1941.74 3 25
xgboost-lossguide O3 9 1208.21 4 25
xgboost-lossguide O3 12 1503.4 5 25
xgboost-lossguide O3-fmath 3 2944.4 2 26
xgboost-lossguide O3-fmath 6 1934.76 3 26
xgboost-lossguide O3-fmath 9 1265.47 4 26
xgboost-lossguide O3-fmath 12 1615.36 5 26

@Laurae2
Copy link
Contributor Author

Laurae2 commented May 21, 2017

@Laurae2 Laurae2 closed this as completed May 21, 2017
eisber pushed a commit to eisber/LightGBM that referenced this issue Mar 15, 2019
…microsoft#348)

* Added numClasses and objective, infer actualNumClasses from objective

* Update LightGBM notebook example

* Remove numClasses since it is now inferred from dataset
@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants