/
50mpi.dox
156 lines (124 loc) · 5.92 KB
/
50mpi.dox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
/*!
\page mpi \splatt with MPI Support
\tableofcontents
<!-- ----------------------------------------------------------------------------- -->
\section mpi_exec Running \splatt on Distributed Systems
\splatt can be configured to run on distributed systems with MPI. Support for
MPI must be enabled during configuration:
\verbatim
$ ./configure --mpi
\endverbatim
The build process continues normally after configuration. After building the
\splatt executable, `mpirun` can be used with `splatt-cpd`. The following
example runs `splatt-cpd` on four compute nodes, with one MPI process per node.
Each MPI process will use OpenMP to utilize all available compute cores.
\verbatim
$ mpirun --map-by ppr:1:node -np 4 splatt cpd mytensor.tns -r 10
****************************************************************
splatt v1.0.0
Tensor information ---------------------------------------------
FILE=mytensor.tns
DIMS=45981x11537x2504 NNZ=229906 DENSITY=1.730791e-07
COORD-STORAGE=7.02MB
MPI information ------------------------------------------------
DISTRIBUTION=3D DIMS=4x1x1
AVG NNZ=57476
MAX NNZ=57523 (0.08% diff)
AVG COMMUNICATION VOL=32647
MAX COMMUNICATION VOL=33310 (1.99% diff)
Factoring ------------------------------------------------------
NFACTORS=10 MAXITS=50 TOL=1.0e-05 RANKS=4 THREADS=4
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=13.21MB FACTOR-STORAGE=7.04MB
its = 1 (0.631s) fit = 0.00000 delta = +3.0732e-06
its = 2 (0.686s) fit = 0.00002 delta = +1.8533e-05
its = 3 (0.636s) fit = 0.00021 delta = +1.8839e-04
its = 4 (0.613s) fit = 0.00043 delta = +2.2130e-04
its = 5 (0.618s) fit = 0.00059 delta = +1.6365e-04
its = 6 (0.619s) fit = 0.00065 delta = +5.9063e-05
its = 7 (0.595s) fit = 0.00067 delta = +1.7868e-05
its = 8 (0.613s) fit = 0.00068 delta = +9.2210e-06
Final fit: 0.00068
Timing information ---------------------------------------------
TOTAL 5.799s
CPD 5.011s
****************************************************************
\endverbatim
It is important to take into consideration the number of threads that each MPI
process will use. If more than one MPI process is assigned to a node, the
number of OpenMP threads should be throttled with the `-t <nthreads>` flag.
\subsection mpidist Selecting the Decomposition Dimensions
By default, \splatt uses a medium-grained decomposition of the tensor which
is formed by intersecting 1D partitions of each tensor mode \cite smith2016dms.
\splatt will attempt to find an assignment of ranks that leads to a small
communication volume. If you wish to use a custom decomposition dimension, we
provide a `-d` flag.
To use a custom medium-grained decomposition:
\verbatim
$ mpirun --map-by ppr:1:node -np 4 splatt cpd mytensor.tns -r 10 -d 2x2x1
****************************************************************
splatt v1.0.0
Tensor information ---------------------------------------------
FILE=mytensor.tns
DIMS=45981x11537x2504 NNZ=229906 DENSITY=1.730791e-07
COORD-STORAGE=7.02MB
MPI information ------------------------------------------------
DISTRIBUTION=3D DIMS=2x2x1
AVG NNZ=57476
MAX NNZ=57677 (0.35% diff)
AVG COMMUNICATION VOL=35387
MAX COMMUNICATION VOL=35714 (0.92% diff)
Factoring ------------------------------------------------------
NFACTORS=10 MAXITS=50 TOL=1.0e-05 RANKS=4 THREADS=4
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=13.69MB FACTOR-STORAGE=7.28MB
its = 1 (0.579s) fit = 0.00000 delta = +3.2466e-06
its = 2 (0.577s) fit = 0.00015 delta = +1.5149e-04
its = 3 (0.579s) fit = 0.00038 delta = +2.2426e-04
its = 4 (0.581s) fit = 0.00047 delta = +8.9148e-05
its = 5 (0.576s) fit = 0.00055 delta = +8.5531e-05
its = 6 (0.596s) fit = 0.00063 delta = +7.7434e-05
its = 7 (0.613s) fit = 0.00069 delta = +5.8126e-05
its = 8 (0.644s) fit = 0.00070 delta = +1.3246e-05
its = 9 (0.582s) fit = 0.00071 delta = +6.6029e-06
Final fit: 0.00071
Timing information ---------------------------------------------
TOTAL 6.099s
CPD 5.330s
****************************************************************
\endverbatim
Alternatively, you can pass `-d 1` to use a coarse-grained decomposition:
\verbatim
$ mpirun --map-by ppr:1:node -np 4 splatt cpd mytensor.tns -r 10 -d 1
****************************************************************
splatt v1.0.0
Tensor information ---------------------------------------------
FILE=mytensor.tns
DIMS=45981x11537x2504 NNZ=229906 DENSITY=1.730791e-07
COORD-STORAGE=7.02MB
MPI information ------------------------------------------------
DISTRIBUTION=1D DIMS=4x4x4
AVG NNZ=132950
MAX NNZ=133227 (0.21% diff)
AVG COMMUNICATION VOL=126413
MAX COMMUNICATION VOL=127348 (0.73% diff)
Factoring ------------------------------------------------------
NFACTORS=10 MAXITS=50 TOL=1.0e-05 RANKS=4 THREADS=4
CSF-ALLOC=ALLMODE TILE=NO
CSF-STORAGE=19.95MB FACTOR-STORAGE=7.60MB
its = 1 (0.487s) fit = 0.00000 delta = +3.1196e-06
its = 2 (0.476s) fit = 0.00002 delta = +1.4600e-05
its = 3 (0.479s) fit = 0.00021 delta = +1.9381e-04
its = 4 (0.482s) fit = 0.00061 delta = +3.9390e-04
its = 5 (0.479s) fit = 0.00071 delta = +1.0275e-04
its = 6 (0.478s) fit = 0.00072 delta = +1.2465e-05
its = 7 (0.479s) fit = 0.00072 delta = +1.9414e-06
Final fit: 0.00072
Timing information ---------------------------------------------
TOTAL 4.110s
CPD 3.361s
****************************************************************
\endverbatim
\subsection mpiapi C/C++ MPI API
The C/C++ API for distributed \splatt will be available in the next release.
*/