Updated docs

OP-DSL · May 31, 2016 · 952b169 · 952b169
1 parent 565a0dc
commit 952b169
Showing 1 changed file with 6 additions and 24 deletions.
diff --git a/doc/C++_Users_Guide.tex b/doc/C++_Users_Guide.tex
@@ -506,14 +506,7 @@ \subsection{Expert user capabilities}
 
 \subsubsection{SoA data layout}
 
-The objective of OP2 is hide all of the complexities involved in
-achieving high performance on a wide variety of hardware platforms.
-Unfortunately, there are limits to the extent to which this is possible,
-and so we have added the capability for expert users to achieve higher
-performance by providing extra directions to OP2.  This is very similar
-to the use of pragmas in C/C++.
-
-At present we have just one qualifier option, which is to force OP2 to
+At present we have an option to force OP2 to
 use SoA (struct of arrays) storage internally on GPUs.  As illustrated in
 Figure \ref{fig:SoA_AoS} the user always supplies data in AoS (array of
 structs) layout, with all of the items associated with one set element
@@ -523,22 +516,11 @@ \subsubsection{SoA data layout}
 or in the AVX vector units of CPUs) with no indirect addressing, then the
 SoA format is more efficient.
 
-OP2 can be directed to use the SoA format by adding the qualifier
-{\tt ':soa'} to the datatype, as in {\tt 'float:soa'}.  Note that the
+OP2 can be directed to use the SoA format by setting the environment variable
+OP\_AUTO\_SOA=1 before the Python code generator is used.  Note that the
 data should still be supplied by the user in the standard AoS layout;
-the transposition to SoA format is handled internally by OP2.  Also,
-this qualifier must be used every time the data is accessed, as well as
-when it is first defined.  If it is not, or if it is accessed indirectly,
-it will generate a run-time error.
-
-If the data is held in an SoA layout, then there is a non-unit ``stride''
-in accessing the data associated with one set element; this is illustrated
-in the figure.  When executing a parallel loop, this stride is held
-in a global variable {\tt op2\_stride} which must be used by the user's
-kernel, so instead of a data reference {\tt data[m]} it should be
-{\tt data[m*op2\_stride]}.   The sequential implementation defines
-{\tt op2\_stride = 1} so that the user's code works with both SoA and AoS
-layouts.
+the transposition to SoA format is handled internally by OP2. No changes need
+to be made to any other user code.
 
 \begin{figure}[h]
 {\begin{center}\setlength{\unitlength}{1.2cm}
@@ -894,7 +876,7 @@ \subsection{Python code generator}
 
 \item
 For the CUDA executable, {\tt main\_kernels.cu} is a new CUDA file which includes one or more files of the form {\tt
-xxx\_kernel.cu} containing the CUDA implementations of the user's kernel functions;
+xxx\_kernel.cu} containing the CUDA implementations of the user's kernel functions. If the OP\_AUTO\_SOA environmental variable is set, it will generate code that transposes multi-dimensional datasets for faster execution on the GPU.
 
 \end{itemize}