Update README.md

NLAFET · Apr 30, 2019 · eb03799 · eb03799
1 parent 2ac43b7
commit eb03799
Showing 1 changed file with 263 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1 +1,263 @@
-g# Highly-Unsymmetric
+# The ParSHUM library
+This document is the UserGuide for the ParSHUM library that implements parallel sparse
+LU factorization. The configuration and installation process for the library is explained
+next, followed by a description of how to use the driver that is provided by the library and
+the interface of the library. The last section explains how to obtain optimal performance
+when using this solver.
+
+# Getting sources
+ParSHUM is developed at STFC in the context of the NLAFET project, and its source
+codes are available on GitHub. The source codes can be obtained with the following git
+command:
+```console 
+$ git clone https://github.com/NLAFET/ParSHUM.git
+```
+# Configuration
+This library relies on cmake for the building of the library. The ParSHUM library depends
+only on the MKL library for the BLAS and LAPACK routines. The CFLAGS and LDFLAGS 
+for MKL must be provided by passing as arguments −DMKL_LDFLAGS and
+−DMKL_CFLAGS to the cmake script. If these two argument are not given, then the
+cmake commands fails. These can be obtained from the [MKL Link Advisor](https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor). Additionally, the SPRAL library could be provided in order to be able to read matrices stored in
+Rutherford Boeing format. This can be done by providing the install root directory of
+SPRAL to the −DSPRAL_DIR cmake’s argument.
+
+# Installing
+Once the library has been downloaded into a directory, issue the following commands in
+that directory:
+```console 
+$ mkdir build &&  cd build
+$ cmake -DMKL_LDFLAGS=<MKL LDFLAGS> -DMKL_CFLAGS=<MKL CFLAGS> -DSPRAL_DIR=<spral installation directory>
+```
+To build the library use:
+```console 
+$ make 
+```
+
+# Using the test driver
+Once the library has been built a test driver called ParSHUM_simple is created in the bin
+directory. This driver takes a large set of parameters:
+* --matrix (string) =>     specifies the path to the matrix file. Our driver is able to read classic IJV files
+(suffixed by "*.ijv" or "*.mtl"). Harwell-Boeing matrices (suffixed by "*.hb" or "*.rb")
+can be read if the library is compiled with the −DSP RAL_DIR option.
+* --marko_threshold (real) =>
+Markowitz threshold of acceptability for a pivot. This threshold should be larger
+than one. The default value is 4.
+* --threshold_value (real) =>
+threshold value of acceptability for a pivot. This threshold should be between zero
+and one. The default value is 10 −4 .
+* --schur_density_tolerance (real) =>
+maximum density allowed of the Schur complement. Once the Schur complement
+reaches this density, we switch to dense factorization. This parameter should be
+between zero and one. The default value is 0.2.
+* --nb_previous_steps (integer) =>
+number of previous steps for which we keep track of how many pivots have been
+found. This parameter should be at least one. The default value is 5.
+* --min_pivot_per_steps (integer) =>
+minimum number of pivots in the last nb_previous_steps steps. If the number
+of pivots is less than this value, the solver switches to a dense factorization. This
+parameter should be at least nb_previous_steps. The default value is ten times
+nb_previous_steps.
+* --max_dense_schur (real) =>
+the maximum allowed size for the dense factorization. If the Schur complement is
+larger than this value, then the dense factorization is not performed and an error is
+returned by the solver. The default value is 20,000.
+* --nb_threads (integer) =>
+the number of threads used. This parameter should be at least 1. The default value
+is 1.
+* --extra_space (real) =>
+the factor of the memory initially allocated for the solver in comparison with the
+size of the input matrix. The default value is 3.
+* --trace =>
+activates a creation of a trace of the execution of the algorithm in Pajé format. By
+default this option is deactivated. 
+* --verbosity (integer) =>
+Controls the verbosity level.
+If the value is smaller and equal then 0, no otput is produced. If it is  
+1 a general output is printed. If it is larger then  one, 
+ extra information about each is step is given. Additionaly, plot, data and
+fig directories are created and a data file is writen, prefixed by the parameters of
+the algorithm, for each run. A script is created in the plot directory that if compiled
+with gnuplot will create a graph of the execution time in the f ig directory. 
+
+# Using the ParSHUM interface
+In this subsection we will explain how to use the interface of the ParSHUM library. The
+source code for the test driver presented previously is presented next: 
+
+```C
+int
+main(int argc, char **argv)
+{
+  ParSHUM_solver solver;
+  ParSHUM_vector X, B;
+  /* Create the solver */
+  solver = ParSHUM_solver_create();   
+
+  /*  Parse the arguments */
+  ParSHUM_solver_parse_args(solver, argc, argv);
+  /*  Read the matrix */
+  ParSHUM_solver_read_matrix(solver);
+
+  /*  Initialize the vectors */
+  X = ParSHUM_vector_create(solver->A->n);
+  B = ParSHUM_vector_create(solver->A->n);
+  ParSHUM_vector_read_file(solver, X);
+
+  /* copy the vector B in X */
+  ParSHUM_vector_copy(B, X);
+  /*  Initialize the solver */
+  ParSHUM_solver_init(solver);
+
+  /*  Perform the factorization */
+  ParSHUM_solver_factorize(solver);
+
+  /*  Perform the solve operation */
+  ParSHUM_solver_solve(solver, B);
+
+  /*  Compute the norms */
+  ParSHUM_solver_compute_norms(solver, X, B);
+
+  /*  Finalize the solver */
+  ParSHUM_solver_finalize(solver);
+
+  /*  Free all the data */
+  ParSHUM_vector_destroy(X);
+  ParSHUM_vector_destroy(B);
+  ParSHUM_solver_destroy(solver);
+
+  return 0;
+}
+```
+First the solver structure is created by calling ParSHUM_solver_create in line 7. This will
+only allocate the basic data for the solver. Then on line 10, the arguments are processed
+and the solver’s parameters are set to the values provided. The matrix is then read from
+the file followed by the initialisation of the two vectors B and X. Once everything is in
+place, the function ParSHUM_solver_init is called which will allocate all the internal
+data that is needed for the factorization. The factorization is performed by the function
+ParSHUM_solver_factorize on line 25, followed by the solve operation on line 28 and
+3the computation of the norms on line 31. The ParSHUM_solver_finalize needs to be
+called before the data is freed. This function finalises the PLASMA library, creating the
+trace file if demanded and prints the output of the solver. Finally, all the data that has
+been used is freed (lines 37-39). 
+
+Alternatively, is possible for the user to control the input
+matrix and parameters for the solver directly in the code and we now explain how it can
+be done. In order to change the input matrix and parameters, we use the ParSHUM_solver
+structure. The only fields that
+should be modified by the user are the input matrix A and the exe_parms
+field. The ParSHUM_solver is presented next: 
+```C
+typedef struct _ParSHUM_solver {
+  ParSHUM_matrix   A;  /* The input matrix */
+
+  /* The factors */
+  ParSHUM_L_matrix L;   
+  ParSHUM_matrix   D;
+  ParSHUM_U_matrix U;
+
+  /* The Schur complement */
+  ParSHUM_schur_matrix S;
+
+  ParSHUM_exe_parms exe_parms;
+  ParSHUM_verbose verbose;
+  .....
+} * ParSHUM_solver; 
+```
+In the first example the input matrix was loaded by calling the
+function ParSHUM_solver_read_matrix. The other option is to set it as the input matrix
+in the ParSHUM_solver structure. We present the ParSHUM_matrix structure and a code  example 
+on how to assign a CSC matrix as the input matrix:
+```C
+typedef  enum _ParSHUM_matrix_type
+{
+  ParSHUM_CSC_matrix,
+  ParSHUM_CSR_matrix,
+  ParSHUM_Diag_matrix,
+  ParSHUM_Rutherford_matrix
+} ParSHUM_matrix_type;
+
+typedef struct _ParSHUM_matrix {
+  ParSHUM_matrix_type type;
+  int n;
+
+  long allocated;
+  long nz;
+
+  int *row;
+  long *col_ptr;
+  double *val;
+
+  int *col;
+  long *row_ptr;
+
+  /* This is used by the SPRAL matrix driver since it is a Fortran code (see http://www.test-numerical.rl.ac.uk/spral/doc/sphinx/C/rutherford_boeing.html) */
+  void *handle;
+} * ParSHUM_matrix;
+
+   .....
+   /* An example for assigning a CSC matrix as an input matrix. 
+   The following code should replace the line 12 in the previous example. */
+
+   ParSHUM_matrix matrix = calloc(1, sizeof(struct _ParSHUM_matrix);
+   matrix->type = ParSHUM_CSC_matrix;
+   matrix->n = n;
+   matrix->allocated = nz;
+   matrix->nz = nz;
+   /* If the input matrix is a CSR matrix, the field type should be assigned with ParSHUM_CSR_matrix and instead of assigning arrays to the row and col_ptr fields, the col and row_ptr fields should be assigned. */
+   matrix->row = row;
+   matrix->col_ptr = col_ptr;
+   solver->A = matrix;
+   .....
+```
+In the first example, the solver is parametrized
+by calling the function ParSHUM_solver_parse_args. The other option is to
+modify directly the exe_parms field of the solver structure. The exe_parms field is of
+type ParSHUM_exe_parms and an example of how to use this structure to change the
+threshold value and Markowitz threshold. All the other parameters
+could be changed in the same way.
+```C
+typedef struct ParSHUM_exe_parms {
+  double value_tol;
+  double marko_tol;
+  double extra_space;
+  double density_tolerance;
+
+  char *matrix_file;
+  char *RHS_file;
+
+  int min_pivot_per_steps;
+  int nb_threads;
+  int nb_previous_pivots;
+  int max_dense_schur;
+  int trace;
+} *ParSHUM_exe_parms;
+
+   .....
+   /* A code example for changing the parameters of the solver */ 
+   /* The following code should replace the call the ParSHUM_solver_parse_args (line 10). */
+   solver->exe_parms->value_threshold = 0.001;
+   solver->exe_parms->marko_threshold = 16;
+   .....
+```
+Once the matrix and the parameters are set, the solver should be initialised by calling
+the ParSHUM_solver_init. From this point the ParSHUM_solver structure should not
+be accessed or modified directly by the user.
+
+# Getting optimal performance with ParSHUM
+We strongly recommend that on NUMA machines, ParSHUM should be binded to a single
+NUMA node. This algorithm is highly dependent on the bandwidth of the machine and
+not on the computational power To do so, one way is to use the numactl command. For
+example:
+```console
+$ numactl --cpunodebind=0 --membind=0 ./bin/ParSHUM_simple <arguments>
+```
+will bind the process to the cores and the memory of NUMA node 0.
+Although internally in ParSHUM we bind the threads through the proc_bind option of
+the omp pragma, we have observed that this argument is ignored for some OpenMP imple-
+mentations. This could be also achieved through the OMP_PROC_BIN D environment
+variable. Additionally, 10% to 20% could be gained by setting the OMP_WAIT_POLICY
+environment variable to ACTIVE. Here is an example for an optimal run:
+```console
+$ export OMP_WAIT_POLICY=ACTIVE
+$ export OMP_PROC_BIND=close
+$ numactl --cpunodebind=0 --membind=0 ./bin/ParSHUM_simple <arguments>
+```