A Vectorized Implementation of the Tersoff Potential for the LAMMPS Molecular Dynamics Software ==================================================== Author: Markus Höhnerbach <firstname.lastname@example.org> Date: 4 Aug 2016 This project provides the source code of a vectorized implementation of the Tersoff potential. We target a variety of processors with conventional vector instruction sets such as NEON, SSE, AVX, and AVX2, the first and second generation of the Xeon Phi accelerator, as well as NVIDIA GPUs. There is experimental support for platform-agnostic vectorization through the Cilk array notation. Supported compilers: ICC 14.0, 15.0 or 16.0, GCC (ARM) Supported MPI: Intel MPI The code builts upon the existing Xeon Phi support and vectorization capabilities of the USER-INTEL LAMMPS package as well as the GPU support from the KOKKOS package. Overview -------- benchmarks/ vect/ very simple benchmark to measure vect. efficiency. lammps/ input files, parameter files and scripts to conduct benchmarking and accuracy tests. Subfolders contain results from real-world systems. machines/ lammps-10Mar16/ complete lammps source code that is certain to work with the provided source code. <a>-<b>_<c>/ folder to build lammps on a specific system. Names: a = organization, b = CPU arch, c = accelerator. These folders contain a build.sh script that shows how to build binaries to experiment with on a given system. src/ The core source code that contains the vectorized Tersoff potential. Can be dropped into an existing LAMMPS install with USER-INTEL package installed, and should just work. test/ Contains a script to test the code against bothh the benchmark and randomly generated systems of multiple species. Invoke the python script with the binary that you would like to test. For now only works with the USER-INTEL package. Installation (simple) --------------------- To try this code out, download LAMMPS from lammps.sandia.gov, and extract the files to some directory $LAMMPS_DIR. In the following, $THIS denotes the directory where this README is located. You need to enable the packages MANYBODY, USER-OMP and USER-INTEL: $ cd $LAMMPS_DIR/src $ make yes-MANYBODY yes-USER-OMP yes-USER-INTEL Copy the files pair_tersoff_intel.h, pair_tersoff_intel.cpp and intel_intrinsics.h from $THIS/src/ to $LAMMPS_DIR/src. Build LAMMPS (make sure to have ICC with offloading support and Intel MPI loaded): $ make intel_phi This creates a binary $LAMMPS_DIR/src/lmp_intel_phi. Testing (simple) ---------------- To test this binary, use the provided test-script: $ cd $THIS/test $ python test.py $LAMMPS_DIR/src/lmp_intel_phi All the tests should turn green. Usage ----- For further usage instructions, please have a look at the documentation of the USER-INTEL package. The code neatly plugs into that framework, all you need to do is 1. specify the correct "package intel" command according to the USER-INTEL docs, to initialize the correct usage mode. 2. use the Tersoff potential and set the suffix to "intel" Getting Started --------------- If you just want to try out the code and make some obvservations on its performance, the easiest way to do so is to download the LAMMPS-provided benchmark for the Tersoff potential, and pass the correct options via the command line. $ http://lammps.sandia.gov/bench/bench_tersoff.tar.gz $ tar xfz bench_tersoff.tar.gz $ cd tersoff $ $LAMMPS_DIR/src/lmp_intel_phi -in in.tersoff -pk omp 0 \ -pk intel 1 balance $BALANCE mode $MODE -sf intel 1. Choose $MODE as either single, double or mixed depending on the precision you want the run to use. 2. Choose $BALANCE according to where you want to run: 0 runs everything on the host, 1 everything on the Phi, values in between split the computation. -1 will perform automatic load balancing. In-Depth Benchmarking --------------------- For in-depth benchmarking, build all the binaries that you would like to investigate (machines/*/build.sh show how to build a variety of targets). For single-node benchmarking, benchmarks/lammps contains shell scripts to conduct a number of experiments. For multi-node benchmarking, machines/lrz-ib_phi contains a python script to showcase how to create job-scripts to be submitted to a batch system. If you can't run the code on suitable machines, check out the result folders, i.e. benchmarks/lammps/results* and machines/lrz-ib_phi/run*, as they contain real-world data from a selection of machines. Limitations ----------- It inherits all the limitations inherent to the USER-INTEL package or the KOKKOS package, please look at that documentation for details. Reference --------- There is a preprint describing this work on arXiv.org: https://arxiv.org/abs/1607.02904 License ------- The code is licensed in accordance with the LAMMPS copyright under the GNU General Public License Version 2 onwards. The vector math functions in vector_math_neon.h are copyrighted by Julien Pommier under the zlib license.