In [2]:
import numpy as np
import pandas
import matplotlib.pyplot as plt
from IPython.display import display, HTML

%matplotlib inline

from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'png')
plt.rcParams['savefig.dpi'] = 75

plt.rcParams['figure.autolayout'] = False
plt.rcParams['figure.figsize'] = 10, 6
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['axes.titlesize'] = 20
plt.rcParams['font.size'] = 16
plt.rcParams['lines.linewidth'] = 2.0
plt.rcParams['lines.markersize'] = 8
plt.rcParams['legend.fontsize'] = 14

plt.rcParams['text.usetex'] = True
plt.rcParams['font.family'] = "serif"
plt.rcParams['font.serif'] = "cm"
plt.rcParams['text.latex.preamble'] = "\usepackage{subdepth}, \usepackage{type1cm}"

# VLASIATOR 

[Vlasiator](http://vlasiator.fmi.fi) is a code that simulates plasma,
in particular targeting space weather simulations. It simulates the
dynamics of plasma using a hybrid-Vlasov model, where protons are
described by their distribution function $f(r,v,t)$ in
ordinary ($r$) and velocity ($v$) space, and electrons are a
charge-neutralising fluid. This approach neglects electron kinetic
effects but retains ion kinetics. The time-evolution of $f(r,v,t)$ is
given by Vlasov's equation, which is coupled self-consistently to
Maxwell's equations giving the evolution of the electric and magnetic
fields E and B.  Vlasiator propagates the distribution function
forward in time with a conservative fifth-order accurate
Semi-Lagrangian algorithm . This algorithm allows using long time
steps even in the presence of strong magnetic fields, as the
propagation in velocity space is not limited by the
Courant-Friedrichs-Levy (CFL) condition. The field solver is a
second-order accurate divergence-free upwind-constrained transport
method.

Vlasiator uses a Cartesian mesh library in ordinary space,
parallelized with the [DCCRG](http://github.com/fmihpc/dccrg)
library. Each cell contains the field variables ($B$, $E$), as well as a
3D sparse velocity mesh. Empty velocity space cells are neither stored
nor propagated, which in a typical case reduces the total number of
phase space cells by a factor of at least 100. In large scale
simulations there are typically on the order of a few million
spatial-cells in ordinary space, with in total 10<sup>12</sup> cells
in the full distribution function.

The cartesian mesh is parallelized with MPI, and uses the Zoltan
library for dynamic load balancing. It relies heavily on user defined
MPI datatypes. The code is futhermore threaded. Typically loops over
spatial cells have been threaded, but where the fata dependencies
demand also other approaches have been used. Finally the Vlasov
solver, representing up to 90% of total run-time, is vectorized using
an explicit approach based on using the Agner Fogg's [vectorclass](http://www.agner.org/optimize/#vectorclass).
 

## Porting 

### Xeon Phi  Knight's Landing

The github branch where the neccessary changes were done is visible
here LINK.  The main changes were:

  * Added the interface to utilize also the Vec16f and Vec8d datatypes in vectorclass.
  * Added the correct compiler flags to enable good performance on the KNL
 
The main challenges was that the code was not compatitable with the
Intel MPI stack. Using the MPI library, version 16 or 17, lead to
crashes very early on. On other MPI libraries the code is, however,
very stable. The root cause for this was not identified. By compiling
OpenMPI and utilizing that good performance could be achieved on a KNL
development platform.


## Performance
### Test cases 

To test the performance of the code three test cases have been
created, which have different size. Each of them are a very low
resolution version of a real space weather simulation, that fits on
one node.


In [7]:
name="small" "medium" "large" 
mem_usage=1,9,40
data = np.array([[1,2,3],[2,3,4]])
df = pandas.DataFrame({
   'col1': ['Item0', 'Item0', 'Item1', 'Item1'],
   'col2': ['Gold', 'Bronze', 'Gold', 'Silver'],
   'col3': [1, 2, 10, 4]
})
display(df)
df.plot()

Unnamed: 0,col1,col2,col3
0,Item0,Gold,1
1,Item0,Bronze,2
2,Item1,Gold,10
3,Item1,Silver,4


<matplotlib.axes._subplots.AxesSubplot at 0x7fba8a586150>

RuntimeError: LaTeX was not able to process the following string:
'lp'
Here is the full report generated by LaTeX: 

This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=latex)
 restricted \write18 enabled.
I can't find the format file `latex.fmt'!


RuntimeError: LaTeX was not able to process the following string:
'lp'
Here is the full report generated by LaTeX: 

This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=latex)
 restricted \write18 enabled.
I can't find the format file `latex.fmt'!


<matplotlib.figure.Figure at 0x7fba8ad14050>



### Vectorization 
  

### Optimal run parameters

To investigate optimal balance of threads and MPI processes we run the
code with 4 threads per core, 256 threads in total, varying the number
of processes. 

|        |   1   |   4  | 8   | 16   | 32   | 64   |
| Small  |  48.7 | 64.0 |71.6 | 83.5 | 89.6 | 87.1 |
| Medium | 154.5 | 112.5| 110.4 | 110.2 | 108.8 | 107.1|
| Large  | 74.9  | 110.7 | 108.8 |112.9 |111.3 | 113.9|  
Table. Performance if GigaCellss/s as a function of number of processes