# Adaptive PDE discretizations on Cartesian grids
## Volume : Reproducible research
## Part : Eikonal CPU/GPU solvers comparison
## Chapter : Asymmetric-Quadratic metrics

In this notebook, we solve Asymmetric-Quadratic eikonal equations on the CPU and the GPU, and check that they produce consistent results.

*Note on the numerical schemes*: The numerical schemes solved by the CPU and the GPU in the Asymmetric-Quadratic case are entirely different. The CPU version uses a causal semi-Lagrangian scheme, in two dimensions, and a causal Eulerian scheme in three dimensions, which is only approximately consistent. The GPU version uses a non-causal (yet monotone) Eulerian scheme, in any dimension.

[**Summary**](Summary.ipynb) of volume Reproducible research, this series of notebooks.

[**Main summary**](../Summary.ipynb) of the Adaptive Grid Discretizations 
	book of notebooks, including the other volumes.

# Table of contents
  * [1. Two dimensions](#1.-Two-dimensions)
  * [2. Three dimensions](#2.-Three-dimensions)



**Acknowledgement.** Some of the experiments presented in these notebooks are part of 
ongoing research with Ludovic Métivier and Da Chen.

Copyright Jean-Marie Mirebeau, Centre Borelli, ENS Paris-Saclay, CNRS, University Paris-Saclay

## 0. Importing the required libraries

In [1]:
import sys; sys.path.insert(0,"..")
#from Miscellaneous import TocTools; print(TocTools.displayTOC('AsymQuad_GPU','Repro'))

In [2]:
from agd import AutomaticDifferentiation as ad
if ad.cupy_generic.cp is None: raise ad.DeliberateNotebookError('Cupy module required')
from agd import Eikonal
from agd import Metrics
from agd import FiniteDifferences as fd
from agd import LinearParallel as lp
import agd.AutomaticDifferentiation.cupy_generic as cugen

DeliberateNotebookError: Cupy module required

In [3]:
import cupy as cp
import numpy as np
import itertools
from matplotlib import pyplot as plt
np.set_printoptions(edgeitems=30, linewidth=100000, formatter=dict(float=lambda x: "%5.3g" % x))

In [4]:
from agd.ExportedCode.Notebooks_Repro.Isotropic_GPU import RunCompare

In [5]:
cp = ad.functional.decorate_module_functions(cp,cugen.set_output_dtype32)
plt = ad.functional.decorate_module_functions(plt,cugen.cupy_get_args)
Eikonal.dictIn.default_mode = 'gpu'

In [6]:
def ReloadPackages():
    from Miscellaneous.rreload import rreload
    global Eikonal,ad,cugen,RunSmart,Metrics
    Eikonal,ad,cugen,Metrics = rreload([Eikonal,ad,cugen,Metrics],"../..")    
    Eikonal.dictIn.default_mode = 'gpu'

### 0.1 Additional configuration

In [7]:
large_instances = False # True favors the GPU code (CPU times may become a big long.)
strong_anisotropy = True # True favors the CPU code 

## 1. Two dimensions

In [8]:
n=2000 if large_instances else 200
asym = 4. if strong_anisotropy else 1.
hfmIn = Eikonal.dictIn({
    'model':'AsymmetricQuadratic2',
    'seed':[0.,0.],
    'exportValues':1,
    'factoringRadius':20,
    'count_updates':True,
#    'fim_front_width':6,'traits':{'niter_i':10}
})
hfmIn.SetRect([[-1,1],[-1,1]],dimx=n+1,sampleBoundary=True)
hfmIn.SetUniformTips((6,6))
hfmIn['metric'] = Metrics.AsymQuad(cp.eye(2),cp.array([asym,0.]) ).rotate_by(cp.array(0.5))
X = hfmIn.Grid()

 The CPU and gpu code produce identical similar results, despite the very distinct implementations.

In [9]:
gpuOut,cpuOut = RunCompare(hfmIn,check=0.015)

Setting the kernel traits.
Preparing the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
GPU kernel eikonal ran for 0.26004719734191895 seconds, and 99 iterations.
Post-Processing
--- gpu done, turning to cpu ---
Field verbosity defaults to 1
Field cosAngleMin defaults to 0.5
Field refineStencilAtWallBoundary defaults to 0
Field order defaults to 1
Field seedRadius defaults to 2
Fast marching solver completed in 0.046 s.
Field geodesicSolver defaults to Discrete
Field geodesicStep defaults to 0.25
Field geodesicWeightThreshold defaults to 0.001
Field geodesicVolumeBound defaults to 8.45
Ended Geodesic Discrete Solver
Unused fields from user: count_updates 
********************
Solver time (s). GPU : 0.26004719734191895, CPU : 0.092. Device acceleration : 0.3537819324352696
Max |gpuValues-cpuValues| :  0.011850670197332214


In [10]:
plt.title('Level lines of a constant asymmetric quadratic metric'); plt.axis('equal')
plt.contour(*X,gpuOut['values']);

The geodesics are straight lines, as expected for a constant metric.

In [11]:
plt.axis('equal')
for geo in gpuOut['geodesics']: plt.plot(*geo)

The GPU acceleration depends on the strength of the anisotropy, on the instance size, and on the dimension.

**Number of update per block.**
Using the basic AGSI scheme yields large number of updates per block, often in excess of $300$. This is presumably due to the non-causality of the scheme. A custom variant of the fast iterative method allows to significantly reduce this computation time.

In [12]:
nupdate = gpuOut['stats']['eikonal']['nupdate_o'].get()
np.mean(nupdate),np.max(nupdate)

(16.89644970414201, 36)

The number of updates is increasing almost linearly in all directions, from the seed point. This is a strong hint that a mechanism for bounding the front width is needed.

In [13]:
plt.contourf(gpuOut['stats']['eikonal']['nupdate_o'])
plt.axis('equal'); plt.colorbar();

## 2. Three dimensions

In [14]:
n=200 if large_instances else 50
asym = 4. if strong_anisotropy else 1.
hfmIn = Eikonal.dictIn({
    'model':'AsymmetricQuadratic3',
    'seed':[0.,0.,0.],
    'exportValues':1,
#    'factoringRadius':20, # 3D cpu version does not support factoring
    'count_updates':True,
    'fim_front_width':5,'traits':{'niter_i':4}, # Improves GPU times a bit
#    'bound_active_blocks':True,
})
hfmIn.SetRect([[-1,1],[-1,1],[-1,1]],dimx=n+1,sampleBoundary=True)
hfmIn.SetUniformTips((6,6,6))
hfmIn['metric'] = Metrics.AsymQuad(cp.eye(3),cp.array([asym,0.,0.]) ).rotate_by(cp.array(0.5),cp.array([1.,2.,3.]) )
X = hfmIn.Grid()

In [15]:
gpuOut,cpuOut = RunCompare(hfmIn,check=0.15)

Setting the kernel traits.
Preparing the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
GPU kernel eikonal ran for 0.06084585189819336 seconds, and 82 iterations.
Post-Processing
--- gpu done, turning to cpu ---
Field verbosity defaults to 1
Field eps defaults to 0.3
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 0.359 s.
Field geodesicSolver defaults to Discrete
Field geodesicStep defaults to 0.25
Field geodesicWeightThreshold defaults to 0.001
Field geodesicVolumeBound defaults to 10.985
Ended Geodesic Discrete Solver
Unused fields from user: count_updates fim_front_width 
********************
Solver time (s). GPU : 0.06084585189819336, CPU : 0.6910000000000001. Device acceleration : 11.356567102654328
Max |gpuValues-cpuValues| :  0.1010283741464284


Again, the number of GPU updates is higher than we would like it to be. The situation is not a bad as in two dimensions, nevertheless, because the domain is less wide.

In [16]:
nupdate = gpuOut['stats']['eikonal']['nupdate_o'].get()
np.mean(nupdate),np.max(nupdate)

(17.548020027309967, 38)

The GPU solver can handle source factorization, which substantially reduces error.

In [17]:
hfmIn['factoringRadius']=20
gpuOut2 = hfmIn.Run()

Setting the kernel traits.
Preparing the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
GPU kernel eikonal ran for 0.06180834770202637 seconds, and 82 iterations.
Post-Processing


In [18]:
exact = hfmIn['metric'].norm(X)
print('Error without factorization : ',np.mean(np.abs(gpuOut['values']-exact)) )
print('Error with factorization : ',np.mean(np.abs(gpuOut2['values']-exact)) )

Error without factorization :  0.098416954
Error with factorization :  0.006475558
