
# Sensitivities: Bump vs AAD 

This dashboard demonstrates the accuracy and performance of AAD sensitivities for a range of complex products
- European Option
- Barrier Option
- Accumulator
- Asian Basket Option
- TaRF

all represented as scripted trades.

We are running ORE twice, with differences in the pricinengine.xml file:
- setting UseAD to false for the bump run and
- setting UseAD to true for AAD.

## Run ORE, Bump & Reval Sensitivities, Single-Threaded

In [None]:
from ORE import *
import sys, time, math
sys.path.append('..')
import utilities

In [None]:
params = Parameters()
params.fromFile("Input/ore.xml")
ore = OREApp(params)
ore.run()
utilities.checkErrorsAndRunTime(ore)
#utilities.writeList(ore.getReportNames())

## Run ORE using AAD

Check the following is set in **pricingengine_ad.xml**
- **UseAD** is set to true

Check the following is set in **sensitivity.xml**
- **ComputeGamma** is set to false: otherwise AD is deactivated, because it does not support gamma and cross gamma calculation so far


In [None]:
params_ad = Parameters()
params_ad.fromFile("Input/ore_ad.xml")
ore_ad = OREApp(params_ad)
ore_ad.run()
utilities.checkErrorsAndRunTime(ore_ad)
#utilities.writeList(ore_ad.getReportNames())

In [None]:
# Compare Pricing Stats
utilities.match_pricingstats_12(ore, ore_ad, "Speedup", True)

In [None]:
# Compare NPV reports, expecting identical results
npv = ore.getReport("npv")
npv_ad = ore_ad.getReport("npv")
print("Bump & Reval NPV Report:")
display(utilities.format_report(npv))
print("AD NPV Report:")
display(utilities.format_report(npv_ad))

In [None]:
# Compare Sensitivities
utilities.match_sensi_reports(ore, ore_ad, "DeltaBump", "DeltaAD", False)

### Discussion
- Shift Scheme (EQ Spot)
- Shift Size (Vega)
- Lowering the BarrierLevel in EquityBarrierOption
- Lowering the KnockOutLevel in EquityAccumulator
- Lowering the KnockOutProfitAmount in FxTARF, introducing varying range strikes or leverages
- Todo: Add CAM

# Sensitivities: CPU vs GPU

## Run with ComputationGraph Enabled

Check the following is set in **pricingengine_cg.xml**.
- **UseCG** is set to true
- **ExternalDeviceCompatibilityMode** is set to true

The latter ensures that we use the same ordering of random variables here and in the subsequent runs using external devices.

In [None]:
params_cg = Parameters()
params_cg.fromFile("Input/ore_cg.xml")
ore_cg = OREApp(params_cg)
ore_cg.run()
utilities.checkErrorsAndRunTime(ore_cg)
#utilities.writeList(ore_bump.getReportNames())

In [None]:
# Compare Pricing Stats
utilities.match_pricingstats_12(ore, ore_cg, "Speedup", False)

### Compare Sensitivities

If we set pricing engine's ExternalDeviceCompatibilityMode to false here, then we should see zero deviations from the reference run above

If we set ExternalDeviceCompatibilityMode to true though, then we see differences below due to different ordering of random variates in both runs,
However, this means we use the same ordering in the CG run as in the GPU run below.

So if you want to check the former, the set ExternalDeviceCompatibilityMode=false first, rerun ore_cg and confirm perfect match of sensitivities.

In [None]:
# Compare Sensitivities: This run vs reference 
utilities.match_sensi_reports(ore, ore_cg, "DeltaBump", "DeltaCG", False)

## Run again using the external GPU device

Note that this functionality is work in progress.

Using an external device requires another few changes in **pricingengine_gpu.xml**'s EngineParameters section:
- set **UseCG** to true
- set **UseExternalComputeDevice** to true
- set **ExternalComputeDevice** to the device name, you can choose **BasicCpu/Default/Default** or (on my machine) **OpenCL/Apple/Apple M2 Max**

Determine your available compute devices by running the QuantExt test suite with **quantext-test-suite --log_level=message --run_test="*/ComputeEnvironmentTest/testEnvironmentInit"**

**BasicCpu/Default/Default** refers to the CPU itself but treats it as an external device. This is useful for sanity checking the implementation - it should yield a perfect match in the sensitivity comparison below, but only moderate change in performance since we do not paralellize.

**OpenCL/Apple/Apple M2 Max** is a **38 core GPU** device that should yield a noticeable performance improvement thanks to distributing the calculations across the GPU cores.

In [None]:
params_gpu = Parameters()
params_gpu.fromFile("Input/ore_gpu.xml")
ore_gpu = OREApp(params_gpu)
ore_gpu.run()
utilities.checkErrorsAndRunTime(ore_gpu)
#utilities.writeList(ore_ad.getReportNames())

In [None]:
# Compare Pricing Stats
utilities.match_pricingstats_123(ore_cg, ore_ad, ore_gpu, "SpeedupAD", "SpeedupGPU", False)

In [None]:
# Compare Sensitivities
# to the reference run with both UseCG and ExternalDeviceCompatibilityMode set to true
utilities.match_sensi_reports(ore_cg, ore_gpu, "DeltaBump", "DeltaGPU", False)

### Todos
- check the OpenCL implementation with devices that support double precision, e.g. Nvidia
- add random number generators to the OpenCL implementation (MT antithetic, Sobol, Sobol BB)
- CUDA implementation