In [1]:
import numpy as np

# STATS 607 
## Week 4: Parallel processing

## Why parallelize
- CPUs stopped getting faster at least a decade ago.
- See [the free lunch is oven](http://www.gotw.ca/publications/concurrency-ddj.htm)
- $\therefore$ If you want your code to go faster, you need to learn how to parallelize it.

![breakdown of Moore's law](http://www.gotw.ca/images/CPU.png)

## Different ways to parallelize
- Across CPU cores  (multiprocessing/multithreading)
- Across GPU cores (CUDA)
- Across different machines (cluster computing)

## CPUs vs GPUs
- GPUs are best at applying the same computation over arrays of data.
- CPUs are better for algorithms that:
  - include conditional branches
  - have high memory overhead/require lots of memory
  - require any sort of communication.
- All the innovation in HPC right now is happening in the GPU space due to deep learning.
- $\therefore$ design your algorithm to take advantage of the hardware, even if it means writing it in a a) weird or b) slightly suboptimal way.
- **GPUs are not "automatically faster" than CPUs**

## The global interpreter lock (GIL)
- The Python interpreter is not fully thread-safe. 
- In order to support multi-threaded Python programs, there’s a global lock, called the global interpreter lock or GIL, that must be held by the current thread before it can safely access Python objects. 
- Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.

In [30]:
import threading
import time

def cpu_bound_task(n):
    # A CPU-intensive computation (e.g., calculating factorial)
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

In [31]:
def single_thread():
    start_time = time.time()
    cpu_bound_task(50000)
    cpu_bound_task(50000)
    end_time = time.time()
    print(f"Single-thread execution time: {end_time - start_time:.2f} seconds")

In [32]:
def multi_thread():
    start_time = time.time()
    threads = []
    for _ in range(2):
        t = threading.Thread(target=cpu_bound_task, args=(50000,))
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    end_time = time.time()
    print(f"Multi-thread execution time: {end_time - start_time:.2f} seconds")

single_thread()
multi_thread()

Single-thread execution time: 1.01 seconds
Multi-thread execution time: 2.46 seconds


In [36]:
def multi_process():
    start_time = time.time()
    processes = []
    for _ in range(2):
        p = multiprocessing.Process(target=cpu_bound_task, args=(50000,))
        processes.append(p)
        p.start()
    for p in processes:
        p.join()
    end_time = time.time()
    print(f"Multi-process execution time: {end_time - start_time:.2f} seconds")
    
multi_process()

Multi-process execution time: 0.49 seconds


## Example: Monte Carlo
- A European call option gives the holder the right, but not the obligation, to buy an asset at a specified strike price $K$  on a specified expiration date  $T$ . 
- The option price is the expected discounted payoff under the risk-neutral measure:
$$e^{-rT} \mathbb{E}[\max\left(S_T - K, 0\right)]$$
  - $S_T$ is the stock price at maturity.
  - $r$ is the risk-free interest rate.

- Assume stock prices follow geometric Brownian motion:

$$S_{t + \Delta t} = S_t \exp \left\{ \left(\mu - \frac{1}{2}\sigma^2\right) \Delta t + \sigma \sqrt{\Delta t} Z   \right\}$$

  - $\mu$  is the expected return.
  - $\sigma$  is the volatility.
  - $Z$  is a standard normal random variable.
  


In [40]:
def monte_carlo_option_price(S0, K, T, r, sigma, Z):
    # implement
    pass

In [37]:
S_0 = 100 # Initial stock price.
K = 105  # strike price
T = 1 # yr; time to maturity.
r = 0.05 # risk-free interest rate.
sigma = .20  # volatility