## Q1: The stock market

(This is about numba)

A Markov Chain is defined as a sequence of random variables where a parameter depends *only* on the preceding value. This is a crucial tool in statistics, widely used in science and beyond (economics for instance).

For instance, the stock market has phases of growing prices (bull), dreasing prices (bear) and recession. This would be a Marov Chain model:

![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Finance_Markov_chain_example_state_space.svg/400px-Finance_Markov_chain_example_state_space.svg.png)

where the numbers on the arrows indicate the probabily that the next day will be in a given state.

Your task is to simulate the stock market according to this rule. Start from a random state and simulate many many  iterations. If your code is right, the fraction of days in each state should converge. 
CPUS
Implement a pure-python version and a numba version, and compare speeds. CPUS


In [None]:
#simple python loop

import random
import matplotlib.pyplot as plt

MCm = [ #Marcov Chain model: 0 = Bull, 1 = Bear, 2 = Recession
    [0.9, 0.075, 0.025],  
    [0.15, 0.8, 0.05],    
    [0.25, 0.25, 0.5]
]

def simulate_stock_market(num_days):
    state = random.randint(0, 2)
    stateCounts = [0, 0, 0] 
    for _ in range(num_days):
        stateCounts[state] += 1
        state = random.choices([0, 1, 2], weights=MCm[state])[0]
    total_days = sum(stateCounts)
    fractions = [count / total_days for count in stateCounts]
    return fractions

days = np.logspace(1,6,40)
trackFrac = np.zeros((len(days), 3))
for i, num_days in enumerate(days):
    fractions = simulate_stock_market(int(num_days))
    trackFrac[i] = fractions

# Plot the fractions over the days simulated
plt.figure(figsize=(10, 6))
plt.plot(days, trackFrac[:, 0], label='Bull')
plt.plot(days, trackFrac[:, 1], label='Bear')
plt.plot(days, trackFrac[:, 2], label='Recession')
plt.xscale('log')
plt.xlabel('Days')
plt.ylabel('Fraction of days')
plt.title('Fractions of days in each state over time')
plt.legend()
plt.grid(True)
plt.show()



In [None]:
# now numba version, slight change in fucntion because numba doesn't suport the np.random.choice 
import numpy as np
import matplotlib.pyplot as plt
import numba as nb

MCm = np.array([  # Markov Chain model: 0 = Bull, 1 = Bear, 2 = Recession
    [0.9, 0.075, 0.025],
    [0.15, 0.8, 0.05],
    [0.25, 0.25, 0.5]
])

@nb.njit
def simulate_stock_market_numba(num_days,MCm):
    state = 1  # Initial state
    stateCounts = [0, 0, 0]  # Initialize counts
    for i in range(num_days):
        stateCounts[state] += 1
        rand_val = np.random.uniform()  # random between 0 and 1
        if rand_val < MCm[state][0]:
            state = 0
        elif rand_val < MCm[state][0] + MCm[state][1]:
            state = 1
        else:
            state = 2   
    fractions = [count / num_days for count in stateCounts]
    return fractions

days = np.logspace(1, 6, 40)
trackFrac = np.zeros((len(days), 3))
for i, num_days in enumerate(days):
    fractions = simulate_stock_market_numba(int(num_days),MCm)
    trackFrac[i] = fractions

# Plot the fractions over the days simulated
plt.figure(figsize=(10, 6))
plt.plot(days, trackFrac[:, 0], label='Bull')
plt.plot(days, trackFrac[:, 1], label='Bear')
plt.plot(days, trackFrac[:, 2], label='Recession')
plt.xscale('log')
plt.xlabel('Days')
plt.ylabel('Fraction of days')
plt.title('Fractions of days in each state over time')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
%%time 

fractions = simulate_stock_market(int(1e6))


In [None]:
%%time 

fractions = simulate_stock_market_numba(int(1e6))


## Q3: Scaling

(This is about multiprocessing)

The ["scaling"](https://hpc-wiki.info/hpc/Scaling) of a code refers to its performance of as a function of the number of cores adopted. 

- Define a computationally intensive task (something like an operation on two giant arrays with >1e7 numbers or, even better!, pick somethinbg from your research). 
- Make sure it's embarassingly parallel. 
- Implement a parallelization strategy using multiprocessing. 
- Plot the time the code takes as a function of the number of cores.
- Figure out the number of cores in your CPU and make sure the plot extends both below and above this number.
- Interpret the resulting features. 
- A perfect scaling result in straight line (linear dependency). How perfect is your scaling?

### Important
Numpy has some inner, semi-automatic parallelization functionalities. Some, but not all, numpy functions detect the number of CPUs in your machine and make good use of them. That's great for most applications, but when performing a scaling study you want to control the parallelization yourself and disable what's done by numpy's. The following forces numpy to use a single core.  

In [None]:
import numpy as np
import multiprocessing
import time
import matplotlib.pyplot as plt

# simple sum of element of array
def compute_sum(data_chunk):
    return np.sum(data_chunk)

# Parallelization function
def parallel_compute_sum(num_cores, data):
    chunk_size = len(data) // num_cores
    chunks = [data[i:i+chunk_size] for i in range(0, len(data), chunk_size)]

    pool = multiprocessing.Pool(processes=num_cores)
    results = pool.map(compute_sum, chunks)
    pool.close()
    pool.join()

    return sum(results)

data_size = 10**8
data = np.random.rand(data_size)

num_cores = multiprocessing.cpu_count()
print(f"I have {num_cores} cores")
# Test the parallelization with different numbers of cores
num_cores_list = list(range(1, num_cores + 3))  # Test from 1 to num_cores + 2
execution_times = []
for num_cores in num_cores_list:
    start_time = time.time()
    result = parallel_compute_sum(num_cores, data)
    execution_time = time.time() - start_time
    execution_times.append(execution_time)
    print(f"Execution time with {num_cores} cores: {execution_time} seconds, result: {result}")

# Plot the results
plt.plot(num_cores_list, execution_times, marker='o')
plt.xlabel('Number of Cores')
plt.ylabel('Execution Time (s)')
plt.title('Scaling Behavior')
plt.show()
