# Benchmark Considerations for Fair Comparison

In our project, the goal is to compare the performance of our Eigen-based wrappers (MyPyEigen) with NumPy's native operations. To ensure a fair, "apple-to-apple" comparison, we must minimize any extraneous overhead—particularly the Python binding overhead.

## Key Points:

1. **Minimizing Python Binding Overhead:**
   - **Pre-Conversion:** When measuring the core computation, it is ideal to pre-convert NumPy arrays to Eigen objects (using our "raw" interfaces) so that the conversion cost is not included in the timing.
   - **Batch Execution:** Repeating the function call many times and taking an average helps to dilute the fixed cost of the Python-to-C++ function call.
   - **Optimized Binding Library:** We use pybind11, which is designed to be lightweight and to add minimal overhead compared to the actual C++ computation.

2. **Excluding Non-Computational Overheads:**
   - **Import Time:** Module import time is excluded from the benchmarks.
   - **Data Preparation:** The time taken to generate test arrays is not included in the benchmark; only the computation is timed.
   - **Cache Warm-Up:** We preheat the function (by calling it several times before measurement) to ensure that any cache initialization or one-time overhead does not skew the results.

3. **Reproducibility and Consistency:**
   - **Fixed Random Seed:** We set a fixed random seed (`np.random.seed(12345)`) so that the generated test arrays are consistent across runs.
   - **Pre-Generated Data:** For each test case, arrays are generated once and reused during the benchmark loop to ensure consistency.
   - **Data Type Consistency:** All arrays are explicitly converted to double precision (`np.float64`) to match the Eigen type (`double`).

4. **Range of Test Sizes:**
   - We test a wide range of sizes—from small (10×10) to large (5000×5000) matrices—to capture the performance behavior under different scales. This helps us observe how the overhead scales with problem size.

## Additional Considerations for ND Operations:
- Our current implementations focus on 1D and 2D operations (e.g., for `inner`, `outer`, `matmult`, `append`, `concat`, `rot90`).
- For functions that inherently support n-dimensional arrays (such as `numpy.stack`), if we implement these in the future, similar principles would apply: pre-convert data if needed, ensure vectorized operations, and measure pure computation time.
- The benchmark must isolate the computational part from any overhead due to Python binding or data conversion to produce a meaningful comparison.

By following these guidelines, we ensure that our benchmark results reflect the true computational performance of the underlying algorithms rather than artifacts of the Python-C++ interface.


In [1]:
import MyPyEigen as pe
import numpy as np
import timeit, functools

In [2]:
# --- Benchmark Utility Function ---
def benchmark_function(func, args, number=10, preheat=5):
    """
    Benchmark 'func' with arguments 'args'. First, preheat the cache by calling the function
    'preheat' times. Then, call 'func' 'number' times and return the average execution time.
    """
    # Preheat to minimize cache/memory initialization overhead.
    for _ in range(preheat):
        func(*args)
    return timeit.timeit(lambda: func(*args), number=number) / number

# --- Ensure Reproducibility ---
np.random.seed(12345)

# --- Define Test Sizes ---
# We extend the test sizes to larger values for a more meaningful benchmark.
sizes = [10, 50, 100, 500, 1000, 2000, 5000]

# --- Initialize a Dictionary to Hold Benchmark Results ---
results = {}

# 1. Benchmark Matrix Multiplication (matmult)
results['matmult'] = []
for size in sizes:
    A = np.random.rand(size, size).astype(np.float64)
    B = np.random.rand(size, size).astype(np.float64)
    # Benchmark NumPy matrix multiplication using '@'
    t_np = benchmark_function(lambda A, B: A @ B, (A, B), number=10)
    results['matmult'].append((size, t_np))

# 2. Benchmark Append (row-wise using vstack)
results['append'] = []
for size in sizes:
    X = np.random.rand(size, size).astype(np.float64)
    Y = np.random.rand(size, size).astype(np.float64)
    t_np = benchmark_function(lambda X, Y: np.vstack((X, Y)), (X, Y), number=10)
    results['append'].append((size, t_np))

# 3. Benchmark Concat (row-wise using concatenate)
results['concat'] = []
for size in sizes:
    A2 = np.random.rand(size, size).astype(np.float64)
    B2 = np.random.rand(size, size).astype(np.float64)
    C2 = np.random.rand(size, size).astype(np.float64)
    t_np = benchmark_function(lambda lst: np.concatenate(lst, axis=0), ([A2, B2, C2],), number=10)
    results['concat'].append((size, t_np))

# 4. Benchmark Inner (1D vector dot product using np.inner)
results['inner'] = []
for size in sizes:
    v1 = np.random.rand(size).astype(np.float64)
    v2 = np.random.rand(size).astype(np.float64)
    # Increase the repetition count to better average out fixed overhead.
    t_np = benchmark_function(np.inner, (v1, v2), number=10000)
    results['inner'].append((size, t_np))

# 5. Benchmark Outer (1D vector outer product using np.outer)
results['outer'] = []
for size in sizes:
    v1 = np.random.rand(size).astype(np.float64)
    v2 = np.random.rand(size).astype(np.float64)
    t_np = benchmark_function(np.outer, (v1, v2), number=10000)
    results['outer'].append((size, t_np))

# 6. Benchmark Rot90 (rotate 2D array using np.rot90)
results['rot90'] = []
for size in sizes:
    M = np.random.rand(size, size).astype(np.float64)
    t_np = benchmark_function(lambda M: np.rot90(M, 1), (M,), number=100)
    results['rot90'].append((size, t_np))

# --- Print Benchmark Results ---
print("Benchmark Results for NumPy functions (Average time in seconds):")
for func_name, data in results.items():
    print(f"\nFunction: {func_name}")
    print("Size\tAverage Time (sec)")
    for size, t in data:
        print(f"{size}\t{t:.9f}")


Benchmark Results for NumPy functions (Average time in seconds):

Function: matmult
Size	Average Time (sec)
10	0.000001504
50	0.000004983
100	0.000013821
500	0.000949358
1000	0.007864650
2000	0.063897550
5000	1.013861154

Function: append
Size	Average Time (sec)
10	0.000001867
50	0.000002612
100	0.000004733
500	0.000214204
1000	0.000541433
2000	0.004133825
5000	0.035949937

Function: concat
Size	Average Time (sec)
10	0.000000983
50	0.000001975
100	0.000004913
500	0.000366783
1000	0.001426362
2000	0.005924200
5000	0.053742600

Function: inner
Size	Average Time (sec)
10	0.000000445
50	0.000000445
100	0.000000454
500	0.000000508
1000	0.000000591
2000	0.000000737
5000	0.000001217

Function: outer
Size	Average Time (sec)
10	0.000001307
50	0.000002482
100	0.000021144
500	0.000202705
1000	0.000821448
2000	0.003679805
5000	0.027354049

Function: rot90
Size	Average Time (sec)
10	0.000003645
50	0.000003582
100	0.000003571
500	0.000003550
1000	0.000003642
2000	0.000003627
5000	0.000003612


# Benchmark Considerations for MyPyEigen vs. NumPy

## Overview
This benchmark measures the performance of our custom Eigen-based functions (exposed via MyPyEigen) in Python, including:
- **matmult**: Matrix multiplication (similar to `numpy.matmul` or `@`)
- **append**: Appending two matrices along a given axis (similar to `numpy.append`/`vstack` or `hstack`)
- **concat**: Concatenating multiple matrices (similar to `numpy.concatenate`)
- **inner**: 1D vector inner product (similar to `numpy.inner` or `np.dot` for vectors)
- **outer**: 1D vector outer product (similar to `numpy.outer`)
- **rot90**: Rotating a matrix by 90 degrees (similar to `numpy.rot90`)

## Key Considerations for a Fair Benchmark
1. **Preheating/Caching:**  
   To minimize one-time initialization or cache-warmup overhead, we preheat the function calls (i.e., run them several times before timing).

2. **Reproducibility:**  
   We fix the random seed (e.g., `np.random.seed(12345)`) to ensure that test cases are consistent across runs.

3. **Data Preprocessing Exclusion:**  
   We generate the test arrays before timing so that data creation and conversion overheads are not included in the benchmark.

4. **Range of Test Sizes:**  
   We test across a range of sizes (from 10x10 to 5000x5000) to observe how performance scales with problem size.

5. **Data Type Consistency:**  
   All arrays are generated with `np.float64` (i.e., 64-bit double) to ensure fair comparison with Eigen's double-based computations.

## Note on Python Binding Overhead
While pybind11 automatically converts NumPy arrays to Eigen types and vice versa, the overhead from these conversions is generally small. However, for very small arrays, this overhead might be significant relative to the computation. In our benchmarks, we call each function many times to amortize the fixed cost.

The goal is to compare the pure computational performance as fairly as possible between our Eigen-based implementation and NumPy's native operations.

---

By following these guidelines, we aim to ensure that our benchmarks reflect the true performance of the underlying linear algebra computations.


## Result
Benchmark Results for NumPy functions (Average time in seconds):

- **Function: matmult**  

| Size | NumPy Time (sec) | MyPyEigen Time (sec) | Speedup Factor |
|------|------------------|----------------------|----------------|
| 10   | 0.000001504      | 0.000002179          | 0.69x (Slower) |
| 50   | 0.000004983      | 0.000013704          | 0.36x (Slower) |
| 100  | 0.000013821      | 0.000079625          | 0.17x (Slower) |
| 500  | 0.000949358      | 0.006776463          | 0.14x (Slower) |
| 1000 | 0.007864650      | 0.050868921          | 0.15x (Slower) |
| 2000 | 0.063897550      | 0.385726825          | 0.17x (Slower) |
| 5000 | 1.013861154      | 6.352387050          | 0.16x (Slower) |



- **Function: append**  

| Size | NumPy Time (sec) | MyPyEigen Time (sec) | Speedup Factor     |
|------|------------------|----------------------|--------------------|
| 10   | 0.000001867      | 0.000001904          | ~1.02x (Similar)   |
| 50   | 0.000002612      | 0.000015279          | 0.17x (Slower)     |
| 100  | 0.000004733      | 0.000036296          | 0.13x (Slower)     |
| 500  | 0.000214204      | 0.001396075          | 0.15x (Slower)     |
| 1000 | 0.000541433      | 0.006081629          | 0.09x (Slower)     |
| 2000 | 0.004133825      | 0.032367979          | 0.13x (Slower)     |
| 5000 | 0.035949937      | 0.426710392          | 0.08x (Slower)     |


- **Function: concat** 

| Size | NumPy Time (sec) | MyPyEigen Time (sec) | Speedup Factor |
|------|------------------|----------------------|----------------|
| 10   | 0.000000983      | 0.000002267          | 0.43x (Slower) |
| 50   | 0.000001975      | 0.000026642          | 0.07x (Slower) |
| 100  | 0.000004913      | 0.000062163          | 0.08x (Slower) |
| 500  | 0.000366783      | 0.002711667          | 0.14x (Slower) |
| 1000 | 0.001426362      | 0.010954300          | 0.13x (Slower) |
| 2000 | 0.005924200      | 0.050432475          | 0.12x (Slower) |
| 5000 | 0.053742600      | 0.642031071          | 0.08x (Slower) |

- **Function: inner**  

| Size | NumPy Time (sec) | MyPyEigen Time (sec) | Speedup Factor     |
|------|------------------|----------------------|--------------------|
| 10   | 0.000000445      | 0.000000979          | 0.45x (Slower)     |
| 50   | 0.000000445      | 0.000000951          | 0.47x (Slower)     |
| 100  | 0.000000454      | 0.000000965          | 0.47x (Slower)     |
| 500  | 0.000000508      | 0.000001054          | 0.48x (Slower)     |
| 1000 | 0.000000591      | 0.000001260          | 0.47x (Slower)     |
| 2000 | 0.000000737      | 0.000001602          | 0.46x (Slower)     |
| 5000 | 0.000001217      | 0.000023336          | 0.05x (Slower)     |


- **Function: outer**  

| Size | NumPy Time (sec) | MyPyEigen Time (sec) | Speedup Factor |
|------|------------------|----------------------|----------------|
| 10   | 0.000001307      | 0.000001398          | 0.93x (Similar)|
| 50   | 0.000002482      | 0.000002877          | 0.86x (Slower) |
| 100  | 0.000021144      | 0.000018673          | 1.13x (Faster) |
| 500  | 0.000202705      | 0.000358850          | 0.56x (Slower) |
| 1000 | 0.000821448      | 0.001814471          | 0.45x (Slower) |
| 2000 | 0.003679805      | 0.009134711          | 0.40x (Slower) |
| 5000 | 0.027354049      | 0.109140976          | 0.25x (Slower) |


- **Function: rot90**  

| Size | NumPy Time (sec) | MyPyEigen Time (sec) | Speedup Factor |
|------|------------------|----------------------|----------------|
| 10   | 0.000003645      | 0.000001235          | 2.95x (Faster) |
| 50   | 0.000003582      | 0.000004167          | 0.86x (Slower) |
| 100  | 0.000003571      | 0.000033881          | 0.11x (Slower) |
| 500  | 0.000003550      | 0.000884480          | 0.004x (Slower)|
| 1000 | 0.000003642      | 0.005710024          | 0.0006x (Slower)|
| 2000 | 0.000003627      | 0.024798107          | 0.0001x (Slower)|
| 5000 | 0.000003612      | 0.328406906          | 0.00001x (Slower)|
