### Performance modelling

> Our context of performance modelling revolves around predicting the **execution times** of a computer **program** that computes certain linear algebra expression.

### Program

> Consider **programs** that are sequences of library calls. These programs compute certain mathematical equation.


```c++
void Program(){
    // (AB)(CD) + E
    dgemm(A,B,500x200,200x500,T1) // T1 = AB
    trmm(C,D,500x500,500x10,T2) // T2 = CD
    dgemm(T1,T2,500x500,500x10,D) // D = T1 T2
    axpy(D,E,500x10) // E = D+E
} 
```

### Assumptions

> A set of library calls which compute certain standard operation on a specific architecture is known apriori.

> Consider performance prediction only for those algorithms that can be written as sequence of given library calls (Iterative algorithms - unroll loop - estimates of execution time per N iterations)

### Non deterministic nature of execution times

The execution time is non deterministic because of certain factors like CPU temperature, data locality etc. So when an algorithm is repeated many times, certain distribution is observed. 

The statistical dispersion of the execution time depends on the amount of work needed for that computation. The following graph shows measurements of matrix multiplication (DGEMM) for different matrix sizes. You can see that when the matrix size is between 200 and 500 the dispersion is larger. The color dots are the 15th (red) and 85th (green) quantile - that is I repeat each matrix multiplation 100 times, sort them according to execution time and these dots are 15th and 85th element in that sort.

<img src="pics/quantile_reg.png" />

### Approach

> Fit a performance model for each library call

> Fit a model that adds up performance metrics from different library calls in sequence

```c++
void Algorithm(){
    // (AB)C
    dgemm(A,B,200x200,200x500,B) ---> Qa_1,Qa_2,Qa_3
    dgemm(B,C,200x500,500x100,D) ---> Qb_1,Qb_2,Qb_3
} ---> Qfoo_1, Qfoo_2, Qfoo_3
```

### Evaluation:

> Benchmark with algorithms that demand execution time prediction in computational science

### Experiment design

The scatter plot shown in first figure is going to vary from machine to machine. It also needs to be trained separately for each library call

Therefore, we want to do some experiment design - minimize the number of measurements and at the same time do not compramise on the quality of solutions. Especially the measurements towards the end of the graph takes more time.

It is possible to estimate the required sample size from an estimate of variance

<img src="pics/variance_plot.png" />

Eg, For a given confidence (corresponding Z value), a precision of sample mean ($\epsilon$) and an estimate of variance, we can calculate sample size (n) as 

> $n = (\frac{Z\sigma}{\epsilon})^2$

Also use experiment design to minimize the number of overall measurements to get the required fit of quantiles

### Feedback

> Sin functions different from linear algebra?