# T Function

\begin{equation}
T(p,t)_{l_0l_1l_2l_3}=\sum_{x_1}e^{-i p x}\epsilon_{c_0c_1c_2c_3}V(t)_{c_0x,l_0}V(t)_{c_1x,l_1}V(t)_{c_2x,l_2}V(t)_{c_3x,l_3}
\end{equation}

\begin{align}
T_{ijkl}&=\epsilon_{abcd}V_{ai}V_{bj}V_{ck}V_{dl}\\
        &=\epsilon_{0123}V_{0i}V_{1j}V_{2k}V_{3l}
         +\epsilon_{0132}V_{0i}V_{1j}V_{3k}V_{2l}+\cdots\\
        &=V_{0i}V_{1j}V_{2k}V_{3l}-V_{0i}V_{1j}V_{3k}V_{2l}+\cdots\\
T_{ijkk}&=V_{0i}V_{1j}(V_{2k}V_{3k}-V_{3k}V_{2k})\\
        &=0
\end{align}


If $\epsilon=0$, don't compute any elements of T.  $N_c^4$ terms $\rightarrow$ $N_c!$ terms

T is also antisymmetric in eigenvector indices.  $N_v^4$ terms $\rightarrow N_v!/(N_v-4)!$


## GPU Code

GPU code has following steps
1. For loop over color, only compute for non-zero epsilon
2. For loop over evec
    1. Copy evec data onto gpu - as spatial vector
    2. GPU kernel to multiply evecs and reduce to scalar

| Kernel Part | Time(microseconds) |
|-------------|--------------------|
| Setup/Data transfer | 120 |
| Multiply & Reduce | 30 |



## L=32 Baryon Correlators

Speedups to-do 
1.  Transfer all eigenvectors onto GPU at start (reduce setup time)
2.  Better reduction (reduce kernel time)

### Timing Info

* Times measured in milliseconds
* Nvec=8 hit time limit on debug node
* Scaling of T compared with $N_v=4$

| Nvec | Compute T | Scaling T | Compute B | Compute Bprop | Evaluate Diagrams | 
|------|-----------|-----------|-----------|---------------|-------------------|
| 4 | 1.8*10^5 | 1 | 228 | 7.3*10^4 | 3.6*10^5 | 
| 6 | 1.5*10^6 | 15 | 1104 | 4.3*10^5 | 6.9*10^5 |
| 8 | 6.8*10^6 | 70 | 3696 | ... | ... |



### Numerical Values

| Nvec | t | C(t) |
|------|---|------|
| 4 | 0 | 6.88748194e-10-4.43434764e-14j | 
| 6 | 0 | 2.28479330e-09-1.70586292e-12j | 
|-|-|-|
| 4 | 1 | -1.35823441e-22-4.98618281e-22j |
| 6 | 1 | -1.49544234e-21-1.45096305e-22j |
|-|-|-|
| 4 | 2 | 5.01071221e-23-3.52351481e-23j |
| 6 | 2 |  -1.38497310e-23+5.44879980e-23j |

In [5]:
import math

#nonzero-elements of T in eigenvector space.
def Tevec_elements(nvec):
    return math.factorial(nvec)/math.factorial(nvec-4)

for nvec in [4,6,8]:
    print(Tevec_elements(nvec)/Tevec_elements(4))

1.0
15.0
70.0


In [1]:
5000000/200000

25.0

In [4]:
2**4

16

In [5]:
5000/60

83.33333333333333

In [9]:
(32**4)*64*(2*8)/1000/1000

1073.741824