# Discriminability for EEG Data

Discriminability is an overall measure of the intra-subject consistency and the inter-subject differentiation in a test-retest type experiment with respect to a given metric $\delta$. Once $\delta$ is fixed, the discriminibility $$D = (\Psi, \Theta) = \mathbb{P}(\delta(x_{it}, x_{it'}) < \delta{x_{it}, x_{i't''}}))$$
Where $\Psi$ are the parameters of preprocessing and $\Phi$ are the parameters of the measurement technique.

In our specific case, we do not get to change $\Phi$. So we just need to find the parameters $\Psi$ which maximize the discriminibility of the EEG data.

Pseudocode of the discribminibility function:
[Pseudocoooooode!](https://github.com/NeuroDataDesign/orange-panda/blob/master/notes/discriminability/discriminability.pdf)

In [19]:
#%matplotlib inline
#import matplotlib.image as mpimage
#img = mpimage.imread("Selection_006.png")
#import matplotlib.pyplot as plt
#plt.axis("off")
#plt.imshow(img)

Now let's implement a shell of this in python, making the distance function modular, because we do not know what we will use yet.

The variable `eeg_data` can be considered a (C, N, T, S) `ndarray` object where:
* C = the number of channels (111)
* N = the number of timesteps (around 170 - 220k)
* T = the number of trials (depends on the task, anywhere from 1 to 10)
* S = the number of subjects (we have around 120, but we are only doing analysis on ~60 right now).

In [38]:
def rdf(eeg_data, delta, s):
    T = eeg_data.shape[2] # Number of trials
    S = eeg_data.shape[3] # Number of subjects
    total_true = 0
    for s_ in range(S):
        if not (s == s_):
            for t in range(T):
                for t_ in range(T):
                    if not (t == t_):
                        intra = delta(eeg_data[:, :, t, s], eeg_data[:, :, t_, s])
                        inter = delta(eeg_data[:, :, t, s], eeg_data[:, :, t, s_])
                        #print intra, inter
                        total_true += int(intra < inter)
    #print total_true
    return float(total_true) / ((S-1) * T * (T-1))

def disc(eeg_data, delta):
    T = eeg_data.shape[2] # Number of trials
    S = eeg_data.shape[3] # Number of subjects
    tot = 0
    for s in range(S):
        tot += rdf(eeg_data, delta, s)
    return float(tot) / (S)

Basic testing, lets just do 2 channels, 2 timesteps, 2 trials, 2 subjects.

In [39]:
import numpy as np
one = np.zeros([4, 4, 4])
two = np.ones([4, 4, 4])
eeg_data = np.concatenate([one[...,np.newaxis], two[...,np.newaxis]], axis=3)
#print eeg_data
#print eeg_data.shape
print eeg_data[:,:,:,0]

[[[ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]]]


Now lets see the discriminability:

In [40]:
def distance(arr1, arr2):
    if np.array_equal(arr1, arr2):
        return 0
    return 1

print disc(eeg_data, distance)

1.0
