### Step 1: Pseudocode

[LaTeX as well as explanation.](https://github.com/NeuroDataDesign/orange-panda/blob/master/notes/bad_chan_detect/baddetec/kernel-probability-density.pdf) I will be using a sci-kit implementation as well as potentially my own implementation (and comparing results).
***************************

### Step 2: Simulations & Details of Their Parameters

With a given bandwith and kernel function, have function to predict the kernel density

***Success:***
- Define 1D data set with 10 of same point, all probabilities should be 1
- Define 1D data set with 10 sequential points, all probabilities from distribution should be equal
- Define 1D data set of random data for sanity check

**********************

### Step 3: Choose Visualization

Scatter plots of each distribution and each set of points, as well as scatter plots for my own function's version of the distribution.
**********

### Step 4: Specify Metrics for Evaluating Performance
We will use the truth function, which returns 1 if values match and 0 if they do not. If they don't, then we know something's wrong in my understanding/the scikit implementation.

***

### Step 5: Write Code Generating Simulated Data

In [7]:
import numpy as np
from scipy.stats import norm

# Fix random seed
np.random.seed(123456789)

In [8]:
def simulate_1():
    # Initialize observations
    return np.zeros(10)

x_1 = simulate_1()
#x_1 = np.vstack((range(0, len(x_1)), x_1))
#print x_1

In [9]:
def simulate_2():
    # Initialize observations
    return np.array(range(0,10))

x_2 = simulate_2()
#print x_2

In [10]:
def simulate_3():
    # Initialize observations
    return np.random.normal(size=500)

x_3 = simulate_3()
#print x_3

In [11]:
# Sim 4 should be the accurate representation of 3
def simulate_4():
    # initialize obvs
    return norm(0, 1)

x_4 = simulate_4()
#print x_4

***

### Step 6: Plot Simulated Data

In [12]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot
from plotly.graph_objs import *
from plotly import tools
init_notebook_mode()

In [45]:
def get_lin(indata, numvals):
    return np.linspace(min(indata), max(indata), numvals)

In [14]:
# Setup plotly data

sim1data = Scatter(
    x = range(0, len(x_1)),
    y = x_1,
    name = 'Zero Sample'
)

sim2data = Scatter(
    x = range(0, len(x_2)),
    y = x_2,
    name = 'Linear Sample'
)

data = [sim1data, sim2data]

# Setup layout

layout = dict(title = 'Simulated Data',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Unit'),
              )

# Make figure object

fig = dict(data=data, layout=layout)

iplot(fig)

In [15]:
sim3data = Scatter(
    x = range(0, len(x_3)),
    y = x_3,
    name = 'Random Sample'
)

data = [sim3data]

# Setup layout

layout = dict(title = 'Samples from Normal Data',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Unit'),
              )

# Make figure object

fig = dict(data=data, layout=layout)

iplot(fig)

In [16]:
samplin = get_lin(x_3, 5000)

sim4data = Scatter(
    x = samplin,
    y = x_4.pdf(samplin),
    name = 'Random Sample'
)

data = [sim4data]

# Setup layout

layout = dict(title = 'Standard Normal Dsitribution',
              xaxis = dict(title = 'Unit'),
              yaxis = dict(title = 'Density'),
              )

# Make figure object

fig = dict(data=data, layout=layout)

iplot(fig)

***

### Step 7: Write Algorithm Code

Via scikit

In [17]:
from sklearn.neighbors.kde import KernelDensity
# Run the KDE!
def gaus_kde(indata, bw):
    return KernelDensity(kernel='gaussian', bandwidth=bw).fit(indata[:, np.newaxis])

def hat_kde(indata, bw):
    return KernelDensity(kernel='tophat', bandwidth=bw).fit(indata[:, np.newaxis])

def epan_kde(indata, bw):
    return KernelDensity(kernel='epanechnikov', bandwidth=bw).fit(indata[:, np.newaxis])


***

### Step 8: Write Qualitative Evaluation Code

In [18]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot
from plotly.graph_objs import *
from plotly import tools
init_notebook_mode()

In [55]:
def eval_qual(indata, bw):
    indata = indata
    lindata = np.linspace(min(indata) - 2 , max(indata) + 2, 5000)
    
    # Setup plotly data
    gausdata = Scatter(
        x = lindata,
        y = np.exp(gaus_kde(indata, bw).score_samples(lindata[:, np.newaxis])),
        name = 'Gaussian Normalized Density'
    )
    
    hatdata = Scatter(
        x = lindata,
        y = np.exp(hat_kde(indata, bw).score_samples(lindata[:, np.newaxis])),
        name = 'Tophat Normalized Density'
    )
    
    epandata = Scatter(
        x = lindata,
        y = np.exp(epan_kde(indata, bw).score_samples(lindata[:, np.newaxis])),
        name = 'Epanechikov Normalized Density'
    )
    
    
    data = [gausdata, hatdata, epandata]

    # Setup layout
    layout = dict(title = 'Kernelled Simulated Data ' + str(bw) + ' bandwidth',
                  xaxis = dict(title = 'Unit'),
                  yaxis = dict(title = 'Normalized Density'),
                  )

    # Make figure object
    fig = dict(data=data, layout=layout)

    iplot(fig)

***

### Step 9: Write Quantitative Eval
Check if integral of probs = 1

In [56]:
def eval_quant(indata, bw, kdefunc):
    lindata = np.linspace(min(indata) - 3, max(indata) + 3, 5000)
    integral = np.sum(np.exp(kdefunc(indata, bw).score_samples(lindata[:, np.newaxis])))/(5000 * bw)
    print integral
    if (integral == 1):
        print "Success!"
    else:
        print "Fail"

***
### Step 10: Run Qualitative Eval

In [57]:
eval_qual(x_1, 0.25)
eval_qual(x_1, 0.5)
eval_qual(x_1, 0.75)

In [58]:
eval_qual(x_2, 0.25)
eval_qual(x_2, 0.5)
eval_qual(x_2, 0.75)

In [59]:
eval_qual(x_3, 0.25)
eval_qual(x_3, 0.5)
eval_qual(x_3, 0.75)

***
### Step 11: Quant Eval

In [63]:
eval_quant(x_1, .5, gaus_kde)
eval_quant(x_1, .5, hat_kde)
eval_quant(x_1, .5, epan_kde)

0.333266666014
Fail
0.3336
Fail
0.333266406688
Fail


In [64]:
eval_quant(x_2, .5, gaus_kde)
eval_quant(x_2, .5, hat_kde)
eval_quant(x_2, .5, epan_kde)

0.133306666641
Fail
0.13328
Fail
0.133306706608
Fail


In [65]:
eval_quant(x_3, .5, gaus_kde)
eval_quant(x_3, .5, hat_kde)
eval_quant(x_3, .5, epan_kde)

0.158115043242
Fail
0.1581104
Fail
0.158115043369
Fail


### Step 12: Analyze Evals

Skipped quantitative run because no quantitative measure. However, as seen in above, especially for dataset 2, we get a kernelled distribution for each value. For the vector of 0's, we get a single point at 0. For the sequential vectors we get probabilities centered around each value chosen. For the normal vector, we get approximately a normal distribution.