Choosing _eps_ and *minPts* parameters for DBSCAN
=================================

DBSCAN requires two parameters, _eps_ and _min_pts_

Some ideas to explore
--------------------------
1. *Knee* selection algorithm
2. OPTICS algorithm
3. BDE-DBSCAN

The idea behind *'knee'* selection process:
----------------------------------
1. We look at the distance from each point to its $k^{th}$ nearest neighbour. We will call this distance $\mid k\mid$.
2. By calculating $\mid k\mid$ of a selected $k$ value and then sorting the values in increasing values of $\mid k\mid$ we can find an appropriate value for _eps_ by finding the value of $\mid k\mid$ where $f''(|k|) = 0$

Source: *"Selection of DBSCAN Parameters"*, http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf

The idea behind OPTICS algorithm:
------------------------------------
**OPTICS** stands for *Ordering Points To Identify Clustering Structure*

*TODO*

The idea behind BDE-DBSCAN
--------------------------
Using Differential Evolutionary algorithm to determine *eps* and *minPts* value using a cluster purity function to determine the *fitness* of a cluster in a population

Purity function: http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html#eqn:purity

BDE-DBSCAN: http://research.ijcaonline.org/volume91/number7/pxc3895059.pdf

In [11]:
########## Knee selection #############

# Pseudocode:

# Calculate eps distance for each point
# Plot out on histogram

# Choose eps value, rounded up

# Given the eps value, for each point, calculate number of neighbours.
# Plot out the number of neighbours.
# Choose min_pts


In [12]:
import pandas as pd
import numpy as np
import helper_adl as h
import matplotlib.pyplot as plt
import time
from sklearn.cluster import DBSCAN

In [13]:
# Returns output from sklearn DBSCAN method
def dbscan(eps, min_pts, X, metric='precomputed'):
    db = DBSCAN(eps, min_pts, metric)
    db.fit(X)
    return db.labels_, db.components_, db.core_sample_indices_

# TODO:
# 1. Research on methods to identify EPS.
# 2. Implement that method
def calculate_eps():
    pass

def configure_polar_plot(axes):
    xticklabels = [str(x) for x in range (0,24)]
    axes.set_xticks(np.linspace(0,23,24,endpoint=True, dtype=np.int32) / 24 * (2 * np.pi))
    axes.set_xticklabels(xticklabels)
    axes.set_theta_zero_location('N')
    axes.set_theta_direction(-1)
    axes.set_ylim([0,1])
    axes.set_yticks([0.8,0.6])
    axes.set_yticklabels(['Sleep-Start', 'Sleep-End'])
    
    return axes

def plot_clusters(labels, rad_tseries, axes, ring):
    unique_labels = set(labels)
    colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))
    for k,col in zip(unique_labels, colors):
        marker = '.'
        ms = 30.0
        if k == -1:
            col='k'
            marker = 'x'
            ms = 8.0

        indices_of_k, = np.where(labels == k)
        data = rad_tseries.take(indices_of_k)
        axes.plot(data, [ring for x in data], color=col, marker=marker, linestyle='none', ms=ms, mec='k')

In [14]:
# Code
# Start timer:
time_now = time.time()

### Globals ###
file_dir = 'sleep'
# Perform clustering on files 1 to 10
# indexes = [str(x) for x in range(1,10)]
index = '1'

file_name = ''.join([file_dir,'/','sleep-aggregate_2015-07_S00'+index+'.csv'])
# print file name
# print('File name: ', file_name)

# Get dataframe containing readings from sensor reading, exclude 
df = pd.read_csv(file_name, delimiter=',', usecols=[x for x in range(1,7)],parse_dates=[1])
column_names = list(df.columns.values)
print(column_names)

# X is a distance matrix.
# Set 'X1' as sleep_start timings
X1,X1_rad_series = h.get_x_from_df(df['sleep_start'])

# Set 'X2' as sleep_end timings
X2,X2_rad_series = h.get_x_from_df(df['sleep_end'])

['sleep_start', 'sleep_end', 'sleep_duration', 'num_interrupts', 'sleep_start_date', 'sleep_end_date']
[[ 0.          2.29648969  1.22216681 ...,  2.42840749  2.99614855
   0.47363873]
 [ 2.29648969  0.          2.76452881 ...,  0.1319178   0.99054707
   2.77012841]
 [ 1.22216681  2.76452881  0.         ...,  2.63261101  1.77398174
   0.74852808]
 ..., 
 [ 2.42840749  0.1319178   2.63261101 ...,  0.          0.85862927
   2.90204621]
 [ 2.99614855  0.99054707  1.77398174 ...,  0.85862927  0.          2.52250982]
 [ 0.47363873  2.77012841  0.74852808 ...,  2.90204621  2.52250982  0.        ]]
