# QBEAST, an enhanced algorithm for determining the distance of samples from a training set.

QBEST is an algorithm that allows one to determine what the distance between a sample set of data and a training set of data. In other alrogithms, having such a small training set can lead to every test spectrum being an outlier.

QBEAST function will take in a training set, along with a test set, and will return SDS and SDSKEW, which are measurements of distance between the two sets of data.

In [60]:
from scipy import genfromtxt
from Bootstrap import Bootstrap
import numpy as np
from math import sqrt
from numpy import matlib
from copy import deepcopy
from scipy.stats import norm
from scipy.spatial.distance import mahalanobis
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from QBEST import QBEAST
from random import randint

In [54]:
tnspec = genfromtxt("tnspec.csv",delimiter=",").T
newspec = genfromtxt("newspec.csv",delimiter=",")

print(tnspec)
print(newspec)

[[ 0.5377  1.8339 -2.2588  0.8622  0.3188 -1.3077 -0.4336  0.3426  3.5784
   2.7694  0.8404 -0.888   0.1001 -0.5445  0.3035 -0.6003  0.49    0.7394
   1.7119 -0.1941]
 [-1.3499  3.0349  0.7254 -0.0631  0.7147 -0.205  -0.1241  1.4897  1.409
   1.4172 -2.1384 -0.8396  1.3546 -1.0722  0.961   0.124   1.4367 -1.9609
  -0.1977 -1.2078]
 [ 0.6715 -1.2075  0.7172  1.6302  0.4889  1.0347  0.7269 -0.3034  0.2939
  -0.7873  2.908   0.8252  1.379  -1.0582 -0.4686 -0.2725  1.0984 -0.2779
   0.7015 -2.0518]
 [ 0.8884 -1.1471 -1.0689 -0.8095 -2.9443  1.4384  0.3252 -0.7549  1.3703
  -1.7115 -0.3538 -0.8236 -1.5771  0.508   0.282   0.0335 -1.3337  1.1275
   0.3502 -0.2991]
 [-0.1022 -0.2414  0.3192  0.3129 -0.8649 -0.0301 -0.1649  0.6277  1.0933
   1.1093  0.0229 -0.262  -1.7502 -0.2857 -0.8314 -0.9792 -1.1564 -0.5336
  -2.0026  0.9642]
 [-0.8637  0.0774 -1.2141 -1.1135 -0.0068  1.5326 -0.7697  0.3714 -0.2256
   1.1174  0.5201 -0.02   -0.0348 -0.7982  1.0187 -0.1332 -0.7145  1.3514
  -0.2248 -0.589 ]

# Looking at mahalanobis

Lets take a loot at a sample set of data using a 10x20 matrix.

In [75]:
vi = np.linalg.inv(np.cov(tnspec.T))
colmean = np.mean(tnspec,0)
print(mahalanobis(colmean,newspec,vi))

667220592.532741


# Looking at QBEST

Now, lets see if we can determine a more appropriate distance between the two sets of data.

In [57]:
sds, sdskew = QBEAST(tnspec,newspec)

In [58]:
print(sds, sdskew)

25.524260582146162 10.177075386901555


Lets make a newspec from data taken from the original training set to see if the sds decreases.


In [73]:
sample = []
for i in range(len(tnspec[0])):
    sample.append(tnspec[randint(0,9)][randint(0,19)])
print(sample)

[-0.7145, 0.8404, 1.5442, 0.8622, -2.0026, -0.3349, -0.1241, -1.1658, -0.7549, 0.6601, -1.1471, 0.5377, -0.7648, -0.0301, 1.1006, 0.7015, 1.7119, 1.409, -0.2097, 0.1049]


In [74]:
sds, sdskew = QBEAST(tnspec,sample)
print(sds, sdskew)

4.152186641488899 1.678534713457381


# Potential Errors
## 1. Placing in a newspec that does not meet dimension requirements

In [78]:
test = [1,1,1,1,1,1,1,1]
sds, sdskew = QBEAST(tnspec,test)

ValueError: operands could not be broadcast together with shapes (8,) (20,) 