In [3]:
import numpy as np

<p><b>Softmax</b></p>
<p>The softmax function takes as input a vector $z$ of $K$ real numbers, and normalizes it into a probability distribution consisting of $K$ probabilities proportional to the exponentials of the input numbers. That is, prior to applying softmax, some vector components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the interval $(0,1)$, and the components will add up to 1, so that they can be interpreted as probabilities. Furthermore, the larger input components will correspond to larger probabilities.</p>
$$ p_i = \frac{e^{o_i}}{\sum\limits_{j} e^{o_j}}$$
$$ y_i = \frac{e^{m_i}}{\sum\limits_{j} e^{m_j}}$$
$$L = -\sum\limits_{j} y_j \ln{p_j}$$

In [15]:
def softmax(x):
    exp_yi = np.exp(x - np.max(x))
    return exp_yi/exp_yi.sum()

In [16]:
#Example
x = np.array([1,2,3])
softmax(x)

array([0.09003057, 0.24472847, 0.66524096])

$$ \frac{\partial p_i }{\partial o_k} = 
\frac{\delta_{ik} e^{o_i} \sum e^{o_i} - e^{o_i}e^{o_k} }{\left(\sum e^{o_i}\right)^2}=
\delta_{ik} p_k - p_i p_k $$
$$\frac{\partial L}{\partial o_k} = -\sum\limits_{j} 
 \frac{y_j}{p_j}\left(\delta_{kj} p_k - p_j p_k \right) = 
-y_k + \sum\limits_{j} y_j p_k =
-y_k + p_k\sum\limits_{j} y_j =
-y_k+p_k$$

In [41]:
def softmax_loss_gradient(m,o):
    y = softmax(o)
    p = softmax(m)
    loss = -y.dot(np.log(p))
    return {'gradient': -y + p, 'loss': loss}

In [56]:
#Example 
o = np.array([2,4,8])
m = np.array([1.5,2.5,3])
steps = 2*10**3
learning_rate = 0.1
for n in range(steps):
    grads = softmax_loss_gradient(o,m)
    o = o - learning_rate*grads['gradient']
    if n%100 == 0:
        print("o value: ",o," loss: ",grads['loss'])

o value:  [2.01195234 4.03135564 7.95669202]  loss:  2.0782868945020683
o value:  [3.04176524 5.2185721  5.73966266]  loss:  1.0103558587037593
o value:  [3.6033989  4.95988652 5.43671458]  loss:  0.9587703901696916
o value:  [3.78186451 4.86256822 5.35556727]  loss:  0.9531285143816612
o value:  [3.82269681 4.83944202 5.33786117]  loss:  0.9528215051015863
o value:  [3.83117366 4.83457725 5.33424909]  loss:  0.9528081408284297
o value:  [3.8328964  4.83358516 5.33351844]  loss:  0.952807587640006
o value:  [3.833245   4.83338425 5.33337075]  loss:  0.9528075649791481
o value:  [3.83331548 4.83334363 5.3333409 ]  loss:  0.9528075640528291
o value:  [3.83332972 4.83333541 5.33333486]  loss:  0.9528075640149797
o value:  [3.8333326  4.83333375 5.33333364]  loss:  0.9528075640134333
o value:  [3.83333319 4.83333342 5.3333334 ]  loss:  0.95280756401337
o value:  [3.8333333  4.83333335 5.33333335]  loss:  0.9528075640133674
o value:  [3.83333333 4.83333334 5.33333334]  loss:  0.952807564013

<p><b>K Nearest Neighbors</b></p>
<p>In $k-NN$ classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its $k$ nearest neighbors ($k$ is a positive integer, typically small). If $k = 1$, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of $k$ nearest neighbors.</p>


In [51]:
def d1(x1,x2):
    x1 = x1.flatten()
    x2 = x2.flatten()
    diff = x2-x1
    diff = abs(diff)
    return diff.sum()

def d2(x1,x2):
    x1 = x1.flatten()
    x2 = x2.flatten()
    diff = x2-x1
    diff_sqr = diff**2
    return np.sqrt(diff_sqr.sum())

def knn_classification(element, group, labels, k, l=2):
    if l == 2:
        d = d2
    else:
        d = d1
    distances = [(d(element,group[n]), n) for n in range(len(labels))]
    distances.sort()
    distances = distances[:k]
    classifications = [labels[distance[1]] for distance in distances]
    return_classification = -1
    for classification in classifications:
        if classifications.count(classification) > return_classification:
            return_classification = classification
    return return_classification

In [52]:
element = np.random.random((5))
elements = np.random.random((50,5))
labels = [ int(3*(element**2).sum()) for element in elements]
knn_classification(element, elements, labels, 5)

[2, 1, 4, 2, 3]


2