# SVM

The SVM loss function is 

$$L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta)$$

which can also be written as (j is the correct class)

$$L_i = \sum_{j\neq y_i} \max(0, w_j^T x_i - w_{y_i}^T x_i + \Delta)$$
Where
$$s_j = w_j^T x_i \hspace{0.5in} s_{y_i} = w_{y_i}^T x_i$$

The gradient of the loss can be calculated as below

$$\frac{\partial L_i}{\partial w_j} = 1(s_j - s_{y_i} + \Delta > 0)x_i \hspace{0.5in} \text {when } i \neq j$$

$$\frac{\partial L_i}{\partial w_j} = -1(s_j - s_{y_i} + \Delta > 0)x_i \hspace{0.5in} \text {when } i = j$$

Now with the gradient update the real weights such as

$$ W = W - \eta * grad$$

<img src="svm_flowchart.png">

Now to predict the new image's class, simply calculate the new image score with respect to the updated weights and find the maximum score.

$$scores = W.X$$
$$Predicted\_class = max(scores).index$$

# Softmax

The loss function for softmax is

$$L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) \hspace{0.5in} \text{or equivalently} \hspace{0.5in} L_i = -f_{y_i} + \log\sum_j e^{f_j}$$

Here

$$f_{y_i} = W.X_i + b$$

$$f_j(z) = \frac{e^{z_j}}{\sum_k e^{z_k}} \text{is known as SoftMax Function}$$

Now the gradient of the above function can be calculated as below-

$$\begin{split}
\frac{\partial L}{\partial w} = \frac{\partial L}{\partial P(x)} . \frac{\partial P(x)}{\partial f_y} . \frac{\partial f_y}{\partial w} \\
\\
\frac{\partial L}{\partial w} = \frac{\partial (-log p(x))}{\partial p(x)} \\
\\
\frac{\partial L}{\partial w} = \frac{-1}{p(x)} * \frac{\partial {p(x)}}{\partial f_y} \\
\\
\frac{\partial L}{\partial w} = \frac{-1}{p(x)} * \frac{\partial (\frac{e^{f_x}}{\sum_y e^{f_y}})}{\partial f_y} \\
\\
\end{split}$$
if x == y

$$\begin{split}
\frac{\partial L}{\partial w} = \frac{-1}{p(x)} * \frac{\partial(\frac{(e^{f_x} * \sum_y e^{f_y}) - (e^{f_x} * e^{f_y}))}{(\sum_y e^{f_y})^2}}{\partial f_y}\\
\\
\frac{\partial L}{\partial w} = \frac{-1}{p(x)} * {\frac{e^{f_x}}{\sum_y e^{f_y}} * \frac{\sum_y e^{f_y} - e^{f_y}}{\sum_y e^{f_y}}} * \frac{\partial f_y}{\partial w}\\
\\
\frac{\partial L}{\partial w} = \frac{-1}{p(x)} * ({p(x)} * (1 - p(y))) * \frac{\partial (X * W + b)}{\partial w}\\
\\
\frac{\partial L}{\partial w} =  -(1 - p(y)) * X\\
\\
\frac{\partial L}{\partial w} =  (p(y) - 1) * X\\
\end{split}$$

$$x\ne y$$

$$\begin{split}
\frac{\partial L}{\partial w} = \frac{-1}{p(x)} * \frac{\partial(\frac{(0 * \sum_y e^{f_y}) - (e^{f_x} * e^{f_y}))}{(\sum_y e^{f_y})^2}}{\partial f_y}\\
\\
\frac{\partial L}{\partial w} = \frac{-1}{p(x)} * (-{\frac{e^{f_x}}{\sum_y e^{f_y}} * \frac{e^{f_y}}{\sum_y e^{f_y}}}) * \frac{\partial f_y}{\partial w}\\
\\
\frac{\partial L}{\partial w} = \frac{-1}{p(x)} * (-p(x)p(y)) * \frac{\partial (X * W + b)}{\partial w}\\
\\
\frac{\partial L}{\partial w} =  p(y) * X\\
\end{split}$$

Above p(x) and p(y) are the softmax output.