## Introduction to optimization

### Dot product

```text
If a and b is a vector then (a dot b) or,
    a.b = a1.b1 + a1.b2 + ... + an.bn
```

In [1]:
def dot_product(weights, samples):
    res = 0
    if len(weights) == len(samples):
        for i in range(len(weights)):
            res += weights[i] * samples[i]
        return res
    else:
        return 'Length of weights and samples is not matched.'

In [2]:
weights = [1,2,3,2,4]
samples = [2,3,1,4,1]

res = dot_product(weights, samples)
print(res)

23


In [3]:
import numpy as np

print(np.dot(weights, samples))

23


In [4]:
a = np.array([[1,2],[3,4]]) 
b = np.array([[11,12],[13,14]]) 
np.dot(a,b)
# [[37  40], [85  92]] 

array([[37, 40],
       [85, 92]])

### MSE

![images](https://www.gstatic.com/education/formulas2/355397047/en/mean_squared_error.svg)


- MSE	=	mean squared error
- n	=	number of data points
- Yi	=	observed values
- Yi_hat	=	predicted values


In [5]:
from sklearn.metrics import mean_squared_error

# Given values
Y_true = [1,1,2,2,4] # Y_true = Y (original values)

# calculated values
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y'

# Calculation of Mean Squared Error (MSE)
mean_squared_error(Y_true,Y_pred)

0.21606

In [6]:
import numpy as np

# Given values
Y_true = [1,1,2,2,4] # Y_true = Y (original values)

# Calculated values
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y'

# Mean Squared Error
MSE = np.square(np.subtract(Y_true,Y_pred)).mean()
MSE

0.21606

### Softmax Function

[Scipy Documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.softmax.html)

The softmax function transforms each element of a collection by computing the exponential of each element divided by the sum of the exponentials of all the elements. That is, if x is a one-dimensional numpy array:

```py
softmax(x) = np.exp(x)/sum(np.exp(x))
```

The formula for the softmax function $\sigma(x) = \{x_0, x_1, ... , x_{n-1}\}$ for a vector  is

$$\sigma(x)_j = \frac {e^x_j} {\sum_k e^{x_k}}$$

In [7]:
x = np.array([[1, 0.5, 0.2, 3],
              [1,  -1,   7, 3],
              [2,  12,  13, 3]])

In [8]:
from scipy.special import softmax

In [9]:
# calculate softmax value for whole matrix

m = softmax(x)
m

array([[4.48308990e-06, 2.71913148e-06, 2.01438214e-06, 3.31258028e-05],
       [4.48308990e-06, 6.06720242e-07, 1.80860755e-03, 3.31258028e-05],
       [1.21863018e-05, 2.68421160e-01, 7.29644362e-01, 3.31258028e-05]])

In [10]:
np.sum(m)

1.0000000000000002

In [11]:
#### using axis parameter

# axis = 0 means column wise 
m = softmax(x, axis=0)
print(sum(m))
print(sum(sum(m))) # 4 for 4 rows

[1. 1. 1. 1.]
4.0


In [12]:
np.sum(m)

3.999999999999999

In [13]:
## axis=1 means row wisw
m = softmax(x, axis=1)
print(sum(m))
print(sum(sum(m))) # 3  for 3 rows

[0.10831674 0.33347542 1.7579064  0.80030144]
3.0


In [14]:
np.sum(m)

2.999999999999999

### Custom softmax function

In [15]:
def csoftmax(x):
    return np.exp(x) / np.sum(np.exp(x))

m = csoftmax(x)
m

array([[4.48308990e-06, 2.71913148e-06, 2.01438214e-06, 3.31258028e-05],
       [4.48308990e-06, 6.06720242e-07, 1.80860755e-03, 3.31258028e-05],
       [1.21863018e-05, 2.68421160e-01, 7.29644362e-01, 3.31258028e-05]])

In [16]:
x = np.array([7, -7.5, 10])
csoftmax(x)

array([4.74258720e-02, 2.39191277e-08, 9.52574104e-01])

In [17]:
softmax(x)

array([4.74258720e-02, 2.39191277e-08, 9.52574104e-01])

In [18]:
for m1, m2  in zip(softmax(x), csoftmax(x)):
    print(m1, m2)
    print(round(m1,2) == round(m2,2))

0.047425872043181265 0.04742587204318125
True
2.391912771022218e-08 2.391912771022222e-08
True
0.952574104037691 0.9525741040376909
True


### Cross Entropy


$$CrossEntroy = -log \frac {e^z} {\sum_{j=1}^{k} e^{z_j}}$$

In [29]:
def cross_entropy(x):
    s = softmax(x)
    print(s)
#     print(np.argmax(s), s[np.argmax(s)])

In [30]:
x = [
    [1,0,0],
    [0,1,0]
]
x

[[1, 0, 0], [0, 1, 0]]

In [31]:
cross_entropy(x)

[[0.28805844 0.10597078 0.10597078]
 [0.10597078 0.28805844 0.10597078]]


### Linear models - Quiz

- Consider a vector `(1, -2, 0.5)`. Apply a `softmax transform` to it and enter the first component (accurate to 2 decimal places).

In [35]:
softmax([1,-2,0.5])

array([0.6037489 , 0.03005889, 0.36619222])

In [36]:
softmax([1,-2,0.5])[0]

0.6037488961486258

- Suppose you are solving a `5-class` classification problem with `10 features`. How many `parameters` a linear model would have? Don't forget bias terms!

In [37]:
def num_of_parameters(n_cls, features):
    return n_cls * features

num_of_parameters(n_cls=5, features=10)

50