# Multiple Features

Now we are introducing new notation when we have multiple features $x_j$ for the jth feature, n is the total number of features

Now as before $x^i$ will be the ith training example, so the jth deature of this ith training example may be written as $x_j^i$

also this training example can now be written in form of a vector as $\vec{x}^i$

Now our model from a $f_{w, b}$ will turn to
$$f_{w, b}(x) = w_1x_1 + w_2x_2 + ... + w_nx_n + b$$

In the vector notation this may be written as
$$f_{\vec{w}, b}(\vec{x}) = \vec{w}\cdot\vec{x} + b$$

where,
$$\vec{w} = \begin{bmatrix} w_1 \ w_2 \ ... \ w_n \end{bmatrix} \space\space\space\space and \space\space\space\space \vec{x} = \begin{bmatrix} x_1 \ x_2\ ... \ x_n \end{bmatrix} $$ 
are the row vectors

# Vectorization

In [1]:
import numpy as np

In [2]:
w = np.array([1.0, 2.5, -3.3])
b = 4
x = np.array([10, 20, 30])

Without vectorisation

$$f_{\vec{w}, b}(\vec{x}) = \bigg(\sum_{j=1}^{n}w_jx_j\bigg)+b$$

In [3]:
f = 0
for j in range(0, len(x)):
    f = f + w[j]*x[j]
f = f + b

With vectorization

In [5]:
f = np.dot(w, x) + b

# Gradient Descent

By now we know the algorithm for gradient descent is
$$w_j = w_j - \alpha \frac{\partial}{\partial w_j} J$$
where J is a function of w and b, in form of a cost function
$$J(w, b) = \frac{1}{2m}\sum_{i=1}^n (\hat{y}-y)^2$$

*In Multi linear Regression*

In [6]:
w = np.array([0.5, 1.3, 3.4, -6.6])
d = np.array([0.3, 0.2, 0.1, 0.4])
a = 0.01
w = w - a*d

# NumPy

In [7]:
import time

In [9]:
a = np.zeros(4);                print(f"np.zeros(4) :   a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.zeros((4,));             print(f"np.zeros(4,) :  a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.zeros(4) :   a = [0. 0. 0. 0.], a shape = (4,), a data type = float64
np.zeros(4,) :  a = [0. 0. 0. 0.], a shape = (4,), a data type = float64
np.random.random_sample(4): a = [0.87071798 0.12786487 0.02801432 0.92533758], a shape = (4,), a data type = float64


In [10]:
a = np.arange(4.);              print(f"np.arange(4.):     a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.rand(4);          print(f"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.arange(4.):     a = [0. 1. 2. 3.], a shape = (4,), a data type = float64
np.random.rand(4): a = [0.68640738 0.76972181 0.52036788 0.54925913], a shape = (4,), a data type = float64


In [11]:
#vector indexing operations on 1-D vectors
a = np.arange(10)
print(a)

#access an element
print(f"a[2].shape: {a[2].shape} a[2]  = {a[2]}, Accessing an element returns a scalar")

# access the last element, negative indexes count from the end
print(f"a[-1] = {a[-1]}")

#indexs must be within the range of the vector or they will produce and error
try:
    c = a[10]
except Exception as e:
    print("The error message you'll see is:")
    print(e)

[0 1 2 3 4 5 6 7 8 9]
a[2].shape: () a[2]  = 2, Accessing an element returns a scalar
a[-1] = 9
The error message you'll see is:
index 10 is out of bounds for axis 0 with size 10


*Single vector operations*

In [12]:
a = np.array([1,2,3,4])
print(f"a             : {a}")
# negate elements of a
b = -a 
print(f"b = -a        : {b}")

# sum all elements of a, returns a scalar
b = np.sum(a) 
print(f"b = np.sum(a) : {b}")

b = np.mean(a)
print(f"b = np.mean(a): {b}")

b = a**2
print(f"b = a**2      : {b}")

a             : [1 2 3 4]
b = -a        : [-1 -2 -3 -4]
b = np.sum(a) : 10
b = np.mean(a): 2.5
b = a**2      : [ 1  4  9 16]


*Dot Products*

In [14]:
def my_dot(a, b): 
    """
   Compute the dot product of two vectors
 
    Args:
      a (ndarray (n,)):  input vector 
      b (ndarray (n,)):  input vector with same dimension as a
    
    Returns:
      x (scalar): 
    """
    x=0
    for i in range(a.shape[0]):
        x = x + a[i] * b[i]
    return x

In [15]:
# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
print(f"my_dot(a, b) = {my_dot(a, b)}")

my_dot(a, b) = 24


In [17]:
# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
c = np.dot(a, b)
print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ") 

NumPy 1-D np.dot(a, b) = 24, np.dot(a, b).shape = () 


In [18]:
np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b)  #remove these big arrays from memory

np.dot(a, b) =  2501072.5817
Vectorized version duration: 7.9827 ms 
my_dot(a, b) =  2501072.5817
loop version duration: 3687.2375 ms 


*Matrices in NumPy*

In [20]:
a = np.zeros((1, 5))                                       
print(f"a shape = {a.shape}, a = {a}")                     

a = np.zeros((2, 1))                                                                   
print(f"a shape = {a.shape}, a = {a}") 

a = np.random.random_sample((2, 2))  
print(f"a shape = {a.shape}, a = {a}") 

a shape = (1, 5), a = [[0. 0. 0. 0. 0.]]
a shape = (2, 1), a = [[0.]
 [0.]]
a shape = (2, 2), a = [[0.04997798 0.77390955]
 [0.93782363 0.5792328 ]]


In [21]:
#vector indexing operations on matrices
a = np.arange(6).reshape(-1, 2)   #reshape is a convenient way to create matrices
print(f"a.shape: {a.shape}, \na= {a}")

#access an element
print(f"\na[2,0].shape:   {a[2, 0].shape}, a[2,0] = {a[2, 0]},     type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\n")

#access a row
print(f"a[2].shape:   {a[2].shape}, a[2]   = {a[2]}, type(a[2])   = {type(a[2])}")

a.shape: (3, 2), 
a= [[0 1]
 [2 3]
 [4 5]]

a[2,0].shape:   (), a[2,0] = 4,     type(a[2,0]) = <class 'numpy.int32'> Accessing an element returns a scalar

a[2].shape:   (2,), a[2]   = [4 5], type(a[2])   = <class 'numpy.ndarray'>


In [22]:
#vector 2-D slicing operations
a = np.arange(20).reshape(-1, 10)
print(f"a = \n{a}")

#access 5 consecutive elements (start:stop:step)
print("a[0, 2:7:1] = ", a[0, 2:7:1], ",  a[0, 2:7:1].shape =", a[0, 2:7:1].shape, "a 1-D array")

#access 5 consecutive elements (start:stop:step) in two rows
print("a[:, 2:7:1] = \n", a[:, 2:7:1], ",  a[:, 2:7:1].shape =", a[:, 2:7:1].shape, "a 2-D array")

# access all elements
print("a[:,:] = \n", a[:,:], ",  a[:,:].shape =", a[:,:].shape)

# access all elements in one row (very common usage)
print("a[1,:] = ", a[1,:], ",  a[1,:].shape =", a[1,:].shape, "a 1-D array")
# same as
print("a[1]   = ", a[1],   ",  a[1].shape   =", a[1].shape, "a 1-D array")

a = 
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
a[0, 2:7:1] =  [2 3 4 5 6] ,  a[0, 2:7:1].shape = (5,) a 1-D array
a[:, 2:7:1] = 
 [[ 2  3  4  5  6]
 [12 13 14 15 16]] ,  a[:, 2:7:1].shape = (2, 5) a 2-D array
a[:,:] = 
 [[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]] ,  a[:,:].shape = (2, 10)
a[1,:] =  [10 11 12 13 14 15 16 17 18 19] ,  a[1,:].shape = (10,) a 1-D array
a[1]   =  [10 11 12 13 14 15 16 17 18 19] ,  a[1].shape   = (10,) a 1-D array


# Gradient Descent for Multiple Linear Regression using Vectorisation 

*Model*
$$f_{w, b}(x) = w_1x_1 + w_2x_2 + ... + w_nx_n + b$$
Or,
$$f_{\vec{w}, b}(\vec{x}) = \vec{w} \cdot \vec{x} + b $$
*Cost Function*
$$J(w_1, w_2, ... ,w_n, b)$$
Or,
$$J(\vec{w}, b)$$
*Gradient Descent*
$$\vec{w} = \vec{w} - \alpha \frac{\partial}{\partial \vec{w}}J(\vec{w}, b)$$
$$b = b - \alpha \frac{\partial}{\partial b}J(\vec{w}, b)$$

Upon solving we can arrive at,
$$w_j = w_j - \alpha \frac{1}{m}\sum_{j=1}^m \bigg(f_{\vec{w}, b}(\vec{x}^{(i)}) - y^{(i)} \bigg)x_j^{(i)}$$
$$b = b - \alpha \frac{1}{m}\sum_{j=1}^m \bigg(f_{\vec{w}, b}(\vec{x}^{(i)}) - y^{(i)} \bigg)$$

*An alternative to gradient descent*

Normal equation
- Only for linear regression
- Solve for w, b without iterations

Disadvantages
- Doesn't generalize to other learning algorithms.
- Slow when number of features is large (> 10,000)