## Linear Regression 

Regression analysis allow us to identify and quantify relationship between data. We will learn how to apply numerical optimization techniques to understand how different variables influence each other. Linear Regression is the most basic form of Machine Learning, today lesson will serve to lay the foundation for the next chapters in the class.  

Before learning about linear, we need to examine one of the most important libraries in Python: 

### NumPy (www.numpy.org)

NumPy is important in scientific computing, it is coded both in Python and C++ (for speed). A few important features for Numpy are:

- a powerful N-dimensional array object

- sophisticated broadcasting functions

- tools for integrating C/C++ and Fortran code

- useful linear algebra, Fourier transform, and random number capabilities

Next, we will introduce Numpy arrays, which are related to the data structures.

In order to use Numpy module, we need to import it first. A conventional way to import it is to use “np” as a shortened name using

```python
import numpy as np
```

In [1]:
import numpy as np
#to create an array, we use the numpy funcion array
x = np.array([[1,2,3]])
x

array([[1, 2, 3]])

Arrays are entered by rows, each row is defined as a list. To create a 2d array, simply use nested lists

In [16]:
y = np.array([[1,2,3],[4,5,6]])
y

array([[1, 2, 3],
       [4, 5, 6]])

Numpy arrays are objects and have multiple methods and attributes associated

In [7]:
y.shape, y.size, y.max(), y.argmax(), y.min(), y.argmin(), y.mean(), y.std()

((3, 3), 9, 6, 5, 1, 0, 3.6666666666666665, 1.4907119849998598)

In [14]:
y.mean(axis=1)

array([2., 5.])

You can modify the shape of an array using the ```reshape``` method

In [21]:
print(y.shape)
z = y.reshape(3,2)
print(z.shape)

(2, 3)
(3, 2)


You don't need to provide all the dimension, numpy can figure out some information for you

In [22]:
y = np.array([[1,2,3],[4,5,6]])
z = y.reshape(3,-1)
z

array([[1, 2],
       [3, 4],
       [5, 6]])

In [16]:
z = y.reshape(-1,2)
z

array([[1, 2],
       [3, 4],
       [5, 6]])

You can access the elements in the array by index. There are multiple ways to access the element in the array

In [20]:
x = np.array([[1,2,3]])
x, x[0][1]

(array([[1, 2, 3]]), 2)

In [22]:
x = np.array([[1],[2],[3]])
x, x[1][0]

(array([[1],
        [2],
        [3]]),
 2)

In [24]:
y = np.array([[1,2,3],[4,5,6]])
y[1], y[1][2]
print(y)

[[1 2 3]
 [4 5 6]]


You can also use slices to obtain a section of the array:

In [27]:
# What result will you obtain after this operation? 
y[:,:2]

array([[1, 2],
       [4, 5]])

In [28]:
# you an also access mutiple rows or columns by index
# What result will you obtain after this operation? 
y[:,[0,2]]

array([[1, 3],
       [4, 6]])

In [32]:
# What result will you obtain after this operation? 
y[:,::2]

array([[1, 3],
       [4, 6]])

NumPy includes methods to generate arrays that have a structure. 

- ```arange``` -> generates arrays that are in order and evenly spaced,
- ```linspace``` -> generates an array of n equally spaced elements starting from a defined beginning and end points

In [33]:
large_array = np.arange(0,2000,1)
large_array[:100]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [34]:
large_array[-1]

1999

In [35]:
large_array = np.arange(0,2000,2)
large_array[:100]

array([  0,   2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,
        26,  28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,
        52,  54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,
        78,  80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100, 102,
       104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128,
       130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154,
       156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180,
       182, 184, 186, 188, 190, 192, 194, 196, 198])

In [36]:
large_array[-1]

1998

In [38]:
large_array = np.linspace(0, 1999, 2000)
large_array

array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.997e+03, 1.998e+03,
       1.999e+03])

In [33]:
len(large_array)

2000

Numpy offers many pre-build functions to easily create arrays with specific dimensions:

- ```zeros```
- ```ones```
- ```empty```
- ```magic```
- ```ones_like```
- ```zeros_like```

In [40]:
y = np.ones((5,3))
y

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [41]:
z = np.zeros_like(y)
z

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [42]:
z[:,2] = y[:,1]
z

array([[0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.]])

NumPy has powerful broadcasting abilities. You can do mathematical operation with arrays of different sizes and NumPy will take care of the operation if possible

## Operations with scalars 

In [109]:
b =  np.array([[0,1],[2,3]])
c = 2

In [110]:
b+c

array([[2, 3],
       [4, 5]])

In [111]:
b-c

array([[-2, -1],
       [ 0,  1]])

In [112]:
b*c

array([[0, 2],
       [4, 6]])

In [113]:
b/c

array([[0. , 0.5],
       [1. , 1.5]])

In [114]:
b**c

array([[0, 1],
       [4, 9]])

## Operations between arrays

In [43]:
b =  np.array([[0,1],[2,3]])
d =  np.array([[4,5],[6,7]])

In [44]:
b+d

array([[ 4,  6],
       [ 8, 10]])

In [45]:
b-d

array([[-4, -4],
       [-4, -4]])

In [46]:
b*d

array([[ 0,  5],
       [12, 21]])

In [47]:
b/d

array([[0.        , 0.2       ],
       [0.33333333, 0.42857143]])

In [48]:
b**d

array([[   0,    1],
       [  64, 2187]])

The *, /, and ** operations are operating on an element by element basis. 

## Operations between arrays of different sizes

In [49]:
b =  np.array([[0,2],[3,4]])
d =  np.array([[4],[5]])

b, d

(array([[0, 2],
        [3, 4]]),
 array([[4],
        [5]]))

In [50]:
b+d

array([[4, 6],
       [8, 9]])

Can you explain what is going on? 

In [None]:
b-d

In [118]:
b*d

array([[ 0,  8],
       [15, 20]])

In [119]:
b/d

array([[0. , 0.5],
       [0.6, 0.8]])

In [120]:
b**d

array([[   0,   16],
       [ 243, 1024]])

## Matrix Multiplication

In [51]:
b =  np.array([[0,1],[2,3]])
d =  np.array([[4,5],[6,7]])
e =  np.array([[4],[5]])
f =  np.array([[4,5]])

In [123]:
b@d, np.matmul(b,d)

(array([[ 6,  7],
        [26, 31]]),
 array([[ 6,  7],
        [26, 31]]))

In [52]:
e@f

array([[16, 20],
       [20, 25]])

In [53]:
# NumPy will tell you when you make a mistake
b@f

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 2)

In [54]:
# the .T atributes computes the transpose of a matrix
# it has precedence over other operations
b@(f.T)

array([[ 5],
       [23]])