# Intro-to-Numerical-Python (NumPy)

 - NumPy’s main object is the homogeneous multidimensional array. 
 - It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. 
 - In NumPy dimensions are called axes.

In [3]:
import numpy as np

In [101]:
mylist = [1, 2, 3, 4, 5, 6]
mylist

[1, 2, 3, 4, 5, 6]

In [102]:
myarray = np.array(mylist)
myarray

array([1, 2, 3, 4, 5, 6])

In [103]:
myarray.dtype

dtype('int64')

In [104]:
myarray.ndim

1

In [105]:
c = myarray.reshape((1,6))

In [106]:
c

array([[1, 2, 3, 4, 5, 6]])

In [107]:
c.ndim

2

In [108]:
d = c.reshape((6,))

In [109]:
d

array([1, 2, 3, 4, 5, 6])

In [110]:
d.ndim

1

Here myarray has one axis. That axis has 6 elements in it, so we say it has a length of 6. 

In [6]:
myarray.ndim

1

In [7]:
type(myarray)

numpy.ndarray

In [8]:
len(myarray)

6

### Differences between lists and NumPy Arrays
 - An array's size is immutable. You cannot append, insert or remove elements, like you can with a list.
 - All of an array's elements must be of the same data type.
 - A NumPy array behaves in a Pythonic fashion. You can len(my_array) just like you would assume.

In [None]:
# Can have elements appended to it
mylist.append(4.0)
# Can have multiple datatypes in it.
mylist.insert(1, "notnum")
# Can have items removed
mylist.pop(1)

## Multidimensional Array

 - The data structure is actually called ndarray, representing any number of dimensions
 - Arrays can have multiple dimensions, you declare them on creation
 - Dimensions help define what each element in the array represents. A two dimensional array is just an array of arrays
 - Rank defines how many dimensions an array contains
 - Shape defines the length of each of the array's dimensions
 - Each dimension is also referred to as an axis, and they are zero-indexed. Multiples are called axes.
 - A 2d array is AKA matrix.

In [19]:
# Create a 2D array
b = np.array([[1,2.,3,4],[4,5.2,6,7]])   
b

array([[1. , 2. , 3. , 4. ],
       [4. , 5.2, 6. , 7. ]])

You can determine dimension of array with attribute shape

In [13]:
b.shape  # (rows, colums)                   

(2, 4)

ndarray.shape: the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

In [14]:
b.ndim, len(b.shape)

(2, 2)

In [15]:
b.size

8

In [17]:
len(b[0])

4

In [20]:
b.dtype

dtype('float64')

In [21]:
type(b)

numpy.ndarray

Individual elements of these arrays can be accessed by passing indices in square brackets. The format is array[r, c] or array[r][c]. Lets see few examples 

In [22]:
print(b)
b[0, 0], b[0, 1], b[1, 0]   

[[1.  2.  3.  4. ]
 [4.  5.2 6.  7. ]]


(1.0, 2.0, 4.0)

b[0,1] = b[0][1] though the second case is more inefficient as a new temporary array is created after the first index that is subsequently indexed by 2.

### ways of creating some arrays with default values

1. To create sequences of numbers, NumPy provides a function analogous to range that returns arrays instead of lists.

In [24]:
temp = np.arange(10, 30, 5)
temp

array([10, 15, 20, 25])

In [25]:
temp.shape, temp.ndim

((4,), 1)

In [28]:
np.arange(10).reshape(2,5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. 

The function 2. **zeros** creates an array full of zeros, the function 3. **ones** creates an array full of ones, and the function 4. **empty** creates an array whose initial content is random and depends on the state of the memory. 

In [32]:
# numpy.zeros(shape, dtype=float)
# shape : int or tuple of ints
np.zeros((2,2))

array([[0., 0.],
       [0., 0.]])

In [33]:
# numpy.ones(shape, dtype=float)
np.ones((3,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [36]:
np.ones((2,3,4), dtype=np.int16)

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [38]:
np.empty((2,3)) 

array([[0., 0., 0.],
       [0., 0., 0.]])

In [40]:
#5. linspace
np.linspace(0, 4, 9).reshape((3,3)) # return 9 evenly spaced values from 0 to 4

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

In [42]:
import math

In [43]:
# All constants
np.full((2,2), math.pi)

array([[3.14159265, 3.14159265],
       [3.14159265, 3.14159265]])

In [44]:
# One more example with all constants
np.full((3,2),4,dtype=int)

array([[4, 4],
       [4, 4],
       [4, 4]])

In [45]:
# Identity Matrix
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [46]:
# Random numbers from [0,1]
np.random.random((4,3))

array([[0.37978658, 0.85710436, 0.76799795],
       [0.91766926, 0.26697413, 0.55946729],
       [0.23626121, 0.76171965, 0.47153062],
       [0.59301709, 0.62319957, 0.8559751 ]])

As a side note ,  single random number from [0,1) can be obtained like this 

In [49]:
np.random.random()

0.03085790689630652

To obtain a number in the interval [a,b), you can simply multiply above with (b-a) and then add a.

In [50]:
# random number from [5,95]
90*np.random.random()+5

46.01186958923985

## Dependence of subset of an array

One important thing that you need to keep in mind is that a subset of a numpy array doesnt become an independent array . If you make any changes in that , they reflect in the parent array. you need to use function copy to make an independent array. Lets understand that with an example

In [85]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [86]:
b = a[:2, 1:3]
print("this is b",":\n",b)
# b is still part of a , not an independent array

this is b :
 [[2 3]
 [6 7]]


In [88]:
a.dtype, b.dtype

(dtype('int64'), dtype('int64'))

In [82]:
b = b*3

In [83]:
b

array([[ 6,  9],
       [18, 21]])

In [84]:
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [74]:
c=a[:2, 1:3].copy()
print("this is c",":\n",c)
# c is independent of a, its a fresh array

this is c :
 [[2 3]
 [6 7]]


Lets look at element a[0,1] , this is same as b[0,0]

In [75]:
print(a)
print(a[0, 1])
print(b[0,0])

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
2
2


In [76]:
b[0, 0] = 77
# notice that we are not changing "a" here
print(a)

[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


You can see that although we changed value for b[0,0] but that ended up changing value for a[0,1] too. now lets look at a[0,2] which is same as c[0,1] . lets see if chaning c[0,1] has any effect on a[0,2].


In [77]:
a

array([[ 1, 77,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [78]:
c

array([[2, 3],
       [6, 7]])

In [69]:
print(a[0,2])
print(c[0,1])

3
3


In [71]:
a

array([[ 1, 77,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [79]:
c[0, 1] = 99
print(a)
print(c)

[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[[ 2 99]
 [ 6  7]]


### More on accessing arrays with indices
Its not necessary that when you are accessing elements of array; index values have to be continuous . They can be any number as long as they do not go out of range of exsiting element positions.

In [89]:
a = np.array([[1,2], [3, 4], [5, 6]])
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [90]:
a[[0, 1, 2], [0, 1, 0]] # a[[r1, r2, r3], [c1, c2, c3]]

array([1, 4, 5])

In [None]:
# another example
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
a

In [None]:
b = np.array([0, 2, 0, 1])
c=np.arange(4)
c

In [None]:
a[c, b]

Using index you can access elements as well as modify them

In [None]:
a

In [None]:
a[c, b] += 10
a

### Conditional Access of arrays
if "a" here was a single element , wirintg a>2 wil generate True or False depeneding on whetrher that particular condition was true.

Now when "a" is a numpy array, that comparison will be done for each element and result will be an array of shape same as "a"->containing True/False for each element.

In [111]:
a = np.array([[1,2], [3, 4], [5, 6]])
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [112]:
c=a > 2
print(c)

[[False False]
 [ True  True]
 [ True  True]]


We can use , these comparison expressions directly for access. Result is only those elements for which the expression evaluates to True

In [113]:
print(a[c])
print(a[c].shape)

[3 4 5 6]
(4,)


In [115]:
a = np.linspace(0,10,10).reshape((2,5))
a

array([[ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444],
       [ 5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ]])

In [116]:
b = np.linspace(0,5,10).reshape((2,5))
b

array([[0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222],
       [2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ]])

In [117]:
b[a>5] = a[a>5]

In [118]:
b

array([[ 0.        ,  0.55555556,  1.11111111,  1.66666667,  2.22222222],
       [ 5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ]])

notice that the result is a 1D array.

Lets see if this works with writing mulitple conditions as well. In that process we'll also see that we dont have to store results in one variable and then pass for subsetting. We can instead, write the conditional expression directly for subsetting.

In [119]:
a

array([[ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444],
       [ 5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ]])

In [128]:
r , z = a[(a>2) | (a<5)] , a[(a>2) & (a<5)]

ValueError: not enough values to unpack (expected 3, got 2)

In [125]:
r

array([ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444,
        5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

In [126]:
z

array([2.22222222, 3.33333333, 4.44444444])

In [127]:
type(r)

numpy.ndarray

### Array Operations

We'll see that you can use ; both normal symbols as well as numpy functions for array operations. Lets look at these operations with examples

Use `+`, `-`, `*`, `/` and `**` to perform element wise addition, subtraction, multiplication, division and power.

In [129]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

In [130]:
x

array([[1, 2],
       [3, 4]])

In [131]:
y

array([[5, 6],
       [7, 8]])

In [132]:
x+y

array([[ 6,  8],
       [10, 12]])

In [133]:
np.add(x,y)

array([[ 6,  8],
       [10, 12]])

In [134]:
print(x-y)

[[-4 -4]
 [-4 -4]]


In [135]:
np.subtract(x,y)

array([[-4, -4],
       [-4, -4]])

In [136]:
# element wise multiplication , not matrix multiplication
print(x)
print("~~~~~")

print(y)
print("~~~~~")
print(x * y)

[[1 2]
 [3 4]]
~~~~~
[[5 6]
 [7 8]]
~~~~~
[[ 5 12]
 [21 32]]


In [137]:
np.multiply(x, y)

array([[ 5, 12],
       [21, 32]])

In [138]:
print(x/y)

[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [139]:
np.divide(x,y)

array([[0.2       , 0.33333333],
       [0.42857143, 0.5       ]])

In general you'll find that , mathematical functions from numpy [being referred as np here ] when applied on array, give back result as an array where that function has been applied on individual elements. 

In [140]:
np.sqrt(x)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ]])

In [141]:
a = np.array([1,2,3,4])
b = np.array([1,2,3,4, 5])
c = a + b

ValueError: operands could not be broadcast together with shapes (4,) (5,) 

### np.dot
numpy.dot(a, b)
 - If both a and b are 1-D arrays, it is inner product of vectors
 - If both a and b are 2-D arrays, it is matrix multiplication

<br>
**Dot Product:**  

$ \begin{bmatrix}x_1 \ x_2 \ x_3\end{bmatrix}
\cdot
\begin{bmatrix}y_1 \\ y_2 \\ y_3\end{bmatrix}
= x_1 y_1 + x_2 y_2 + x_3 y_3$

In [142]:
v = np.array([9,10])
v

array([ 9, 10])

In [143]:
w = np.array([11, 12])
w

array([11, 12])

In [144]:
v.shape, w.shape

((2,), (2,))

In [145]:
# Matrix multiplication
v.dot(w)

219

You can see that result is not what you'd expect from matrix multiplication. This happens because a single dimensional array is not a matrix.

In [146]:
print(v.shape)
print(w.shape)

(2,)
(2,)


We can make them to be 2X1 matrices by manually applying that shape to them

In [147]:
v=v.reshape((1,2))
w=w.reshape((1,2))

In [148]:
print(v.shape)
print(w.shape)

(1, 2)
(1, 2)


Now if you simply try to do v.dot(w) or np.dot(v,w) [both are same] , you will get and error because you can multiple a mtrix of shape 2X1 with a matrix of 2X1 .

In [150]:
np.dot(v,w.T)

array([[219]])

In [151]:
print('matrix v : ',v)
print('matrix v Transpose:',v.T)
print('matrix w:',w)
print('matrix w Transpose:',w.T)
print('~~~~~~~~~')
print(np.dot(v,w.T))
print('~~~~~~~~~')
print(np.dot(v.T,w))

matrix v :  [[ 9 10]]
matrix v Transpose: [[ 9]
 [10]]
matrix w: [[11 12]]
matrix w Transpose: [[11]
 [12]]
~~~~~~~~~
[[219]]
~~~~~~~~~
[[ 99 108]
 [110 120]]


If you leave v to be a single dimensional array . you will simply get an element wise multiplication. Here is an example

In [152]:
print(x)
v=np.array([9,10])
print("~~~~~")
print(v)
x.dot(v)

[[1 2]
 [3 4]]
~~~~~
[ 9 10]


array([29, 67])

In [None]:
print(x)
print("~~~")
print(y)
x.dot(y)

### other functions

In [153]:
x = np.array([[1,2],[3,4]])
x

array([[1, 2],
       [3, 4]])

In [154]:
np.sum(x)

10

Using axis option in the function sum , you can some across both the dimension of array separately as well

In [155]:
np.sum(x, axis=0)

array([4, 6])

In [156]:
np.sum(x, axis=1)

array([3, 7])

In [None]:
# Transpose : we have used this one already
x

In [None]:
x.T

So far we have seen that, when we do operations between two arrays; operation happens between corresponding elements of the arrays. Many at times , shape of arrays will not match and correspondence between elements will not be complete. In such case , elements of the smaller array are recycled to makeup for the correspondence.

In [157]:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])

In [158]:
x

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [159]:
v

array([1, 0, 1])

here v is a smaller array than x, lets see what happens when we do operation between x and v. But before that , we are going to replicate v to make up for the correpondence ourselves and see the result

In [160]:
vv = np.tile(v, (4, 1))  # Stack 4 copies of v on top of each other
vv

array([[1, 0, 1],
       [1, 0, 1],
       [1, 0, 1],
       [1, 0, 1]])

In [161]:
x.shape, v.shape, vv.shape

((4, 3), (3,), (4, 3))

In [162]:
print(x)
print("~~~~~")
print(vv)
x + vv

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
~~~~~
[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]


array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

Now lets check what would have been the result if we added just x and v


In [163]:
x + v # produce the same result as x + vv

array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

Lets see some more examples of operations between mis matching shape arrays

In [164]:
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)

In [165]:
v.shape

(3,)

In [166]:
w.shape

(2,)

In [167]:
x = np.array([[1,2,3], [4,5,6]]) # x has shape (2,3])
x.shape

(2, 3)

In [168]:
x

array([[1, 2, 3],
       [4, 5, 6]])

In [169]:
print(x)
print("~~~~~")
print(v)
x + v

[[1 2 3]
 [4 5 6]]
~~~~~
[1 2 3]


array([[2, 4, 6],
       [5, 7, 9]])

What we see here is known as broadcasting of values. But dimensions need to be compatible for that too

In [170]:
v=np.array([1,2])

In [171]:
v.shape

(2,)

In [172]:
x+v

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

In [173]:
x.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [174]:
print(x)
print("~~~~~")
print(w)
x.T + w

[[1 2 3]
 [4 5 6]]
~~~~~
[4 5]


array([[ 5,  9],
       [ 6, 10],
       [ 7, 11]])

In [175]:
(x.T + w).T

array([[ 5,  6,  7],
       [ 9, 10, 11]])

In [None]:
x + np.reshape(w, (2, 1))

## Math Functions

In [176]:
a = np.array([-4, -2, 1, 3, 5])

In [177]:
a.sum()

3

In [178]:
a.max()

5

In [179]:
a.min()

-4

In [180]:
a.mean()

0.6

In [181]:
a.std()

3.2619012860600183

In [182]:
a.argmax()

4

In [184]:
a[a.argmax()]

5

In [183]:
a.argmin()

0

## Saving data

In [188]:
import os

In [194]:
os.getcwd()

'/Users/pradip.gupta/personal-projects/my-learning-notebooks/intro-to-python-ml/week1'

In [196]:
os.getcwd()+"/data"

'/Users/pradip.gupta/personal-projects/my-learning-notebooks/intro-to-python-ml/week1/data'

In [195]:
os.path.join(os.getcwd(),"data")

'/Users/pradip.gupta/personal-projects/my-learning-notebooks/intro-to-python-ml/week1/data'

In [None]:
'/Users/pradip.gupta/personal-projects/my-learning-notebooks/intro-to-python-ml/'

In [193]:
np.save("../data/temp.npy",a)

In [186]:
z = np.load("temp.npy")

In [187]:
z

array([-4, -2,  1,  3,  5])