<a href="https://colab.research.google.com/github/aniruddhamodak/Python-Notes-For-Data-Science/blob/master/9_NumPy_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NumPy

!pip install numpy

Numpy is the backbone of Machine Learning in Python. It is one of the most important libraries in Python for numerical computations. 

It adds support to core Python for multi-dimensional arrays (and matrices) and 
fast vectorized operations on these arrays.

Numpy ndarray
-------------
All of the numeric functionality of numpy is orchestrated by two important constituents of the numpy package, ndarray and Ufuncs (Universal function). 

Numpy ndarray is a multi-dimensional array object which is the core data container for all of the numpy operations.

In [0]:
# Let’s create an numpy array.
import numpy as np
# [1]
arr = np.array([2,1,3,4.48,5])
print(arr)
print(type(arr))
print(arr.shape)
print(arr.dtype)

[2.   1.   3.   4.48 5.  ]
<class 'numpy.ndarray'>
(5,)
float64


In [0]:
# shape attribute of the array object will tell us 
# about the dimensions of the array.
# [2]
arr.shape

(5,)

In [0]:
arr.dtype

dtype('int32')

One important thing to keep in mind is 
--------------------------------------
that all the elements in an array must have the same data type. If you try to initialize an array in which the elements are mixed, i.e. you mix some strings with the numbers then all of the elements will get converted into a string type and we won’t be able to perform most of the numpy operations on that array. Another thing to keep in mind is that numpy is used for numerical computations, so it will be foolish enough if u think of having an array having strings

In [0]:
arr = np.array([1,'st','er',3])
print(arr)
print(arr.dtype)

['1' 'st' 'er' '3']
<U11


U -> stands for unicode string.
<U11 -> stands for little endian, 11 char unicode string
For more on data types see on 
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html

In [0]:
# [5]
np.sum(arr)

TypeError: cannot perform reduce with flexible type

In [0]:
arr = np.array([1,2,3,4])
print(arr)
print(arr.dtype)
print(np.sum(arr))

[1 2 3 4]
int32
10


Creating Arrays
---------------

Arrays can be created in multiple ways in numpy. 
One of the ways was demonstrated earlier to create a single dimensional
array. 

Similarly we can stack up multiple lists to create a multidimensional array.

In [0]:
# [6]
arr = np.array([  [1,2,3],  [4,5,6],  [7,8,9]  ])
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [0]:
arr.shape

(3, 3)

In [0]:
arr1 = np.array([[1,2,3],[2,4,6],[8,8]])
print("arr1= ",arr1)
print("Shape= ",arr1.shape)

arr1=  [list([1, 2, 3]) list([2, 4, 6]) list([8, 8])]
Shape=  (3,)


In [0]:
arr2 = np.array([[1,2,3],[2,4,6],[8,None,None]])
print(arr2)
print("Shape= ",arr2.shape)

[[1 2 3]
 [2 4 6]
 [8 None None]]
Shape=  (3, 3)


In [0]:
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In addition to this we can create arrays using a bunch of special functions provided by numpy.

np.zeros 
--------
Creates a matrix of specified dimensions containing only zeroes.

In [0]:
# [10]
arr = np.zeros( (2,4) )
arr

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [0]:
# [11]
arr3 = np.ones([3,5])
arr3

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [0]:
# [12]
arr3 = np.threes( (3,5) )
arr3

AttributeError: module 'numpy' has no attribute 'threes'

np.identity 
-----------
Creates an identity matrix of specified dimensions:

In [0]:
# [13]
arr  = np.identity(5)
arr

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

Creating and Assigning an Array with random values :
---------------------------------------------------
Often, an important requirement is to initialize an array of a specified dimension with random values.

This can be done easily by using the randn function from the 
numpy.random package:

In [0]:
arr = np.random.randn(5,7)
arr

array([[ 0.75649387, -0.72012717, -1.89766957,  0.31468033,  1.02235771,
         0.19548638,  1.5433681 ],
       [ 1.54907033,  0.38889501,  0.32446754,  0.58071144, -1.19860533,
        -2.23585907,  1.07478287],
       [-0.8261365 , -0.44507826, -1.80489152,  1.11510269,  0.90658101,
        -1.38436254, -0.17473888],
       [-0.01217872,  0.05652854, -0.32267872, -2.78808271,  1.46390245,
        -0.27571375, -0.79922546],
       [ 0.38184212, -0.83155019,  0.83598305, -0.01430201, -0.21569448,
        -0.96465669, -0.49189765]])

In [0]:
# One-Dimensional List to Array
# one dimensional example
from numpy import array
# list of data
data = [11, 22, 33, 44, 55]
# array of data
data = array(data)
print(data)
print(type(data))
print(data.shape)

[11 22 33 44 55]
<class 'numpy.ndarray'>


In [0]:
# Two-Dimensional List of Lists to Array
# two dimensional example
from numpy import array
# list of data
data = [[11, 22],    [33, 44],      [55, 66]]
# array of data
data = array(data)
print(data)
print(type(data))
print(data.shape)

[[11 22]
 [33 44]
 [55 66]]
<class 'numpy.ndarray'>
(3, 2)


# Basic Indexing and Slicing

In [0]:
#### Accessing array elements
#### Simple indexing
arr = np.array([[1,2,3],[2,4,6],[8,8,8]])
print("arr\n",arr)
print("\nshape:\n",arr.shape)
print("\narr1:\n",arr[2])

arr
 [[1 2 3]
 [2 4 6]
 [8 8 8]]

shape:
 (3, 3)

arr1:
 [8 8 8]


In [0]:
# [19]
np.arange(12)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [0]:
# [20]
arr = np.arange(12).reshape(2,2,3)

arr

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [0]:
arr[0]  # prints the 0th array

array([[0, 1, 2],
       [3, 4, 5]])

In [0]:
arr[1]  # prints the 1st array

array([[ 6,  7,  8],
       [ 9, 10, 11]])

In [0]:
# concept of slicing arrays 
arr = np.arange(10)
print(arr)
arr[5:]  

[0 1 2 3 4 5 6 7 8 9]


array([5, 6, 7, 8, 9])

In [0]:
arr1 = np.arange(10)
arr1[10:]

array([], dtype=int32)

In [0]:
arr2 = np.arange(10)
arr[10:15]

array([], dtype=int32)

In [0]:
print(arr)
arr[5:8]  # slice elements from 5 to (8-1) i.e 5 to (endindex-1)

[0 1 2 3 4 5 6 7 8 9]


array([5, 6, 7])

In [0]:
print(arr)
arr[:-5] 

[0 1 2 3 4 5 6 7 8 9]


array([0, 1, 2, 3, 4])

In [0]:
arr[-4:]

array([6, 7, 8, 9])

In [0]:
arr[1:-4]

array([1, 2, 3, 4, 5])

In [0]:
arr = np.arange(12).reshape(2,2,3)
arr

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [0]:
arr[0:1]

array([[[0, 1, 2],
        [3, 4, 5]]])

In [0]:
arr[1:2]

array([[[ 6,  7,  8],
        [ 9, 10, 11]]])

In [0]:
arr[0:2]

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [0]:
arr

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [0]:
arr[:,:,2]   # accessing the 2nd column from all the arrays.

array([[ 2,  5],
       [ 8, 11]])

In [0]:
arr[...,2] # using dot notation i.e ... followed with no. of the column
#in java, these 3 dots are called elipses

array([[ 2,  5],
       [ 8, 11]])

In [0]:
arr[...,1] # here 1 means values of 1st column

array([[ 1,  4],
       [ 7, 10]])

In [0]:
arr

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [0]:
arr[1]

array([[ 6,  7,  8],
       [ 9, 10, 11]])

In [0]:
# [39]
arr[1][1]

array([ 9, 10, 11])

In [0]:
# [40]
arr[1][1][1]

10

In [0]:
arr = np.arange(9).reshape(3,3)
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Two-Dimensional Slicing
--
It is common to split your loaded data into input variables (X) and the output variable (y).

We can do this by slicing all rows and all columns up to, but before the last column, then separately indexing the last column.

For the input features, we can select all rows and all columns except the last one by specifying ‘:’ for in the rows index, and :-1 in the columns index.

X = [:, :-1]

For the output column, we can select all rows again using ‘:’ and index just the last column by specifying the -1 index.

y = [:, -1]

In [0]:
from numpy import array
# define array
data = array([[11, 22, 33],[44, 55, 66],[77, 88, 99]])
print("data=",data)
# separate data
X, y = data[:, :-1], data[:, -1]
# X = data[:, :-1]
# y =  data[:, -1]
print("X=",X)
print("---------------")
print("y=",y)

data= [[11 22 33]
 [44 55 66]
 [77 88 99]]
X= [[11 22]
 [44 55]
 [77 88]]
---------------
y= [33 66 99]


# Advanced Indexing

This advanced indexing occurs when the reference object is also an array. 

The simplest type of indexing is when we provide an array that’s equal in dimensions to the array being accessed.

For example:

In [0]:
print("\n",arr)
print("\n",arr[[0,1,2],[1,0,0]])


 [[0 1 2]
 [3 4 5]
 [6 7 8]]

 [1 3 6]


In this example we have provided an array in which the first part identifies the rows we want to access and the second identifies the columns which we want to address. i.e (0th row, 1st column) element is 1, (1st row, 0th column element is 3, (2nd row , 0th column) is 6 

This is quite similar to providing a collective element-wise address.

# Boolean Indexing

Boolean indexing: 
-----------------
This advanced indexing occurs when the reference object is an array of Boolean values. 

This is used when we want to access data based on some conditions, in that case, Boolean indexing can be used. 

We will illustrate it with an example. Suppose in one array, we have the names of some cities and in another array, we have some data related to those cities.

In [0]:
cities = np.array(["delhi","banglaore","mumbai","chennai","bhopal"])
city_data = np.random.randn(5,3)
print("\ncity_data:\n",city_data)


city_data:
 [[-0.5878923   1.72427524 -0.47744438]
 [-0.16022939  1.09435198  1.4904848 ]
 [-0.44973243 -0.6255381  -0.49365635]
 [-0.96386459 -0.75903213 -0.29638631]
 [ 0.40346108 -1.78832138  0.05078257]]


In [0]:
cities

array(['delhi', 'banglaore', 'mumbai', 'chennai', 'bhopal'], dtype='<U9')

In [0]:
print(cities =="mumbra")

[False False False False False]


In [0]:
city_data[cities =="mumbai"] 
# depending on which index is True, it prints that row
# In this case, mumbai is at index 2, therefore row no. 2 will be printed

array([[-0.44973243, -0.6255381 , -0.49365635]])

In [0]:
# if the match does not happen , then all values are false.
city_data[cities =="mumbra"] 
# the o/p in such a case is indicating shape as(0 rows, 3 features)

array([], shape=(0, 3), dtype=float64)

We can also use Boolean indexing for selecting some elements of an array that satisfy a particular condition. For example, in the previous array suppose we want to only select non-zero elements. 

We can do that easily using the following code.

In [0]:
city_data

array([[-0.5878923 ,  1.72427524, -0.47744438],
       [-0.16022939,  1.09435198,  1.4904848 ],
       [-0.44973243, -0.6255381 , -0.49365635],
       [-0.96386459, -0.75903213, -0.29638631],
       [ 0.40346108, -1.78832138,  0.05078257]])

In [0]:
# [48]
city_data[ city_data>0  ]

array([1.72427524, 1.09435198, 1.4904848 , 0.40346108, 0.05078257])

In [0]:
# [49]
city_data[ city_data>0  ] =0
city_data

array([[-0.5878923 ,  0.        , -0.47744438],
       [-0.16022939,  0.        ,  0.        ],
       [-0.44973243, -0.6255381 , -0.49365635],
       [-0.96386459, -0.75903213, -0.29638631],
       [ 0.        , -1.78832138,  0.        ]])

# Operations on Arrays

Most of the operations on the numpy arrays is achieved by using Universal functions (Ufuncs). 

Numpy provides a rich set of functions that we can leverage for various
operations on arrays. We only cover some of those functions here.

Universal functions are functions that operate on arrays in an element by element fashion. The implementation of Ufunc is vectorized, which means that the execution of Ufuncs on arrays is quite fast. The Ufuncs implemented in the numpy package are implemented in compiled C code for speed and efficiency.

In [0]:
arr = np.arange(15).reshape(3,5)
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [0]:
# we are adding a constant to all elements of the array
arr + 5  # most Ufunc return a array 
#the above concept is called BROADCAST which converts single element '5' 
# into an array.
#Broadcasting happens on its own

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [0]:
arr * 2

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

In [0]:
arr1 = np.arange(15).reshape(5,3)
arr2 = np.arange(5).reshape(5,1)
print("arr1\n",arr1)
print("arr2\n",arr2)
# [53]
arr2+arr1

arr1
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]
arr2
 [[0]
 [1]
 [2]
 [3]
 [4]]


array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10],
       [12, 13, 14],
       [16, 17, 18]])

Here we see that we were able to add up two arrays even when they were of different sizes. This is achieved by the concept of broadcasting.

>>> Concept of broadcasting -> later, after Linear Algebra topic.

# Linear Algebra Using numpy

In [0]:
# #### Linear algebra using numpy

A = np.array([[1,2,3],[4,5,6],[7,8,9]])
B = np.array([[9,8,7],[6,5,4],[1,2,3]])
print("A=",A)
print("B=",B)
# [54]
A.dot(B)

A= [[1 2 3]
 [4 5 6]
 [7 8 9]]
B= [[9 8 7]
 [6 5 4]
 [1 2 3]]


array([[ 24,  24,  24],
       [ 72,  69,  66],
       [120, 114, 108]])

In [0]:
# taking Transpose
A = np.arange(15).reshape(3,5)  # 3 rows and 5 columns each
print(A)
print("\n")
# [55]
A.T

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]




array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

Oftentimes, we need to find out decomposition of a matrix into its constituents factors. 

This is called matrix factorization. 

A popular matrix factorization method is SVD (singular value decomposition) factorization , which returns decomposition of a matrix into three different matrices. This can be done using linalg.svd function.

In [0]:
np.linalg.svd(A)  # linalg is the Linear Algebra package

(array([[-0.15425367,  0.89974393,  0.40824829],
        [-0.50248417,  0.28432901, -0.81649658],
        [-0.85071468, -0.3310859 ,  0.40824829]]),
 array([3.17420265e+01, 2.72832424e+00, 8.10792259e-16]),
 array([[-0.34716018, -0.39465093, -0.44214167, -0.48963242, -0.53712316],
        [-0.69244481, -0.37980343, -0.06716206,  0.24547932,  0.55812069],
        [ 0.49916309, -0.8355069 ,  0.19887686,  0.1121146 ,  0.02535234],
        [-0.30036899, -0.03396104,  0.33716014,  0.62903881, -0.63186892],
        [-0.24620048, -0.02783651,  0.80422076, -0.54013007,  0.0099463 ]]))

Why SVD is used?
--
Matrix decomposition, also known as matrix factorization, involves describing a given matrix using its constituent elements.

Perhaps the most known and widely used matrix decomposition method is the Singular-Value Decomposition, or SVD. All matrices have an SVD, which makes it more stable than other methods, such as the eigendecomposition. As such, it is often used in a wide array of applications including compressing, denoising, and data reduction.

https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/


Consider the system of equations:
7x + 5y -3z = 16
3x - 5y + 2z = -8
5x + 3y - 7z = 0

In [0]:
a = np.array([[7,5,-3], [3,-5,2],[5,3,-7]])
b = np.array([16,-8,0])
# [57]
x = np.linalg.solve(a,b)
x                             # will print the values of x,y, and z

array([1., 3., 2.])

In [0]:
# Finding the Inverse of a Matrix
# The NumPy library contains the ìnv function in the linalg module.

# let's find the inverse of a 2x2 matrix.
Y = np.array(([1,2], [3,4]))  
# [58]
Z = np.linalg.inv(Y)
print(Y)
print("\n")
print(Z)  

# How to find the inverse or determinant of a matrix ?
# https://www.mathsisfun.com/algebra/matrix-inverse.html

[[1 2]
 [3 4]]


[[-2.   1. ]
 [ 1.5 -0.5]]


In [0]:
# Finding the Determinant of a Matrix
# The determinant of a matrix can be calculated using the det method 

X = np.array(([1,2,3], [4,5,6], [7,8,9]))
# [59]
Z = np.linalg.det(X)
print(X)
print("\n")
print(Z)  

[[1 2 3]
 [4 5 6]
 [7 8 9]]


6.66133814775094e-16


In [0]:
# Finding the Trace of a Matrix
# The trace of a matrix is the sum of all the elements in the diagonal 
# of a matrix. The NumPy library contains trace function that can be 
# used to find the trace of a matrix.

X = np.array(([1,2,3], [4,5,6], [7,8,9]))
# [60]
Z = np.trace(X)
print(X)
print("\n")
print(Z)  

[[1 2 3]
 [4 5 6]
 [7 8 9]]


15


# Working of NumPy’s broadcasting functionality

Recall that for arrays of the same size, binary operations are performed on an
element-by-element basis:

In [0]:
import numpy as np

a = np.array([0, 1, 2])
b = np.array([5, 5, 5])

a + b

array([5, 6, 7])

Broadcasting allows these types of binary operations to be performed on arrays of different sizes—for example, we can just as easily add a scalar to an array:

In [0]:
a

array([0, 1, 2])

In [0]:
a + 5

array([5, 6, 7])

In [0]:
M = np.ones((3, 3))
M

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [0]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

Here the one-dimensional array a is stretched, or broadcast, across the second
dimension in order to match the shape of M.


Consider the following example:

In [0]:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]

print(a)
print("\n")
print(b)

[0 1 2]


[[0]
 [1]
 [2]]


In [0]:
a + b

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

Just as before we stretched or broadcasted one value to match the shape of the other, here we’ve stretched both a and b to match a common shape, and the result is a two dimensional array! 

The geometry of these examples is visualized in below figure.

![image for visualising Broadcasting](datasets_n_images/images/broadcasting_visual.png "Broadcasting Images" )