<h1>NUMPY TUTORIAL</h1>


## Introduction 

Numpy stands for Numrical Python.
It is the core library for scientific computing in Python.
It provides a high-performance multidimensional array object, and tools for working with these arrays.

In Python we have lists that serve the purpose of arrays, but they are slow to process.<br>
NumPy aims to provide an array object that is up to 50x faster that traditional Python lists.<br>

NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently.
This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures.<br>

Arrays are very frequently used in data science, where speed and resources are very important.

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers.<br>
The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

## Installation of Numpy

Install Numpy using command:<br>
><i>pip install numpy</i> <br>
    
Make sure that Python and Pip command is already installed.<br>
<b>If you are using Jupyter/ Collab Notebook then you don't require any installation !</b>

## Import Numpy


Once NumPy is installed, import it in your applications by adding the import keyword:<br>
><i>import numpy as np</i><br>
 
np is used as alias


In [None]:
import numpy as np

print(np.__version__)    # To check version of Numpy installed

## Creating Array Objects

NumPy is used to work with arrays. <br>
The array object in NumPy is called ndarray.<br>

We can create a Numpy ndarray object by using the array() function.

In [6]:
## 1-D ARRAY
import numpy as np

a = np.array([1, 2, 3])   # Create a rank 1 array
print(a)                  # Prints "[1 2 3]"

print(type(a))            #Prints "<class 'numpy.ndarray'>"

[1 2 3]
<class 'numpy.ndarray'>


In [7]:
## 2-D ARRAY
import numpy as np

b = np.array([[1, 2, 3], [4, 5, 6]])    # Create a rank 2 array  
print(b)                                # Prints "[[1, 2, 3], [4, 5, 6]]"

[[1 2 3]
 [4 5 6]]


In [19]:
## 3-D ARRAY
import numpy as np

c = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])  # Create a rank 3 array
print(c)                                                        # Prints " [[[1 2 3] [4 5 6]] [[1 2 3] [4 5 6]]] "

[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]


NumPy Arrays provides the ndim attribute that returns an integer that tells us how many dimensions the array have.

In [11]:
print(a.ndim)   #Prints "1"
print(b.ndim)   #Prints "2"
print(c.ndim)   #Prints "3"

1
2
3


In [12]:
import numpy as np

arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('number of dimensions :', arr.ndim)


[[[[[1 2 3 4]]]]]
number of dimensions : 5


### Shape of Array

The shape of an array is the number of elements in each dimension. <br>
NumPy arrays have an attribute called shape that returns a tuple with each index having the number of corresponding elements.

In [14]:
print(a.shape)   
print(b.shape)   
print(c.shape)

(3,)
(2, 3)
(2, 2, 3)


Integers at every index tells about the number of elements the corresponding dimension has.<br>
In the above example for array c, 1st element is 2, hence we can say there are 2 elements in first dimension.

## Numpy Array Indexing

Array Indexing is used to access specific elemt from array.<br>
One can access an array element by referring to its index number.<br>
The array indexing in numpy starts from 0.

In [21]:
print(a[0], a[1], a[2])

1 2 3


In [22]:
print(b[0, 0], b[0, 1], b[1, 0])

1 2 4


In [25]:
print(c[0,0,0], c[0,0,1], c[0,0,2])

1 2 3


Use negative indexing to access an array from the end.

In [27]:
print('Last element from 2nd dim: ', b[1, -1])

Last element from 2nd dim:  6


## Numpy Array Slicing

Slicing in python means taking elements from one given index to another given index.<br>
We pass slice instead of index like this: <i>[start:end].</i> <br>
We can also define the step, like this: <i>[start:end:step].</i><br>
If we don't pass start its considered 0 <br>
If we don't pass end its considered length of array in that dimension <br>
If we don't pass step its considered 1 <br>

In [41]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[0:5])    # Prints first 5 elements, not included 5th index

print(arr[4:])     # Prints all the elements after 4th index

print(arr[:4])     # Prints first 4 elements, not included 4th index

print(arr[-3:-1])  # Prints last 3 elements except last one

print(arr[::-1])   # Prints the array in reverse order

print(arr[0:5:2])  # Prints alternate elements upto index 4 i.e. even index elements

# Minus operator is used to access elemts from end, similar to indexing

[1 2 3 4 5]
[5 6 7]
[1 2 3 4]
[5 6]
[7 6 5 4 3 2 1]
[1 3 5]


In [38]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]

b = a[:2, 1:3]

print(a)
print()
print(b)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

[[2 3]
 [6 7]]


You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. 

In [39]:
import numpy as np

# Create rank 2 array with shape (3, 4)
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)  # Prints "[ 2  6 10] (3,)"
print(col_r2, col_r2.shape)  # Prints "[[ 2]
                             #          [ 6]
                             #          [10]] (3, 1)"

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[ 2  6 10] (3,)
[[ 2]
 [ 6]
 [10]] (3, 1)


One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:

In [46]:
import numpy as np

# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

print(a)  # prints "array([[ 1,  2,  3],
          #                [ 4,  5,  6],
          #                [ 7,  8,  9],
          #                [10, 11, 12]])"

# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints "[ 1  6  7 11]"

# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)  # prints "array([[11,  2,  3],
          #                [ 4,  5, 16],
          #                [17,  8,  9],
          #                [10, 21, 12]])

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[ 1  6  7 11]
[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


### Boolean Indexing

In [48]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.

print(bool_idx)      # Prints "[[False False]
                     #          [ True  True]
                     #          [ True  True]]"

print(a[bool_idx])  # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:
print(a[a > 2])     # Prints "[3 4 5 6]"

[[False False]
 [ True  True]
 [ True  True]]
[3 4 5 6]
[3 4 5 6]


## Some Useful Functions

In [56]:
import numpy as np

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"
    
f = np.arange(0,30,2)    # Creates even spaced ndarray
print(f)                 # Prints all the even numbers between 0 to 30

g = np.linspace(0, 4, 9)  # Returns 9 evenly spaced values from 0 to 4 
print(g)                  # Prints " [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4. ] "

h = np. repeat([1, 2, 3], 3) # Prints " [1 1 1 2 2 2 3 3 3] "
print(h)

[[0. 0.]
 [0. 0.]]
[[1. 1.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[0.14272851 0.188051  ]
 [0.61052347 0.994231  ]]
[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28]
[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4. ]
[1 1 1 2 2 2 3 3 3]


## Datatypes

Numpy array is collection of elements of same type. <br>
Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. 

Below is a list of all data types in NumPy and the characters used to represent them.<br>
<li>i - integer</li>
<li>b - boolean</li>
<li>u - unsigned integer</li>
<li>f - float</li>
<li>c - complex float</li>
<li>m - timedelta</li>
<li>M - datetime</li>
<li>O - object</li>
<li>S - string</li>
<li>U - unicode string</li>
<li>V - fixed chunk of memory for other type ( void )</li>

In [50]:
import numpy as np

x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype) 

x = np.array(['apple', 'banana', 'cherry'])
print(x.dtype)

int32
float64
int64
<U6


One can change the data type of an existing array, by making a copy of the array with the astype() method.

In [51]:
import numpy as np

arr = np.array([1.1, 2.1, 3.1])

newarr = arr.astype('i')

print(newarr)
print(newarr.dtype)

[1 2 3]
int32


## Array Iteration

In [61]:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
    print(x)

print()

for x in arr:
    for y in x:
        print(y)

[1 2 3]
[4 5 6]

1
2
3
4
5
6


Enumeration means mentioning sequence number of somethings one by one.<br>
Sometimes we require corresponding index of the element while iterating, the ndenumerate() method can be used for those usecases.

In [64]:
import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for idx, x in np.ndenumerate(arr):
    print(idx, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 3) 4
(1, 0) 5
(1, 1) 6
(1, 2) 7
(1, 3) 8


## Joining Numpy arrays

Joining means putting contents of two or more arrays in a single array.<br>
We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If axis is not explicitly passed, it is taken as 0.

In [65]:
import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr)

[1 2 3 4 5 6]


In [66]:
import numpy as np
arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

print(arr)

[[1 2 5 6]
 [3 4 7 8]]


Stacking is same as concatenation, the only difference is that stacking is done along a new axis.<br>
We can concatenate two 1-D arrays along the second axis which would result in putting them one over the other, ie. stacking.<br>
We pass a sequence of arrays that we want to join to the stack() method along with the axis. If axis is not explicitly passed it is taken as 0.

In [72]:
import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.stack((arr1, arr2), axis=1)

print(arr)
print()

arr = np.hstack((arr1, arr2))     # hstack() is used to stack along rows.
print(arr)   
print()

arr = np.vstack((arr1, arr2))     # vstack() is used tostack along columns.
print(arr)
print()

arr = np.dstack((arr1, arr2))
print(arr)

[[1 4]
 [2 5]
 [3 6]]

[1 2 3 4 5 6]

[[1 2 3]
 [4 5 6]]

[[[1 4]
  [2 5]
  [3 6]]]


## Splitting Numpy Arrays

Splitting is reverse operation of Joining.<br>
Joining merges multiple arrays into one and Splitting breaks one array into multiple.<br>
We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.

In [73]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)

print(newarr)

[array([1, 2]), array([3, 4]), array([5, 6])]


In [74]:
import numpy as np
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)

print(newarr)

[array([[1, 2],
       [3, 4]]), array([[5, 6],
       [7, 8]]), array([[ 9, 10],
       [11, 12]])]


In [76]:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
newarr = np.array_split(arr, 3, axis=1)

print(newarr[0])
print(newarr[1])
print(newarr[2])

[[ 1]
 [ 4]
 [ 7]
 [10]
 [13]
 [16]]
[[ 2]
 [ 5]
 [ 8]
 [11]
 [14]
 [17]]
[[ 3]
 [ 6]
 [ 9]
 [12]
 [15]
 [18]]


An alternate solution is using hsplit() opposite of hstack(). <br>
Similar alternates to vstack() and dstack() are available as vsplit() and dsplit().

## Searching in Arrays

In [77]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)

print(x)

(array([1, 3, 5, 7], dtype=int64),)


There is a method called searchsorted() which performs a binary search in the array, and returns the index where the specified value would be inserted to maintain the search order.

In [79]:
import numpy as np

arr = np.array([6, 7, 8, 9])
x = np.searchsorted(arr, 7)
y = np.searchsorted(arr, 7, side='right')  #Finds the index from right side

print(x)
print(y)

1
2


## Sorting Arrays

Sorting means putting elements in an ordered sequence.<br>
Ordered sequence is any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending.<br>
The NumPy ndarray object has a function called sort(), that will sort a specified array.

In [81]:
import numpy as np

arr = np.array([3, 2, 0, 1])
print(np.sort(arr))

[0 1 2 3]


In [82]:
#Sorting array alphabetically
import numpy as np

arr = np.array(['banana', 'cherry', 'apple'])
print(np.sort(arr))

['apple' 'banana' 'cherry']


Note: This method makes a copy of array and keeps original array unchanged !

In [84]:
#Sorting Boolean array
import numpy as np

arr = np.array([True, False, True])
print(np.sort(arr))

[False  True  True]


In [85]:
# Sorting 2-D array
import numpy as np

arr = np.array([[3, 2, 4], [5, 0, 1]])
print(np.sort(arr))

[[2 3 4]
 [0 1 5]]


## Filtering Arrays

Getting some elements out of an existing array and creating a new array out of them is called filtering.<br>
In NumPy, you filter an array using a boolean index list.<br>

If the value at an index is True that element is contained in the filtered array, if the value at that index is False that element is excluded from the filtered array.

In [86]:
import numpy as np

arr = np.array([41, 42, 43, 44])
x = [True, False, True, False]
newarr = arr[x]

print(newarr)

[41 43]


In [87]:
import numpy as np

arr = np.array([41, 42, 43, 44])
filter_arr = arr > 42
newarr = arr[filter_arr]

print(filter_arr)           # Prints only values higher than 42
print(newarr)

[False False  True  True]
[43 44]


## Numpy Random

NumPy offers the random module to work with random numbers.

In [88]:
from numpy import random

x = random.randint(100)     # Generate random number between 0 to 100
print(x)

90


In [89]:
from numpy import random

x = random.rand()        # Generate a random float from 0 to 1
print(x)

0.5129420711548857


In [90]:
from numpy import random

x=random.randint(100, size=(5))
print(x)

[86 63 77 54 71]


In [91]:
from numpy import random

x = random.rand(3, 5)
print(x)

[[0.58800422 0.25619647 0.76379938 0.31513524 0.25562615]
 [0.21117809 0.30891436 0.92197749 0.40746275 0.54493784]
 [0.20955588 0.50124695 0.36384013 0.20222456 0.49685332]]


In [92]:
from numpy import random

x = random.choice([3, 5, 7, 9])
print(x)

9


In [93]:
from numpy import random
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
random.shuffle(arr)               # shuffle method makes changes to original array

print(arr)

[2 1 5 4 3]


In [95]:
from numpy import random
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(random.permutation(arr))    # permutation method leave the original array unchanged

[2 4 1 3 5]


Random module of Numpy also contains various other method related to Data Analysis. <br>
For example Normal Distribution, Binomial Distribution, Poissions Distribution. <br>
These topics will be covered in Statistics and Probability. 

References: 
https://www.w3schools.com/python/numpy_intro.asp
https://cs231n.github.io/python-numpy-tutorial/#datatypes