## Numpy

#### introduction to Numpy

Numpy is the most basic and a powerful package for working with data in python.

If you are going to work on data analysis or machine learning projects, then having a solid understanding of numpy is nearly mandatory.

Because other packages for data analysis (like pandas) is built on top of numpy and the scikit-learn package which is used to build machine learning applications works heavily with numpy as well.

So what does numpy provide?

At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

You might wonder, ‘I can store numbers and other objects in a python list itself and do all sorts of computations and manipulations through list comprehensions, for-loops etc. What do I need a numpy array for?’

Well, there are very significant advantages of using numpy arrays overs lists.

To understand this, let’s first see how to create a numpy array.

### how to create numpy array

In [70]:
import numpy as np

In [71]:
list1 = [0,1,2,3,4,5,6,7]
arry = np.array(list1)

In [72]:
arry

array([0, 1, 2, 3, 4, 5, 6, 7])

#### key diffrence between array and list is array is designed to handle vectorized operation and python list is not

In [73]:
## lets see i want to add 2 in whole list 
#list1+2   ### error
## but it is possible in narray
arry+2

array([2, 3, 4, 5, 6, 7, 8, 9])

In [74]:
## once numpy array is created you cant extend the size of numpy array bt it si possible in list

In [75]:
## create a 2d array from list of list
list1 = [[1,2,3] ,[4,5,6] ,[7,8,9]]
arr2d = np.array(list1)

In [76]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

##### You may also specify the datatype by setting the dtype argument. Some of the most commonly used numpy dtypes are: 'float', 'int', 'bool', 'str' and 'object'.

###### To control the memory allocations you may choose to use one of ‘float32’, ‘float64’, ‘int8’, ‘int16’ or ‘int32

In [77]:
## create a float 2d array
print(arr2d.astype("float"))
print(arr2d.astype("int"))
print(arr2d.astype("str"))
print(arr2d.astype("bool"))


[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[['1' '2' '3']
 ['4' '5' '6']
 ['7' '8' '9']]
[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]


In [78]:
##crate an object arry which can hold int and string together
np.array([1,"durgeshwar"] , dtype = "object")

array([1, 'durgeshwar'], dtype=object)

In [79]:
## finally you can convert array to python list also
print(np.array([1,"durgeshwar"]).tolist())
arr2d.tolist()


['1', 'durgeshwar']


[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

#### so diffrence between list and array
##### Arrays support vectorised operations, while lists don’t.
###### Once an array is created, you cannot change its size. You will have to create a new array or overwrite the existing one.
###### Every array has one and only one dtype. All items in it should be of that dtype.
###### An equivalent numpy array occupies much less space than a python list of lists

## How to inspect the size and shape of a numpy array?

In [80]:
## create a 2d array with 3 columns and 4 rows
arr = np.array([[1,2,3] ,[4,5,6] ,[6,7,8] , [9 ,10,11]] , dtype = 'float')

In [81]:
list2 = [[1,2,3] ,[4,5,6] ,[6,7,8] , [9 ,10,11]]
arr

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

In [82]:
print(arr.shape)
print(arr.size)
print(arr.dtype)
print(arr.ndim)

(4, 3)
12
float64
2


## how to extract specific items in an array

In [83]:
arr

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

In [84]:
## extract two rows and two columns
arr[:2 ,:2]

array([[1., 2.],
       [4., 5.]])

In [85]:
#list2[:2 , :2]  ### error

In [86]:
## numpy array suport boolean indexing also
b = arr >4

In [87]:
b

array([[False, False, False],
       [False,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [88]:
arr[b]

array([ 5.,  6.,  6.,  7.,  8.,  9., 10., 11.])

## how to reverse the rows and whole arrays
Reversing an array works like how you would do with lists, but you need to do for all the axes (dimensions) if you want a complete reversal.



In [89]:
arr[::-1 ,::-1]

array([[11., 10.,  9.],
       [ 8.,  7.,  6.],
       [ 6.,  5.,  4.],
       [ 3.,  2.,  1.]])

## how to represent missing values and infinte
Missing values can be represented using np.nan object, while np.inf represents infinite. Let’s place some in arr2d.



In [90]:
## insert a nan and inf
arr[1,1] = np.nan
arr[1,2] = np.inf
arr

array([[ 1.,  2.,  3.],
       [ 4., nan, inf],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

In [91]:
## replace nan inf with -1\
missing_values = np.isnan(arr) | np.isinf(arr)
arr[missing_values] = -1

In [92]:
arr

array([[ 1.,  2.,  3.],
       [ 4., -1., -1.],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

In [93]:
missing_values

array([[False, False, False],
       [False,  True,  True],
       [False, False, False],
       [False, False, False]])

## how to compute mean , max , min of array

In [94]:
print("mean" , arr.mean())
print("max" , arr.max())
print("min" , arr.min())

mean 4.916666666666667
max 11.0
min -1.0


### However, if you want to compute the minimum values row wise or column wise, use the "np.amin" version instead.

In [95]:
## Row wise and Column wise minimum
print("column wise minimum :" ,np.amin(arr , axis = 0))
print("Rows wise minimum :" , np.amin(arr , axis = 1))

column wise minimum : [ 1. -1. -1.]
Rows wise minimum : [ 1. -1.  6.  9.]


In [96]:
## Row wise and Column wise max
print("column wise max :" ,np.amax(arr , axis = 0))
print("Rows wise max :" , np.amax(arr , axis = 1))

column wise max : [ 9. 10. 11.]
Rows wise max : [ 3.  4.  8. 11.]


## how to create  a new array using existing array
If you just assign a portion of an array to another array, the new array you just created actually refers to the parent array in memory.

That means, if you make any changes to the new array, it will reflect in the parent array as well.

So to avoid disturbing the parent array, you need to make a copy of it using copy(). All numpy arrays come with the copy() method.

In [64]:
arr2 = arr[:-2 , :-2]
arr2[:1 , :2]  =100
arr

array([[100.,   2.,   3.],
       [  4.,  -1.,  -1.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.]])

In [99]:
## so use the copy()
arr2 = arr.copy()
#arr2[:1,:1] = 100
arr

array([[ 1.,  2.,  3.],
       [ 4., -1., -1.],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

In [100]:
arr2

array([[ 1.,  2.,  3.],
       [ 4., -1., -1.],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

## Reshaping and Flattening Multidimensional arrays
There are 2 popular ways to implement flattening. That is using the flatten() method and the other using the ravel() method.

##### The difference between ravel and flatten is, the new array created using ravel is actually a reference to the parent array. So, any changes to the new array will affect the parent as well. But is memory efficient since it does not create a copy.

In [101]:
arr.flatten()

array([ 1.,  2.,  3.,  4., -1., -1.,  6.,  7.,  8.,  9., 10., 11.])

In [102]:
b1 = arr.flatten()
b1[0] = 100 ## changing b1 does not afffect the parent array

In [105]:
arr1 = arr
arr

array([[ 1.,  2.,  3.],
       [ 4., -1., -1.],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

In [106]:
## using ravel()
b2  = arr1.ravel()
b2[0] = 100
arr1

array([[100.,   2.,   3.],
       [  4.,  -1.,  -1.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.]])

## How to create sequences, repetitions and random numbers using numpy?
###### The np.arange function comes handy to create customised number sequences as ndarray.

In [107]:
print(np.arange(10))

[0 1 2 3 4 5 6 7 8 9]


In [108]:
## 0to 10 with step 2
print(np.arange(0,10,2))

[0 2 4 6 8]


In [110]:
## 10 to 1 in decreasing oreder
print(np.arange(10 ,0 , -1))

[10  9  8  7  6  5  4  3  2  1]


You can set the starting and end positions using np.arange. But if you are focussed on the number of items in the array you will have to manually calculate the appropriate step value.

Say, you want to create an array of exactly 10 numbers between 1 and 50, Can you compute what would be the step value?

Well, I am going to use the np.linspace instead.

In [116]:
## start at 1 and end with 50
np.linspace(start = 5 , stop = 70 , num = 20 , dtype = 'int')

array([ 5,  8, 11, 15, 18, 22, 25, 28, 32, 35, 39, 42, 46, 49, 52, 56, 59,
       63, 66, 70])

Similar to np.linspace, there is also np.logspace which rises in a logarithmic scale. In np.logspace, the given start value is actually base^start and ends with base^stop, with a default based value of 10.

In [118]:
np.logspace(start =1 ,stop = 40 , num = 10 ,base  = 10)

array([1.00000000e+01, 2.15443469e+05, 4.64158883e+09, 1.00000000e+14,
       2.15443469e+18, 4.64158883e+22, 1.00000000e+27, 2.15443469e+31,
       4.64158883e+35, 1.00000000e+40])

In [121]:
## np.zeros or np.ones let u create a desired matrix 
print(np.zeros([3,3]))
print(np.ones([3,4]))

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [122]:
import random

In [132]:
random.randint(10,20)

13

In [136]:
random.randrange(11 ,30,2)

25

## how to create repeating sequences
##### np.tile will repeat a whole list or array n times. Whereas, np.repeat repeats each item n times.

In [137]:
a = [1,2,3]
print("Tile" , np.tile(a,3))

Tile [1 2 3 1 2 3 1 2 3]


In [138]:
print("repeat" , np.repeat(a,3))

repeat [1 1 1 2 2 2 3 3 3]


## how to get unique counts and items 

In [151]:
np.random.seed(5)
arr = np.random.randint(0,10 ,size = 7)
arr

array([3, 6, 6, 0, 9, 8, 4])

In [152]:
## get the unique item and there counts

In [157]:
uniq , counts = np.unique(arr,return_counts = True)

In [158]:
uniq

array([0, 3, 4, 6, 8, 9])

In [159]:
counts

array([1, 1, 1, 2, 1, 1], dtype=int64)

## Array Math

In [162]:
x = np.array([[1,2] , [3,4]] , dtype = 'float64')
y = np.array([[5,6] ,  [7,8]]  , dtype = 'float64')

In [170]:
## element wise addition
x+y

array([[ 6.,  8.],
       [10., 12.]])

In [171]:
np.add(x,y)

array([[ 6.,  8.],
       [10., 12.]])

In [172]:
## element wise diffrence
x-y

array([[-4., -4.],
       [-4., -4.]])

In [174]:
np.subtract(x,y)

array([[-4., -4.],
       [-4., -4.]])

In [175]:
## element wise multiplication
x*y

array([[ 5., 12.],
       [21., 32.]])

In [176]:
np.multiply(x , y)

array([[ 5., 12.],
       [21., 32.]])

In [177]:
## elemrnt wise division
x/y

array([[0.2       , 0.33333333],
       [0.42857143, 0.5       ]])

In [178]:
np.divide(x, y)

array([[0.2       , 0.33333333],
       [0.42857143, 0.5       ]])

In [179]:
## element wise squareroot
np.sqrt(x)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ]])

In [180]:
np.sqrt(y)

array([[2.23606798, 2.44948974],
       [2.64575131, 2.82842712]])

In [182]:
## matrix multiplication not element wise 

In [183]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6] , [7,8]])

In [184]:
v = np.array([9,10])
w = np.array([11, 12])

In [185]:
np.dot(v,w)

219

In [186]:
v.dot(w)

219

In [187]:
np.dot(x,v)

array([29, 67])

In [188]:
x.dot(v)

array([29, 67])

In [189]:
x.dot(y)

array([[19, 22],
       [43, 50]])

In [190]:
## Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum:

In [191]:
np.sum(x)

10

In [192]:
np.sum(x ,axis = 0)

array([4, 6])

In [193]:
np.sum(x, axis = 1)

array([3, 7])

In [194]:
## transpose of a matrix

In [195]:
x

array([[1, 2],
       [3, 4]])

In [196]:
x.T

array([[1, 3],
       [2, 4]])

In [199]:
t = np.array([1,2,3])

In [200]:
t.T

array([1, 2, 3])

## Broadcasting
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

In [201]:
x = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])

In [202]:
v = np.array([1,0,1])

In [203]:
x+v

array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

In [204]:
y = np.empty_like(x)

In [209]:
for i in range(4):
    y[i, :] = x[i ,:] +v

In [210]:
y

array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

In [211]:
## using Tile

In [218]:
v = np.tile(v , (4,1))

In [219]:
x+v

array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

## reshape of Array

In [230]:
v = np.array([1,2,3])
w = np.array([4,5])

In [231]:
np.reshape(v , (3,1))*w

array([[ 4,  5],
       [ 8, 10],
       [12, 15]])

In [235]:
w.shape

(2,)

In [233]:
x = np.array([[1,2,3], [4,5,6]])
print(x + np.reshape(w, (2, 1)))


[[ 5  6  7]
 [ 9 10 11]]


## argmax and argmin .........return index of max and min values

In [238]:
a = np.array([[1,2,4,7], [9,88,6,45], [9,76,3,4]])
a
#array([[ 1,  2,  4,  7],
       #[ 9, 88,  6, 45],
       #[ 9, 76,  3,  4]])
a.shape
#(3, 4)
a.size
#12
np.argmax(a)
#5
np.argmax(a,axis=0)
#array([1, 1, 1, 1])
np.argmax(a,axis=1)
#array([3, 1, 1])
np.argmin(a)
#0
np.argmin(a,axis=0)
#array([0, 0, 2, 2])
np.argmin(a,axis=1)
#array([0, 2, 2])

array([0, 2, 2], dtype=int64)