## Numpy- The fundamental package for scientific computing with Python

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance.


In [1]:
#importing the packages
import numpy as np
import math

Suppose we want to calculate the marks obtained by the student in an yearly exam using  Student data with no.of hours he/she studied and no.of classes attened. The simple approach can be formulating the relationshio between the no.of hours studied and no.of classes attend as linear relation.

`marks_gained = W1 * no_of_hours + W2 * no_of_classes_attened`

Here were are expressing `marks_gained` as linear relation, that is the weighted sum of `no_of_hours` and `no_of_classes_attened`. It is not always necessar to be a linear relationship, can be non linear due to other factors.

Let the weights be W1 and W2 are based on statistical analysis of historical data.

In [27]:
W1, W2 = 0.3, 0.2

The given Student data
![table.PNG](attachment:table.PNG)

In [29]:
priya_no_of_hours_studied = 91
priya_no_of_classes_attended = 88

Using these varibles and substituting in the linear equation to predict the marks.

In [30]:
priya_marks_gained = W1 * priya_no_of_hours_studied + W2 * priya_no_of_classes_attended
priya_marks_gained

44.900000000000006

For more simpler method we an represent each student data in the form of a vector with `no_of_hours` and `no_of_classes_attened` and a seperate vector for weights. Then finally defining a function to calculate the marks.

In [31]:
priya = [91,88]
anil = [87,134]
ravali =[69,96]
vinay = [73,68]

In [32]:
weights = [0.3,0.2]

In [35]:
a = list(zip(priya,weights))
a

[(91, 0.3), (88, 0.2)]

In [38]:
def marks_gained(hours_studied,class_attended):
    marks=0
    z= zip(hours_studied,class_attended)
    for x,y in z:
        marks += x*y
    return marks

In [39]:
marks_gained(priya,weights)

44.900000000000006

In [40]:
marks_gained(anil,weights)

52.9

## Numpy - makes your work more easy

The calculation performed by the `marks_gained` (element-wise multiplication of two vectors and taking a sum of the results) is also called the dot product.

The Numpy library provides a built-in function to compute the dot product of two vectors. However, we must first convert the lists into Numpy arrays.

In [41]:
anil = np.array([87,134])
weights = np.array([0.3,0.2])

In [42]:
type(anil)

numpy.ndarray

In [43]:
type(weights)

numpy.ndarray

## Operations of Numpy

Element wise operation takes place between 2 arrays.

In [44]:
np.dot(anil,weights)

52.9

In [45]:
(anil*weights).sum()

52.9

In [46]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

In [48]:
print(x)
print(y)

[[1 2]
 [3 4]]
[[5 6]
 [7 8]]


In [47]:
x+y

array([[ 6,  8],
       [10, 12]])

In [52]:
np.add(x,y)

array([[ 6,  8],
       [10, 12]])

In [49]:
x-y

array([[-4, -4],
       [-4, -4]])

In [50]:
np.subtract(x,y)

array([[-4, -4],
       [-4, -4]])

In [53]:
x*y

array([[ 5, 12],
       [21, 32]])

In [55]:
np.multiply(x,y)

array([[ 5, 12],
       [21, 32]])

In [56]:
x/y

array([[0.2       , 0.33333333],
       [0.42857143, 0.5       ]])

In [57]:
np.divide(x,y)

array([[0.2       , 0.33333333],
       [0.42857143, 0.5       ]])

In [58]:
np.sqrt(x)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ]])

In [59]:
math.sqrt(x)

TypeError: only size-1 arrays can be converted to Python scalars

Math function only works for only scalars.

In [60]:
print(x)

[[1 2]
 [3 4]]


In [61]:
x = x.reshape((1,4))

In [62]:
x

array([[1, 2, 3, 4]])

In [63]:
w = np.array([2,3,4,1])

In [64]:
np.dot(w,x.T)

array([24])

In [65]:
v = np.array([[1,2],[3,4]])
v

array([[1, 2],
       [3, 4]])

In [69]:
np.sum(v)

10

In [68]:
np.sum(v, axis=1)  #sum of values with in lists

array([3, 7])

In [70]:
np.sum(v, axis=0) #sum of values across the list

array([4, 6])

In [72]:
v.T  #To get the transpose

array([[1, 3],
       [2, 4]])

## Numpu broadcasting

In [73]:
a2 = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])

In [74]:
a3 = np.array([[11, 12, 13, 14], 
                 [15, 16, 17, 18], 
                 [19, 11, 12, 13]])

In [76]:
a2+2 #adding a scalar

array([[ 3,  4,  5,  6],
       [ 7,  8,  9, 10],
       [11,  3,  4,  5]])

In [78]:
a2-3

array([[-2, -1,  0,  1],
       [ 2,  3,  4,  5],
       [ 6, -2, -1,  0]])

In [80]:
a2-a3 #element wise subraction

array([[-10, -10, -10, -10],
       [-10, -10, -10, -10],
       [-10, -10, -10, -10]])

In [79]:
a2/2

array([[0.5, 1. , 1.5, 2. ],
       [2.5, 3. , 3.5, 4. ],
       [4.5, 0.5, 1. , 1.5]])

In [82]:
ar = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])
ar

array([[1, 2, 3, 4],
       [5, 6, 7, 8],
       [9, 1, 2, 3]])

In [83]:
ar.shape

(3, 4)

In [84]:
ar2 = np.array([4,5,6,7])

In [85]:
ar2

array([4, 5, 6, 7])

In [86]:
ar2.shape

(4,)

In [87]:
ar + ar2

array([[ 5,  7,  9, 11],
       [ 9, 11, 13, 15],
       [13,  6,  8, 10]])

In [88]:
ar3 = np.array([7,8])

In [89]:
ar2+ar3

ValueError: operands could not be broadcast together with shapes (4,) (2,) 

In [18]:
#Creating a 1D array
a = np.array([1,3,4,6,7,8])
a

array([1, 3, 4, 6, 7, 8])

In [19]:
a.shape

(6,)

Let's creating a simple 2D array, which; as you can see is simply a list of lists. You must be wondering then, why dont we simply keep it that way then; as a list of lists? Because a 2D 'array' and its additional supporting functions/attribute are much needed for data handling and do not come with simple lists.

In [24]:
#Creating a 2D array
b=np.array([[1,2,3],[4,5,6]])
b

array([[1, 2, 3],
       [4, 5, 6]])

we can know the dimension of array with attribute shape

In [25]:
b.shape

(2, 3)

In [26]:
type(b)

numpy.ndarray

In [21]:
#3D array
c = np.array([
    [[12,44,66],
    [34,56,77]],
    [[86,89,88],
     [90,23,11]]])

In [22]:
c.shape

(2, 2, 3)

In [23]:
type(c)

numpy.ndarray

In [4]:
#accessing the elements from array
print(b)
b[0,0], b[0,1], b[1,0]

[[1 2 3]
 [4 5 6]]


(1, 2, 4)

Next we look at quick ways of creating some arrays with default values

In [5]:
#All zeros
np.zeros((2,2))

array([[0., 0.],
       [0., 0.]])

In [6]:
#All ones
np.ones((3,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [7]:
##All constants
np.full((3,2),math.pi)

array([[3.14159265, 3.14159265],
       [3.14159265, 3.14159265],
       [3.14159265, 3.14159265]])

In [9]:
#Identity matrix
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [14]:
#Random vector
help(np.random.rand)

Help on built-in function rand:

rand(...) method of numpy.random.mtrand.RandomState instance
    rand(d0, d1, ..., dn)
    
    Random values in a given shape.
    
    .. note::
        This is a convenience function for users porting code from Matlab,
        and wraps `random_sample`. That function takes a
        tuple to specify the size of the output, which is consistent with
        other NumPy functions like `numpy.zeros` and `numpy.ones`.
    
    Create an array of the given shape and populate it with
    random samples from a uniform distribution
    over ``[0, 1)``.
    
    Parameters
    ----------
    d0, d1, ..., dn : int, optional
        The dimensions of the returned array, must be non-negative.
        If no argument is given a single Python float is returned.
    
    Returns
    -------
    out : ndarray, shape ``(d0, d1, ..., dn)``
        Random values.
    
    See Also
    --------
    random
    
    Examples
    --------
    >>> np.random.rand(3,2)
    arra

In [16]:
#Random vector
#Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).
np.random.rand(5)

array([0.80765843, 0.67425004, 0.1843258 , 0.73121632, 0.67007076])

In [18]:
#Generates an array of shape, filled with random floats sampled from a univariate “normal” (Gaussian) distribution of mean 0 and variance 1
np.random.randn(3,2)

array([[-0.77461718, -0.70715012],
       [ 1.14973951, -1.55533349],
       [ 2.15318458,  0.84657131]])

In [19]:
#Returns array of numbers of specified size with given range of values
np.random.randint(high=10,low=1,size=(2,3))

array([[2, 9, 8],
       [7, 2, 1]])

In [21]:
#To create an array in sequence
x=np.arange(0,15)
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [22]:
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

In [23]:
#To create a sequence with equally spaced elements
np.linspace(start=2,stop=10,num=15)

array([ 2.        ,  2.57142857,  3.14285714,  3.71428571,  4.28571429,
        4.85714286,  5.42857143,  6.        ,  6.57142857,  7.14285714,
        7.71428571,  8.28571429,  8.85714286,  9.42857143, 10.        ])

To create an array of required shape from the existing array- Use np.random.choice()
- The output will be different for each run, we can use replace= True in parameters to get fixed values

In [28]:
np.random.choice(x,6)

array([ 7,  9, 13,  9,  4, 12])

In [29]:
np.random.choice(x,6,replace=False)

array([ 2,  0, 14, 11,  4,  8])

In [30]:
np.random.choice(['a','b','c'],9)

array(['c', 'c', 'b', 'a', 'c', 'b', 'b', 'b', 'a'], dtype='<U1')

In [31]:
np.random.choice(['a','b','c'],1000,p=[.3,.3,.4])

array(['b', 'c', 'c', 'c', 'a', 'a', 'b', 'c', 'c', 'c', 'a', 'b', 'a',
       'a', 'a', 'b', 'c', 'b', 'c', 'b', 'b', 'c', 'b', 'c', 'b', 'a',
       'c', 'c', 'b', 'a', 'b', 'c', 'c', 'b', 'b', 'c', 'c', 'a', 'b',
       'c', 'b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'a', 'b', 'a', 'b',
       'a', 'b', 'c', 'b', 'b', 'c', 'c', 'c', 'c', 'a', 'c', 'a', 'c',
       'c', 'c', 'a', 'a', 'c', 'a', 'a', 'c', 'c', 'b', 'a', 'a', 'b',
       'c', 'a', 'c', 'a', 'c', 'c', 'c', 'b', 'a', 'b', 'b', 'c', 'c',
       'c', 'b', 'a', 'a', 'c', 'a', 'b', 'b', 'b', 'c', 'b', 'c', 'a',
       'a', 'b', 'b', 'c', 'c', 'b', 'c', 'b', 'c', 'a', 'a', 'c', 'c',
       'c', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'b', 'c', 'b', 'b',
       'c', 'a', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'b', 'c', 'b', 'a',
       'a', 'a', 'b', 'a', 'c', 'c', 'b', 'b', 'c', 'a', 'a', 'b', 'a',
       'c', 'c', 'c', 'b', 'c', 'b', 'b', 'c', 'c', 'a', 'a', 'b', 'a',
       'c', 'b', 'a', 'c', 'c', 'a', 'b', 'a', 'b', 'c', 'b', 'b

## Dependence of subset of an array

One important thing that you need to keep in mind is that a subset of a numpy array doesnt become an independent array . If you make any changes in that , they reflect in the parent array. you need to use function copy to make an independent array. Lets understand that with an example

In [2]:
#Defining 'b' which is the part of 'a',not an independent array
a=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(a)
b=a[:2,1:3]
print('this is b','\n',b)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
this is b 
 [[2 3]
 [6 7]]


In [3]:
#Creating an independent array using copy
c = a[:2,1:3].copy()
print(c)

[[2 3]
 [6 7]]


Lets look at element a[0,1] , this is same as b[0,0]

In [4]:
print(a)
print(a[0, 1])
print(b[0,0])

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
2
2


In [5]:
#Notice that we are not changing 'a' here
b[0,0] = 77
print(a)

[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


You can see that although we changed value for b[0,0] but that ended up changing value for a[0,1] too. now lets look at a[0,2] which is same as c[0,1] . lets see if chaning c[0,1] has any effect on a[0,2].

In [6]:
print(a[0,2])
print(c[0,1])

3
3


In [7]:
c[0,1] = 99
print(a)
print(c)

[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[[ 2 99]
 [ 6  7]]


## Accessing arrays with indices
Its not necessary that when you are accessing elements of array; index values have to be continuous . They can be any number as long as they do not go out of range of exsiting element positions.

In [8]:
a = np.array([[1,2],[3,4],[5,6]])
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [9]:
a[[0,1,2],[0,1,0]]

array([1, 4, 5])

In [10]:
a = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
a

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [12]:
b = np.array([0,2,0,1])
c = np.arange(4)
print(b)
print(c)

[0 2 0 1]
[0 1 2 3]


In [13]:
a[c,b]

array([ 1,  6,  7, 11])

Using index you can access elements as well as modify them

In [14]:
a[c,b] +=10
a

array([[11,  2,  3],
       [ 4,  5, 16],
       [17,  8,  9],
       [10, 21, 12]])

## Conditional accessing

In [2]:
a = np.array([[1,2],[3,4],[5,6]])
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [3]:
a>2

array([[False, False],
       [ True,  True],
       [ True,  True]])

We can use comparison expressions directly for access. Result is only those elements for which the expression evaluates to True.

In [5]:
print(a[a>2])
print(a[a>2].shape)  # The result is 1D array

[3 4 5 6]
(4,)


In [6]:
#For multiple conditions
a[(a>2)|(a<5)]

array([1, 2, 3, 4, 5, 6])

In [8]:
a[(a>2) & (a<5)]

array([3, 4])

In [12]:
a[[True,True,False],:]

array([[1, 2],
       [3, 4]])

In [14]:
#subsetting boolean values
x= np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(x)
x[:,[True,True,False,False]]

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]])

In [15]:
x[[True,True,False],:]

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [16]:
x[:,0]>=5

array([False,  True,  True])

In [17]:
x[x[:,0]>=5,:]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

## Benefits of using Numpy arrays

Numpy arrays offer the following benefits over Python lists for operating on numerical data:

- Ease of use: You can write small, concise, and intuitive mathematical expressions like (priya * weights).sum() rather than using loops & custom functions like marks_gained.
- Performance: Numpy operations and functions are implemented internally in C++, which makes them much faster than using Python statements & loops that are interpreted at runtime.

Learn more from documentation: https://numpy.org/devdocs/user/quickstart.html