<a href="https://colab.research.google.com/github/MayureshOP-max/DataAnalytics/blob/main/Self_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*NumPy is the fundamental library for scientific computing with Python. NumPy is centered around a powerful N-dimensional array object, and it also contains useful linear algebra, Fourier transform, and random number functions.*

In [None]:
#IMPORTING THE LIBRARY
import numpy as np

In [None]:
#The zeros function creates an array containing any number of zeros:
np.zeros(5)   #5 is size of array to be created: 5 times zeros

array([0., 0., 0., 0., 0.])

In [None]:
#It's just as easy to create a 2D array (ie. a matrix) by providing a tuple with the
#desired number of rows and columns. For example, here's a 3x4 matrix:
np.zeros((3,4))   #3 rows and 4 columns

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

## Some vocabulary

* In NumPy, each dimension is called an **axis**.
* The number of axes is called the **rank**.
    * For example, the above 3x4 matrix is an array of rank 2 (it is 2-dimensional).
    * The first axis has length 3, the second has length 4.
* An array's list of axis lengths is called the **shape** of the array.
    * For example, the above matrix's shape is `(3, 4)`.
    * The rank is equal to the shape's length.
* The **size** of an array is the total number of elements, which is the product of all axis lengths (eg. 3*4=12)

In [None]:
#more functions
a = np.zeros((3,4)) #storing a matrix of 3x4 zeros into variable named "a"
a   #same as : print(a)   ,   however if used multiple times only last used is compiled

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
a.shape  #gives shape of the matrix: colums by rows

(3, 4)

In [None]:
a.ndim   #same as len(a.shape)

2

In [None]:
a.size  #size of array/ matrix

12

## N-dimensional arrays
You can also create an N-dimensional array of arbitrary rank. For example, here's a 3D array (rank=3), with shape `(2,3,4)`:

In [None]:
np.zeros((2,3,4))

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

## Array type
NumPy arrays have the type `ndarray`s:

In [None]:
type(np.zeros((3,4)))

numpy.ndarray

## `np.ones`
Many other NumPy functions create `ndarrays`.

Here's a 3x4 matrix full of ones:

In [None]:
#Many other NumPy functions create ndarrays.
np.ones((3,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

## `np.full`
Creates an array of the given shape initialized with the given value. Here's a 3x4 matrix full of `π`.

In [None]:
np.full((3,4),1)  #,1 is value you want to be in the matrix
np.full((3,4),np.pi)    #np.pi fills matrix by value of pi

array([[3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265]])

## `np.empty`
An uninitialized 2x3 array (its content is not predictable, as it is whatever is in memory at that point):

In [None]:
np.empty((2,3))

array([[4.87205764e-310, 0.00000000e+000, 4.87089593e-310],
       [6.89666518e-310, 1.34394500e+219, 3.28094696e-085]])

## np.array
Of course you can initialize an `ndarray` using a regular python array. Just call the `array` function:

In [None]:
np.array([[1,2,3,4],[5,6,7,8]])

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

## `np.arange`
You can create an `ndarray` using NumPy's `arange` function, which is similar to python's built-in `range` function:

In [None]:
np.arange(1,5)  #creates array of 1 to 4 : 5 is excluded : last value is excluded and always used value+1

array([1, 2, 3, 4])

In [None]:
#works with floats as well
np.arange(0.5,5.5)

array([0.5, 1.5, 2.5, 3.5, 4.5])

In [None]:
#you can provide a step parameter
np.arange(1,5,0.5)

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [None]:
# However, when dealing with floats, the exact number of elements in the array is not always predictible
print(np.arange(0,5/3,1/3))
print(np.arange(0,5/3,0.333333333))
print(np.arange(0,5/3,0.333333334))

[0.         0.33333333 0.66666667 1.         1.33333333 1.66666667]
[0.         0.33333333 0.66666667 1.         1.33333333 1.66666667]
[0.         0.33333333 0.66666667 1.         1.33333334]


## `np.linspace`
For this reason, it is generally preferable to use the `linspace` function instead of `arange` when working with floats. The `linspace` function returns an array containing a specific number of points evenly distributed between two values (note that the maximum value is *included*, contrary to `arange`):

In [None]:
print(np.linspace(0,5/3,6))

[0.         0.33333333 0.66666667 1.         1.33333333 1.66666667]


# Array data
## `dtype`
NumPy's `ndarray`s are also efficient in part because all their elements must have the same type (usually numbers).
You can check what the data type is by looking at the `dtype` attribute:

In [None]:
c =np.arange(1,5)
c.dtype

dtype('int64')

In [None]:
c =np.arange(1.0,5.0)
c.dtype

dtype('float64')

In [None]:
# Instead of letting NumPy guess what data type to use,
#  you can set it explicitly when creating an array by setting the dtype parameter:
d = np.arange(1,5, dtype =np.complex64)
print(d.dtype , d)

complex64 [1.+0.j 2.+0.j 3.+0.j 4.+0.j]


Available data types include `int8`, `int16`, `int32`, `int64`, `uint8`|`16`|`32`|`64`, `float16`|`32`|`64` and `complex64`|`128`. Check out [the documentation](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html) for the full list.

## `itemsize`
The `itemsize` attribute returns the size (in bytes) of each item:

In [None]:
e = np.arange(1,5)
e.itemsize

8

## `data` buffer
An array's data is actually stored in memory as a flat (one dimensional) byte buffer. It is available *via* the `data` attribute (you will rarely need it, though).

In [None]:
f = np.array([[1,2],[1000, 2000]], dtype=np.int32)
f.data

<memory at 0x7ef4880d1220>

# Reshaping an array
## In place
Changing the shape of an `ndarray` is as simple as setting its `shape` attribute. However, the array's size must remain the same.

In [None]:
g = np.arange(24)
print(g)
print("Rank: ", g.ndim)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Rank:  1


In [None]:
g.shape = (6, 4)
print(g)
print("Rank: ", g.ndim)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
Rank:  2


In [None]:
g.shape = (2,3,4)
print(g)
print("Rank: ", g.ndim)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
Rank:  3


## `reshape`
The `reshape` function returns a new `ndarray` object pointing at the *same* data. This means that modifying one array will also modify the other.

In [None]:
g2 = g.reshape(4,6)
print(g2)
print("Rank: ", g2.ndim)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
Rank:  2


In [None]:
#Set item at row 1, col 2 to 999 (more about indexing below).

g2[1,2] = 999
g2

array([[  0,   1,   2,   3,   4,   5],
       [  6,   7, 999,   9,  10,  11],
       [ 12,  13,  14,  15,  16,  17],
       [ 18,  19,  20,  21,  22,  23]])

In [None]:
#The corresponding element in g has been modified.
g

array([[[  0,   1,   2,   3],
        [  4,   5,   6,   7],
        [999,   9,  10,  11]],

       [[ 12,  13,  14,  15],
        [ 16,  17,  18,  19],
        [ 20,  21,  22,  23]]])

## `ravel`
Finally, the `ravel` function returns a new one-dimensional `ndarray` that also points to the same data:

In [None]:
g.ravel()

array([  0,   1,   2,   3,   4,   5,   6,   7, 999,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23])

# Arithmetic operations
All the usual arithmetic operators (`+`, `-`, `*`, `/`, `//`, `**`, etc.) can be used with `ndarray`s. They apply *elementwise*:

In [None]:
a = np.array([12,45,75,23,87])
b = np.array([5,4,2,3,1])

print("a+b = ",a+b)
print("a-b = ",a-b)
print("a*b = ",a*b)
print("a/b = ",a/b)
print("a//b = ",a//b)
print("a%b = ",a%b)
print("a**b = ",a**b)

a+b =  [17 49 77 26 88]
a-b =  [ 7 41 73 20 86]
a*b =  [ 60 180 150  69  87]
a/b =  [ 2.4        11.25       37.5         7.66666667 87.        ]
a//b =  [ 2 11 37  7 87]
a%b =  [2 1 1 2 0]
a**b =  [ 248832 4100625    5625   12167      87]


Note that the multiplication is *not* a matrix multiplication. We will discuss matrix operations below.

The arrays must have the same shape. If they do not, NumPy will apply the *broadcasting rules*.

# Broadcasting

In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called *broadcasting* rules:

## First rule
*If the arrays do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match.*

In [None]:
h = np.arange(5).reshape(1,1,5)
h

array([[[0, 1, 2, 3, 4]]])

In [None]:
#Now let's try to add a 1D array of shape (5,) to this 3D array of shape (1,1,5).
#  Applying the first rule of broadcasting!
h + [10,20,30,40,50] # same as: h + [[[10, 20, 30, 40, 50]]]

array([[[10, 21, 32, 43, 54]]])

## Second rule
*Arrays with a 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is repeated along that dimension.*

In [None]:
k = np.arange(6).reshape(2, 3)
k

array([[0, 1, 2],
       [3, 4, 5]])

In [None]:
# Let's try to add a 2D array of shape (2,1) to this 2D ndarray of shape (2, 3).
#  NumPy will apply the second rule of broadcasting:

k+[[100],[200]]  # same as: k + [[100, 100, 100], [200, 200, 200]]

array([[100, 101, 102],
       [203, 204, 205]])

In [None]:
# Combining rules 1 & 2, we can do this:
k + [100,200,300] # after rule 1: [[100, 200, 300]], and after rule 2: [[100, 200, 300], [100, 200, 300]]

array([[100, 201, 302],
       [103, 204, 305]])

In [None]:
k + 1000  # same as: k + [[1000, 1000, 1000], [1000, 1000, 1000]]

array([[1000, 1001, 1002],
       [1003, 1004, 1005]])

## Third rule
*After rules 1 & 2, the sizes of all arrays must match.*

In [None]:
try:
  k + [33,44]
except ValueError as e:
  print(e)

operands could not be broadcast together with shapes (2,3) (2,) 


Broadcasting rules are used in many NumPy operations, not just arithmetic operations, as we will see below.
For more details about broadcasting, check out [the documentation](https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html).

## Upcasting
When trying to combine arrays with different `dtype`s, NumPy will *upcast* to a type capable of handling all possible values (regardless of what the *actual* values are).

In [None]:
k1 = np.arange(0,5,dtype = np.uint8)
print(k1.dtype,k1)

uint8 [0 1 2 3 4]


In [None]:
k2 = k1+ np.array([5,6,7,8,9], dtype = np.int8)
print(k2.dtype, k2)

int16 [ 5  7  9 11 13]


In [None]:
# Note that int16 is required to represent all possible int8 and uint8 values (from -128 to 255),
# even though in this case a uint8 would have sufficed.
k3 = k1+1.5
print(k3.dtype,k3)

float64 [1.5 2.5 3.5 4.5 5.5]


# Conditional operators

In [None]:
#The conditional operators also apply elementwise:

m = np.array([20,-5,34,57])
m<[12,34,2,567]

array([False,  True, False,  True])

In [None]:
#And using broadcasting
m<25 #same as m<[25,25,25,25]

array([ True,  True, False, False])

In [None]:
#This is most useful in conjunction with boolean indexing (discussed below).

m[m<25]

array([20, -5])

# Mathematical and statistical functions

Many mathematical and statistical functions are available for `ndarray`s.

## `ndarray` methods
Some functions are simply `ndarray` methods, for example:

In [None]:
a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])
print(a)
print("mean: ", a.mean())

[[-2.5  3.1  7. ]
 [10.  11.  12. ]]
mean:  6.766666666666667


In [None]:
#Note that this computes the mean of all elements in the ndarray, regardless of its shape.
# Here are a few more useful ndarray methods:

for func in (a.min, a.max, a.sum, a.prod, a.std, a.var):
    print(func.__name__, "=", func())

In [None]:
#These functions accept an optional argument axis which lets you ask for the operation to be
# performed on elements along the given axis. For example:

c = np.arange(24).reshape(2,3,4)
c

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [None]:
c.sum(axis=0)  #sum across matrices

array([[12, 14, 16, 18],
       [20, 22, 24, 26],
       [28, 30, 32, 34]])

In [None]:
c.sum(axis=1)  #sum across rows

array([[12, 15, 18, 21],
       [48, 51, 54, 57]])

In [None]:
#You can also sum over multiple axes:
c.sum(axis =(0,2))

array([ 60,  92, 124])

In [None]:
0+1+2+3 + 12+13+14+15, 4+5+6+7 + 16+17+18+19, 8+9+10+11 + 20+21+22+23

(60, 92, 124)

## Universal functions
NumPy also provides fast elementwise functions called *universal functions*, or **ufunc**. They are vectorized wrappers of simple functions. For example `square` returns a new `ndarray` which is a copy of the original `ndarray` except that each element is squared:

In [None]:
a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])
np.square(a)

array([[  6.25,   9.61,  49.  ],
       [100.  , 121.  , 144.  ]])

In [None]:
#Here are a few more useful unary ufuncs:

print("original ndarray")
print(a)

for func in (np.abs,np.sqrt,np.exp,np.log,np.sign,np.ceil,np.modf,np.isnan,np.cos):
  print("\n", func.__name__)
  print(func(a))

original ndarray
[[-2.5  3.1  7. ]
 [10.  11.  12. ]]

 absolute
[[ 2.5  3.1  7. ]
 [10.  11.  12. ]]

 sqrt
[[       nan 1.76068169 2.64575131]
 [3.16227766 3.31662479 3.46410162]]

 exp
[[8.20849986e-02 2.21979513e+01 1.09663316e+03]
 [2.20264658e+04 5.98741417e+04 1.62754791e+05]]

 log
[[       nan 1.13140211 1.94591015]
 [2.30258509 2.39789527 2.48490665]]

 sign
[[-1.  1.  1.]
 [ 1.  1.  1.]]

 ceil
[[-2.  4.  7.]
 [10. 11. 12.]]

 modf
(array([[-0.5,  0.1,  0. ],
       [ 0. ,  0. ,  0. ]]), array([[-2.,  3.,  7.],
       [10., 11., 12.]]))

 isnan
[[False False False]
 [False False False]]

 cos
[[-0.80114362 -0.99913515  0.75390225]
 [-0.83907153  0.0044257   0.84385396]]


  print(func(a))
  print(func(a))


## Binary ufuncs
There are also many binary ufuncs, that apply elementwise on two `ndarray`s.  Broadcasting rules are applied if the arrays do not have the same shape:

In [None]:
a = np.array([1, -2, 3, 4])
b = np.array([2, 8, -1, 7])
np.add(a,b)  #same as : a + b

array([ 3,  6,  2, 11])

In [None]:
np.greater(a,b)

array([False, False,  True, False])

In [None]:
np.maximum(a,b)

array([2, 8, 3, 7])

In [None]:
np.copysign(a,b)

array([ 1.,  2., -3.,  4.])

# Array indexing
## One-dimensional arrays
One-dimensional NumPy arrays can be accessed more or less like regular python arrays: