# Today's Agenda
> ### What is an Array?
> ### What is NumPy?
> ### What is the use of NumPy?
> ### What is data analysis?
> ### What is the role of NumPy in Data analysis?

NumPy (**Num**erical **Py**thon) is an open-source Python library that’s used in almost every field of science and engineering.   
NumPy can be used to perform a wide variety of **mathematical operations** on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices.   

It comes with a great number of built-in functions.     

An **array** is defined as the collection of similar type of data items stored at contiguous memory locations.

NumPy is a Python library used for working with arrays. NumPy arrays are called **ndarray or N-dimensional arrays** and they store elements of the **same type** and size. It is known for its high-performance and provides efficient storage and data operations as arrays grow in size.

We use NumPy arrays that contain only **homogeneous elements**, i.e. elements having the same data type. This makes it more efficient at storing and manipulating the array. This difference becomes apparent when the array has a large number of elements, say thousands or millions. Also, with NumPy arrays, you can perform **element-wise operations**, something which is not possible using Python lists!  

An array of one dimension is called a Vector while having two dimensions is called a Matrix.    

NumPy is used to work with arrays. The array object in NumPy is called **ndarray**.  

We have lists that serve the purpose of arrays, but they are slow to process.

NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.

 NumPy’s array class is called ndarray. It is also known by the alias array. Note that numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality.

**Data Manipulation with numpy** 

You can perform standard mathematical operations on either individual elements or complete array.   
The range of functions covered is linear algebra, statistical operations, and other specialized mathematical operations.  
For our purpose, we need to know about ndarray and the range of mathematical functions that are relevant to our research purpose.   
If you already know languages such as C, Fortran, then you can integrate NumPy code with code written in these languages and can pass NumPy arrays seamlessly.   

### Possible application of NumPy package in research work are:

+ Algorithmic operations such as sorting, grouping and set operations
+ Performing repetitive operations on whole arrays of data without using loops
+ Data merging and alignment operations
+ Data indexing, filtering, and transformation on individual elements or whole arrays
+ Data summarization and descriptive statistics

![image.png](attachment:image.png)

## Installing NumPy
In order to check if NumPy is installed, go to Package Manager and type NumPy. You will get a list of packages with names closely matching to NumPy. For our purpose, we need to focus on package named numpy 1.xx. If the package is not installed, click on Install.

In [1]:
!pip install numpy



In [3]:
import numpy as np

In [4]:
np.array([1,2])

array([1, 2])

## Arange
Return evenly spaced values within a given interval.

In [5]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [6]:
np.arange(0,200,5)

array([  0,   5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60,
        65,  70,  75,  80,  85,  90,  95, 100, 105, 110, 115, 120, 125,
       130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190,
       195])

# Random
Numpy also has lots of ways to create random number arrays:   

### rand
Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).

In [7]:
arr=np.random.rand(2,3)

In [8]:
arr

array([[0.52256371, 0.54361387, 0.19842781],
       [0.39514855, 0.04777982, 0.11257907]])

### randint
Return random integers from low (inclusive) to high (exclusive).


In [9]:
np.random.randint(1,10,5) # 5 is shape

array([3, 2, 6, 3, 1])

In [10]:
np.random.randint(1,10,(2,4)) # (2,4) is a shape

array([[1, 3, 7, 3],
       [1, 8, 9, 7]])

In [11]:
np.random.randint(1,10,(2,4,2))

array([[[1, 6],
        [7, 1],
        [7, 3],
        [4, 8]],

       [[5, 3],
        [5, 9],
        [9, 6],
        [4, 7]]])

will create a 2 * 3 array of random numbers between 0 to 100.

In [12]:
np.random.rand(2,3)* 100

array([[85.92035941, 13.41595484, 52.90151702],
       [82.69192777, 67.72628318, 43.0275261 ]])

## randn
Return a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform:   

#### interval range from (-3 to +3 )

In [13]:
np.random.randn(5)

array([-0.15613449,  1.33736151, -1.45975312, -0.32004177,  0.82532058])

In [14]:
np.random.randn(50)

array([ 0.31571268, -1.22542268, -0.39381745,  1.28931271, -0.13619895,
        1.38431821,  1.69912809,  0.10678749,  1.53332964, -0.11134371,
        1.90495335, -0.74478344, -0.6876157 , -0.5730063 ,  0.58917469,
        1.77134867, -0.96854399, -1.06611483,  0.58127879,  0.98764244,
        0.3231903 ,  0.43081722, -1.12458036, -0.80456065,  0.72650518,
       -0.63447748,  0.56633248, -0.56879191,  1.01814463,  1.29851661,
        0.03366451,  0.14633258, -0.88850958, -0.36954798, -0.34587291,
        1.96327487, -0.12859703,  1.04279475, -0.44328203,  0.53892611,
        0.92562394, -0.43418593,  1.6787509 , -0.92759193,  0.53327487,
        0.23404255,  0.55063737,  0.10435727,  0.82617649, -1.3458434 ])

In [15]:
np.random.randn(5,3)

array([[ 0.11452468, -1.7285617 ,  1.47591338],
       [ 0.17792149,  0.46209334, -1.47561447],
       [-0.76410845,  0.4088038 , -1.2021352 ],
       [-1.06050008, -1.33599991, -0.50595993],
       [ 0.30764856,  1.95197072, -0.33868697]])

## To know the data type

In [17]:
arr = np.array([1.0,2.0,3.0,4.0,5.0])
print(arr)
print(type(arr))
print(arr.dtype)

[1. 2. 3. 4. 5.]
<class 'numpy.ndarray'>
float64


In [18]:
arr = np.array([1,2,3,4,5])
print(arr)
print(type(arr))
print(arr.dtype)

[1 2 3 4 5]
<class 'numpy.ndarray'>
int32


### linspace
Return evenly spaced numbers over a specified interval.

In [22]:
np.linspace(1,10,10)  #0 is start, 10 is stop (inclusive) 
                     # 10 is number of values

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [23]:
np.linspace(1,10,20)

array([ 1.        ,  1.47368421,  1.94736842,  2.42105263,  2.89473684,
        3.36842105,  3.84210526,  4.31578947,  4.78947368,  5.26315789,
        5.73684211,  6.21052632,  6.68421053,  7.15789474,  7.63157895,
        8.10526316,  8.57894737,  9.05263158,  9.52631579, 10.        ])

## Identity Matrics 

- idential are one rest are zeros

In [19]:
np.eye(3) # 3 rows and 3 columns 

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [20]:
I = np.eye(4)

In [21]:
I.dtype

dtype('float64')

## Basic Mathematical operations in Numpy

In [25]:
arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])

In [26]:
arr1 + arr2 

array([5, 7, 9])

In [27]:
arr3 = np.array([[1,2],[3,4]])

In [28]:
arr4 = np.array([[4,2],[3,8]])

In [29]:
arr3+arr4

array([[ 5,  4],
       [ 6, 12]])

In [30]:
arr1 - arr2 

array([-3, -3, -3])

In [31]:
arr1 * 2 # Multiplication 

array([2, 4, 6])

In [32]:
arr1 * 2 # Multiplication 

array([2, 4, 6])

In [33]:
arr1 / 2 # division 

array([0.5, 1. , 1.5])

In [34]:
arr1 / 2 # division 

array([0.5, 1. , 1.5])

In [35]:
arr1 // 2 # floor division

array([0, 1, 1], dtype=int32)

In [36]:
arr1 ** 2  # exponential 

array([1, 4, 9], dtype=int32)

In [37]:
A1 = np.array([[1,2],[3,4]])
B1 = np.array([[5,6],[7,8]])

## Matrix Multiplication

In [38]:
np.matmul(A1,B1)

array([[19, 22],
       [43, 50]])

In [39]:
A1.dot(B1)

array([[19, 22],
       [43, 50]])

In [40]:
A = np.array([1,2,3,4])
B = np.array([5,6,7,8])

In [41]:
A.dot(B)

70

In [42]:
arr = np.array([[1, 5, 6],
                [4, 7, 2],
                [3, 1, 9]])

In [43]:
print("max values in array",arr.max())
print("min values in array",arr.min())
print("sum values in array",arr.sum())

max values in array 9
min values in array 1
sum values in array 38


## axis =1 is used for row wise operation , axis = 0 is used for column wise operation

In [44]:
print("Row-wise maximum max values ",arr.max(axis= 1))
print("Row-wise maximum min values",arr.min(axis= 1))
print("Row-wise maximum sum values",arr.sum(axis= 1))

max values  [6 7 9]
min values [1 2 1]
sum values [12 13 13]


In [46]:
print("Column-wise max values ",arr.max(axis= 0))
print("Column-wisemin values ",arr.min(axis= 0))
print("Column-wise sum values ",arr.sum(axis= 0))

Column-wise max values  [4 7 9]
Column-wisemin values  [1 1 2]
Column-wise sum values  [ 8 13 17]


In [48]:
arr3 = np.array([1,8,6,4,3])

In [49]:
arr3.mean()

4.4

In [50]:
arr3.var()

5.84

In [51]:
arr3.std()

2.4166091947189146

In [52]:
arr = np.array([[1, 5, 6],
                [4, 7, 2],
                [3, 1, 9]])

In [53]:
print("mean",arr.mean(axis = 0))
print("std",arr.std(axis = 0))
print("variance",arr.var(axis = 0))

mean [2.66666667 4.33333333 5.66666667]
std [1.24721913 2.49443826 2.86744176]
variance [1.55555556 6.22222222 8.22222222]


## Row wise operations

In [54]:
print("mean",arr.mean(axis = 1))
print("std",arr.std(axis = 1))
print("variance",arr.var(axis = 1))

mean [4.         4.33333333 4.33333333]
std [2.1602469  2.05480467 3.39934634]
variance [ 4.66666667  4.22222222 11.55555556]


## Reshape

In [55]:
m1 = np.array([1,2,3,4,5,6,7,8])

In [56]:
m1.reshape(2,4)

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [57]:
m1.reshape(4,2)

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [58]:
m3 = np.array([[1,2],[3,4],[5,6],[7,8]])
print(m3.shape)
print(m3)

(4, 2)
[[1 2]
 [3 4]
 [5 6]
 [7 8]]


In [59]:
m3.reshape(2,4)

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

## Stacking Arrays

axis = 0 means Row wise (Vertical)   
axis = 1 means columns wise (horizontally) 

hstack() is used for horizontal stacking.. which is equivalent to concatenate(, axis=1)
Vstack() is used for Vertical stacking.. which is equivalent to concatenate(, axis=0)

In [63]:
arr1 = np.array([[1,2],[3,4]])   
arr2 = np.array([[10,20],[30,40]]) 

In [64]:
np.hstack((arr1,arr2))

array([[ 1,  2, 10, 20],
       [ 3,  4, 30, 40]])

In [65]:
arr5 = np.array([[4,3],[5,2]])
arr6 = np.array([[2,6],[1,3]])

In [66]:
np.vstack((arr5,arr6))

array([[4, 3],
       [5, 2],
       [2, 6],
       [1, 3]])

In [67]:
np.concatenate((arr1,arr2), axis=0)

array([[ 1,  2],
       [ 3,  4],
       [10, 20],
       [30, 40]])