![MicrosoftTeams-image.png](attachment:MicrosoftTeams-image.png)

<span style="color:#0000A8;font-size: 30px; font-family: Arial; font-weight: bold;">Data Analytics - Crash Course for Beginners</span>

# Data Analysis Using NumPy Library

# Learning Objectives

- Introduction to NumPy
- Installation of NumPy
- NumPy Datatypes
- NumPy Array
- I/O with NumPy
- Indexing and Slicing
- Broadcasting
- Statistical Functions of NumPy


# Introduction To NumPy¶

- Stands for Numerical Python
- It is an open-source Python library used for working with arrays.
- It also includes functions for working with linear algebra, the Fourier transform, and matrices.
- It is a completely open-source and free project.

![image.png](attachment:image.png)

# Installation Instructions

It is highly recommended you install Python using the Anaconda distribution to make sure all underlying dependencies (such as Linear Algebra libraries) all sync up with the use of a conda install. If you have Anaconda, install NumPy by going to your terminal or command prompt and typing:

conda install numpy
If you do not have Anaconda and can not install it, please refer to Numpy's official documentation on various installation instructions.

![image.png](attachment:image.png)

# Using NumPy
Once you've installed NumPy you can import it as a library:

In [1]:
import numpy as np

# NumPy Datatypes

The data types are used to describe a variable with a specific type for identifying the variable and allowing the given types of data. 

The following list of numeric data types-

- bool_ - This is used to represents the Boolean value indicating true or false. It is stored as a byte.
- int_  - This is the default type of an integer. It is identical to long type in C that mainly contains 64 bit or 32-bit integer.
- float_ - It is identical to float64








In [2]:
import numpy as np

arr = np.array([1, 2, 3, 4])

print(arr.dtype)

int32


# Numpy Arrays
NumPy arrays are the main way we will use Numpy throughout the course. Numpy arrays essentially come in two flavors: vectors and matrices. Vectors are strictly 1-d arrays and matrices are 2-d (but you should note a matrix can still have only one row or one column).

Let's begin our introduction by exploring how to create NumPy arrays.

![image.png](attachment:image.png)

 # Creating NumPy Arrays¶

In [3]:
import numpy as np

arr = np.array((1, 2, 3, 4, 5))

print(arr)

[1 2 3 4 5]


## 0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.

In [4]:
import numpy as np

arr = np.array(42)

print(arr)

42


## 1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.

These are the most common and basic arrays.

In [5]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

[1 2 3 4 5]


## 2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.

These are often used to represent matrix or 2nd order tensors.

In [6]:
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr)

[[1 2 3]
 [4 5 6]]


## 3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.

These are often used to represent a 3rd order tensor.

In [7]:
import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr)

[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]


## Check Number of Dimensions?

In [8]:
import numpy as np

a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

0
1
2
3


# I/O with NumPy

Ndarray objects can be saved and loaded using disc files. The available IO functions include
- The /numPy binary files (with the npy extension) are compatible with the load() and save() functions.
- Standard text files are the focus of the loadtxt() and savetxt() functions.


![image.png](attachment:image.png)

##  numpy.save()
The numpy.save() file stores the input array in a disk file with npy extension.

In [9]:
import numpy as np 
a = np.array([1,2,3,4,5]) 
np.save('outfile',a)

To reconstruct array from outfile.npy, use load() function.

In [11]:
import numpy as np 
b = np.load('outfile.npy') 
print(b) 

[1 2 3 4 5]


It will produce the following output −

The save() and load() functions accept an additional Boolean parameter allow_pickles. A pickle in Python is used to serialize and de-serialize objects before saving to or reading from a disk file.

## savetxt()
The storage and retrieval of array data in simple text file format is done with savetxt() and loadtxt() functions.

In [14]:
import numpy as np 

a = np.array([1,2,3,4,5]) 
np.savetxt('out.txt',a) 
b = np.loadtxt('out.txt') 
print(b) 


[1. 2. 3. 4. 5.]


The savetxt() and loadtxt() functions accept additional optional parameters such as header, footer, and delimiter.

# Indexing 

- Individual elements are accessed through indexing. 
- With numpy indexing, you can also extract entire rows, columns, or planes from multidimensional arrays. 
- The indexing begins at zero.


![image.png](attachment:image.png)

## Indexing Using Index Arrays
This process of accessing groups of elements in NumPy arrays using arrays as indexes is known as indexing using index arrays. 
Arrays or any other sequence, such as a list, can be used to index NumPy arrays.


In [17]:
import numpy as np
arr=np.arange(1,10,2) 
print("Elements of array: ",arr)
arr1=arr[np.array([4,0,2,-1,-2])]
print("Indexed Elements of array arr: ",arr1)

Elements of array:  [1 3 5 7 9]
Indexed Elements of array arr:  [9 1 5 9 7]


## Indexing in 1 dimension


In [20]:
import numpy as np 
arr1=np.arange(4)
print("Array arr11:",arr1)
print("Element at index 0 of arri is: ", arr1[0])
print("Element at index 1 of arr1 is: ", arr1[1])

Array arr11: [0 1 2 3]
Element at index 0 of arri is:  0
Element at index 1 of arr1 is:  1


## Indexing in 2 Dimensions

In [24]:
import numpy as np
arr=np.arange(12)
arr1=arr.reshape(3,4)
print("Array arr1:\n",arr1)
print("Element at eth row and eth column of arr1 is:",arr1[0,0]) 
print("Element at 1st row and 2nd column of arr1 is:", arr1[1,2])

Array arr1:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Element at eth row and eth column of arr1 is: 0
Element at 1st row and 2nd column of arr1 is: 6


## Indexing in 3 Dimensions


In [25]:
import numpy as np 
arr=np.arange(12) 
arr1=arr.reshape(2,2,3) 
print("Array arr1:\n", arr1) 
print("Element:", arr1[1,0,2])

Array arr1:
 [[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]
Element: 8


# Slicing an Array

- To access elements from a NumPy array in a specific range, use NumPy indexing. 
- A subtuple, substring, or sublist is produced from a tuple, string, or list, respectively.
- The same principles that apply to Python lists also apply to NumPy arrays when it comes to slicing.


Syntax: arr_name[start:stop:step] 



## Slicing 1D NumPy Arrays

In [26]:
import numpy as np 
arr = np.arange(6) 
print("array arr:",arr)
print("sliced element of array: ", arr[1:5])

array arr: [0 1 2 3 4 5]
sliced element of array:  [1 2 3 4]


## Slicing a 2D Array


In [28]:
import numpy as np
arr=np.arange(12)
arr1=arr.reshape(3,4)
print("Array arr1: \n",arr1)
print("\n")
print("elements of 1st row and 1st column upto last column \n", arr1[1:,1:4])

Array arr1: 
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


elements of 1st row and 1st column upto last column 
 [[ 5  6  7]
 [ 9 10 11]]


## Broadcasting
Broadcasting explains the way NumPy handles arrays of various shapes when performing arithmetic operations.​
Element-to-element operations cannot be performed on two arrays if their dimensions are different.
NumPy still allows operations on arrays of different shape arrays due to the broadcasting feature.
 In order for the smaller array and the larger array to have similar shapes, they are broadcast to the same size.


![image.png](attachment:image.png)

In [30]:
import numpy as np 
a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]]) 
b = np.array([1.0,2.0,3.0])  
   
print('First array:') 
print(a) 
print('\n')  
   
print('Second array:') 
print(b) 
print('\n')  
   
print('First Array + Second Array' )
print( a + b)

First array:
[[ 0.  0.  0.]
 [10. 10. 10.]
 [20. 20. 20.]
 [30. 30. 30.]]


Second array:
[1. 2. 3.]


First Array + Second Array
[[ 1.  2.  3.]
 [11. 12. 13.]
 [21. 22. 23.]
 [31. 32. 33.]]


# Structured arrays

Various types and sizes of data are organised using it.
Fields are the type of data containers used by structure arrays. 
Any type and amount of data can be contained in each data field. 
Dot notation can be used to access an array's elements.

-Structured array properties

The array's structs all have the same number of fields.
The names of the fields in all structs are the same.


In [33]:
#Python program to demonstrate
# Structured array
import numpy as np
a = np.array([('Sana', 2, 21.0), ('Mansi', 7, 29.0)], 
             dtype=[('name', (np. str_, 10)), ('age', np.int32), ('weight', np.float64)])
print(a)

[('Sana', 2, 21.) ('Mansi', 7, 29.)]


# Statistical Functions

Numpy includes a number of statistical functions that can be used to perform statistical data analysis.
Statistics is concerned with gathering and analysing data. 
NumPy includes a number of statistical functions for analysing statistical data.


- np.amin()- Minimum value of the element along a specified axis.
- np.amax()- Maximum value of the element along a specified axis.
- np.mean()- Mean value of the data set.
- np.median()- Median value of the data set.
- np.ptp()- Range of values along an axis(peak to peak).
- np.std()- Standard deviation
- np.var() – Variance.
- np.average()- Weighted average
- np.percentile()- nth percentile of data along the specified axis.


![image.png](attachment:image.png)

- Mean -  Compute the arithmetic mean along the specified axis.
                 np.mean([1,2,3,4,5])                  
- Median - Compute the median along the specified axis.
                 np.median([1,5,2,3,4])
- Range -  Compute the median along the specified axis.
                 np.ptp([1,5,2,3,4])


- Standard deviation is the square root of the average of squared deviations from mean. The function used for this i np.std(). 
      np.std([1,2,3,4])
      
- Variance is the average of squared deviations, i.e., mean(abs(x - x.mean())**2).
      Or, standard deviation is the square root of variance.
      np.var([1,2,3,4])


In [34]:
import numpy as np
arr = np.arange(0,10)

In [35]:
arr + arr

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [36]:
arr * arr

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [37]:
arr - arr

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [38]:
# Warning on division by zero, but not an error!
# Just replaced with nan
arr/arr

  arr/arr


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [39]:
# Also warning, but not an error instead infinity
1/arr

  1/arr


array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111])

In [40]:
arr**3

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729], dtype=int32)