<a href="https://colab.research.google.com/github/Arunpar/ML-Lab/blob/master/Numpy_with_Data_science.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Numpy

### Broadcasting
* Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes

### NumPy 
* Numpy is the fundamental package for numerical computing with Python. It contains among other things:
* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

In [0]:
import numpy as np   # Importing libraries

a = np.array([0, 1, 2])
b = np.array([5, 5, 5])

print("Matrix A\n", a)
print("Matrix B\n", b)

print("Regular matrix addition A+B\n", a + b)

print("Addition using Broadcasting A+5\n", a + 5)

Matrix A
 [0 1 2]
Matrix B
 [5 5 5]
Regular matrix addition A+B
 [5 6 7]
Addition using Broadcasting A+5
 [5 6 7]


### Broadcasting Rules
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

1. they are equal, or
2.  one of them is 1


In [0]:
# Lets go for a 2D matrix
c = np.array([[0, 1, 2],[3, 4, 5],[6, 7, 8]])
d = np.array([[1, 2, 3],[1, 2, 3],[1, 2, 3]])

e = np.array([1, 2, 3])

print("Matrix C\n", c)
print("Matrix D\n", d)
print("Matrix E\n", e)

print("Regular matrix addition C+D\n", c + d)

print("Addition using Broadcasting C+E\n", c + e)

Matrix C
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
Matrix D
 [[1 2 3]
 [1 2 3]
 [1 2 3]]
Matrix E
 [1 2 3]
Regular matrix addition C+D
 [[ 1  3  5]
 [ 4  6  8]
 [ 7  9 11]]
Addition using Broadcasting C+E
 [[ 1  3  5]
 [ 4  6  8]
 [ 7  9 11]]


In [0]:
M = np.ones((3, 3))
print("Matrix M:\n",M)

Matrix M:
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [0]:
print("Dimension of M: ",M.shape)
print("Dimension of a: ",a.shape)
print("Addition using Broadcasting")
print(M + a)
# Broadcasting array with matrix

Dimension of M:  (3, 3)
Dimension of a:  (3,)
Addition using Broadcasting
[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]


## All in one program

In [0]:
# Importing libraries
import timeit

# Usage of builtin functions
start = timeit.default_timer()   

# Defining a list
array_list = [10,11,15,19,21,32]      
array_np_list = []

# Print the list
print("Original List",array_list,"\n")   

# Defining a function
def prime(num):      
    if num > 1:     
        
        # check for factors
        # Iterating a range of numbers
        for i in range(2,num):    
            if (num % i) == 0:
                
                # Appending data to list
                array_np_list.append(num)           
                print(num,"is not a prime number (",i,"times",num//i,"is",num,")")
                
                # Terminating a loop run
                break         
        else:
            print(num,"is a prime number")
            
# Iterating a list
for item in array_list:
    
    # Calling a function
    prime(item)         

print("\nNon-prime List",array_np_list,"\n")

end = timeit.default_timer()

# Computing running time
print("Time Taken to run the program:",end - start, "seconds")       

Original List [10, 11, 15, 19, 21, 32] 

10 is not a prime number ( 2 times 5 is 10 )
11 is a prime number
15 is not a prime number ( 3 times 5 is 15 )
19 is a prime number
21 is not a prime number ( 3 times 7 is 21 )
32 is not a prime number ( 2 times 16 is 32 )

Non-prime List [10, 15, 21, 32] 

Time Taken to run the program: 0.0048212890000058906 seconds


### Note:
* Python is a procedural Language
* Two versions of Python 2 vs 3
* No braces. i.e. indentation
* No need to explicitly mention data type

## Unvectorized vs Vectorized Implementations

In [0]:
# Importing libraries
import numpy as np

# Defining matrices
mat_a = [[6, 7, 8],[5, 4, 5],[1, 1, 1]]
mat_b = [[1, 2, 3],[1, 2, 3],[1, 2, 3]]

# Getting a row from matrix
def get_row(matrix, row):
    return matrix[row]

# Getting a coloumn from matrix
def get_column(matrix, column_number):
    column = []
 
    for i in range(len(matrix)):
        column.append(matrix[i][column_number])
 
    return column

# Multiply a row with coloumn
def unv_dot_product(vector_one, vector_two):
    total = 0
 
    if len(vector_one) != len(vector_two):
        return total
 
    for i in range(len(vector_one)):
        product = vector_one[i] * vector_two[i]
        total += product
 
    return total

# Multiply two matrixes
def matrix_multiplication(matrix_one, matrix_two):
    m_rows = len(matrix_one)
    p_columns = len(matrix_two[0])
    result = []
    
    for i in range(m_rows):
        row_result = []
 
        for j in range(p_columns):
            row = get_row(matrix_one, i)
            column = get_column(matrix_two, j)
            product = unv_dot_product(row, column)
            
            row_result.append(product) 
        result.append(row_result)
        
    return result

print("Matrix A: ", mat_a,"\n")
print("Matrix B: ", mat_b,"\n")

print("Unvectorized Matrix Multiplication\n",matrix_multiplication(mat_a,mat_b),"\n")


Matrix A:  [[6, 7, 8], [5, 4, 5], [1, 1, 1]] 

Matrix B:  [[1, 2, 3], [1, 2, 3], [1, 2, 3]] 

Unvectorized Matrix Multiplication
 [[21, 42, 63], [14, 28, 42], [3, 6, 9]] 



In [0]:
# Vectorized Implementation
npm_a = np.array(mat_a)
npm_b = np.array(mat_b)

print("Vectorized Matrix Multiplication\n",npm_a.dot(npm_b),"\n") 
# A.dot(B) is a numpy built-in function for dot product

Vectorized Matrix Multiplication
 [[21 42 63]
 [14 28 42]
 [ 3  6  9]] 



### Tip:
* Vectorization reduces number of lines of code
* Always prefer libraries and avoid coding from scratch

## Essential Python Packages: Numpy, Pandas, Matplotlib

In [0]:
# Load library
import numpy as np

In [0]:
# Create row vector
vector = np.array([1, 2, 3, 4, 5, 6])
print("Vector:",vector)

# Select second element
print("Element 2 in Vector is",vector[1])

Vector: [1 2 3 4 5 6]
Element 2 in Vector is 2


In [0]:
# Create matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

print("Matrix\n",matrix)

# Select second row
print("Second row of Matrix\n",matrix[1,:])
print("Third coloumn of Matrix\n",matrix[:,2])

Matrix
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Second row of Matrix
 [4 5 6]
Third coloumn of Matrix
 [3 6 9]


In [0]:
# Create Tensor
tensor = np.array([ [[[1, 1], [1, 1]], [[2, 2], [2, 2]]],
                    [[[3, 3], [3, 3]], [[4, 4], [4, 4]]] ])

print("Tensor\n",tensor.shape)

Tensor
 (2, 2, 2, 2)


### Matrix properties

In [0]:
# Create matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

print("Matrix Shape:",matrix.shape)
print("Number of elements:",matrix.size)
print("Number of dimentions:",matrix.ndim)
print("Average of matrix:",np.mean(matrix))
print("Maximum number:",np.max(matrix))
print("Coloumn with minimum numbers:",np.min(matrix, axis=1))
print("Diagnol of matrix:",matrix.diagonal())
print("Determinant of matrix:",np.linalg.det(matrix))

Matrix Shape: (3, 3)
Number of elements: 9
Number of dimentions: 2
Average of matrix: 5.0
Maximum number: 9
Coloumn with minimum numbers: [1 4 7]
Diagnol of matrix: [1 5 9]
Determinant of matrix: 0.0


### Matrix Operations

In [0]:
print("Flattened Matrix\n",matrix.flatten())
print("Reshaping Matrix\n",matrix.reshape(9,1))
print("Transposed Matrix\n",matrix.T)

Flattened Matrix
 [1 2 3 4 5 6 7 8 9]
Reshaping Matrix
 [[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]
Transposed Matrix
 [[1 4 7]
 [2 5 8]
 [3 6 9]]


In [0]:
# Create matrix
matrix_a = np.array([[1, 1, 1],
                     [1, 1, 1],
                     [1, 1, 2]])

# Create matrix
matrix_b = np.array([[1, 3, 1],
                     [1, 3, 1],
                     [1, 3, 8]])

print("Matrix Addition\n",np.add(matrix_a, matrix_b))
print("Scalar Multiplication\n",np.multiply(matrix_a, matrix_b))
print("Matrix Multiplication\n",np.dot(matrix_a, matrix_b))

Matrix Addition
 [[ 2  4  2]
 [ 2  4  2]
 [ 2  4 10]]
Scalar Multiplication
 [[ 1  3  1]
 [ 1  3  1]
 [ 1  3 16]]
Matrix Multiplication
 [[ 3  9 10]
 [ 3  9 10]
 [ 4 12 18]]


In [0]:
x = np.arange(5) 
print(x)

[0 1 2 3 4]


In [0]:
x = np.arange(5, dtype = float)
print(x)

[0. 1. 2. 3. 4.]


In [0]:
# numbers with difference of 2
x = np.arange(10,20,2) 
print(x)

[10 12 14 16 18]


In [0]:
x = np.linspace(10,20,5) 
print(x)

[10.  12.5 15.  17.5 20. ]


##let's get a rid of statistics with numpy

In [0]:
# X is a Python List
X = [32.32, 56.98, 21.52, 44.32, 55.63, 13.75, 43.47, 43.34]

# Sorting the data and printing it.
X.sort()
print(X)
# [13.75, 21.52, 32.32, 43.34, 43.47, 44.32, 55.63, 56.98]

# Using NumPy's built-in functions to Find Mean, Median, SD and Variance
mean = np.mean(X)
median = np.median(X)
sd = np.std(X)
variance = np.var(X)

# Printing the values
print("Mean", mean) # 38.91625
print("Median", median) # 43.405
print("Standard Deviation", sd) # 14.3815654029
print("Variance", variance) # 206.829423437

[13.75, 21.52, 32.32, 43.34, 43.47, 44.32, 55.63, 56.98]
Mean 38.91625
Median 43.405
Standard Deviation 14.381565402886432
Variance 206.8294234375


###The tool min returns the minimum value along a given axis.

In [0]:
import numpy as np
my_array = np.array([[2, 5], 
                        [3, 7],
                        [1, 3],
                        [4, 0]])

x=np.min(my_array, axis = 1)
print(x)
print(max(x))


[2 3 1 0]
3


###The tool max returns the maximum value along a given axis.

In [0]:
import numpy
my_array = numpy.array([[2, 5], 
                        [3, 7],
                        [1, 3],
                        [4, 0]])

print (np.max(my_array, axis = 0))       #Output : [4 7]
print (np.max(my_array, axis = 1))         #Output : [5 7 3 4]
print (np.max(my_array, axis = None))      #Output : 7
print (np.max(my_array))                   #Output : 7

[4 7]
[5 7 3 4]
7
7


In [0]:
a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.

print(bool_idx)      # Prints "[[False False]
                     #          [ True  True]
                     #          [ True  True]]"

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])  # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:
print(a[a > 2])     # Prints "[3 4 5 6]"

[[False False]
 [ True  True]
 [ True  True]]
[3 4 5 6]
[3 4 5 6]


#NumPy Challenge
### You are given a 2-D array with dimensions N X M.
### Your task is to perform the min function over axis 1 and then find the max of that.
Sample Input

```
4 2
2 5
3 7
1 3
4 0
```

This is formatted as code

Sample Output

3

In [0]:
#Challenge Question 
import numpy as np
a=np.array([[4,2],
            [2, 5],
            [3, 7],
            [1, 3],
            [4, 0]])

m=np.min(my_array, axis = 1)
print ("Minimum according to axix-1 is : ",m)      
print ("Maximum element from the minimum array is : ",max(m)) 

Minimum according to axix-1 is :  [2 3 1 0]
Maximum element from the minimum array is :  3


### Pandas

In [0]:
import pandas as pd

In [4]:
df=pd.read_csv("Income.csv" ,encoding = "ISO-8859-1")
print("Data\n")
df[:5]


Data



Unnamed: 0,id,State_Code,State_Name,State_ab,County,City,Place,Type,Primary,Zip_Code,Area_Code,ALand,AWater,Lat,Lon,Mean,Median,Stdev,sum_w
0,1011000,1,Alabama,AL,Mobile County,Chickasaw,Chickasaw city,City,place,36611,251,10894952,909156,30.77145,-88.079697,38773,30506,33101,1638.260513
1,1011010,1,Alabama,AL,Barbour County,Louisville,Clio city,City,place,36048,334,26070325,23254,31.708516,-85.611039,37725,19528,43789,258.017685
2,1011020,1,Alabama,AL,Shelby County,Columbiana,Columbiana city,City,place,35051,205,44835274,261034,33.191452,-86.615618,54606,31930,57348,926.031
3,1011030,1,Alabama,AL,Mobile County,Satsuma,Creola city,City,place,36572,251,36878729,2374530,30.874343,-88.009442,63919,52814,47707,378.114619
4,1011040,1,Alabama,AL,Mobile County,Dauphin Island,Dauphin Island,Town,place,36528,251,16204185,413605152,30.250913,-88.171268,77948,67225,54270,282.320328


In [6]:
df=pd.read_csv("Income.csv" ,encoding = "ISO-8859-1")
print("Data\n")
df[:]

Data



Unnamed: 0,id,State_Code,State_Name,State_ab,County,City,Place,Type,Primary,Zip_Code,Area_Code,ALand,AWater,Lat,Lon,Mean,Median,Stdev,sum_w
0,1011000,1,Alabama,AL,Mobile County,Chickasaw,Chickasaw city,City,place,36611,251,10894952,909156,30.771450,-88.079697,38773,30506,33101,1638.260513
1,1011010,1,Alabama,AL,Barbour County,Louisville,Clio city,City,place,36048,334,26070325,23254,31.708516,-85.611039,37725,19528,43789,258.017685
2,1011020,1,Alabama,AL,Shelby County,Columbiana,Columbiana city,City,place,35051,205,44835274,261034,33.191452,-86.615618,54606,31930,57348,926.031000
3,1011030,1,Alabama,AL,Mobile County,Satsuma,Creola city,City,place,36572,251,36878729,2374530,30.874343,-88.009442,63919,52814,47707,378.114619
4,1011040,1,Alabama,AL,Mobile County,Dauphin Island,Dauphin Island,Town,place,36528,251,16204185,413605152,30.250913,-88.171268,77948,67225,54270,282.320328
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32521,720296,72,Puerto Rico,PR,Adjuntas Municipio,Guaynabo,Adjuntas,Track,Track,970,787,589417,1691,18.397925,-66.130633,30649,13729,37977,1321.278082
32522,7202966,72,Puerto Rico,PR,Adjuntas Municipio,Aguada,Adjuntas,Track,Track,602,787,1801613,795887,18.385424,-67.203310,15520,9923,15541,238.813450
32523,7202976,72,Puerto Rico,PR,Adjuntas Municipio,Aguada,Adjuntas,Track,Track,602,787,11031227,0,18.356565,-67.180686,41933,34054,31539,313.551070
32524,7202986,72,Puerto Rico,PR,Adjuntas Municipio,Aguada,Adjuntas,Track,Track,602,787,0,33597561,18.412041,-67.213413,0,0,0,0.000000


In [8]:
df[(df['id'] >1011020) & (df['id'] < 1011050)]

Unnamed: 0,id,State_Code,State_Name,State_ab,County,City,Place,Type,Primary,Zip_Code,Area_Code,ALand,AWater,Lat,Lon,Mean,Median,Stdev,sum_w
3,1011030,1,Alabama,AL,Mobile County,Satsuma,Creola city,City,place,36572,251,36878729,2374530,30.874343,-88.009442,63919,52814,47707,378.114619
4,1011040,1,Alabama,AL,Mobile County,Dauphin Island,Dauphin Island,Town,place,36528,251,16204185,413605152,30.250913,-88.171268,77948,67225,54270,282.320328


In [9]:
print("Top Elements\n")
df.head(3)

Top Elements



Unnamed: 0,id,State_Code,State_Name,State_ab,County,City,Place,Type,Primary,Zip_Code,Area_Code,ALand,AWater,Lat,Lon,Mean,Median,Stdev,sum_w
0,1011000,1,Alabama,AL,Mobile County,Chickasaw,Chickasaw city,City,place,36611,251,10894952,909156,30.77145,-88.079697,38773,30506,33101,1638.260513
1,1011010,1,Alabama,AL,Barbour County,Louisville,Clio city,City,place,36048,334,26070325,23254,31.708516,-85.611039,37725,19528,43789,258.017685
2,1011020,1,Alabama,AL,Shelby County,Columbiana,Columbiana city,City,place,35051,205,44835274,261034,33.191452,-86.615618,54606,31930,57348,926.031


In [10]:
print("Bottom Elements\n")
df.tail(3)

Bottom Elements



Unnamed: 0,id,State_Code,State_Name,State_ab,County,City,Place,Type,Primary,Zip_Code,Area_Code,ALand,AWater,Lat,Lon,Mean,Median,Stdev,sum_w
32523,7202976,72,Puerto Rico,PR,Adjuntas Municipio,Aguada,Adjuntas,Track,Track,602,787,11031227,0,18.356565,-67.180686,41933,34054,31539,313.55107
32524,7202986,72,Puerto Rico,PR,Adjuntas Municipio,Aguada,Adjuntas,Track,Track,602,787,0,33597561,18.412041,-67.213413,0,0,0,0.0
32525,7202996,72,Puerto Rico,PR,Adjuntas Municipio,Aguadilla,Adjuntas,Track,Track,603,787,6476604,2717115,18.478094,-67.160453,28049,20229,33333,512.884803


In [12]:
print("Specific Coloumn\n")
df['State_Name'].head(3)

Specific Coloumn



0    Alabama
1    Alabama
2    Alabama
Name: State_Name, dtype: object

In [17]:
import numpy as np
print("Replace negative numbers with NaN\n")
df.replace(-999,np.nan)

Replace negative numbers with NaN



Unnamed: 0,id,State_Code,State_Name,State_ab,County,City,Place,Type,Primary,Zip_Code,Area_Code,ALand,AWater,Lat,Lon,Mean,Median,Stdev,sum_w
0,1011000,1,Alabama,AL,Mobile County,Chickasaw,Chickasaw city,City,place,36611,251,10894952,909156,30.771450,-88.079697,38773,30506,33101,1638.260513
1,1011010,1,Alabama,AL,Barbour County,Louisville,Clio city,City,place,36048,334,26070325,23254,31.708516,-85.611039,37725,19528,43789,258.017685
2,1011020,1,Alabama,AL,Shelby County,Columbiana,Columbiana city,City,place,35051,205,44835274,261034,33.191452,-86.615618,54606,31930,57348,926.031000
3,1011030,1,Alabama,AL,Mobile County,Satsuma,Creola city,City,place,36572,251,36878729,2374530,30.874343,-88.009442,63919,52814,47707,378.114619
4,1011040,1,Alabama,AL,Mobile County,Dauphin Island,Dauphin Island,Town,place,36528,251,16204185,413605152,30.250913,-88.171268,77948,67225,54270,282.320328
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32521,720296,72,Puerto Rico,PR,Adjuntas Municipio,Guaynabo,Adjuntas,Track,Track,970,787,589417,1691,18.397925,-66.130633,30649,13729,37977,1321.278082
32522,7202966,72,Puerto Rico,PR,Adjuntas Municipio,Aguada,Adjuntas,Track,Track,602,787,1801613,795887,18.385424,-67.203310,15520,9923,15541,238.813450
32523,7202976,72,Puerto Rico,PR,Adjuntas Municipio,Aguada,Adjuntas,Track,Track,602,787,11031227,0,18.356565,-67.180686,41933,34054,31539,313.551070
32524,7202986,72,Puerto Rico,PR,Adjuntas Municipio,Aguada,Adjuntas,Track,Track,602,787,0,33597561,18.412041,-67.213413,0,0,0,0.000000
