# Numpy Basics

## Today's Outline:
- Introduction to NumPy
- NumPy Functions
- [Practical Exercises](https://www.w3resource.com/python-exercises/numpy/index.php)
- Case-study

==========

## Introduction to NumPy
- Get Help!
    - Jupyter Help
    - API Reference
    - Kaggle
    - Stackoverflow
- Why NumPy?
- NumPy in Data Science
    - Practical Examples

### Why NumPy?
- Numerical Python (NumPy Array)
- Very Useful for Mathematical Operations
- Multi-Dimensional Arrays (Linear Algebra)
- Easy & Fast
- Mutable / Homogenous / Indexable
- Free & Open-source

#### NumPy Documentation

https://numpy.org/doc/stable/reference/index.html

Download Cheat-Sheet from Here:
- https://s3.amazonaws.com/dq-blog-files/numpy-cheat-sheet.pdf
- https://datacamp-community-prod.s3.amazonaws.com/e9f83f72-a81b-42c7-af44-4e35b48b20b7

==========

#### NumPy Arrays vs. Python Lists

In [None]:
# Let's see the importance of NumPy Arrays, and their advantages over Python Lists

# Defining two list
a = [1,2,3]
b = [4,5,6]
print(a)
print(b)

In [None]:
# Using '+' operator is used for concatination not for element-wise addition
a + b

In [None]:
# For adding to lists you have to do the following

# Element-wise addition of lists using indexing
c = [(a[0]+b[0]), (a[1]+b[1]), (a[2]+b[2])]
print(c)

# Element-wise addition of lists using iteration
d = []
for i in range(len(a)):
    d.append(a[i]+b[i])
print(d)

In [None]:
# Also, '/' operator is not allowed for division of lists
a/b

In [None]:
# Now, let's use NumPy Arrays to solve these problems
import numpy as np
np_a = np.array([1,2,3])
np_b = np.array([4,5,6])
print(np_a)
print(np_b)

In [None]:
# You can now do the element-wise addition easily
np_c = np_a + np_b
print(np_c)

In [None]:
# And the element-wise division too
np_d = np_a / np_b
print(np_d)

### NumPy in Data Science
- Data Preprocessing
- Linear Regression (LU Decomposition / SVD)
- Encoding Variables (Categorical)
- Recommender Systems
- Principle Component Analysis (PCA) / Dimensionality Reduction
- Deep Learning & Neural Networks
- And much more ..

==========

## NumPy Functions
- Importing NumPy
- Creating & Initializing NumPy Arrays
- Inspecting Properties
- NumPy Data Types
- Indexing, Slicing, & Subsetting
- Adding & Removing Elements
- Combining & Splitting
- Copy, Sorting, Reshaping
- Scalar & Vector Math
- Statistics
- Broadcasting & Typecasting (ufuncs)

### Importing NumPy

In [None]:
import numpy

In [None]:
# Always try to use this convention
import numpy as np

In [None]:
# You can select a specific function in NumPy
from numpy import array 

In [None]:
# Also, you can import a specific function in any sub-module
from numpy.linalg import det

==========

### Creating & Initializing NumPy Arrays

| Operator                    	| Description                                              	|
|-----------------------------	|----------------------------------------------------------	|
| np.array([1,2,3])           	| 1d array                                                 	|
| np.array([(1,2,3),(4,5,6)]) 	| 2d array                                                 	|
| np.arange(start,stop,step)  	| range array                                              	|
| np.linspace(0,2,9)          	| Add evenly spaced values btw interval to array of length 	|
| np.zeros((1,2))             	| Create and array filled with zeros                       	|
| np.ones((1,2))              	| Creates an array filled with ones                        	|
| np.random.random((5,5))     	| Creates random array                                     	|
| np.empty((2,2))             	| Creates an empty array                                   	|

In [None]:
# Importing NumPy module
import numpy as np

In [None]:
from IPython.display import Image
Image("data/ndarrays.png")

In [None]:
# Creating a 0-D NumPy Array
arr0 = np.array(40)
arr0

In [None]:
# Creating a 1-D NumPy Array
arr1 = np.array([1,2,3])
arr1

In [None]:
# Creating a 2-D NumPy Array
arr2 = np.array([[1.0,2.3,4.5],[8.6,1.2,7.3]])
arr2

In [None]:
# Creating a 3-D NumPy Array using ndim attribute
arr3 = np.array([4,5,6,7], ndmin=3) # np.array([[[5]]])
arr3

In [None]:
# Another way for creating a 3-D NumPy Array
arr3a = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
arr3a

In [None]:
# Creating a NumPy Array full of zeros with 3-elements
np.zeros(3)

In [None]:
# Creating a NumPy Array full of ones with 3x4 dimensions
np.ones((3,4))

In [None]:
# Creating a NumPy Array for a 3x3 identity matrix 
np.identity(3)

In [None]:
# Another way for creating a NumPy Array for a 3x3 identity matrix 
np.eye(3)

In [None]:
# Creating a NumPy Array full of a specific number
np.full((3,4), 8)

In [None]:
# Initializing an empty NumPy Array with a dummy (garbage) data
np.empty((2,3))

In [None]:
# Creating a NumPy Array from a range
np.arange(5)

In [None]:
# Another way for creating a NumPy Array from a range by specifying the start, the end, & the step, and reshaping the result
np.arange(4,20,2).reshape(4,2)

In [None]:
# Creating a NumPy Array with a range between tow points
np.linspace(0,10,5)

In [None]:
# Creating a random NumPy Array in a range between 0 & 1
np.random.rand(4,5)

In [None]:
# Creating a random integer NumPy Array in a range between 0 & 1
np.random.randint(10, size=(4,5))

In [None]:
# Creating a random NumPy Array based-on a normal distribution by providing the mean and the standard deviation
np.random.normal(1,2,5)

In [None]:
# Creating a random integer NumPy Array based-on a standard normal distribution 
np.random.randn(6)

In [None]:
# You can create a NumPy Array-like to define a NumPy Array similar to a given NumPy Array
# np.zeros_like(), np.ones_like(), np.empty_like(), np.full_like() 
arr2_like = np.zeros_like(arr2)
arr2_like

==========

### Inspecting Properties

| Syntax             	| Description                	|
|--------------------	|----------------------------	|
| array.shape        	| Dimensions (Rows,Columns)  	|
| len(array)         	| Length of Array            	|
| array.ndim         	| Number of Array Dimensions 	|
| array.dtype        	| Data Type                  	|
| array.astype(type) 	| Converts to Data Type      	|
| type(array)        	| Type of Array              	|

In [None]:
arr1

In [None]:
arr2

In [None]:
# Calculating the dimension of 1-D NumPy Array
arr1.ndim

In [None]:
# Calculating the dimension of 2-D NumPy Array
arr2.ndim

In [None]:
# Finding the number of elements in each dimension
arr1.shape

In [None]:
# Finding the number of elements in a specific dimension
arr2.shape[1]

In [None]:
# Evaluating the truthness of a NumPy Array
x = np.array([1,0,1,0,0,1])
x.any(), x.all()

==========

### NumPy Data Types

| Data Types 	|                                    Description                                   	|
|:----------:	|:--------------------------------------------------------------------------------:	|
| bool_      	| Boolean (True or False) stored as a byte                                         	|
| int_       	| Default integer type (same as C long; normally either int64 or int32)            	|
| intc       	| Identical to C int (normally int32 or int64)                                     	|
| intp       	| Integer used for indexing (same as C ssize_t; normally either int32 or int64)    	|
| int8       	| Byte (-128 to 127)                                                               	|
| int16      	| Integer (-32768 to 32767)                                                        	|
| int32      	| Integer (-2147483648 to 2147483647)                                              	|
| int64      	| Integer (-9223372036854775808 to 9223372036854775807)                            	|
| uint8      	| Unsigned integer (0 to 255)                                                      	|
| uint16     	| Unsigned integer (0 to 65535)                                                    	|
| uint32     	| Unsigned integer (0 to 4294967295)                                               	|
| uint64     	| Unsigned integer (0 to 18446744073709551615)                                     	|
| float_     	| Shorthand for float64                                                            	|
| float16    	| Half precision float: sign bit, 5 bits exponent, 10 bits mantissa                	|
| float32    	| Single precision float: sign bit, 8 bits exponent, 23 bits mantissa              	|
| float64    	| Double precision float: sign bit, 11 bits exponent, 52 bits mantissa             	|
| complex_   	| Shorthand for complex128                                                         	|
| complex64  	| Complex number, represented by two 32-bit floats (real and imaginary components) 	|
| complex128 	| Complex number, represented by two 64-bit floats (real and imaginary components) 	|

In [None]:
# Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
np.float64

In [None]:
# Finding the data type of our NumPy Array
arr2.dtype

In [None]:
# Using dtype attribute in array function to convert the data type
arr_data = np.array([[1,2,3],[4,5,6]], dtype = np.float64) # You can also use 'float64' between quotations mark 
arr_data

In [None]:
# Creating a NumPy Array full of NaN value (Not a Number)
an_array = np.empty((3,3))
an_array[:] = np.NaN
print(an_array)

In [None]:
# Converting the data type of an array using astype() function
arr3.astype(np.float64).dtype

==========

### Indexing, Slicing, & Subsetting

| Operator        	| Description                                  	|
|-----------------	|----------------------------------------------	|
| array[i]        	| 1d array at index i                          	|
| array[i,j]      	| 2d array at index[i][j]                      	|
| array[i<4]      	| Boolean Indexing, see Tricks                 	|
| array[0:3]      	| Select items of index 0, 1 and 2             	|
| array[0:2,1]    	| Select items of rows 0 and 1 at column 1     	|
| array[:1]       	| Select items of row 0 (equals array[0:1, :]) 	|
| array[1:2, :]   	| Select items of row 1                        	|
| [comment]: <> ( 	| array[1,...]                                 	|
| array[ : :-1]   	| Reverses array                               	|

In [None]:
from IPython.display import Image
Image("data/index.png")

In [None]:
arr1

In [None]:
# Modifing a specific element in the array by indexing it
arr1[1] = 5
arr1

In [None]:
# You can get the last element by index -1
arr1[-1]

In [None]:
arr2

In [None]:
# Indexing an element in the 2-D NumPy Array
arr2[1][2]

In [None]:
# Another way for indexing the 2-D NumPy Array
arr2[1,2]

In [None]:
# Slicing the 1-D NumPy Array to get multiple elements at once
arr1[0:2]

In [None]:
# Getting all of the elements
arr1[:]

In [None]:
# Slicing the 2-D NumPy Array to get multiple elements at once
arr2[:,2]

In [None]:
# Also we can skip columns by doubling the step
arr2[0:3,::2]

In [None]:
# Boolean indexing would be helpful in getting specific elements
arr2[arr2 < 5]

In [None]:
# Give it a try!
from IPython.display import Image
Image("data/slicing.png")

In [None]:
arr = np.arange(25).reshape(5, 5)
arr

In [None]:
# Get Last row
arr[-1]

In [None]:
# Get all rows and last column
arr[:, -1]

In [None]:
arr[ : , [1,3] ]

In [None]:
arr[ : , 1: :2 ]

In [None]:
arr[1::2, :4:2]

==========

### Adding, & Removing Elements

| Operator                     	| Description                            	|
|------------------------------	|----------------------------------------	|
| np.append(a,b)               	| Append items to array                  	|
| np.insert(array, 1, 2, axis) 	| Insert items into array at axis 0 or 1 	|
| np.resize((2,4))             	| Resize array to shape(2,4)             	|
| np.delete(array,1,axis)      	| Deletes items from array               	|

In [None]:
# Adding a new element to the array
arr1 = np.append(arr1,4.2)
arr1

In [None]:
# Removing the last element from the array
arr1 = np.delete(arr1,-1)
arr1

In [None]:
# Inserting an element in a specific position in the array
arr1 = np.insert(arr1,1,6.7)
arr1

==========

### Combining & Splitting

| Operator                     	| Description                                             	|
|------------------------------	|---------------------------------------------------------	|
| np.concatenate((a,b),axis=0) 	| Concatenates 2 arrays, adds to end                      	|
| np.vstack((a,b))             	| Stack array row-wise                                    	|
| np.hstack((a,b))             	| Stack array column wise                                 	|
| numpy.split()                	| Split an array into multiple sub-arrays.                	|
| np.array_split(array, 3)     	| Split an array in sub-arrays of (nearly) identical size 	|
| numpy.hsplit(array, 3)       	| Split the array horizontally at 3rd index               	|

In [None]:
# Defining new arrays
a = np.array([1,2,3])
b = np.array([4,5,6])

In [None]:
# Concatenating two arrays
c = np.concatenate((a,b))
c

In [None]:
# Combing two arrays vertically
c = np.vstack((a,b))
c

In [None]:
# Combing two arrays horizontally
c = np.hstack((a,b))
c

In [None]:
# Defining a new array for splitting
d = np.arange(12).reshape(4,3)
d

In [None]:
# Splitting an array to two arrays
e,f = np.split(d,2)

In [None]:
e

In [None]:
f

==========

### Copy, Sorting, Reshaping

In [None]:
a = np.array([1,2,3,4])
a

In [None]:
# Deep Copy
b = a

In [None]:
b

In [None]:
b[1] = 6

In [None]:
b

In [None]:
a

In [None]:
# a and b are pointing to the same position in the memory
id(a), id(b)

In [None]:
# Shallow Copy
b = np.copy(a) 
b

In [None]:
b[1] = 10
b

In [None]:
a

In [None]:
c = np.random.rand(4,3)
c

In [None]:
# Reshaping the array
c.reshape(6,2)

In [None]:
d = np.random.rand(4,3)
d

In [None]:
# Resizing the array
np.resize(d,(6,3))

In [None]:
# Sorting a NumPy Array
z = np.array([[1,4],[3,1]])
np.sort(z)

==========

### Scalar & Vector Math

| Operator                	| Description                              	|
|-------------------------	|------------------------------------------	|
| np.add(x,y) x + y       	| Addition                                 	|
| np.substract(x,y) x - y 	| Subtraction                              	|
| np.divide(x,y) x / y    	| Division                                 	|
| np.multiply(x,y) x @ y  	| Multiplication                           	|
| np.sqrt(x)              	| Square Root                              	|
| np.sin(x)               	| Element-wise sine                        	|
| np.cos(x)               	| Element-wise cosine                      	|
| np.log(x)               	| Element-wise natural log                 	|
| np.dot(x,y)             	| Dot product                              	|
| np.roots([1,0,-4])      	| Roots of a given polynomial coefficients 	|

In [None]:
a

In [None]:
b

In [None]:
# Scalar addition
np.add(a, 3)

In [None]:
# Vector addition
np.add(a, b)

In [None]:
# Calculating the square root of all elements in the array
np.sqrt(a)

In [None]:
# Dot-product (vector multiplication)
np.dot(a,b)

In [None]:
# Element-wise multiplication
a * b

==========

### Statistics

| Operator             	| Description                      	|
|----------------------	|----------------------------------	|
| np.mean(array)       	| Mean                             	|
| np.median(array)     	| Median                           	|
| array.corrcoef()     	| Correlation Coefficient          	|
| np.std(array)        	| Standard Deviation               	|
| array.sum()          	| Array-wise sum                   	|
| array.min()          	| Array-wise minimum value         	|
| array.max(axis=0)    	| Maximum value of specified axis  	|
| array.cumsum(axis=0) 	| Cumulative sum of specified axis 	|

In [None]:
a = np.array([[2, 3], [0, 1]])

In [None]:
# Finding the minimum value of the array
np.min(a)

In [None]:
# Finding the position of the minimum value of the array
np.argmin(a)

In [None]:
# Finding the maximum value of each column of the array
a.max(axis=0)

In [None]:
a = np.array([[1,2,3],[4,5,6]])

In [None]:
# Calculating the mean value of each row of the array
a.mean(axis=1)

In [None]:
# Calculating the standard deviation value of the array with degree_of_freedom = 1
a.std(axis=0, ddof=1)

In [None]:
# Give it a try!
from IPython.display import Image
Image("data/math.png")

- Compute the maximum of each row
- Compute the mean of each column
- The position of the overall minimum

In [None]:
arr = np.arange(-15, 15).reshape(5, 6) ** 2
arr

### Broadcasting & Typecasting

In [None]:
from IPython.display import Image
Image("data/broadcast.png")

In [None]:
arr1 = np.array([0,1,2])
arr1

In [None]:
arr2 = np.array([[0,0,0],[10,10,10],[20,20,20],[30,30,30]])
arr2

In [None]:
# Broadcasting & Typecasting
np.add(arr1, arr2, dtype = np.float64)

In [None]:
# Finding the mean of each column
np.mean(arr2, axis = 0, dtype=np.int64)

==========

## Case-study: NumPy Data Cleaning (Missing Data)

### Checking for Missing Values

In [None]:
import numpy as np

In [None]:
# Loading the file with no missing data 
lending_co_data_numeric = np.loadtxt("data/lending-company-numeric.csv", delimiter = ',')
lending_co_data_numeric

In [None]:
# Checking for the missing data in the file
np.isnan(lending_co_data_numeric)

In [None]:
# Find the total sum of the missing data
np.isnan(lending_co_data_numeric).sum()

In [None]:
# Let's load the file that containing a missing data
lending_co_data_numeric_NAN = np.genfromtxt("data/lending-company-numeric-nan.csv", delimiter = ';')
# Note that you can't use loadtxt() when there is missing data in the file, we use genfromtxt() instead

In [None]:
# Checking for the missing data in the file
np.isnan(lending_co_data_numeric_NAN)

In [None]:
np.isnan(lending_co_data_numeric_NAN).sum()

### Dealing with the Missing Data, by Zero Filling

In [None]:
# How about dealing with the missing data by filling all the NaN values with 0
lending_co_data_numeric_NAN = np.genfromtxt("data/lending-company-numeric-nan.csv", 
                                            delimiter = ';',
                                            filling_values = 0)

In [None]:
# Let's check for the missing data in the file, now it will be 0
np.isnan(lending_co_data_numeric_NAN)

In [None]:
np.isnan(lending_co_data_numeric_NAN).sum()

In [None]:
# And here is the final version of the data after filling the missing value
lending_co_data_numeric_NAN

### Substituting Missing Values with Mean/Max values

In [None]:
# We need to reimport the dataset since all the missing values are filled up
lending_co_data_numeric_NAN = np.genfromtxt("data/lending-company-numeric-nan.csv", 
                                            delimiter = ';')

In [None]:
# We want a value greater than the max, since we have be certain it's unique to the dataset.
temporary_fill = np.nanmax(lending_co_data_numeric_NAN).round(2) + 1

In [None]:
temporary_fill

In [None]:
# Filling up all the missing values with the temporary filler
lending_co_data_numeric_NAN = np.genfromtxt("data/lending-company-numeric-nan.csv", 
                                            delimiter = ';',
                                            filling_values = temporary_fill) 

In [None]:
np.isnan(lending_co_data_numeric_NAN)

In [None]:
np.isnan(lending_co_data_numeric_NAN).sum()

In [None]:
# Reimporting the dataset again
lending_co_data_numeric_NAN = np.genfromtxt("data/lending-company-numeric-nan.csv", delimiter = ';')
lending_co_data_numeric_NAN

In [None]:
## Storing the means of every column
temporary_mean = np.nanmean(lending_co_data_numeric_NAN, axis = 0).round(2)

In [None]:
# Find the mean of the 1st column
temporary_mean[0]

In [None]:
## Creating a unique filler and using it to take care of all the missing values.
temporary_fill = np.nanmax(lending_co_data_numeric_NAN).round(2) + 1

lending_co_data_numeric_NAN = np.genfromtxt("data/lending-company-numeric-nan.csv",
                                            delimiter = ';',
                                            filling_values = temporary_fill)

In [None]:
temporary_fill

In [None]:
# Supposed mean (w/ fillers)
np.mean(lending_co_data_numeric_NAN[:,0]).round(2) 

In [None]:
# Actual mean (w/0 fillers)
temporary_mean[0]

In [None]:
# Going through the first column and substituting any temporary fillers (previously missing) with the mean for that column.
lending_co_data_numeric_NAN[:,0] = np.where(lending_co_data_numeric_NAN[:,0] == temporary_fill,
                                            temporary_mean[0], 
                                            lending_co_data_numeric_NAN[:,0])

In [None]:
# New mean equals old mean. 
np.mean(lending_co_data_numeric_NAN[:,0]).round(2)

In [None]:
# We're generalizing the filling from earlier and going through all the columns
for i in range(lending_co_data_numeric_NAN.shape[1]):        
    lending_co_data_numeric_NAN[:,i] = np.where(lending_co_data_numeric_NAN[:,i] == temporary_fill, 
                                                temporary_mean[i], 
                                                lending_co_data_numeric_NAN[:,i])

In [None]:
# We can use this approach for other applications as well (e.g. remove all negative values and set them to 0)
for i in range(lending_co_data_numeric_NAN.shape[1]):        
    lending_co_data_numeric_NAN[:,i] = np.where(lending_co_data_numeric_NAN[:, i] < 0,
                                                0, 
                                                lending_co_data_numeric_NAN[:,i])

In [None]:
lending_co_data_numeric_NAN

==========

# THANK YOU!