# Data Processing

#### Data is the most important element of Machine Learning Industry. Given a large and diverse set of training data, a good deep learning model will significantly outperform non-deep learning algorithms.


# Getting Started with NumPy

#### The majority of neural networks use input data that is either numeric or has been converted to a numeric form. When we deal with numeric data, the best Python library to use is NumPy. The NumPy library allows us to perform many operations on numeric data, and convert the data to more usable forms.
#### And NumPy aims to provide an array object that is up to **50x faster** than traditional Python lists.
### Its a commone practice to import NumPy **alias** with **np** Like
> import **numpy** as **np**

# NumPy Arrays

#### NumPy arrays are basically just Python lists with added features. In fact, you can easily convert a Python list to a Numpy array using the np.array function, which takes in a Python list as its required argument.
#### The function also has quite a few keyword arguments, but the main one to know is **dtype**.
#### The function **np.array** perfoms **upcasting**. **If the array contains elements of different data types, all the elements are cast into the largest type (a process known as upcasting)**
#### The dtype keyword argument takes in a NumPy type and manually casts the array to the specified type.

In [1]:
import numpy as np

arr = np.array([[0, 1, 2], [3, 4, 5]],
               dtype=np.float32)
print(repr(arr))

array([[0., 1., 2.],
       [3., 4., 5.]], dtype=float32)


## Copying

In [2]:
a = np.array([0, 1])
b = np.array([9, 8])
c = a
print('Array a: {}'.format(repr(a)))
c[0] = 5
print('Array a: {}'.format(repr(a)))

d = b.copy()
d[0] = 6
print('Array b: {}'.format(repr(b)))

Array a: array([0, 1])
Array a: array([5, 1])
Array b: array([9, 8])


## Casting

In [3]:
arr = np.array([0, 1, 2])
print(arr.dtype)
arr = arr.astype(np.float32)
print(arr.dtype)

int64
float32


## NaN & Infinity

* Note that np.nan cannot take on an integer type.
* Note that np.inf cannot take on an integer type.
### Try it by uncommenting and commenting the code chunks

In [4]:
# NaN
# arr = np.array([np.nan, 1, 2])
# print(repr(arr))

# arr = np.array([np.nan, 'abc'])
# print(repr(arr))

# Will result in a ValueError
# np.array([np.nan, 1, 2], dtype=np.int32)


# # # # Infinity
# print(np.inf > 1000000)

# arr = np.array([np.inf, 5])
# print(repr(arr))

# arr = np.array([-np.inf, 1])
# print(repr(arr))

# # Will result in an OverflowError
# np.array([np.inf, 3], dtype=np.int32)

# Dimensions in Arrays

#### A dimension in arrays is one level of array depth (nested arrays).
### 0-D Arrays
#### 0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
### 1-D Arrays
#### An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
**These are the most common and basic arrays.**
### 2-D Arrays
#### An array that has 1-D arrays as its elements is called a 2-D array.
**These are often used to represent matrix or 2nd order tensors.**
>NumPy has a whole sub module dedicated towards matrix operations called numpy.mat
### 3-D arrays
#### An array that has 2-D arrays (matrices) as its elements is called 3-D array.
**These are often used to represent a 3rd order tensor.** 
### Check Number of Dimensions?
#### NumPy Arrays provides the **ndim** attribute that returns an integer that tells us how many dimensions the array have.

In [5]:
# importing numpy and alias as np
import numpy as np

# 0-D 
d= np.array(42)
print("0-D numpy array {}\n".format(d))

# 1-D
D = np.array([1, 2, 3])
print("1-D numpy array {}\n".format(D))

# 2-D
DD = np.array([[1, 2, 4], [3, 4, 5]])
print("2-D numpy array {}\n".format(DD))

# 3-D
DDD = np.array([[[1, 2, 3], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]])
print("3-D numpy array {}\n".format(DDD))

# Check Number of Dimensions?
print("The dimensions of DDD is {}\n".format(DDD.ndim))

0-D numpy array 42

1-D numpy array [1 2 3]

2-D numpy array [[1 2 4]
 [3 4 5]]

3-D numpy array [[[ 1  2  3]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]

The dimensions of DDD is 3



# Numpy Ranged data

#### To specify the number of elements in the returned array, rather than the step size, we can use the **np.linspace** function.
#### In other words **np.linspace** gives an array between something to something. The difference is set according to the total numbers of array we want. Such as, We want 100 numbers between 1000 to 2500. The difference will be autometicly set.
#### The argument **endpoint=False** makes the stop value **exclusive**.  

In [6]:
arr = np.linspace(5, 11, num=4)
print(repr(arr))

arr = np.linspace(5, 11, num=4, endpoint=False)
print(repr(arr))

arr = np.linspace(5, 11, num=4, dtype=np.int32)
print(repr(arr))

array([ 5.,  7.,  9., 11.])
array([5. , 6.5, 8. , 9.5])
array([ 5,  7,  9, 11], dtype=int32)


#### **np.arange** worked the same as **range()** function in python except it returns **evenly spaced** values within a given interval. The function acts very similar to the range function in Python. 
#### Like np.array, np.arange performs upcasting. It also has the dtype keyword argument to manually cast the array.

# Reshaping data

#### You probably worked with **shape** attribute. The shape of an array is the number of elements in each dimension.
#### **np.reshape** gives you the power to reshape the array as you like. (2, 4) or others. 
#### The product of the reshape format should have equal to the number of elements exits in the array itself.
#### Here reshape format (2, 4) or (2, 2, 2). Indivisually their product must be equal to **arr.shape** which is 8.
#### 2x4=8 and 2x2x2=8 are valid **But** 2x2x2x2=16 is not valid

In [7]:
arr = np.arange(8)
print(arr.shape)

reshaped_arr = np.reshape(arr, (2, 4))
reshaped_arr_one = np.reshape(arr, (2, 2, 2))
# reshaped_arr_one = np.reshape(arr, (2, 2, 2, 2))

print(repr(reshaped_arr_one))
print(repr(reshaped_arr))
print('New shape: {}'.format(reshaped_arr.shape))

reshaped_arr = np.reshape(arr, (-1, 2, 2))
print(repr(reshaped_arr))
print('New shape: {}'.format(reshaped_arr.shape))

(8,)
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])
New shape: (2, 4)
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])
New shape: (2, 2, 2)


#### Since we need to flatten data quite often, it is a useful function. NumPy provides an inherent function for flattening an array. Flattening an array reshapes it into a 1D array.
#### In other words **np.flatten()** reshaped your array at the **begining point**. Like when you created it or imported it.

In [8]:
arr = np.arange(8)
arr = np.reshape(arr, (2, 4))
flattened = arr.flatten()
print(repr(arr))
print('arr shape: {}'.format(arr.shape))
print(repr(flattened))
print('flattened shape: {}'.format(flattened.shape))

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])
arr shape: (2, 4)
array([0, 1, 2, 3, 4, 5, 6, 7])
flattened shape: (8,)


# Transposing

#### Similar to how it is common to reshape data, it is also common to **transpose** data. Perhaps we have data that's supposed to be in a particular format, but some new data we get is rearranged. We can just transpose the data, using the **np.transpose** function, to convert it to the proper format.
#### **np.transpose** takes an argument called **axes**. Like np.reshape(**the array** , axes(1, 0, 2))
#### **NOTE:** If the shape of a 3D array is (2, 3, 2) than the transpose argument axes(1, 0, 2) represents, 1 equal to 2 arrays, 0 equal to 3 rows, 1 equal to 2 coloumns.
#### In other words axes argument represents the position of the last shape.


In [9]:
arr = np.arange(8)
print(repr(arr))
arr = np.reshape(arr, (4, 2))
transposed = np.transpose(arr)
print(repr(arr))
print('arr shape: {}'.format(arr.shape))
print(repr(transposed))
print('transposed shape: {}'.format(transposed.shape))

array([0, 1, 2, 3, 4, 5, 6, 7])
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])
arr shape: (4, 2)
array([[0, 2, 4, 6],
       [1, 3, 5, 7]])
transposed shape: (2, 4)


# Math

## Arithmetic

#### Using NumPy arithmetic, we can easily modify large amounts of numeric data with only a few operations. For example, we could convert a dataset of Fahrenheit temperatures to their equivalent Celsius form.

In [10]:
def f2c(temps):
  return (5/9)*(temps-32)

fahrenheits = np.array([32, -4, 14, -40])
celsius = f2c(fahrenheits)
print('Celsius: {}'.format(repr(celsius)))

Celsius: array([  0., -20., -10., -40.])


#### It is important to note that performing arithmetic on NumPy arrays does not change the original array, and instead produces a new array that is the result of the arithmetic operation.

## Non-linear functions

> In order to be a Linier function the changes of x and y, has to be constant.In other words the graph gives a stright line
> On the other hand the Non-Linear function gives a carved line instead of a stright line in graph. In Non-Linear function the change of x and y is not constant.

> Difference among **log**, **ln**(natural log), **lb**

| Kinds | Base |
|----------|----------|
|log|10|
|ln|e|
|lb|2|

#### The function **np.exp** performs a base **e** exponential on an array, while the function **np.exp2** performs a base 2 exponential. Likewise, **np.log**, **np.log2**, and **np.log10** all perform logarithms on an input array, using base **e**, base **2**, and base **10**, respectively.

In [17]:
arr = np.array([[1, 2], [3, 4]])
# Raised to power of e
print(repr(np.exp(arr)))
# Raised to power of 2
print(repr(np.exp2(arr)))

arr2 = np.array([[1, 10], [np.e, np.pi]])
# Natural logarithm
print(repr(np.log(arr2)))
# Base 10 logarithm
print(repr(np.log10(arr2)))

array([[ 2.71828183,  7.3890561 ],
       [20.08553692, 54.59815003]])
array([[ 2.,  4.],
       [ 8., 16.]])
array([[0.        , 2.30258509],
       [1.        , 1.14472989]])
array([[0.        , 1.        ],
       [0.43429448, 0.49714987]])


#### NumPy has various other mathematical functions, which are listed [here](https://numpy.org/doc/stable/reference/routines.math.html)

## Matrix multiplication

####  NumPy arrays are basically vectors and matrices. The main function to use is np.matmul, which takes two vector/matrix arrays as input and produces a dot product or matrix multiplication.
#### The code below shows various examples of matrix multiplication. When both inputs are 1-D, the output is the dot product.
#### The basic Matrix multiplication rules are also applied here too.So, the second dimension of the first matrix must equal the first dimension of the second matrix, otherwise np.matmul will result in a **ValueError**.
#### **As reminder if two matrix are A=[2row, 3columns], B=[2row, 3columns]. These matrixs(A, B) can not be multiplied. These have to be A=[2x3], B=[3x2].So, the first matrix columns number and second matrix row number has to be equal.

In [16]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([-3, 0, 10])
print(np.matmul(arr1, arr2))

arr3 = np.array([[1, 2], [3, 4], [5, 6]])
arr4 = np.array([[-1, 0, 1], [3, 2, -4]])
print(repr(np.matmul(arr3, arr4)))
print(repr(np.matmul(arr4, arr3)))
# This will result in a ValueError: If we uncomment line 10 and run again.
#print(repr(np.matmul(arr3, arr3)))

27
array([[  5,   4,  -7],
       [  9,   8, -13],
       [ 13,  12, -19]])
array([[  4,   4],
       [-11, -10]])
