# Introduction to Machine Learning with Python

# Chapter 1: Introduction to Machine Learning

- NumPy ranking? 

# Chapter 2: Extending Python using NumPy

![1_Ikn1J6siiiCSk4ivYUhdgw.png](attachment:1_Ikn1J6siiiCSk4ivYUhdgw.png)

## Introduction to NumPy 

- In Python we use the `list` data type to store a collection of items. Unlike most other languages, the elements of python lists do not need to be of the same data type. 
```python
list2 = [1,"Hello",3.14,True,5]```

    - This provides flexibility when handling multiple data types in a list, but it is very inefficient when we have large amounts of data (as is typical in ML and data science projects). 

- Why use NumPy? NumPy arrays allow you to perform array math very easily/efficiently. 

**Why are Python lists inefficient?**

- Due to the way that a Python list is implemented, acessing items in a large list is computationally expensive. 

- to allow a list to have non-uniform type items, each item in the list must be stored in a memory location. The list then contains an "array of pointers" to each of these locations. 

- To solve this limitation of Python's list feature, we turn to `NumPy` which is an extension of the python programing language that adds support for large, multidimensional arrays and matrices. NumPy also has a large library of high-level mathematical functions to operate on these arrays. 

- In `NumPy` an array is of type `ndarray` (n-dimensional array) and all elements must be of the **same type** (i.e., homogenous).  
- `ndarray` objects are more efficient than python lists and ,as a bonus, they allow us to perform functions that operate on the entire array at once. 

In [1]:
import numpy as np

## Basic ways to create a NumPy array

- `np.zeros()`

- `np.arange()`

- `np.full()`

- `np.random.random()`

- `np.eye()`

In [2]:
# creating a one-dimensional array of a specific size filled with 0s
a5 = np.zeros(15)
print(a5.shape) # (15,)
#print(a5)

# creating a two-dimensional array of a specific size filled with 0s
a1 = np.zeros((1,101,5)) # notice the extra ( ) (non-zero start,stop,dimensions)
print(a1.shape) # (1, 101, 5)
#print(a1)

# creating a one-dimensional evenly spaced array with a given interval
a2 = np.arange(10)
print(a2.shape) # (10,)
#print(a2)

# creating a two-dimensional or one-dimensional arrary filled with a specific number
a3 = np.full((2,3),7)
print(a3.shape) # (2,3)
#print(a3)

# creating a two-dimensional or one-dimensional arrary filled with random numbers
a4 = np.random.random((2,4))
print(a4.shape) #(2,4)
#print(a4)


# creating an array that mimics the identity matrix

a6 = np.eye(4,4) #a6 shape is (4,4)
#print(a6)

(15,)
(1, 101, 5)
(10,)
(2, 3)
(2, 4)


## Creating a NumPy array using Python lists

- Literally transform your list from type list to numpy.ndarray type

In [3]:
list1 = [1,2,3,4,5]
print(type(list1))

r1 = np.array(list1)
print(type(r1))
print(r1)

<class 'list'>
<class 'numpy.ndarray'>
[1 2 3 4 5]


In [4]:
print(r1)

[1 2 3 4 5]


## Indexing  NumPy arrays

- Array Indexing
    - accesing elements in the arrat is similar to accessing elements in a Python list
- Boolean Indexing
    - Python goes through each element specified and checks whether the condition is True/False. The result is a Boolean value and a list of Boolean values is created at the end of the process.

In [5]:
print(r1[0])
print(r1[4])

1
5


In [6]:
# Array Indexing

list2 = [6,7,8,9,0]
r2 = np.array([list1,list2]) # rank 2 array
print(r2)
print(type(r2))

print(r2.shape) # (2,5) - 2 rows and 5 columns
print(r2[0,0]) # 1
print(r2[0,1]) # 2
print(r2[1,0]) # 6

[[1 2 3 4 5]
 [6 7 8 9 0]]
<class 'numpy.ndarray'>
(2, 5)
1
2
6


In [7]:
# Boolean Indexing
# recall r1 = [1 2 3 4 5]

print(r1>2)

print(r1[r1>2]) # [3 4 5] # Indices where this condition is True

[False False  True  True  True]
[3 4 5]


Why is Boolean Indexing useful? Consider an example where you want to retrieve all of the odd numbers from a list. You can use Boolean indexing: 

In [8]:
# More Boolean Indexing

nums = np.arange(20)
print(nums) # [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]

print(nums % 2 == 1)

odd_num = nums[nums % 2 == 1]
print(odd_num) # [ 1 3 5 7 9 11 13 15 17 19]



[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[False  True False  True False  True False  True False  True False  True
 False  True False  True False  True False  True]
[ 1  3  5  7  9 11 13 15 17 19]


##  Slicing Arrays

In [None]:
- Slicing has the following syntax: `[start:stop]`. 
- For two-dimensional arrays, the slicing syntax becomes `[start:stop, start:stop]`.

- The start:stop before the comma `(,)` refers to the rows, and the start:stop after the comma `(,)` refers to the columns.
    - `[ROWS,COLS]`
- The general confusion regarding slicing is the end index. 
    - You need to remember that the end index is not included in the answer. A better way to visualize slicing is to write the index of each row and column
    
**NumPy Slice is a Reference**
- It is noteworthy that the result of a NumPy slice is a reference and not a copy of the original array. Therefore if you change one of the elements in the NumPy slice you are actually modifying the orginal array. 

In [9]:
a = np.array([[1,2,3,4,5],  # a is the original array
              [4,5,6,7,8],
              [9,8,7,6,5]]) # rank 2 array
print(a)

[[1 2 3 4 5]
 [4 5 6 7 8]
 [9 8 7 6 5]]


In [12]:
b3 = a[1:, 2:] # row 1 onwards and column 2 onwards

print(b3) # b3 is now pointing to a subset of a, i.e., b3 is a reference to the original array. 

[[6 7 8]
 [7 6 5]]


In [15]:
# modifying the original array a by changing an element in b3
b3[0,2] = 88
print(b3)

[[ 6  7 88]
 [ 7  6  5]]


To extract the last two rows and first two columns of `a` we can use slicing. 

In [11]:
b1 = a[1:3, :3] # row 1 to 3 (not inclusive) and cols 0 to 3 (not inclusive) [row,col]
print(b1)

[[4 5 6]
 [9 8 7]]


## Reshaping Arrays

- You can reshape an array to another dimension using the `reshape()` function.
    - returns a reference to the original array
- `flatten()`
    - returns a copy of the array
- `ravel()`
    - returns a reference to the original array

In [20]:
b3 = b3.reshape(1,-1)
# The -1 indicates that you let the function decide how many rows to create as 
#long as the end result is a rank 1 array.
print(b3)

[[ 6  7 88  7  6  5]]


- The first `1` indicates that you want to convert it into rank 2 array with 1 row, and the `-1` indicates that you will leave it to the `reshape()` function to create the correct number of columns.

- Of course, in this example, it is clear that after reshaping there will be five columns, so you can call the `reshape()` function as `reshape(1,5)`. In more complex cases, however, it is always convenient to be able to use -1 to let the function decide on the number of rows or columns to create.

- To convert a rank 2 array to a rank 1 array, you can also use the `flatten()` or `ravel()` functions. The `flatten()` function always returns a copy of the array, while the `ravel()` and `reshape()` functions return a view (reference) of the original array.


## Array Math

In [23]:
x1 = np.array([[1,2,3],[4,5,6]])
y1 = np.array([[7,8,9],[2,3,4]])

print(x1)
print(y1)

[[1 2 3]
 [4 5 6]]
[[7 8 9]
 [2 3 4]]


### Matrix Addition

- To add these two arrays together, you use the + operator.
- You can also use the np.add() function to add two arrays. 
- Matrix addition is useful because we can use this to add two vectors. Recall what it means to add two vectors: 

![parallelogram.gif](attachment:parallelogram.gif)

In [25]:
# Matrix addition
new_mat = x1 + y1
print(new_mat)

[[ 8 10 12]
 [ 6  8 10]]


Apart from addition, you can also perform subtraction, multiplication, and division. 

In [27]:
print(x1 - y1) # same as np.subtract(x1,y1)

print(x1 * y1) # same as np.multiply(x1,y1)

print(x1 / y1) # same as np.divide(x1,y1)


[[-6 -6 -6]
 [ 2  2  2]]
[[ 7 16 27]
 [ 8 15 24]]
[[0.14285714 0.25       0.33333333]
 [2.         1.66666667 1.5       ]]


What's a practical use of the ability to multiply or divide two arrays? 

Below is an example using BMI. Consider we have 3 np arrays including the names, heights, and weights of three people. We want to calculate their BMIs and save the resulting np array to the variable bmi. 

To calculate the BMI Divide the weight by the height, then divide the answer by the height again. 

In [31]:
names = np.array(['Ann','Joe','Mark'])
heights = np.array([1.5, 1.78, 1.6])
weights = np.array([65, 46, 59])

bmi = (weights/heights)/heights
print(bmi)
type(bmi)

[28.88888889 14.51836889 23.046875  ]


numpy.ndarray

In [38]:
print("Overweight: ", names[bmi>25])
print("Underweight: ", names[bmi<18.5])
print("Healthy: ", names[(bmi>=18.5) & (bmi<=25)]) #notice the & operator not and 

Overweight:  ['Ann']
Underweight:  ['Joe']
Healthy:  ['Mark']


### Dot Product

![image9.png](attachment:image9.png)

- Note that when you multiply two arrays, you are actually multiplying each of the corresponding elements in the two arrays (i.e., the dot product). 
    - The dot product is an algebraic operation that takes two coordinate vectors of equal size and returns a single number. 
- The dot product of two vectors is calculated by multiplying corresponding entries in each vector and adding up all of those products
- In NumPy, dot product is accomplished using the `dot()` function. 

In [39]:
x = np.array([2,3])
y = np.array([4,2])
np.dot(x,y) # 8 + 6 = 14

14

- Dot product can also work on rank 2 arrays. Peforming a dot product of two rank 2 arrays is equivalent to matrix multiplication. 

![Screen%20Shot%202019-09-16%20at%202.55.41%20PM.png](attachment:Screen%20Shot%202019-09-16%20at%202.55.41%20PM.png)

In [42]:
x2 = np.array([[1,2,3],[4,5,6]])
y2 = np.array([[7,8], [9,10], [11,12]])

print(np.dot(x2,y2))

[[ 58  64]
 [139 154]]


## Matrix

- NumPy provides another class in addition to ndarrays: matrices!

- The matrix class is a subclass of the ndarray and it is basically identical to the ndarray with one notable exception —a matrix is strictly two-dimensional, while an ndarray can be multidimensional.

- Another important difference between an ndarray and a matrix occurs whenyou perform multiplications on them. When multiplying two ndarray objects,the result is the element-by-element multiplication that we have seen earlier. On the other hand, when multiplying two matrix objects, the result is the dot product (equivalent to the `np.dot()` function):

- You can also convert a NumPy array to a matrix using the `asmatrix()` function. 

In [46]:
x2 = np.matrix([[1,2],[4,5]])
y2 = np.matrix([[7,8],[2,3]])

x1 = np.asmatrix(x1)
y1 = np.asmatrix(y1)

In [52]:
x1 = np.array([[1,2],[4,5]])
y1 = np.array([[7,8],[2,3]])
print(x1 * y1) # element-by-element multiplication

print('-'*10)

x2 = np.matrix([[1,2],[4,5]])
y2 = np.matrix([[7,8],[2,3]])
print(x2 * y2) # dot product; same as np.dot()

[[ 7 16]
 [ 8 15]]
----------
[[11 14]
 [38 47]]


### Cumulative Sum 

![Screen%20Shot%202019-09-16%20at%204.52.22%20PM.png](attachment:Screen%20Shot%202019-09-16%20at%204.52.22%20PM.png)

In [70]:
a = np.array([(1,2,3),(4,5,6),(7,8,9)])
print(a)

print('-'*10)

print(a.cumsum()) # rank 1 array, prints cum sum of all the elements in the array

print('-'*10)

print(a.cumsum(axis = 0)) #axis 0 to get the cummulative sum of each column 

print('-'*10)

print(a.cumsum(axis = 1))# axis = 1 if you want cum sum of each row

[[1 2 3]
 [4 5 6]
 [7 8 9]]
----------
[ 1  3  6 10 15 21 28 36 45]
----------
[[ 1  2  3]
 [ 5  7  9]
 [12 15 18]]
[[ 1  3  6]
 [ 4  9 15]
 [ 7 15 24]]


### NumPy Sorting  

- `sort()`: takes in an array and returns a sorted array. 
- `argsort()`: takes in an array and returns the indices that will sort an array. 


In [93]:
ages = np.array([34,12, 37, 5, 13])
sorted_ages = np.sort(ages) #does not modify the original array

print(sorted_ages, '\n\t')
print(ages)

[ 5 12 13 34 37] 
	
[34 12 37  5 13]


- sort() function does not modify the original array; instead it returns a sorted array. If you want to sort the original array, call the sort() function on the array itself:

In [91]:
ages.sort() # you cannot do variable assignment here as we are directly modifying the original array
print(ages,'\n\t')

ages = np.array([34,12, 37, 5, 13])
print(ages.argsort())

[ 5 12 13 34 37] 
	
[3 1 4 0 2]


- In the preceding example, the first element (3) in the result of the argsort() function means that the smallest element after the sort is in index 3 of the original array, which is the number 5.The next number is in index 1, which is the number 12, and so on.

- To print the sorted ages array, use the result of argsort() as the index to the ages array. 



In [76]:
print(ages[ages.argsort()])

[ 5 12 13 34 37]


In [104]:
persons = np.array(['Johnny','Mary','Peter','Will','Joe'])
ages = np.array([34,12,37,5,13])
heights = np.array([1.76,1.2,1.68,0.5,1.25])


sorted_indices = np.argsort(ages)

sorted_persons = persons[sorted_indices]
sorted_ages = ages[sorted_indices]
sorted_heights = heights[sorted_indices]

print(sorted_persons,'\n\n',sorted_ages,'\n\n',sorted_heights)

['Will' 'Mary' 'Joe' 'Johnny' 'Peter'] 

 [ 5 12 13 34 37] 

 [0.5  1.2  1.25 1.76 1.68]


In [103]:
#reverse the order of names

r_sorted_persons = persons[sorted_indices][::-1]
r_sorted_ages = ages[sorted_indices][::-1]
r_sorted_heights = heights[sorted_indices][::-1]

print(r_sorted_persons,'\n\n', r_sorted_ages,'\n\n', r_sorted_heights)

['Peter' 'Johnny' 'Joe' 'Mary' 'Will'] 

 [37 34 13 12  5] 

 [1.68 1.76 1.25 1.2  0.5 ]


### Array Assignment 

In [100]:
list1 = [[1,2,3,4], [5,6,7,8]]
a1 = np.array(list1)
print(a1, '\n')

a2 = a1
print(a2, '\n')

a2[0][0] = 11
print(a1, '\n')
print(a2, '\n')

[[1 2 3 4]
 [5 6 7 8]] 

[[1 2 3 4]
 [5 6 7 8]] 

[[11  2  3  4]
 [ 5  6  7  8]] 

[[11  2  3  4]
 [ 5  6  7  8]] 



- In the example above, when you try to assign a1 to another variable called a2, a copy of a1 is created. However, a2 is actually pointing to the original a1. So any changes made to a2 will affect a1. 

### Copying by View (Shallow Copy) 

- NumPy has a `view()` function that allows you to create a copy of an array by reference, while at the same time ensuring that changing the shape of the original array does not affect the shape of the copy. This is known as a shallow copy.

In [108]:
a2 = a1.view() # creates a copy of a1 by reference; but changes
                # in dimension in a1 will not affect a2
    
print(a1,'\n')
print(a2, '\n')


a1[0][0] = 111
print(a1, '\n\n', a2)

[[111   2   3   4]
 [  5   6   7   8]] 

[[111   2   3   4]
 [  5   6   7   8]] 

[[111   2   3   4]
 [  5   6   7   8]] 

 [[111   2   3   4]
 [  5   6   7   8]]


### Copying by Value (Deep Copy) 

# Chapter 3: Manipulating Tabular Data Using Pandas

- Although NumPy arrays are more useful ndarray objects when compared to python's list object, ndarray objects are still insufficient to meet the needs of data science. 
    - In the real world data are often presnted in data tables/spreadsheets
    - To be able to deal with data stored as tables, you need a new data type that is more suited to deal with it - Pandas!
- While Python supports lists and dictionaries for manipulating structured data, it is not well suited for manipulating numerical tables. 
- **Pandas** is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data easier. 
    - Fun fact: Pandas stands for Panel Data Analysis

**Pandas Series** 
- A pandas series is a one-dimensional NumPy-like array, with each element having an index(0,1,2,....)
    - by default the index of a Series starts from 0, but you can also specify an optional index for a Series using the index parameter. 
    - It is worth noting that the indec of a Series does not need to be unique 
    
**Acessing Elements in a Series**
- you can use the position of the element 
    - the `iloc` indexer allows you to specify an element via its position. 
- you can also specify the value of the index of the element you wish to access. 
    - the `loc` indexer allows you to specify the label of an index. 
    
**You can also perform slicing on a Series**

In [112]:
import pandas as pd

In [113]:
series = pd.Series([1,2,3,4,5])

print(series)

0    1
1    2
2    3
3    4
4    5
dtype: int64


In [116]:
series = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'c']) #last index is not unique
print(series)

a    1
b    2
c    3
d    4
c    5
dtype: int64


In [None]:
# Accessing Elements in a Series

print(series[2]) # 3
# same as
print(series.iloc[2]) # 3 - based on the position of the index

print(series['d']) # 4
# same as
print(series.loc['d']) # 4 - based on the label in the index

In [117]:
print(series[2:]) # returns a Series
print(series.iloc[2:]) # returns a Series

c    3
d    4
c    5
dtype: int64
c    3
d    4
c    5
dtype: int64


In [120]:
dates1 = pd.date_range('20190525', periods=12)
print(dates1)

series = pd.Series([1,2,3,4,5,6,7,8,9,10,11,12])
series.index = dates1 #changing indices so that they are the dates
print(series)

DatetimeIndex(['2019-05-25', '2019-05-26', '2019-05-27', '2019-05-28',
               '2019-05-29', '2019-05-30', '2019-05-31', '2019-06-01',
               '2019-06-02', '2019-06-03', '2019-06-04', '2019-06-05'],
              dtype='datetime64[ns]', freq='D')
2019-05-25     1
2019-05-26     2
2019-05-27     3
2019-05-28     4
2019-05-29     5
2019-05-30     6
2019-05-31     7
2019-06-01     8
2019-06-02     9
2019-06-03    10
2019-06-04    11
2019-06-05    12
Freq: D, dtype: int64


numpy.random.randn(d0, d1, …, dn) : creates an array of specified shape and fills it with random values as per standard normal distribution.

# Chapter 4: Data Visualization Using matplotlib

# Chapter 5: Getting Started with Scikit-learn for Machine Learning

# Chapter 6: Supervised Learning - Linear Regression

# Chapter 7: Supervised Learning - Classification using L