# Lecture 7: Machine Learning Basics

## Agenda: 
1. What is machine learning?
2. How does machine learning work?
3. Scikit-learn
4. Numpy

##  1. What is Machine Learning ?
Machine learning learns models from a set of **n observations (also known as samples, examples, instances, records)** of data and then tries to predict **properties** of new data. 

                                  
|![Figure 1: Machine Learning](ML_training.png)|
|-----------------------------|
|Figure 1. Machine Learning|

## Two main categories of ML
1. Supervised learning, in which the data comes with additional ***labels/attributes that we want to predict***. This problem can be either: 
    1. Classification: the desired output consists of a finite number of **discrete categories** 
        1. Examples: handwritten digit recognition, Iris classification and spam or ham email classification
    2. Regression: the desired output consists of one or more **continuous variables**
        1. Predict the final score (0-100) of students using their grades of homework
![Figure 3: Machine Learning](handwritten.png)
2. Unsupervised learning, in which the training data consists of a set of input vectors x **without any corresponding target labels**. 
    1. Clustering: discover groups of similar examples within the data, e.g., group shoppers with similar behavior
![Figure 3: Machine Learning](clusters.png)
    2. Density estimation: determine the distribution of data 
    3. Dimensionality Reduction: project the data from a high-dimensional space down to low dimensions
3. Reinforcement Learning

## 2. How does machine learning work?
Take supervised learning for example:
1. First Training a machine learning using labeled data
    1. labeled data with labels (output)
    2. machine learning models learns the relationship of the input data and output(labels)
2. Make prediction in new data that was not used in training the model
    1. The primary goal of machine learning is to build model that generalizes to new data
    
![Figure 2: Machine Learning](ML_tt.png)

## 3. Scikit-learn

1. Learn machine learning basics "An introduction to machine learning with scikit-learn" from the tutorials at https://scikit-learn.org/stable/tutorial/index.html

In [1]:
#import scikit-learn package
import sklearn as sk # run __init__ first
print('sklearn version:', sk.__version__)

# check scikit-learn folder: C:\Program Files\Anaconda3\Lib\site-packages\sklearn 

sklearn version: 0.24.2


In [1]:
# explore iris dataset
import sklearn.datasets as ds
iris = ds.load_iris()


In [2]:
inpt = iris.data
labels = iris.target


### 4. Practice NumPy array after class

1. Learn the numpy array.
    1. https://numpy.org/devdocs/user/quickstart.html
    
2. Functions and Methods: concatenate, diagonal, dsplit, dstack, hsplit, hstack, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack

3. Ordering: argmax, argmin, argsort, max, min, searchsorted, sort

4. math and statistics: cov, mean, std, var,all, any, inner, invert, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose



### 5. Example: Iris Classification
1. The 'Hello World!' task in machine learning: Iris classification
The Iris dataset

```python
    import sklearn.datasets as ds
    iris = ds.load_iris()
```
    1. 150 observations; 50 observations of 3 different species
    2. 4 fearures:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
    3. Class labels (Species): Iris-Setosa, Iris-Versicolour, and Iris-Virginica
![Figure 1: Machine Learning](Iris1.png)    