# Notes from Section 3 Machine Learning Concepts

Three types of machine learning:

* Supervised
* Unsupervised
* Reinforcement Learning

## Supervised Learning

Train a model with labeled data, predict the output for a new input.

Different models can yield different predictions - how to you evaluate different models?

Training data often arranged in tables.

* Rows: Example, sample, instance, or observation
* Columns: Feature or attribute
* Labeled output column: label or target

Labeled data typically split into two parts: training set and test set.

* Build model with the training set
* Test the model using the test set

Test results can be used in evaluating data quality.

You need to be careful in how you split the labeled data into training and test sets. Often techniques like shuffling are applied.

### Regression Models

Use to predict a numeric output, e.g. home prices.

### Binary Classification

Predict a binary output, e.g. yes, no. Email spam, does the social media post need a response.

### Multi-Class Classification

Used for predicting one out of several outcomes, e.g. How is the weather tomorrow (sunny, rainly, windy, snowy, etc)

## Unsupervised Learning

There is no defined target, only data. Used to group similar observerations.

Example uses: anomaly detection, words using in similar context, group similar documents, customer segmentation.

Unsupervised Learning Algorithms

* Clustering
* Dimensionality Reduction (reduce number of features for a model)
    * Principle Component Algorithm
* Groups words that are used in similar context or have similar meansings

## Reinforcement Learning

Used in circumstances where decisions must be made under uncertainty. Examples: autonomous driving, games.

Reinforcement learning usesreward functions to reward correct decisions and punish incorrect decisions.

## Data In Real Life

Often dealing with a mix of data

* Numeric
* Text
* Categorical

### Handling Categorical Data

Used to describe qualitative properties of observations

Day of Week - Sunday, Monday, etc. - text based categoricals like day of week can be *converted to numeric*, e.g. Sunday is 1, Monday is 2, etc.

Several machine learning libraries do not handle even numeric categorical observations, as well as algorithms like linear regression. In this case the data is encoded using *one hot encoding* - one column per categorical value, use 1 for category value that applies, 0 for those that do not.

What if you want to model or capture the interplay between two categorical values, for example day of week (Sunday, Monday, ...) and weather (Sunny, Cloudy, Snow, ...)? Interaction between features can be handled by combining features (aka Catersian Transformation) - for example Sunday_Sunny, Sunday_Cloudy, ,,,

### Text Data

Requires special handling. Example transformations:

* NGRAM transformation
* Orthogonal Sparse Bigram (OSB) transformation
* Lowercase transformation - improve the signal by normalizing case
* Remove punctuation transformation
* Cartesian transformation

Text to numeric transformation - bag of words: word is feature, value is count

Splitting on whitespace can lose meaning: "this is not working. disappointed" vs "this is working. not disappointed". Here parsing on whitespace yields the same encoding and loses the underlying meaning.

NGRAM transformation - provide a window for capturing word combinations. Now we can discriminate the above two. 

[OSB](https://docs.aws.amazon.com/machine-learning/latest/dg/data-transformations-reference.html#orthogonal-sparse-bigram-osb-transformation) is an on this idea.

Stemming - words are reduced to root word, root word used for training. For example working, worked, works all reduce to work.

### Numeric Data

* Numeric value as-is or using normalization transformation for linear relationships.
* Binning transformation - convert to categorical for non-linear realtionship

Example - features can be of different scales, such as GDP in trillions and population in millions. To avoid large values dominating, you can normalize the features, for example scale GDP to units of trillions, population to units of millions.

Binning - convert ranges of values to categories, useful when you suspect non-linearities. Allows the model to assign different weights to bins.

## Working With Data Exercise

* Start your SageMaker instance
* Open Jupyter, go to introduction (assumes course repo has been cloned to the notebook)
* Open notebook_5minute_intro
    * shift-tab for help on a method

Pandas

* Methods for reading in data as data frames
* Flexible ways to index into the frame
* Can replace missing values with the mean
    * `df = df.fillna(df.mean())`
* Can shuffle the index
    * `np.random.seed(5)
       np.random.shuffle(index_list)`
* Shuffle the data frame
    * `df = df.iloc[index_list]`
* Split into training and test sets
    * `size = df.shape[0]
train = round(size *.7)
test = size - train`

Refer to the [notebook](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/Introduction/notebook_5minute_intro.ipynb) for more examples, how to explore the data, etc.

## Handling Missing Values

Can:

* Remove instances with missing values - dropna
* Can impute values, for example calculate the mean value for which you have observations to replace missing values
* Use interpolation, e.g. add a line between observations that span missing data points, pick values along the line.
* Forward fill (last known value) or backfill (next valid value) for filling missing values

See [this notebook](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/Introduction/ml_handling_missing_values.ipynb)

Use plots to visualize how well your strategy for replacing values works.

If examples are independent (as opposed to time ordered data), you may need to drop the rows with missing values.

If you can distinguish rows into classes (group by), can apply the above techniques to replace missing values, for example the iris data set by class, replace missing value with the mean for the group.


## Linerarity and Non-Linerarity

See [this](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/Introduction/ml_linear_non_linear.ipynb) notebook.

The above notebook has several plots to illustrates some non-linear functions.