<h1 style="font-size:42px; text-align:center; margin-bottom:30px;"><span style="color:SteelBlue">Lesson 3:</span> Classification Algorithms</h1>
<hr>

Welcome to <span style="color:royalblue">Lesson 3: Classification Algorithms</span>!

In this lesson, we'll dive into a few more key concepts for machine learning. In particular, we want to introduce you to 4 algorithms:
1. $L_1$-regularized logistic regression
2. $L_2$-regularized logistic regression
3. Random forests
4. Boosted trees

Just as in the previous project, we'll provide a gentle introduction to the **intuition and practical benefits** of each algorithm.

<br><hr id="toc">

### In this lesson...

In this lesson we'll walk through more key machine learning concepts, plus 4 effective algorithms for classification tasks.

1. [Binary classification](#binary)
2. [Toy example: noisy conditional](#conditional)
3. [Logistic Regression](#logistic)
3. [Regularized logistic algorithms](#regularized-logistic) - $L_1$-regularized and $L_2$-regularized
4. [Tree ensemble algorithms](#tree-ensembles) - Random Forests and Boosted Trees

**Tip:** Each section builds on the previous ones.

<br><hr>

### First, let's import libraries that we'll need

In [None]:
# NumPy and Pandas
import numpy as np
import pandas as pd 

# Matplotlib, and remember to display plots in the notebook
from matplotlib import pyplot as plt 
%matplotlib inline

# Seaborn for easier visualization
import seaborn as sns 

<span id="binary"></span>
# 1. Binary classification

Classification with 2 classes is so common that it gets its own name: **binary classification.** 


Just to be clear, let's take another look at the **target variable** for this problem.  First, let's look at it in the raw dataset (before we created the analytical base table).

In [None]:
# Print unique classes for 'status' and the first 5 observations for 'status' in the raw dataset
raw_df = pd.read_csv('project_files/clean_employee_data.csv')

print(raw_df.status.unique())
raw_df.status.head()

However, when we constructed our analytical base table, we converted the target variable from <code style="color:crimson">'Left' / 'Employed'</code> into <code style="color:crimson">1 / 0</code>.

In [None]:
# Print unique classes for 'status' and the first 5 observations for 'status' in the analytical base table
abt_df = pd.read_csv('project_files/employee_analytical_base_table.csv')

print(abt_df.status.unique())
abt_df.status.head()

Which is the **positive** class? How about the **negative** class?

<p style="text-align:center; margin: 40px 0 40px 0; font-weight:bold;">
[Back to Contents](#toc)
</p>

<span id="conditional"></span>
# 2 - Toy example: noisy conditional

We're going to use another toy example, just as we did in Project 1. 

This time, we're going to build models for a **noisy conditional**.


Let's create that dataset:

In [None]:
# Input feature
x = np.linspace(0, 1, 100)
# Noise
np.random.seed(555)
noise = np.random.uniform(-0.2, 0.2, 100)

# Target variable
y = ((x + noise) > 0.5).astype(int)


We need to **reshape** <code style="color:steelblue">x</code> before moving on.
* That's because Scikit-Learn algorithms expect input features with 2 axes. However, right now, <code style="color:steelblue">x</code> only has one.

To make sure it has 2 axes, reshape it to be (100, 1) and name the the reshaped object capital <code style="color:steelblue">X</code>.

In [None]:
# Reshape x into X
X = x.reshape(100, 1)

Next, plot a **scatterplot** of the synthetic dataset.

In [None]:
# Plot scatterplot of synthetic dataset
plt.scatter(X, y)

<p style="text-align:center; margin: 40px 0 40px 0; font-weight:bold;">
[Back to Contents](#toc)
</p>

<span id="logistic"></span>
# 3. Logistic regression

First, we'll discuss **logistic regression**, which is the classification analog of linear regression.

Let's actually fit a linear regression model first.

In [None]:
# Import LinearRegression and LogisticRegression
from sklearn.linear_model import LinearRegression, LogisticRegression

Fit a linear model, make predictions, and plot them.

In [None]:
# Linear model
model = LinearRegression()
model.fit(X, y)

# Plot dataset and predictions
plt.scatter(X, y)
plt.plot(X, model.predict(X), 'k--')
plt.show()

Next, let's see how **logistic regression** differs.

Let's fit a logistic regression model.

In [None]:
# Logistic regression
model = LogisticRegression()
model.fit(X, y)

Next, let's call the <code style="color:steelblue">.predict()</code> function.

In [None]:
# predict()
model.predict(X)

Call <code style="color:steelblue">.predict_proba()</code> on the first 10 observations and display the results.

In [None]:
# predict_proba()
pred = model.predict_proba(X[:10])

pred

Get the predictions for the first observation.

In [None]:
# Class probabilities for first observation
pred[0]

Get the probability of **just the positive class** for the first observation.

In [None]:
# Positive class probability for first observation
pred[0][1]

Use a simple list comprehension to extract a **list of only the predictions for the positive class**.

In [None]:
# Just get the second value for each prediction
pred = [pred[1] for p in pred]

pred

Ok, let's fit and plot the logistic regression model.

In [None]:
# Logistic regression
model = LogisticRegression()
model.fit(X, y)

# Predict probabilities
pred = model.predict_proba(X)

# Just get the second value (positive class) for each prediction
pred = [p[1] for p in pred]

# Plot dataset and predictions
plt.scatter(X, y)
plt.plot(X, pred, 'k--')
plt.show()

<p style="text-align:center; margin: 40px 0 40px 0; font-weight:bold;">
[Back to Contents](#toc)
</p>

<span id="regularized-logistic"></span>
# 4. Regularized logistic regression

Logistic regression has regularized versions that are analogous to those for linear regression.

Just to save ourselves from repeating the same code, let's write a quick helper function that:
1. Fits any classification model
2. Makes predictions
3. Extracts the positive probabilities
4. Plots them

In [None]:
def fit_and_plot_classifier(clf):
    # Fit model
    clf.fit(X, y)
    
    # Predict and take second value of each prediction
    pred = clf.predict_proba(X)
    pred = [p[1] for p in pred]
    
    # Plot
    plt.scatter(X, y)
    plt.plot(X, pred, 'k--')
    plt.show()
    
    # Return fitted model and predictions
    return clf, pred

Fit and plot the same logistic regression from earlier, this time using <code style="color:steelblue">fit_and_plot_classifier()</code>.

In [None]:
# Logistic regression
clf, pred = fit_and_plot_classifier(LogisticRegression())

Make the penalty **4 times stronger**.

In [None]:
# More regularization
# C : float, default: 1.0
# Inverse of regularization strength; must be a positive float. 
# Like in support vector machines, smaller values specify stronger regularization.
clf, pred = fit_and_plot_classifier(LogisticRegression(C=0.25))

Next, make the penalty **4 times weaker**.

In [None]:
# Less regularization
clf, pred = fit_and_plot_classifier(LogisticRegression(C=4))

To basically remove regularization, bump <code style="color:steelblue">C</code> way up.

In [None]:
# Basically no regularization
clf, pred = fit_and_plot_classifier(LogisticRegression(C=10000))

Set the **penalty type** to use $L_1$ regularization.

In [None]:
# L1 regularization
clf, pred = fit_and_plot_classifier(LogisticRegression())

Initialize $L_1$-regularized and $L_2$-regularized logistic regression **separately** and **explicitly**.

**penalty** : str, ‘l1’ or ‘l2’, default: ‘l2’
Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties.

**random_state** : int, RandomState instance or None, optional, default: None
The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when solver == ‘sag’ or ‘liblinear’.



In [None]:
# L1-regularized logistic regression
l1 = LogisticRegression(penalty='l1', random_state=123)

# L2-regularized logistic regression
l2 = LogisticRegression(penalty='12', random_state=123)

Finally, use $L_1$-regularization with a 4 times weaker penalty.

In [None]:
# L1 regularization with weaker penalty
clf, pred = fit_and_plot_classifier(LogisticRegression(penalty='l1', C=4))

<p style="text-align:center; margin: 40px 0 40px 0; font-weight:bold;">
[Back to Contents](#toc)
</p>

<span id="tree-ensembles"></span>
# 5. Tree ensemble algorithms

The same tree ensembles we used for regression can be applied to classification. 

First, import the random forest classifier.

In [None]:
# Import RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier

Apply it to this toy problem.

**n_estimators** : integer, optional (default=10)
The number of trees in the forest.

In [None]:
# Random forest classifier
clf, pred = fit_and_plot_classifier(RandomForestClassifier(n_estimators=100))

Next, import the boosted tree classifier.

In [None]:
# Import GradientBoostingClassifier
from sklearn.ensemble import GradientBoostingClassifier

And finally, apply it to this toy problem.

In [None]:
# Random forest classifier
clf, pred = fit_and_plot_classifier(GradientBoostingClassifier(n_estimators=100))

<p style="text-align:center; margin: 40px 0 40px 0; font-weight:bold;">
[Back to Contents](#toc)
</p>

### Next Steps

Alright, that was a nice tour through some key theory and concepts, but let's get ready to dive back into the project!

As a reminder, here are a few things you did in this module:
* You learned some key terminology for binary classification, such as "positive" vs. "negative" classes.
* You saw how logistic regression can also be regularized.
* You played around with different settings for penalty strength.
* And you recruited 4 algorithms: $L_1$-Regularized Logistic, $L_2$-Regularized Logistic, Random Forests, and Boosted Trees.


<p style="text-align:center; margin: 40px 0 40px 0; font-weight:bold;">
[Back to Contents](#toc)
</p>