# Linear Classifiers in Python

#### Logistic Regression
* Logistic Regression is a linear classifier
* sklearn's Logistic Regression can also output confidence scores rather than "hard" or definite predictions with `.predict_proba()`

```
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(X_train, y_train)
lr.predict(X_test)
lr.score(X_test, y_test)
```

#### Using Linear SVC
* In sklearn the basic SVM classifier is called `LinearSVC()` or Linear Support Vector Classifier
* Note that sklearn's Logistic Regression and SVM implementations handle multiple classes (if a dataset has more than 2 classes) automatically.

#### Using SVC
* The SVC class fits a nonlinear SVM by default

* **Underfitting:** model is too simple, training accuracy low
* **Overfitting:** model is too complex, testing accuracy low

#### Linear Decision Boundaries
* A decision boundary tells us what class our classifier will predict for any value of x
* A decision boundary is considered **linear** when it is a line (in any orientation)
    * This definition extends to (classifying) more than 2 features
    * For five features, the space of possible x-values would be five-dimensional. In this case, the boundary would be a higher-dimensional **hyperplane** cutting the space into two halves.
* A **nonlinear** boundary is any other type of boundary.
    * Sometimes this leads to non-contiguous regions regions of a certain prediction ("islands", etc).
* In their basic forms, logistic regression and SVMs are linear classifiers, which means they learn linear decision boundaries.
    * However in some more complex forms, both may learn nonlinear decision boundaries

#### Vocabulary:
* **Classification:** learning to predict categories
* **Regression:** learning to predict a continuous value
* **Decision boundary:** the surface separating different predicted classes
* **Linear classifier:** a classifier that learns linear decision boundaries 
    * e.g. logistic regression, linear SVM
* **Linearly separable:** A data set is called linearly separable if it can be perfectly explained by a linear classifier **(straight line)**

```
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.neighbors import KNeighborsClassifier

# Define the classifiers
classifiers = [LogisticRegression(), LinearSVC(), SVC(), KNeighborsClassifier()]

# Fit the classifiers
for c in classifiers:
    c.fit(X, y)

# Plot the classifiers
plot_4_classifiers(X, y, classifiers)
plt.show()
```

#### Linear Classifiers: Prediction Equations

#### Dot products
* Create two arrays, x and y:

```
x = np.arange(3)
y = np.arange(3, 6)
```
* `x = array([0, 1, 2])`
* `y = array([3, 4, 5])`

* To take the **dot product** between these two arrays, we need to multiply them element-wise.
* The result is:
    * 1. `x` * `y` == `array([0, 4, 10])`
    * 2. The sum of the numbers in this array (0 + 4 + 10) or `np.sum(x*y)` = `14`
* A convenient notation for this is `@`
    * `x@y` = 14
    * In math notation, this is written x dot y
* You can think of a **dot product** as multiplication in higher dimensions, since x and y are arrays of values
* Using dot products, we can express how linear classifiers make predictions 

#### Linear classifier predictions:
* `raw model output = coefficients * features + intercept`
    * Dot product of coefficients and features, plus an intercept.
* Linear classifier prediction: compute raw model output, check the **sign**:
    * If **positive**, predict one class
    * If **negative**, predict the other class
    
* Crucially, this pattern is the same for logistic regression and linear SVMs
* In sklearn terms, we can say logistic regression and linear SVM have different `fit` functions but same `predict` function.
    * The differences in `fit` relate to loss functions
    
* We can get the learned coefficients and intercept with:
    * `lr.coef_`
    * `lr.intercept_`
* To compute raw model output for example 10:
    * `lr.coef_ @ X[10] + lr.intercept_`
        * If the raw model output is negative, then we predict the negative class ("0", for example)
* In general, this is what the predict function does for *any* X: it computes the raw model output, checks if it's positive or negative, and then returns result based on the names of the classes in your data set (for example, "0" and "1").
* The sign (positive or negative), tells you what side of the decision boundary you're on, and thus, your prediction
* Along the decision boundary itself, the raw model output is zero
* Furthermore, the values of the coefficients and intercept determine the boundary 