# Classification and Logistic Regression

The classification problem is much like the [regression problem](/notebooks/machine-learning/supervised-learning/linear-regression.ipynb), except that values we want to predict $\mathcal{y}$ take only a small number of discrete values.

## Logistic regression

Using the [linear regression](/notebooks/machine-learning/supervised-learning/linear-regression.ipynb) approach to predict $\mathcal{y}$ given $\mathcal{x}$ may perform very poorly. The reason is explicitly shown in the below example.

In [None]:
import pylab
import numpy as np
from sklearn import linear_model

## Learning first round
reg = linear_model.LinearRegression()

x = np.array([2, 3, 4, 5, 6, 7])
y = np.array([0, 0, 0, 1, 1, 1])

reg.fit(x.reshape((x.size,1)), y)
lx = np.linspace(1, 8, 10)
ly = reg.intercept_ + reg.coef_ * lx

pylab.plot(lx, ly, 'blue')
pylab.plot(x, y, 'bo')

## Learning second round
reg = linear_model.LinearRegression()

x = np.array([2, 3, 4, 5, 6, 7, 15])
y = np.array([0, 0, 0, 1, 1, 1, 1])

reg.fit(x.reshape((x.size,1)), y)
lx = np.linspace(1, 15, 10)
ly = reg.intercept_ + reg.coef_ * lx

pylab.plot(lx, ly, 'g')
pylab.plot([15], 1, 'go')

Suppose that we are performing linear regression over a training set containing 6 entries such that:

| y | 0 | 0 | 0 | 1 | 1 | 1 |
|:-:|---|---|---|---|---|---|
| x | 2 | 3 | 4 | 5 | 6 | 7 |

The output generated will be the blue line, which is essentially a good predictor. However, if we add a new point to our table, let's say $(15, 1)$, which shouldn't change our modeling, but it will affect our model changing its predictions (see green line).

To fix this, lets change the form for our hypotheses $h_{\theta}(x)$. We will choose

$$h_{\theta} = g(\theta^{T}x) = \frac{1}{1+e^{-\theta^{T}x}}$$

where

$$g(z)=\frac{1}{1+e^{-z}}$$

is called the logistic function or the sigmoid function.