# Logistic Regression

### Classification

- Examples of classification problems in ML:
    - Email: Spam/Not Spam?
    - Online Transactions: Fraudulent (Yes/No)?
    - Tumor: Malignant/Benign?
- $y\in$ {0,1}
    - 0: Negative Class
    - 1: Positive Class
- If using linear regression ($h_\mathsf{\theta}(x) = \mathsf{\theta^T}x$), a threshold classifier output $h_\mathsf{\theta}(x)$ at 0.5 would result in something like this:
    - if $h_\mathsf{\theta}(x)\geq$ 0.5, predict "y=1".
    - if $h_\mathsf{\theta}(x)\leq$ 0.5, predict "y=0".
    - However, because training set data might have outliers that cause the threshold to shift and ultimately misclassify some data points, linear regression is usually not a great idea for classification problems.
    
- Logistic Regression: $0\leq h_\mathsf{\theta}(x) \leq 1$

    

### Hypothesis Representation for Logistic Regression

- $h_\mathsf{\theta}(x) = g(\mathsf{\theta^T}x)$
    - $g(z)=\frac{1}{1+e^-z}$
    - $g(z)$ is the sigmoid/logistic function
    - an alternate form of the hypothesis is: $h_\mathsf{\theta}(x) = \frac{1}{1+e^{-{\mathsf{\theta^T}x}}}$
- Interpretation of Hypothesis Output:
    - $h_\mathsf{\theta}(x)$ = estimated probability that y = 1 on input x
    - Example:
        - if $ x = \begin{bmatrix}x_0 \\ x_1\end{bmatrix}$ = $\begin{bmatrix}1 \\ tumorSize\end{bmatrix}$
        - $h_\mathsf{\theta}(x) = 0.7$, You would interpret the results as a 70% chance that the tumor is malignant.
    - Formally, this probabilty is expressed as: $h_\mathsf{\theta}(x) = P(y=1|x;\mathsf{\theta})$, which should be read as 'the probability that y=1, given x, parameterized by $\mathsf{\theta}$'

### Decision Boundary

- Suppose predict "y=1" if $h_\mathsf{\theta}(x) \geq 0.5$ and predict "y=0" if $h_\mathsf{\theta}(x) < 0.5$
- If we visualize the sigmoid function, then $g(z) \geq 0.5$ when $z \geq 0$. Therefore, $h_\mathsf{\theta}(x) = g(\mathsf{\theta^T}x) \geq 0.5$ whenever $\mathsf{\theta^T}x \geq 0$
    - the oppositve would be true whenever $g(\mathsf{\theta^T}x) < 0.5$
- A training set may be used to fit the parameters, but the decision boundary itself is a property of the paramter vector $\mathsf{\theta}$, **not** the training set.

### Cost Function for Logistic Regression