# Overview - Logistic Regression from Linear Regression

## How is it used?

Classification vs Linear Regression predicting values

## Recall Linear Regression

### Formula

$$ \hat y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n = \sum_{i=0}^{N} \beta_i x_i $$

## Classification: Use Logistic Regression

Probability of belonging to a particular group

Transform from linear regression!

$$ \hat y = \sum_{i=0}^{N} \beta_i x_i $$

$$ P = \displaystyle \frac{1}{1+e^{-\hat y}} = \frac{1}{1+e^{-\sum_{i=0}^{N} \beta_i x_i}} $$

$$ = \frac{1}{1+e^{-\beta_0}e^{-\beta_1 x_1}\ldots e^{-\beta_N x_N}} $$

# Implementing Logistic Regression

In [None]:
import numpy as np
import pandas as pd

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

## Play with some data

In [None]:
# import some data to play with
from sklearn import datasets
iris = datasets.load_iris()
df = pd.DataFrame(
    data= np.c_[iris['data'], iris['target']],
    columns= iris['feature_names'] + ['target']
)

In [None]:
display(df.head())
display(df.describe())

## Prepare the data to do the classification

In [None]:
# Get the features and then the target
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [None]:
# Split for test & training  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=27)

## Create the logistic regression model

In [None]:
logreg = LogisticRegression(fit_intercept = False, C = 1e12, solver='lbfgs', multi_class='auto')
model_log = logreg.fit(X_train, y_train)
model_log

In [None]:
y_hat_test = logreg.predict(X_test)
y_hat_train = logreg.predict(X_train)

## Evaluate the model

### Training Set

In [None]:
# Was our model correct?
residuals = y_train == y_hat_train

print('Number of values correctly predicted:')
print(pd.Series(residuals).value_counts())

In [None]:
print('Percentage of values correctly predicted: ')
print(pd.Series(residuals).value_counts(normalize=True))

### Testing Set

In [None]:
residuals = y_test == y_hat_test

In [None]:
print('Number of values correctly predicted:')
print(pd.Series(residuals).value_counts())

In [None]:
print('Percentage of values correctly predicted: ')
print(pd.Series(residuals).value_counts(normalize=True))