# Classification

Two main types of classification problems:
- Binary: exactly two classes to choose between (usually 0 and 1, true and false, or positive and negative)
- Multiclass or multinomial classification: three or more classes of the outputs to choose from

## Logistic Regression

Logistic regression is a fundamental classification technique. It belongs to the group of linear classifiers and is somewhat similar to polynomial and linear regression. Logistic regression is fast and relatively uncomplicated, and it’s convenient for you to interpret the results. Although it’s essentially a method for binary classification, it can also be applied to multiclass problems.

<img src="https://files.realpython.com/media/log-reg-1.e32deaa7cbac.png">

## Methodology

Logistic regression is a linear classifier, so you’ll use a linear function 𝑓(<b>𝐱</b>) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ + 𝑏ᵣ𝑥ᵣ, also called the <u>logit</u>. The variables 𝑏₀, 𝑏₁, …, 𝑏ᵣ are the estimators of the regression coefficients, which are also called the predicted weights or just coefficients.

The logistic regression function 𝑝(𝐱) is the sigmoid function of 𝑓(𝐱): 𝑝(𝐱) = 1 / (1 + exp(−𝑓(𝐱)). As such, it’s often close to either 0 or 1. The function 𝑝(𝐱) is often interpreted as the predicted probability that the output for a given 𝐱 is equal to 1. Therefore, 1 − 𝑝(𝑥) is the probability that the output is 0.

Logistic regression determines the best predicted weights 𝑏₀, 𝑏₁, …, 𝑏ᵣ such that the function 𝑝(𝐱) is as close as possible to all actual responses 𝑦ᵢ, 𝑖 = 1, …, 𝑛, where 𝑛 is the number of observations. The process of calculating the best weights using available observations is called model training or fitting.

To get the best weights, you usually maximize the log-likelihood function (LLF) for all observations 𝑖 = 1, …, 𝑛. This method is called the maximum likelihood estimation and is represented by the equation LLF = Σᵢ(𝑦ᵢ log(𝑝(𝐱ᵢ)) + (1 − 𝑦ᵢ) log(1 − 𝑝(𝐱ᵢ))).

When 𝑦ᵢ = 0, the LLF for the corresponding observation is equal to log(1 − 𝑝(𝐱ᵢ)). If 𝑝(𝐱ᵢ) is close to 𝑦ᵢ = 0, then log(1 − 𝑝(𝐱ᵢ)) is close to 0. This is the result you want. If 𝑝(𝐱ᵢ) is far from 0, then log(1 − 𝑝(𝐱ᵢ)) drops significantly. You don’t want that result because your goal is to obtain the maximum LLF. Similarly, when 𝑦ᵢ = 1, the LLF for that observation is 𝑦ᵢ log(𝑝(𝐱ᵢ)). If 𝑝(𝐱ᵢ) is close to 𝑦ᵢ = 1, then log(𝑝(𝐱ᵢ)) is close to 0. If 𝑝(𝐱ᵢ) is far from 1, then log(𝑝(𝐱ᵢ)) is a large negative number.

There are several mathematical approaches that will calculate the best weights that correspond to the maximum LLF, but that’s beyond the scope of this tutorial. For now, you can leave these details to the logistic regression Python libraries you’ll learn to use here!

Once you determine the best weights that define the function 𝑝(𝐱), you can get the predicted outputs 𝑝(𝐱ᵢ) for any given input 𝐱ᵢ. For each observation 𝑖 = 1, …, 𝑛, the predicted output is 1 if 𝑝(𝐱ᵢ) > 0.5 and 0 otherwise. The threshold doesn’t have to be 0.5, but it usually is. You might define a lower or higher value if that’s more convenient for your situation.

There’s one more important relationship between 𝑝(𝐱) and 𝑓(𝐱), which is that log(𝑝(𝐱) / (1 − 𝑝(𝐱))) = 𝑓(𝐱). This equality explains why 𝑓(𝐱) is the logit. It implies that 𝑝(𝐱) = 0.5 when 𝑓(𝐱) = 0 and that the predicted output is 1 if 𝑓(𝐱) > 0 and 0 otherwise.

## Classification Performance
Binary classification has four possible types of results:
- True negatives: correctly predicted negatives (zeros)
- True positives: correctly predicted positives (ones)
- False negatives: incorrectly predicted negatives (zeroes)
- False positives: incorrectly predicted positives (ones)

The most straightforward indicator of <b>classification accuracy</b> is the ratio of the number of correct predictions to the total numbr of predictions (or oservations). Other indicators of binary classifiers include the following:
- The <b>positive prdictive value</b>: the ratio of the number of true positives to the sum of the numbers of true and false positives
- The <b>negatgive preictive value</b>: the ratio of the number of true negatives to the sum of the numbers of true and false negatives
- The <b>sensitivity</b> (also known as <b>recall</b> or <b>true positive rate</b>): the ratio of the number of true positive to the number of actual positives
- The <b>specificity</b> (or true negative rate): the ratio of the number of true negatives tot he number of actual negatives

## Single-Variate Logistic Regression

Single-variate logistic regression is the most straightforward case of logistic regression. There is only one independent variable (or feature), which is 𝐱 = 𝑥. This figure illustrates single-variate logistic regression:

<img src="https://files.realpython.com/media/log-reg-2.e88a21607ba3.png">

## Multi-Variate Logistic Regression

Multi-variate logistic regression has more than one input variable. This figure shows the classification with two independent variables, 𝑥₁ and 𝑥₂:

<img src="https://files.realpython.com/media/log-reg-3.b1634d335c4f.png">

Logistic regression determines the weights 𝑏₀, 𝑏₁, and 𝑏₂ that maximize the LLF. Once you have 𝑏₀, 𝑏₁, and 𝑏₂, you can get:

- The logit 𝑓(𝑥₁, 𝑥₂) = 𝑏₀ + 𝑏₁𝑥₁ + 𝑏₂𝑥₂
- The probabilities 𝑝(𝑥₁, 𝑥₂) = 1 / (1 + exp(−𝑓(𝑥₁, 𝑥₂)))

The dash-dotted black line linearly separates the two classes. This line corresponds to 𝑝(𝑥₁, 𝑥₂) = 0.5 and 𝑓(𝑥₁, 𝑥₂) = 0.

## Regularization

Overfitting is one of the most serious kinds of problems related to machine learning. It occurs when a model learns the training data too well. The model then learns not only the relationships among data but also the noise in the dataset. Overfitted models tend to have good performance with the data used to fit them (the training data), but they behave poorly with unseen data (or test data, which is data not used to fit the model).

<b><u>Regularization</u></b> normally tries to reduce or penalize the complexity of the model. Regularization techniques applied with logistic regression mostly tend to penalize large coefficients 𝑏₀, 𝑏₁, …, 𝑏ᵣ:

- L1 regularization penalizes the LLF with the scaled sum of the absolute values of the weights: |𝑏₀|+|𝑏₁|+⋯+|𝑏ᵣ|.
- L2 regularization penalizes the LLF with the scaled sum of the squares of the weights: 𝑏₀²+𝑏₁²+⋯+𝑏ᵣ².
- Elastic-net regularization is a linear combination of L1 and L2 regularization.


Step 1: Import Packages, Functions, and Classes

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

Step 2: Get data

In [None]:
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

In [None]:
print(y)

Step 3: Create a Model and Train It

In [None]:
model = LogisticRegression(solver='liblinear', random_state = 0)

Optional parameters of LogisticRegression class
- <b>penalty</b>: a string ('12' by default) that decides whether there is regularization and which approach to use. other options are 'l1', 'elasticnet', and 'none'
- <b>dual</b>: a boolean (false by default) that decides whether to use primal (when false) or dual formulation (when rue)
- <b>tol</b>: is a floatingp-point number( 0.0001 by default) that defines the toleranc for stopping the procdurfe
- <b>c</b>: is a positive floating-point number (1.0 by default) that defines the relative strength of regularization. Smaller values indicate stronger regularizatgion.
- <b>fit_injtercept</b>: a Boolean (True by default) that decies whether to calculate the itnerecpt b0 (when True) or consider it equal to zero (when False) 
- <b>intercept_scaling</b>: is a floating-point number (1.0 by default) that defines the scaling of the intecept b0. 
- <b>class_weight</b>: a dictionary, 'balanced', or None(default) that defines the weights related to each class. When None, all classes have the weight one.
- <b>random_state</b>: is an integer, an instanec of numpy.Randomstate, or None (defaujlt) that dfeines what pseudo-random number genrator to use.
- <b>solver</b>: a string ('liblinear' by default) that decides what solve rto use for fitin the moel. Other options are 'newton-cg', 'lbfgs', 'sag', and 'saga'.
- <b>max-iter</b>: an integer (100 by default) tha defines the maximum number of iterations by the solver during model fitting
- <b>multi-class</b>: a string ('ovr' by default) that decies the approach to use for handling multiple classes. Other options are 'multinomial' and 'auto'. 
- <b>verbose</b>: a non-negative integer (0 by default) that defines the verbosity for the 'liblinear' and 'lbfgs' solvers
- <b>warm_start</b>:a Boolean (False by default) that decis whether to reuse the previously obtained solution
- <b>n_jobs</b>: an integer or None (default) that definest he number of paralle,l processes to use. None usually means to use one core, while -1 mean sto use all available cores
- <b>l1_ratio</b>: a floating-point number between zero and one or None (default). It defies the relative importance of the L1 part int he elastic-net regularization

In [None]:
model = LogisticRegression(solver='liblinear', random_state=0).fit(x, y)

In [None]:
#Represents the array of disctinct values that y takes.
print(model.classes_)

In [None]:
model.intercept_, model.coef_

Step 4: Evaluate the model

In [None]:
model.predict_proba(x)

In the matrix above, each row corresponds to a single observation. The first column is the probability of the predicted output being zero, that is 1 - 𝑝(𝑥). The second column is the probability that the output is one, or 𝑝(𝑥).

You can get the actual predictions, based on the probability matrix and the values of 𝑝(𝑥), with .predict():

In [None]:
model.predict(x)

<img src="https://files.realpython.com/media/log-reg-5.1e0f3f7e733a.png">

In [None]:
#Obtain the accuracy of your model with .score(); this is the ratio of the number of correct predictions tot he number of observations
model.score(x, y)

You can get more information on the accuracy of the model with a confusion matrix. In the case of binary classificaiton, the confusion matrix showst he numbers of the following: 
- True negatives in the upper-left position
- False negatives in the lower-left position
- False positives in the upper-right position
- True positives int he lower-right position

In [None]:
#Create confusion matrix
confusion_matrix(y, model.predict(x))

In [None]:
#Visualize the confusion matrix:
cm = confusion_matrix(y, model.predict(x))

fig, ax = plt.subplots(figsize=(8, 8))
ax.imshow(cm)
ax.grid(False)
ax.xaxis.set(ticks=(0, 1), ticklabels=('Predicted 0s', 'Predicted 1s'))
ax.yaxis.set(ticks=(0, 1), ticklabels=('Actual 0s', 'Actual 1s'))
ax.set_ylim(1.5, -0.5)
for i in range(2):
    for j in range(2):
        ax.text(j, i, cm[i, j], ha='center', va='center', color='black')
plt.show()

In [None]:
#Get a more comprehensive report on the classification with classification_report()
print(classification_report(y, model.predict(x)))

## Improve the model

In [None]:
#One way is wyb changing the regularization strength c equal to 10.0
model = LogisticRegression(solver='liblinear', C=10.0, random_state=0)
model.fit(x, y)



In [None]:
#This model will have different parameters, a differen probability matrix and a different set of coefficients and predictions
model.intercept_

In [None]:
model.coef_

In [None]:
model.predict_proba(x)

In [None]:
model.predict(x)

In [None]:
model.score(x, y)

In [None]:
confusion_matrix(y, model.predict(x))

In [None]:
print(classification_report(y, model.predict(x)))

<img src="https://files.realpython.com/media/log-reg-7.9141027bd736.png">

## Logistic Regression with sci-kit learn

In [None]:
# Step 1: Import packages, functions, and classes
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Step 2: Get data
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 1, 0, 0, 1, 1, 1, 1, 1, 1])

# Step 3: Create a model and train it
model = LogisticRegression(solver='liblinear', C=10.0, random_state=0)
model.fit(x, y)

# Step 4: Evaluate the model
p_pred = model.predict_proba(x)
y_pred = model.predict(x)
score_ = model.score(x, y)
conf_m = confusion_matrix(y, y_pred)
report = classification_report(y, y_pred)

In [None]:
print('x:', x , sep='\n')

In [None]:
print('y:', y, sep='\n', end='\n\n')

In [None]:
print('intercept:', model.intercept_)

In [None]:
print('coef:', model.coef_, end='\n\n')

In [None]:
print('p_pred', p_pred, sep='\n', end='\n\n')

In [None]:
print('y_pred:', y_pred, end='\n\n')

In [None]:
print('score_:', score_, end='\n\n')

In [None]:
print('conf_m:', conf_m, sep='\n', end='\n\n')

In [None]:
print('report:', report, sep='\n')

## Logistic Regression in Python with Statsmodels Example

Step 1: Import Packages

In [None]:
import numpy as np
import statsmodels.api as sm

Step 2: Get Data

In [None]:
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 1, 0, 0, 1, 1, 1, 1, 1, 1])
x = sm.add_constant(x)

add_constant() takes the array x as the argument and returns a new array witht he additional column of ones. This is how x and y look.

Step 3: Create a Model and Train It

In [None]:
model = sm.Logit(y, x)

In [None]:
result = model.fit(method='newton')

In [None]:
result.params

Step 4: Evaluate the Model 

In [None]:
result.predict(x)
(result.predict(x) >= 0.5).astype(int)

result.pred_table()
result.summary()
result.summary2()

## Logistic Regression Example with Pre-Processing

In [None]:
#Logistic regression
import pandas as pd
import pylab as pl
import numpy as np
import scipy.optimize as opt
from sklearn import preprocessing
%matplotlib inline
import matplotlib.pyplot as plt


In [None]:
#Load dataset
await download (path, "ChurnData.csv")
path = "ChurnData.csv"
churn_df = pd.read_csv(path)

In [None]:
#Pre-processing
churn_df = churn_df[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip',   'callcard', 'wireless','churn']]
churn_df['churn'] = churn_df['churn'].astype('int') #int required
churn_df.head()

In [None]:
#Define x and y
X = np.asarray(churn_df[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip']])
y = np.asarray(churn_df['churn'])

In [None]:
#Normalizing
from sklearn import preprocessing
x = preprocessing.StandardScaler().fit(X).transform(X)

In [None]:
#Train test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 4)

In [None]:
#Model building
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
LR = LogisticRegression(C = 0.01, solver="liblinear").fit(X_train, y_train)
LR

In [None]:
#Predictng
yhat = LR.predict(X_test)
yhat_prob = LR.predict_proba(X_test)

In [None]:
#Jaccard index for accuracy evaluation
from sklearn.metrics import jaccard_score
jaccard_score(y_test, yhat, pos_label=0)

In [None]:
#Confusion matrix
from sklearn.metrics import classification_report, confusion_matrix
import itertools
def plot_confusion_matrix(cm, classes,
                            normalize=False,
                            title='Confusion matrix',
                              cmap=plt.cm.Blues):
    if normalize:
            cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
            print("Normalized confusion matrix")
    else:
            print('Confusion matrix, without normalization')
            
        print(cm)
        
        plt.imshow(cm, interpolation='nearest', cmap=cmap)
        plt.title(title)
        plt.colorbar()
        tick_marks = np.arrange(len(classes))
        plt.xticks(tick_marks, classes, rotation=45)
        plt.yticks(tick_marks, classes)
        
        fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
print(confusion_matrix(y_test, yhat, labels=[1,0]))

In [None]:
#Compute confusion matrix
cnf_matrix = confusion_matrix(y_test, yhat, labels=[1, 0])
np.set_printoptions(precision=2)

#Plot non-normalized confusion matrix
plt.figure(
plot_confusion_matrix(cnf_matrix, classes=['churn=1', 'churn'=0], normalize= False,  title='Confusion matrix')
    
#Precision: TP / (TP + FP)
#Recall: TP / (TP + FN)
    
#Log loss
from sklearn.metrics import log_loss
    log_loss(y_test, yhat_prob)