# <center><u>  Logistic Regression</u></center>
 It is a statistical method for analysing a data set in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). The goal of logistic regression is to find the best fitting model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables. This is better than other binary classification like nearest neighbor since it also explains quantitatively the factors that lead to classification.

- Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes.


- In simple words, the dependent variable is binary in nature having data coded as either 1 (stands for success/yes) or 0 (stands for failure/no).


- Mathematically, a logistic regression model predicts P(Y=1) as a function of X. It is one of the simplest ML algorithms that can be used for various classification problems such as spam detection, Diabetes prediction, cancer detection etc.

### Types of Logistic Regression
Generally, logistic regression means binary logistic regression having binary target variables, but there can be two more categories of target variables that can be predicted by it. Based on those number of categories, Logistic regression can be divided into following types −

**Binary or Binomial :-** In such a kind of classification, a dependent variable will have only two possible types either 1 and 0. For example, these variables may represent success or failure, yes or no, win or loss etc.


**Multinomial :-** In such a kind of classification, dependent variable can have 3 or more possible unordered types or the types having no quantitative significance. For example, these variables may represent “Type A” or “Type B” or “Type C”.

**Ordinal :-** In such a kind of classification, dependent variable can have 3 or more possible ordered types or the types having a quantitative significance. For example, these variables may represent “poor” or “good”, “very good”, “Excellent” and each category can have the scores like 0,1,2,3.

### Regression Models
**Binary Logistic Regression Model** − The simplest form of logistic regression is binary or binomial logistic regression in which the target or dependent variable can have only 2 possible types either 1 or 0.

**Multinomial Logistic Regression Model** − Another useful form of logistic regression is multinomial logistic regression in which the target or dependent variable can have 3 or more possible unordered types i.e. the types having no quantitative significance.

The simplest form of logistic regression is binary or binomial logistic regression in which the target or dependent variable can have only 2 possible types either 1 or 0. It allows us to model a relationship between multiple predictor variables and a binary/binomial target variable. In case of logistic regression, the linear function is basically used as an input to another function such as 𝑔 in the following relation −
![](img/img-LogisticRegression/Sigmoid00.png)
To sigmoid curve can be represented with the help of following graph. We can see the values of y-axis lie between 0 and 1 and crosses the axis at 0.5.
![](img/img-LogisticRegression/Sigmoid01.png)
The classes can be divided into positive or negative. The output comes under the probability of positive class if it lies between 0 and 1. For our implementation, we are interpreting the output of hypothesis function as positive if it is ≥0.5, otherwise negative.

We also need to define a loss function to measure how well the algorithm performs using the weights on functions, represented by theta as follows −
![](img/img-LogisticRegression/Sigmoid02.png)
Now, after defining the loss function our prime goal is to minimize the loss function. It can be done with the help of fitting the weights which means by increasing or decreasing the weights. With the help of derivatives of the loss function w.r.t each weight, we would be able to know what parameters should have high weight and what should have smaller weight.

The following gradient descent equation tells us how loss would change if we modified the parameters −
![](img/img-LogisticRegression/Sigmoid03.png)

![](img/img-LogisticRegression/knn_steps.PNG)

### Program(Python)

In [1]:
import pandas as pd
df=pd.read_csv('B:/dataset/online_ads.csv')
df.head()

Unnamed: 0,Gender,Age,EstimatedSalary,Purchased
0,Male,19,19000,0
1,Male,35,20000,0
2,Female,26,43000,0
3,Female,27,57000,0
4,Male,19,76000,0


In [3]:
#df=pd.get_dummies(df)
df['Gender']=df.Gender.map({'Male':0,'Female':1,})

Index(['Gender', 'Age', 'EstimatedSalary', 'Purchased'], dtype='object')

In [5]:
df.head()

Unnamed: 0,Gender,Age,EstimatedSalary,Purchased
0,0,19,19000,0
1,0,35,20000,0
2,1,26,43000,0
3,1,27,57000,0
4,0,19,76000,0


In [7]:
#X=df.loc[:,('Age','EstimatedSalary','Gender_Female','Gender_Male')].values
X=df.loc[:,('Age','EstimatedSalary','Gender')].values
Y=df.loc[:,'Purchased'].values

In [8]:
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_new=sc.fit_transform(X.astype(float))

In [9]:
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X_new,Y)

In [17]:
from sklearn.linear_model import LogisticRegression
#log=LogisticRegression(solver='lbfgs') 

log=LogisticRegression(solver='lbfgs')
log.fit(X_train,Y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

In [11]:
prob=log.predict_proba(X_train)
print(prob[12])
print(Y_train[12])

[0.9416761 0.0583239]
0


In [8]:
pred=log.predict(X_train)
pred[12]

0

In [9]:
from sklearn.metrics import accuracy_score
accuracy_score(Y_train,pred)

0.8533333333333334

In [10]:
mypred=[]
for pb in prob:
    if(pb[0]>=.65):
        mypred.append(0)
    else:
        mypred.append(1)

In [11]:
accuracy_score(Y_train,mypred)

0.8433333333333334

In [12]:
test_prob=log.predict_proba(X_test)
test_prob[0]

array([0.45227703, 0.54772297])

In [13]:
test_pred=log.predict(X_test)

In [14]:
test_pred[0]

1

In [15]:
accuracy_score(Y_test,test_pred)

0.87

In [16]:
X_test[0]

array([-0.63563988,  0.03692631,  0.98019606, -0.98019606])

In [17]:
log.coef_

array([[ 2.17459909,  0.96463387, -0.08189553,  0.08189553]])

In [18]:
log.intercept_

array([-1.07167576])

In [19]:
cont_y=log.coef_[0][0]*X_test[0][0]+log.coef_[0][1]*X_test[0][1]+log.coef_[0][2]*X_test[0][2]+log.coef_[0][3]*X_test[0][3]+log.intercept_[0]
cont_y

-2.578864638956782

In [20]:
import numpy as np
class_y=1/(1+np.exp(-cont_y))
class_y

0.07051110537395128

In [16]:
# example of LogisticRegression that generates a FutureWarning
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression
# prepare dataset
X, y = make_blobs(n_samples=100, centers=3, n_features=2)
# create and configure model
model = LogisticRegression(solver='lbfgs')
# fit model
model.fit(X, y)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

# create and configure model

**model = LogisticRegression(solver='lbfgs', multi_class='ovr')**

**model = LogisticRegression(solver='lbfgs', multi_class='auto')**