<a href="https://colab.research.google.com/github/aravind309/blogs/blob/main/Simple_Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Importing Libraries

In [24]:
import numpy as np
from sklearn.linear_model import LogisticRegression

NumPy is used to perform a  variety of mathematical operations on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices.

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python.

# Dataset


**Input and Output train dataset**

We will usualy have the data in input and for the purpose of this example, lets create arrays for the input (X) and output (y). Both input and output are numpy arrays. The array X is required to be two-dimensional. It should have one column for each input, and the number of rows should be equal to the number of observations. To make x two-dimensional, you apply .reshape() with the arguments -1 

X has two dimensions:
1) One column for a single input
2) Ten rows, each corresponding to one observation
y is one-dimensional with ten items

In [25]:
X = np.arange(10).reshape(-1, 1)
X

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

In [26]:
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
y

array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

# Training Model


**Creating a Model and train it**

We create a model for logistic regression by creating a instance of LogisticRegression().In paramters solver is a string ('liblinear' by default) that decides what solver to use for fitting the model. Other options are 'newton-cg', 'lbfgs', 'sag', and 'saga'.

We should carefully match the solver and regularization.

*  'liblinear' solver doesn’t work without regularization.
*   'newton-cg', 'sag', 'saga', and 'lbfgs' don’t support L1 regularization.
*   'saga' is the only solver that supports elastic-net regularization.

Post the model creation we use *.fit()* to fit the input data X and output data y.



In [27]:
model = LogisticRegression(solver='liblinear', random_state=0)
model.fit(X, y)

LogisticRegression(random_state=0, solver='liblinear')

#  Prediction on test data

Above trained regression will be used to predict the output on test data. Here our test data is [[-1], [0], [3], [8], [10], [11]]. This functions returns the predicted output as 1D array

In [28]:
model.predict( [[-1], [0], [3], [8], [10], [11]])

array([0, 0, 1, 1, 1, 1])

# Printing Probabilities

The first column is the probability of the predicted output being zero, that is 1 - 𝑝(𝑥). The second column is the probability that the output is one, or 𝑝(𝑥)

In [29]:
# Train Data 
model.predict_proba(X)

array([[0.76881371, 0.23118629],
       [0.68800809, 0.31199191],
       [0.59387837, 0.40612163],
       [0.49230569, 0.50769431],
       [0.39136427, 0.60863573],
       [0.29893328, 0.70106672],
       [0.22042624, 0.77957376],
       [0.15789351, 0.84210649],
       [0.11058424, 0.88941576],
       [0.07616801, 0.92383199]])

In [30]:
# Test Data
model.predict_proba([[-1], [0], [3], [8], [10], [11]])

array([[0.83374798, 0.16625202],
       [0.76881371, 0.23118629],
       [0.49230569, 0.50769431],
       [0.11058424, 0.88941576],
       [0.05183858, 0.94816142],
       [0.0349861 , 0.9650139 ]])