# Logistic Regression 

It establishes the relationship between a categorical variable and one or more independent variables. This relationship is used in machine learning to predict the outcome of a categorical variable. It is widely used in many different fields such as the medical field, trading and business, technology, and many more.

It is a classification algorithm used to find the probability of event success and event failure. It is used when the dependent variable is binary(0/1, True/False, Yes/No) in nature. It supports categorizing data into discrete classes by studying the relationship from a given set of labelled data. It learns a linear relationship from the given dataset and then introduces a non-linearity in the form of the Sigmoid function.

### Adavantages 
- It is easy to implement yet provides great training efficiency in some cases. Training a model with this algorithm doesn't require high computation power.
- In a low dimensional dataset having a sufficient number of training examples, logistic regression is less prone to over-fitting.
- Logistic Regression proves to be very efficient when the dataset has features that are linearly separable.
- In a low dimensional dataset having a sufficient number of training examples, logistic regression is less prone to over-fitting.

### Disadvantages
- Non linear problems can't be solved with logistic regression since it has a linear decision surface.
- Only important and relevant features should be used to build a model otherwise the probabilistic predictions made by the model may be incorrect and the model's predictive value may degrade.
- The presence of data values that deviate from the expected range in the dataset may lead to incorrect results as this algorithm is sensitive to outliers.
- It is required that each training example be independent of all the other examples in the dataset. If they are related in some way, then the model will try to give more importance to those specific training examples. So, the training data should not come from matched data or repeated measurements. For example, some scientific research techniques rely on multiple observations on the same individuals. This technique can't be used in such cases.

### Applications

- Credit scoring
- Medicine
- Hotel booking
- Emails: Spam or not

# From scratch

In [1]:
import numpy as np

class LogisticRegression:
    def __init__(self, x, y):
        self.intercept = np.ones((x.shape[0], 1))
        self.x = np.concatenate((self.intercept, x), axis = 1)
        self.weight = np.zeros(self.x.shape[1])
        self.y = y


    '''Sigmoid method : The sigmoid function in logistic regression returns a probability value that can then be 
        mapped to two or more discrete classes. Given the set of input variables, our goal is to 
        assign that data point to a category (either 1 or 0). The sigmoid function outputs the 
        probability of the input points belonging to one of the classes.'''
    def sigmoid(self, x, weight):
        z = np.dot(x, weight)
        return 1/(1 + np.exp(-z))



    '''The Loss function : The loss function consists of parameters/weights, 
        when we say we want to optimize a loss function by this we simply refer to finding the best values of the
        parameters/weights.''' 
    def loss(self, h, y):
         l = (-y * np.log(h) - (1 - y) * np.log(1 - h))
         return l.mean()



    '''Gradient descent:  The Gradient descent is just the derivative of the loss function with respect to its weights.'''
    def gd(self, X, h, y):
        return np.dot(X.T, (h - y))/ y.shape[0]



    '''To implement the Algorithm we defined a fit method which requires the learning rate and the number of iterations as the input arguments.'''
    def fit(self, lr, iterations):
        for i in range(iterations):
            sigma = self.sigmoid(self.x, self.weight)

            loss = self.loss(sigma, self.y)

            dW = self.gd(self.x, sigma, self.y)

            #updating weights
            self.weight -= lr*dW

        return print('fitted successfully to data')



    #Method to predict class label   
    def predict(self, x_new, threshold):
        x_new = np.concatenate((self.intercept, x_new), axis = 1)
        result = self.sigmoid(x_new, self.weight)
        result = result >= threshold
        y_pred = np.zeros(result.shape[0])

        for i in range(len(y_pred)):
            if result[i] == True:
                y_pred[i] = 1
            else:
                continue
        
        return y_pred




# Implementing to test

In [2]:
from sklearn.datasets import load_breast_cancer

ds = load_breast_cancer()

x = ds.data
y = ds.target


regressor = LogisticRegression(x, y)

regressor.fit(0.01, 5000)


y_pred = regressor.predict(x, 0.5)


accuracy = sum(y_pred == y)/y.shape[0]

print('Accuracy is : ', accuracy)



fitted successfully to data
Accuracy is :  0.9261862917398945
