# LOGISTIC REGRESSION

Logistic regression, despite its name, is a linear model for classification rather than regression. Logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function.

Logistic regression is implemented in LogisticRegression. This implementation can fit binary, One-vs-Rest, or multinomial logistic regression with optional 
, 
 or Elastic-Net regularization.

# Types of Activation Functions


### Identity:
The identity activation function returns its input as it is.

identity (a)=a

It is the simplest of all activation functions but does not impart any particular characteristic to the input. It is mostly reserved for output layers, especially in the case of real-valued regression problems.


### Sigmoid: σ

The sigmoid activation, typically denoted as 
σ(a), is a nonlinear activation function with the range [0,1].

σ(a)=1/1+e−a

It is commonly used for gates in LSTMs and GRUs. It can also be used for probabilistic outputs because it is always positive and less than 1.

It is also known as the logistic or soft-step activation function.


The sigmoid function is an apt choice for predicting a probabilistic output. This is possible because the output of the sigmoid function is bounded in the range [0,1].

The sigmoid function output is 0.5 only when its input is 0.

For positive inputs, the sigmoid returns values in the range [0.5,1].

For negative inputs, the sigmoid returns values in the range [0,0.5].

### Hyperbolic tangent: tanh
The hyperbolic tangent activation, typically denoted as tanh(a), is a nonlinear activation function with the range [−1,1].

It is quite similar to the sigmoid activation function, but allows for negative values.

tanh(a)=(e^a)−(e^−a)/(e^a)+(e^−a)=e^2a−1/e^2a+1

### ReLU:

Rectified linear unit (ReLU) is a piecewise linear function that assigns zero to negative input and keeps positive input unchanged. It is typically denoted as its acronym ReLU.

ReLU(a)=max{0,a}

ReLU is the default recommendation for all hidden layers in modern deep neural networks. Multiple stacked layers with ReLU activations enable the modeling of any nonlinearity due to the piecewise linearity of this activation function.

### Leaky ReLU:
ReLU is harsh on negative inputs. It returns zero for negative inputs. This rigidity results in dead units — units whose activation is always zero.

A milder alternative is the leaky ReLU, defined as follows:

ReLU(a)=⎧0.01         for a<0
        ⎨
        ⎩a            for a≥0
 
Thus, negative values are reduced in magnitude, but still manage to pass through, thereby preventing dead units.

### Parametric ReLU: PReLU

The leaky ReLU discussed above makes an arbitrary choice of returning 
0.01a when a<0.

The multiplier 0.01 can instead by parametrized with a learnable parameter α that can be adapted during learning phase, just as any parameter of the model.

ReLU(a)=⎧αa for a<0
        ⎨
        ⎩a  for a≥0

### SoftPlus
ReLU, Leaky ReLU, and PReLU are not differentiable at zero. A softer alternative that is differentiable, but has a behaviour roughly similar to ReLU is the SoftPlus activation function.

SoftPlus(a)=ln(1+e^−a)

In spite of this differentiable behavior, it is still the case that ReLU is preferred and default choice in neural networks. It often works well enough in practice and is super cheap to compute.