# Cost Function

The cost function in logistic regression is an essential tool for calculating the discrepancy between the actual labels and the predicted probabilities. Finding the best values for the model parameters involves minimizing this cost function, which is the aim of logistic regression.

## Sigmoid Function

An essential part of logistic regression is the logistic or sigmoid function. It has the following definition:

$$ \sigma(z) = \frac{1}{1 + e^{-z}} $$

where $ z $ represents the linear combination of the model's parameters and input features:

$$ z = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_n X_n $$

The coefficients that need to be learned in this case are $\beta_0, \beta_1, \ldots, \beta_n $.

## Probability and Odds

The probability ($ P $) that an instance belongs to the positive class is predicted by the logistic regression model. The odds of a positive outcome are represented by the odds ratio ($ \frac{P}{1-P} $).

## Log-Odds (Logit)

The log-odds or logit function is used to map the odds to a continuous range:

$$ \ln\left(\frac{P}{1-P}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_n X_n $$

## Binary Cross-Entropy Loss

The binary cross-entropy loss is a common way to represent the cost function in logistic regression. It is provided by: for a single training example.

$$ J(\theta) = -\left[y \log(\hat{y}) + (1-y) \log(1-\hat{y})\right] $$

where $ \hat{y} $ is the expected probability and $ y $ is the actual label (0 or 1).

## Cost Function for the Entire Dataset

The overall cost function for the full dataset with $ m $ training examples is the mean of the individual costs:

$$ J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}\left[ y^{(i)}\log(\hat{y}^{(i)}) + (1-y^{(i)})\log(1-\hat{y}^{(i)}) \right] $$

## Minimizing the Cost Function

Finding the parameter values ($ \beta $) that minimize the cost function is the aim of the training process. Numerical optimization techniques like gradient descent, Newton's method, or stochastic gradient descent are commonly used to accomplish this.

## Definition

The cost function for logistic regression is defined as:

$$ J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)}))] $$

Here, $ J(\theta) $ is the cost function, $ m $ is the number of training examples, $ y^{(i)} $ is the actual output for the $ i^{th} $ example, $ h_\theta(x^{(i)}) $ is the predicted output, and $ \theta $ represents the parameters of the model.

## Implementation

```python
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def compute_cost(theta, X, y):
    m = len(y)
    h = sigmoid(np.dot(X, theta))
    cost = -1/m * (np.dot(y, np.log(h)) + np.dot((1 - y), np.log(1 - h)))
    return cost

# Example usage:
# Assuming X is your feature matrix and y is the target variable
# Add a column of ones to X for the bias term
X = np.column_stack((np.ones(m), X))
theta = np.zeros(X.shape[1])
cost = compute_cost(theta, X, y)

print("Initial cost:", cost)
```

In this example, sigmoid is the sigmoid activation function, and compute_cost calculates the logistic regression cost using the given formula. This cost is what the optimization algorithm e.g., gradient descent or Newton's method aims to minimize during the training process by adjusting the parameters $ \theta $.
