## Understanding Logistic Regression (Detailed Explanation)

Logistic Regression is a **supervised machine learning algorithm** primarily used for
**binary classification problems**, where the output variable has two possible classes,
such as *yes/no*, *true/false*, or *disease/no disease*.

Despite its name, Logistic Regression is a **classification algorithm**, not a regression
algorithm. The term “regression” comes from the fact that the model computes a linear
combination of input features before applying a nonlinear transformation.

### 1. Linear Model Component

Logistic Regression starts by computing a **linear combination** of input features:

![](https://miro.medium.com/1*_TqRJ9SmwFzRigJhMiN2uw.png)

This linear equation is similar to linear regression, but **its output is not used directly**
for prediction.

### 2. Sigmoid Function (Core of Logistic Regression)

The linear output \(z\) can take any real value (positive or negative).  
To convert this value into a **probability**, Logistic Regression uses the **sigmoid function**:

![Sigmoid Function](https://d1.awsstatic.com/sigmoid.bfc853980146c5868a496eafea4fb79907675f44.png)

The sigmoid function maps any real number into the range **0 to 1**, making it ideal for
probability estimation.

#### Key properties of the sigmoid function:
- Large negative values → output close to **0**
- Large positive values → output close to **1**
- z = 0 → output = **0.5**

**Sigmoid Function Graph (Reference):**  
![Sigmoid Function Graph](https://media.licdn.com/dms/image/v2/D4D12AQGIXdSG7IJCNw/article-cover_image-shrink_600_2000/article-cover_image-shrink_600_2000/0/1694183259537?e=2147483647&v=beta&t=lJ_qEzot0iGYhNpez9XGRNHjS-CDKHn3Wj-6iCQxRO0)

This S-shaped curve explains how logistic regression smoothly transitions between
class predictions instead of making abrupt decisions.

### 3. Probability Interpretation

The output of the sigmoid function is interpreted as:

\[
P(y = 1 | x)
\]

- If probability ≥ 0.5 → class **1**
- If probability < 0.5 → class **0**

This threshold can be adjusted depending on the problem, especially in medical
applications where recall is critical.

### 4. Loss Function: Binary Cross-Entropy

To train the model, Logistic Regression minimizes the **binary cross-entropy loss**:

![](https://miro.medium.com/v2/resize:fit:1400/1*dEZxrHeNGlhfNt-JyRLpig.png)

This loss function penalizes:
- Confident wrong predictions heavily
- Correct predictions lightly

**Loss Function Visualization:**  
![](https://www.loekvandenouweland.com/assets/logloss/logloss.jpg)

### 5. Gradient Descent Optimization

Logistic Regression uses **gradient descent** to minimize the loss function.

Steps involved:
1. Initialize weights and bias
2. Compute predictions using sigmoid
3. Calculate loss
4. Compute gradients
5. Update parameters
6. Repeat until convergence

Over time, the loss decreases, indicating improved predictions.
![](https://miro.medium.com/v2/resize:fit:1400/0*QGWG1SbMjNjQ2wWk.png)
A downward-sloping curve confirms that the model is learning effectively.

### 6. Decision Boundary

Logistic Regression creates a **linear decision boundary** that separates the two classes.
In higher dimensions, this boundary becomes a hyperplane.

Decision Boundary Visualization:**  
![](https://2.bp.blogspot.com/-eCQJ4j3wqw0/Tr8-bwNir1I/AAAAAAAAApg/mBLTGATfgI0/s1600/plot3.png)

### 7. Importance of Feature Scaling

Since gradient descent relies on distance and magnitude, features with large values
can dominate the learning process.

Standardization ensures:
- Faster convergence
- Stable gradients
- Fair weight updates

This is why **StandardScaler** is used before training.

## Dataset Used

The **Framingham Heart Study dataset** was used for this task to predict the likelihood
of developing coronary heart disease within 10 years.

- **Target Variable:** `TenYearCHD`
  - 0 → No heart disease
  - 1 → Heart disease
- **Features:** Demographic, lifestyle, and medical attributes
- **Nature of problem:** Binary classification

This dataset is well-suited for Logistic Regression due to its probabilistic interpretation
and medical relevance.

## Approach Followed to Solve the Task

### Logistic Regression from Scratch

The scratch implementation involved manually coding:
- Sigmoid function
- Weight and bias initialization
- Gradient descent
- Binary prediction logic

This approach helped in understanding:
- How probabilities are computed
- How loss is minimized
- How model parameters evolve over time

### Logistic Regression using scikit-learn

Scikit-learn’s implementation abstracts the mathematical complexity and provides:
- Optimized solvers
- Built-in regularization
- Faster and more stable convergence

This model served as a **benchmark** for performance comparison.

## Evaluation Metrics Used

Since the problem is a **binary classification task**, the following metrics were used:

- **Accuracy** – overall correctness
- **Precision** – reliability of positive predictions
- **Recall** – ability to detect actual heart disease cases
- **F1-score** – balance between precision and recall

Using multiple metrics ensured a well-rounded evaluation.

## Conclusion

Logistic Regression is a powerful and interpretable classification algorithm,
especially suitable for medical decision-making.

Implementing it from scratch builds strong conceptual clarity, while using
scikit-learn demonstrates how machine learning is applied efficiently in practice.

This task reinforced the importance of understanding both **algorithm internals**
and **high-level ML tools** for building reliable predictive models.