# Logistic Regression with scikit-learn

## Introduction
Logistic Regression is a **supervised learning** algorithm used primarily for **binary classification** (though it can be extended to multi-class problems). Despite the name "regression," logistic regression is actually used for classification tasks.

### How it Works
- **Linear Model**: Logistic Regression starts with a linear combination of input features \(x_1, x_2, \dots, x_n\). It computes a linear predictor \(z = w_1 x_1 + w_2 x_2 + ... + w_n x_n + b\).
- **Sigmoid Function**: Instead of using this \(z\) directly, logistic regression feeds \(z\) through the sigmoid (logistic) function \(\sigma(z) = \frac{1}{1 + e^{-z}}\). The output is a number between 0 and 1, which can be interpreted as a **probability** of belonging to the positive class.

### Why Use Logistic Regression?
- It is easy to implement and interpret.
- It provides probabilities directly (i.e., it can give you the chance that a sample is in one class vs. another).
- Works well for linearly separable data (and can be extended with regularization or polynomial features).

In this notebook, we will:
1. Load a dataset (Iris dataset from scikit-learn).
2. Explore basic features.
3. Apply a Logistic Regression model.
4. Evaluate our model's performance.

## scikit-learn (sklearn) Library
- **scikit-learn** is one of the most popular machine learning libraries in Python.
- It provides tools for model training, evaluation, and data preprocessing.
- The `LogisticRegression` class is part of `sklearn.linear_model`.

Let's get started!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# 1. Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target labels

print("Feature names:", iris.feature_names)
print("Target names:", iris.target_names)
print("First 5 rows of X:\n", X[:5])
print("First 5 labels of y:\n", y[:5])

### Train/Test Split
We'll split the Iris dataset into **training** and **testing** sets. A common split is 80% for training and 20% for testing, but any ratio can be used (like 70%-30%, 75%-25%, etc.).

In [None]:
# 2. Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=42,
                                                    stratify=y)

print("Train set size:", X_train.shape)
print("Test set size: ", X_test.shape)

### Training the Logistic Regression Model
We create an instance of `LogisticRegression` and fit it on our training data (`X_train` and `y_train`).

In [None]:
# 3. Train Logistic Regression model
log_reg = LogisticRegression(max_iter=200, random_state=42)
log_reg.fit(X_train, y_train)

print("Logistic Regression trained!")

### Model Evaluation
After training, we'll use the model to predict on the **test** set and measure how well it performs.

Common metrics include:
- **Accuracy**: Percentage of correct predictions.
- **Confusion Matrix**: A grid showing counts of actual vs. predicted labels.
- **Precision, Recall, F1-Score**: Detailed classification performance per class.

In [None]:
# 4. Predictions on the test set
y_pred = log_reg.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Classification report
report = classification_report(y_test, y_pred, target_names=iris.target_names)
print("Classification Report:\n", report)

## Observations
- The **accuracy** here is often quite high on the Iris dataset because it's a relatively easy classification task.
- The **confusion matrix** helps to see which classes are confused with each other.
- The **classification report** shows precision, recall, and F1-score for each class.

## Next Steps
- Experiment with **different test sizes** or **different random states**.
- Try adding **regularization parameters** or other hyperparameters (like `C`, `penalty`) in the `LogisticRegression` model.
- Consider using **pipelines** for preprocessing steps (like scaling or feature engineering) if you have more complex data.

### Conclusion
In this notebook, we demonstrated how to use **Logistic Regression** from **scikit-learn** to classify the Iris dataset. We covered:
1. Loading the dataset.
2. Splitting into train and test sets.
3. Training a logistic regression model.
4. Evaluating the model with accuracy, confusion matrix, and a classification report.

Logistic regression remains a strong baseline and a fundamental model in machine learning, especially for classification problems.