<a href="https://colab.research.google.com/github/QusaiALBahri/Day_13_logistic_regression/blob/main/Day_13_logistic_regression_miniclass.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Logistic Regression Tutorial

Logistic Regression is a statistical model used for binary classification. It predicts the probability that a given data point belongs to a particular class (e.g., spam or not spam, tumor benign or malignant). Unlike linear regression, which outputs a continuous value, logistic regression uses a sigmoid function to map any real-valued number to a value between 0 and 1, which can be interpreted as a probability.

**Key Concepts:**

*   **Binary Classification:** Classifying data into one of two categories.
*   **Sigmoid Function:** A mathematical function that squashes any input value into the range [0, 1]. The formula is:
    $$ \sigma(z) = \frac{1}{1 + e^{-z}} $$
    where $z$ is the output of a linear combination of features and weights.
*   **Decision Boundary:** A threshold (usually 0.5) applied to the sigmoid output to make a final classification. If the probability is above the threshold, the data point is classified into one class; otherwise, it's classified into the other.
*   **Cost Function:** Measures the performance of the logistic regression model. A common cost function is the cross-entropy loss.
*   **Gradient Descent:** An optimization algorithm used to find the optimal weights and biases that minimize the cost function.

Let's walk through a simple example using Python and scikit-learn.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [None]:
# Generate some synthetic data for demonstration
np.random.seed(0)
X = np.random.rand(100, 1) * 10
y = (X[:, 0] > 5).astype(int) + np.random.randint(0, 2, 100)  # Create some overlap

In [None]:
y = np.clip(y, 0, 1) # Ensure labels are 0 or 1
#print(X,y)

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#print(X_train)
# Create and train the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

In [None]:
# Make predictions on the test set
y_pred = model.predict(X_test)
#print(y_pred)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

In [None]:
# Visualize the decision boundary (for a 1D example)
x_values = np.linspace(0, 10, 100).reshape(-1, 1)
y_prob = model.predict_proba(x_values)[:, 1]

plt.scatter(X_test, y_test, color='black', zorder=3, edgecolors='white')
plt.plot(x_values, y_prob, color='red', linewidth=2)
plt.axhline(0.5, color='blue', linestyle='--', label='Decision Boundary')
plt.xlabel('Feature')
plt.ylabel('Probability / Class')
plt.title('Logistic Regression Decision Boundary')
plt.legend()
plt.show()

**Explanation of the Code:**

1.  **Import Libraries:** We import `numpy` for numerical operations, `matplotlib.pyplot` for plotting, and modules from `sklearn` for splitting data, creating the logistic regression model, and evaluating it.
2.  **Generate Data:** We create simple synthetic data where the target variable `y` is roughly determined by whether the feature `X` is greater than 5. We add some noise to make it a bit more realistic.
3.  **Split Data:** We split the data into training and testing sets to evaluate how well our model generalizes to unseen data.
4.  **Create and Train Model:** We initialize a `LogisticRegression` model and train it using the `fit()` method on the training data.
5.  **Make Predictions:** We use the trained model to predict the class labels for the test set using the `predict()` method.
6.  **Evaluate Model:** We calculate the accuracy of the model by comparing the predicted labels (`y_pred`) to the actual labels (`y_test`).
7.  **Visualize Decision Boundary:** We generate a range of feature values (`x_values`) and use `predict_proba()` to get the predicted probabilities for each value. We then plot these probabilities and the decision boundary at 0.5 to visualize how the model makes classifications based on the feature value.

This tutorial provides a basic understanding of logistic regression and how to implement it using scikit-learn. You can extend this to more complex datasets and explore other evaluation metrics and techniques like regularization.