# Logistic Regression

### Overview
This notebook guides you through the process of building a logistic regression model from scratch. We'll cover the basics of logistic regression, implement the model in Python, and evaluate its performance on a sample dataset.

### Motivation
Logistic regression is a fundamental algorithm in machine learning used for binary classification problems. It's a great starting point for understanding more complex models like neural networks. By implementing logistic regression from scratch, you'll gain a deeper understanding of the underlying math and be able to apply it to real-world problems.

### Key Components
*   **Logistic Function**: The logistic function, also known as the sigmoid function, maps any real-valued number to a value between 0 and 1. This is essential for logistic regression , as it allows us to model probabilities.
*   **Cost Function**: The cost function measures the difference between the predicted probabilities and the actual labels . We'll use the binary cross-entropy loss function, which is commonly used for binary classification problems .
*   **Gradient Descent**: Gradient descent is an optimization algorithm used to minimize the cost function. We'll use it to update the model's parameters and improve its performance.
*   **Model Evaluation**: We'll evaluate the model's performance using metrics like accuracy, precision, recall, and F1-score.

### Conclusion
By the end of this notebook, you'll have a solid understanding of logistic regression and be able to implement it from scratch. You'll also learn how to evaluate the model's performance and identify areas for improvement.

In [1]:
# import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Generate Sample Data

In [6]:
np.random.seed(42)  # For reproducibility

# Generate synthetic data
temperature = np.random.uniform(20, 40, 20)
humidity = np.random.uniform(30, 100, 20)
wind_speed = np.random.uniform(0, 20, 20)

# Simulate Labels
labels = np.where((humidity > 70) | (wind_speed > 15), 0, 1)

# Create DataFrame
df = pd.DataFrame({
    'temperature': temperature,
    'humidity': humidity,
    'wind_speed': wind_speed,
    'weather': labels
})

# Display the first few rows of the DataFrame
df.head(5)

Unnamed: 0,temperature,humidity,wind_speed,weather
0,27.490802,72.829703,2.440765,0
1,39.014286,39.76457,9.903538,1
2,34.639879,50.450125,0.68777,1
3,31.97317,55.645329,18.186408,0
4,23.120373,61.924899,5.1756,1


### Build Logistic Regression Model

In [4]:
# Define the sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

class LogisticRegression:
    def __init__(self, learning_rate=0.01, num_iterations=1000):
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.num_iterations):
            linear_model = np.dot(X, self.weights) + self.bias
            y_pred = sigmoid(linear_model)

            dw = (1 / n_samples) * np.dot(X.T, (y_pred - y))
            db = (1 / n_samples) * np.sum(y_pred - y)

            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict_proba(self, X):
        return sigmoid(np.dot(X, self.weights) + self.bias)

    def predict(self, X):
        return np.where(self.predict_proba(X) >= 0.5, 1, 0)


In [7]:
from sklearn.model_selection import train_test_split

# Split data
X = df[['temperature', 'humidity', 'wind_speed']].values
y = df['weather'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train
model = LogisticRegression(learning_rate=0.01, num_iterations=1000)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)


In [8]:
def precision_recall(y_true, y_pred):
    TP = np.sum((y_true == 1) & (y_pred == 1))
    FP = np.sum((y_true == 0) & (y_pred == 1))
    FN = np.sum((y_true == 1) & (y_pred == 0))

    precision = TP / (TP + FP) if (TP + FP) != 0 else 0
    recall    = TP / (TP + FN) if (TP + FN) != 0 else 0

    return precision, recall

precision, recall = precision_recall(y_test, y_pred)
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")


Precision: 0.50
Recall: 1.00
