# Logistic Regression Classification Task

This notebook implements **binary classification using Logistic Regression** as part of the internship task.

## Steps:
1. Load and explore dataset  
2. Train/Test split and standardize features  
3. Fit Logistic Regression model  
4. Evaluate with confusion matrix, precision, recall, ROC-AUC  
5. Tune threshold and explain sigmoid function  

---


## 1. Load Dataset

We use the provided dataset (`data.csv`). Let's first inspect its structure.


In [None]:
import pandas as pd

# Load dataset
data = pd.read_csv("data.csv")
data.head()

## 2. Train/Test Split & Standardization

We split the dataset into training and testing sets, and standardize features for better model performance.


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Assuming last column is target
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## 3. Logistic Regression Model

We fit a Logistic Regression model using **scikit-learn**.


In [None]:
from sklearn.linear_model import LogisticRegression

# Train model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Predictions
y_pred = log_reg.predict(X_test)
y_prob = log_reg.predict_proba(X_test)[:,1]

## 4. Model Evaluation

We evaluate using:
- Confusion Matrix  
- Precision, Recall, F1-score  
- ROC Curve & AUC


In [None]:
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score, roc_curve
import matplotlib.pyplot as plt

# Confusion Matrix & Classification Report
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# ROC-AUC
auc = roc_auc_score(y_test, y_prob)
print("ROC-AUC Score:", auc)

# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
plt.plot(fpr, tpr, label=f"AUC = {auc:.2f}")
plt.plot([0,1],[0,1],'--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.show()

## 5. Threshold Tuning & Sigmoid Function

By default, Logistic Regression uses **0.5** as the decision threshold.  
We can adjust this threshold to balance between **precision** and **recall**.

The **sigmoid function** maps any real value into the range (0,1):  

\[ \sigma(z) = \frac{1}{1 + e^{-z}} \]

This gives the probability of belonging to the positive class.


In [None]:
import numpy as np

# Example of threshold tuning
threshold = 0.3
y_pred_thresh = (y_prob >= threshold).astype(int)

print("Confusion Matrix with threshold=0.3:\n", confusion_matrix(y_test, y_pred_thresh))
print("\nClassification Report:\n", classification_report(y_test, y_pred_thresh))

## Conclusion

We successfully built a binary classifier using Logistic Regression, evaluated it using multiple metrics, and explored how threshold tuning affects results.
