# 🧠 Assignment: Classification Using Logistic Regression with Confusion Matrix Evaluation

## 🎯 Objective
Build a robust classification model using Logistic Regression. Apply advanced data preprocessing and evaluate the model using confusion matrix and other key metrics. Focus on handling imbalanced datasets and extracting meaningful insights from predictions.




## 📂 Dataset
- Choose a classification dataset with binary or multiclass labels.
- Examples: Breast Cancer, Titanic, Credit Card Fraud, or any relevant CSV dataset.
- Make sure the dataset includes categorical and numerical features.

https://www.kaggle.com/datasets/vjchoudhary7/hr-analytics-case-study

## 🧩 Tasks

### Task 1: Data Loading & Exploration
- Load the dataset using `pandas`
- Display class distribution, null values, data types
- Visualize the class imbalance using bar charts or pie charts


### Task 2: Data Preprocessing
- Handle missing data using suitable imputation techniques
- Encode categorical variables using One-Hot Encoding or Label Encoding
- Perform feature scaling on numerical features
- Use correlation heatmap to detect multicollinearity
- Apply feature selection to keep only top important features


### Task 3: Train-Test Split
- Split the dataset into training and testing sets (80-20)
- Use `stratify` parameter to maintain class distribution


### Task 4: Model Building
- Train a Logistic Regression model using `sklearn.linear_model`
- Use `class_weight='balanced'` to handle class imbalance
- Display model coefficients and interpret them


### Task 5: Model Evaluation
- Generate predictions on the test set
- Evaluate using:
  - Confusion matrix (plot using seaborn)
  - Accuracy, Precision, Recall, F1-score
  - ROC Curve and AUC score
- Explain false positives/negatives impact in your use case


### Task 6: Advanced Analysis
- Apply cross-validation and compare accuracy
- Tune hyperparameters (e.g., C, penalty) using GridSearchCV
- Report and compare results


## ✅ Deliverables
- Notebook with all preprocessing, code, and plots
- Clearly explained markdown insights
- Link to dataset or download script (if applicable)
- Optional: Summary table of all metric scores and cross-validation results
