# 📧 Spam Classification using Logistic Regression

This project uses **logistic regression** to predict whether an email is spam or not, based on the [Spambase dataset](https://archive.ics.uci.edu/ml/datasets/spambase). It is a complete machine learning pipeline built in Python.

## 🔍 Objective
Build a **binary classification model** to distinguish between spam and non-spam emails using supervised learning techniques.

## 📁 Dataset
- Source: UCI Machine Learning Repository
- Number of samples: 4,601 emails
- Number of features: 57
- Target: 
  - `1` → Spam  
  - `0` → Not Spam

## 🛠️ Technologies & Libraries
- Python (Pandas, NumPy, Matplotlib, Seaborn)
- Scikit-learn
- Jupyter Notebook

## 📊 Process Overview
1. **Data Cleaning & Preprocessing**
   - Normalization
   - Check for missing values
   - Train-test split (70/30)
   
2. **Exploratory Data Analysis (EDA)**
   - Boxplots for outlier detection
   - Correlation matrix and heatmap
   - Class distribution

3. **Model Building**
   - Logistic Regression (baseline)
   - Evaluation using:
     - Accuracy
     - Confusion Matrix
     - Classification Report
     - ROC Curve & AUC

## 🧠 Results
- **Accuracy**: 92.8%
- **Precision**: 93% (for spam)
- **Recall**: 89% (for spam)
- **F1-score**: 91% (for spam)
- **AUC Score**: ≈ 0.97

## 📈 Visualization
- Heatmap of correlations
- ROC Curve (Receiver Operating Characteristic)
- Boxplots to analyze feature distributions

## ✅ Conclusion
The logistic regression model performs well on this classification problem with high accuracy and good generalization. This project serves as a great introduction to binary classification problems in machine learning.

---

Feel free to explore the notebook to understand each step!

