# **Portuguese Bank Marketing** 🏦

## 📊 Dataset Overview

This dataset contains information about **direct marketing campaigns** (phone calls) conducted by a Portuguese banking institution. The goal of the campaign was to promote **term deposit subscriptions** — a financial product that boosts bank revenue.

The target variable is `y`, which indicates whether the client **subscribed to a term deposit**:
- `yes`: client subscribed
- `no`: client did not subscribe

---

## 📁 Dataset Shape
- **Rows**: 32,950  
- **Columns**: 16

---

## 🔍 Feature Description

| Column        | Description |
|---------------|-------------|
| `age`         | Age of the client (numeric) |
| `job`         | Type of job (admin, technician, retired, etc.) |
| `marital`     | Marital status (married, single, divorced) |
| `education`   | Education level (basic, high school, university) |
| `default`     | Has credit in default? (yes, no, unknown) |
| `housing`     | Has a housing loan? (yes or no) |
| `loan`        | Has a personal loan? (yes or no) |
| `contact`     | Communication type (cellular or telephone) |
| `month`       | Last contact month (e.g., may, jun) |
| `day_of_week` | Last contact day of the week (e.g., mon, tue) |
| `duration`    | Duration of the last contact in seconds — ⚠️ *should not be used in training* (leaks info) |
| `campaign`    | Number of contacts performed during this campaign |
| `pdays`       | Days since the client was last contacted (-1 means never contacted before) |
| `previous`    | Number of contacts before this campaign |
| `poutcome`    | Outcome of the previous marketing campaign |
| `y`           | **Target**: Has the client subscribed to a term deposit? (yes/no) |

---

## 💡 Notes for Students

- The dataset has **categorical features** that need **encoding**
- Some features may contain **unknown or missing values**
- The target variable is **imbalanced** (~11% said "yes")
- `duration` is highly predictive but should be excluded during training to avoid data leakage
- Great dataset to practice:
  - Exploratory Data Analysis (EDA)
  - Data cleaning & encoding
  - Imbalanced classification techniques
  - Logistic Regression, Decision Trees, SVM, KNN


# **🏆 💥 ML Battle: Predict Term Deposit Subscriptions! 💥**

## 🧠 Machine Learning Challenge: Bank Term Deposit Prediction

You're hired as a data scientist at a Portuguese bank that's losing revenue because clients aren't depositing money as often as before.

Your mission? 🕵️‍♀️  
Build the **most accurate model** to **predict which clients will subscribe** to a term deposit — and help the bank target the right people in its next campaign.

---

### 🏆 Competition Rules

- Your goal is to maximize **accuracy on the test set**.
- Do **not** use the `duration` feature during training (you’ll be disqualified for leakage ⚠️).
- You must split your data into **train/test sets** and optionally use **cross-validation**.
- Use any classification model you want (Logistic Regression, Decision Tree, Random Forest, etc.)
- Preprocess the data (encoding, scaling, handling imbalance) as you see fit.

---

### ⚙️ Requirements

- ✅ Clean the data
- ✅ Encode categorical variables
- ✅ Handle imbalance in target variable `y`
- ✅ Evaluate your model with accuracy, confusion matrix, and classification report

---

### 🎯 Scoring

- Submit your accuracy score.
- We’ll rank the results **live**.
- Top 3 students will get bragging rights... and maybe chocolate 🍫😉

---

### 📛 Disqualification Alert
- Using `duration` for training = **disqualified**
- Copying code = ❌
