# Loan Approval Prediction - Binary Classification

**Notebook:** w01_d01_EDA_baseline_models.ipynb  
**Author:** Alberto Diaz Durana  
**Date:** November 2025  
**Purpose:** Build baseline classification models to predict loan approval status for interview preparation

---

## Objectives

- Perform exploratory data analysis on loan application dataset (~600 records)
- Preprocess data: handle missing values, encode categorical features, engineer income ratios
- Train and evaluate 3 baseline models: Logistic Regression, Decision Tree, Random Forest
- Compare model performance using accuracy, precision, recall, and F1-score

## Business Context

This analysis prepares a working baseline for a 50-minute technical interview, demonstrating end-to-end data science workflow from data quality assessment through model evaluation and selection.

---

In [1]:
## 1. Setup & Environment Configuration

# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Import ML libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_rows', 100)

# Plotting settings
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Define paths
PROJECT_ROOT = Path.cwd().parent
DATA_RAW = PROJECT_ROOT / 'data' / 'raw'
DATA_PROCESSED = PROJECT_ROOT / 'data' / 'processed'
OUTPUTS = PROJECT_ROOT / 'outputs' / 'figures' / 'eda'

# Create output directory if needed
OUTPUTS.mkdir(parents=True, exist_ok=True)

print("Environment setup complete")
print(f"Project root: {PROJECT_ROOT}")
print(f"Data directory: {DATA_RAW}")
print(f"Output directory: {OUTPUTS}")

Environment setup complete
Project root: d:\data-science
Data directory: d:\data-science\data\raw
Output directory: d:\data-science\outputs\figures\eda
