# Data Exploration - Credit Risk Prediction System

**Notebook Purpose**: Initial exploration of the Home Credit dataset
**Author**: Capstone Project Team  
**Date**: February 2, 2026

## Objectives
1. Load and inspect the base training data
2. Understand data structure and types
3. Analyze target variable distribution
4. Identify missing values
5. Explore key features
6. Generate initial insights

## 1. Import Required Libraries

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Visualization settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("Set2")
%matplotlib inline

print("Libraries imported successfully!")

In [None]:
# Load base training data
from src.data import DataLoader

loader = DataLoader(data_type='train', file_format='parquet')
base_df = loader.load_base_table()

print(f"\nâœ“ Base training data loaded successfully!")
print(f"Shape: {base_df.shape}")

In [None]:
# Add parent directory to path to import config
import sys
sys.path.append('..')

# Import project configuration
from config import *

print(f"Root Directory: {ROOT_DIR}")
print(f"Data Directory: {DATA_DIR}")
print(f"Parquet Directory: {PARQUET_DIR}")

## 2. Load Configuration and Data