# Feature Engineering

- **Purpose:** Missing value handling and feature engineering for fraud detection  
- **Author:** Devbrew LLC  
- **Last Updated:** October 18, 2025  
- **Status:** In Progress  
- **License:** Apache 2.0 (Code) | Non-commercial (Data)

---

## Dataset License Notice

This notebook uses the **IEEE-CIS Fraud Detection dataset** from Kaggle.

**Dataset License:** Non-commercial research use only
- You must download the dataset yourself from [Kaggle IEEE-CIS Competition](https://www.kaggle.com/c/ieee-fraud-detection)
- You must accept the competition rules before downloading
- Cannot be used for commercial purposes
- Cannot redistribute the raw dataset

**Setup Instructions:** See [`../data_catalog/README.md`](../data_catalog/README.md) for download instructions.

**Code License:** This notebook's code is licensed under Apache 2.0 (open source).

---

## Notebook Configuration

### Environment Setup

We configure the Python environment with standardized settings, import required libraries, and set a fixed random seed for reproducibility. This ensures consistent results across runs and enables reliable experimentation.

These settings establish the foundation for all feature engineering operations.

In [2]:
import warnings
from pathlib import Path
import json
from typing import Optional, Tuple, List

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns

# Configuration
warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 100)
pd.set_option("display.float_format", '{:.2f}'.format)

# Plotting configuration
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)
plt.rcParams["font.size"] = 10

# Reproducibility
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

print("\nEnvironment configurated successfully")
print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")


Environment configurated successfully
pandas: 2.3.3
numpy: 2.3.3
