# Data Inspection Documentation
## Credit Card Default Analysis

## What to Look For When Reviewing Data

### 1. Data Structure and Quality Checks
- **Number of Records**: Total number of credit card clients in the dataset
- **Number of Features**: Count and relevance of available attributes
- **Missing Values**: Identify any gaps in the data that need addressing
- **Data Types**: Ensure proper types (numeric for amounts, categorical for status)
- **Duplicates**: Check for duplicate entries that could skew analysis

### 2. Feature-Specific Review

#### Demographic Variables
- **Age**: Range and distribution (should be reasonable for credit card holders)
- **Education**: Categories represented and their encoding
- **Marriage**: Categories and their representation
- **Gender**: Binary encoding and distribution

#### Financial Variables
- **Credit Limit**: Range and reasonableness of values
- **Bill Amounts**: Scale and currency consistency
- **Payment Amounts**:
  - Should be less than or equal to bill amounts
  - Check for negative values
  - Currency consistency

#### Target Variable (Default)
- **Encoding**: Typically binary (0/1)
- **Class Balance**: Distribution of default vs non-default
- **Data Quality**: No missing values in this critical field

### 3. Initial Red Flags to Watch For
- Unrealistic values (e.g., negative ages)
- Inconsistent currency amounts
- Impossible combinations (e.g., payments > bills)
- Outliers that might need investigation
- Encoding inconsistencies
- Unexpected null values

## Initial Data Findings

[This section will be populated as we analyze the data]

In [None]:
# Code to display basic data info
import pandas as pd
import numpy as np

# Display basic information about the dataset
print("Dataset Overview:")
print("-----------------")
print(df.info())
print("\nMissing Values Summary:")
print(df.isnull().sum())
print("\nBasic Statistics:")
print(df.describe())