# DD and PD Calculation Using Accounting Data (Book Values)

This notebook walks through the `dd_pd_accounting.py` script step by step.  
We will:

1.  Install dependencies  
2.  Set up imports and file paths  
3.  Load and inspect the accounting data  
4.  Standardize column names, extract years  
5.  Load and merge market cap and equity volatility  
6.  Define the Accounting-Based Merton Solver
7.  Run the Accounting-Based Solver and Compute DDa/PDa
8.  Compute Distance to Default (DDa) and Probability of Default (PDa)  
9.  Export results and append diagnostics to log  
10. Summarize next steps

## 1.  Install dependencies 

In [6]:
# 1. Install needed packages (run once per environment)
%pip install pandas numpy scipy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## 2. Imports and File Paths

Import libraries and define all paths relative to the project root.

In [7]:
import pandas as pd
import numpy as np
from scipy.stats import norm
from scipy.optimize import root
from pathlib import Path
import logging
import re

# Helper function for standardizing tickers
def standardize_ticker(t):
    return str(t).split('.', 1)[0] if pd.notnull(t) else t

# 1. Dynamically locate your workspace root
base_dir = Path().resolve()
print("Workspace root:", base_dir)

# 2. Time horizon for the Merton model (1 year)
T = 1.0

# 3. Define all file paths relative to base_dir
model_fp      = base_dir / 'data' / 'clean' / 'Book2_clean.csv'
marketcap_fp  = base_dir / 'data' / 'clean' / 'all_banks_marketcap_annual_2016_2023.csv'
vol_fp        = base_dir / 'data' / 'clean' / 'equity_volatility_by_year.csv'
rf_fp         = base_dir / 'data' / 'clean' / 'fama_french_factors_annual_clean.csv'
log_fp        = base_dir / 'data' / 'logs' / 'dd_pd_accounting_log.txt'
output_fp     = base_dir / 'data' / 'merged_inputs' / 'dd_pd_accounting.csv'

# 4. Ensure directories exist
for directory in (model_fp.parent, marketcap_fp.parent, vol_fp.parent, rf_fp.parent, log_fp.parent, output_fp.parent):
    directory.mkdir(parents=True, exist_ok=True)

# 5. Quick existence check for inputs and outputs
for name, fp in [
    ('Accounting input',    model_fp),
    ('Market cap input',    marketcap_fp),
    ('Equity vol input',    vol_fp),
    ('Risk-free input',     rf_fp),
    ('Account log file',    log_fp)
]:
    status = "FOUND" if fp.exists() else f"MISSING ({fp.name})"
    print(f"{name:20s} → {status}")

Workspace root: /Users/guillaumebld/Library/CloudStorage/OneDrive-LoyolaUniversityChicago/Group Quantitative Research Workspace – Summer 2025 - Results
Accounting input     → FOUND
Market cap input     → FOUND
Equity vol input     → FOUND
Risk-free input      → FOUND
Account log file     → FOUND


## 3. Load and Inspect Accounting Data

- Read `Book2_clean.csv` which must contain `total_assets` and `debt_total`.  
- Display row and unique `(instrument, year)` counts.

In [8]:
print('[INFO] Loading accounting data...')
df = pd.read_csv(model_fp)
print(f"→ {df.shape[0]} rows, {df[['instrument','year']].drop_duplicates().shape[0]} unique (instrument, year)")

[INFO] Loading accounting data...
→ 1425 rows, 1424 unique (instrument, year)


## 4. Standardize Column Names & Extract Year

- Lowercase and replace spaces/dashes with underscores  
- If `date` exists, parse it and extract year; otherwise ensure `year` is integer.

In [9]:
# Standardize column names
col_map = {c: re.sub(r'_+', '_', c.strip().lower()
                        .replace(' ', '_')
                        .replace('-', '_'))
           for c in df.columns}
df = df.rename(columns=col_map)

# Extract or clean 'year'
if 'date' in df.columns:
    df['date'] = pd.to_datetime(df['date'], errors='coerce')
    df['year'] = df['date'].dt.year
else:
    df = df[df['year'].notnull()].copy()
    df['year'] = df['year'].astype(float).astype(int)

df['Year'] = df['year']  # for merges down the line
df.head()

Unnamed: 0,instrument,year,rit_rf,rit,new_wacc,unnamed:_5,"weighted_average_cost_of_capital,_(%)",beta_levered,beta_unlevered,environmental_pillar_score,...,"wacc_tax_rate,_(%)","wacc_cost_of_debt,_(%)","wacc_debt_weight,_(%)","wacc_equity_weight,_(%)",total_assets,debt_total,d/e,dummylarge,dummymid,Year
0,JPM,2016,0.245917,0.247917,1.584416,,4.864093,1.536081,0.606612,81.766775,...,29.05903,2.126534,66.065751,30.587868,2490972.0,495354.0,2.159868,1.0,0.0,2016
1,JPM,2017,0.223383,0.231383,1.657157,,4.749802,1.210898,0.579144,83.07977,...,28.38487,2.189746,58.567477,38.450277,2533600.0,494798.0,1.5232,1.0,0.0,2017
2,JPM,2018,-0.22784,-0.20974,1.86956,,5.749097,1.141127,0.577758,79.42626,...,28.38487,2.730852,55.921812,41.071387,2622532.0,533627.0,1.361576,1.0,0.0,2018
3,JPM,2019,0.161746,0.183146,1.548871,,4.696227,1.223974,0.561719,89.750677,...,26.62674,2.017936,59.874307,37.262556,2687379.0,516093.0,1.606822,1.0,0.0,2019
4,JPM,2020,-0.128976,-0.124576,0.989988,,3.732273,1.217916,0.487232,90.723262,...,21.07742,1.306948,63.283852,33.304242,3384757.0,542102.0,1.900174,1.0,0.0,2020


## 5. Load and Merge Market Cap & Equity Volatility

- Load annual market cap, standardize tickers, merge on `(symbol, Year)`  
- Load equity volatility, standardize, merge on `(ticker_prefix, Year)`

In [10]:
# --- Step 5: Prepare merge keys in df ---
# 5.0 Ensure df has ticker_prefix & Year
if 'ticker_prefix' not in df.columns:
    df['ticker_prefix'] = df['instrument'].apply(standardize_ticker)
if 'Year' not in df.columns:
    # prefer existing year column
    if 'year' in df.columns:
        df['Year'] = df['year']
    else:
        df['Year'] = pd.to_datetime(df['date'], errors='coerce').dt.year

# Quick debug
print("After prep, df columns:", df.columns.tolist())
print("Unique df keys:", df[['ticker_prefix','Year']].drop_duplicates().head())

# --- 5.1 Load and merge market capitalization ---
mc = pd.read_csv(marketcap_fp)

# Compute market_cap if needed
if 'market_cap' not in mc.columns:
    mc['market_cap'] = mc['dec_price'] * mc['shares_outstanding'] / 1_000_000

# Ensure mc has the merge keys
if 'ticker_prefix' not in mc.columns:
    mc['ticker_prefix'] = mc['symbol'].apply(standardize_ticker)
if 'Year' not in mc.columns:
    mc['Year'] = mc['year']

# Debug
print("mc columns:", mc.columns.tolist())
print("Sample mc keys:", mc[['ticker_prefix','Year','market_cap']].head())

# Merge
df = df.merge(
    mc[['ticker_prefix','Year','market_cap']],
    on=['ticker_prefix','Year'],
    how='left'
)

# --- 5.2 Load and merge equity volatility ---
ev = pd.read_csv(vol_fp)

# Determine ticker column in ev
ticker_col = 'symbol' if 'symbol' in ev.columns else 'Bank'
if 'ticker_prefix' not in ev.columns:
    ev['ticker_prefix'] = ev[ticker_col].apply(standardize_ticker)
if 'Year' not in ev.columns:
    ev['Year'] = ev['year'] if 'year' in ev.columns else ev['Year']

# Debug
print("ev columns:", ev.columns.tolist())
print("Sample ev keys:", ev[['ticker_prefix','Year','equity_volatility']].head())

# Merge
df = df.merge(
    ev[['ticker_prefix','Year','equity_volatility']],
    on=['ticker_prefix','Year'],
    how='left'
)

# Final back-compatibility
df['equity_vol'] = df['equity_volatility'].fillna(0.25)

# Confirm merge
print("After merges, sample df:")
print(df[['ticker_prefix','Year','market_cap','equity_vol']].head())

After prep, df columns: ['instrument', 'year', 'rit_rf', 'rit', 'new_wacc', 'unnamed:_5', 'weighted_average_cost_of_capital,_(%)', 'beta_levered', 'beta_unlevered', 'environmental_pillar_score', 'social_pillar_score', 'governance_pillar_score', 'esg_score', 'esg_combined_score', 'environmental_pillar_score_1', 'social_pillar_score_1', 'governance_pillar_score_1', 'esg_score_1', 'esg_combined_score_1', 'lnta', 'td/ta', 'price_to_book_value_per_share', 'capital_adequacy_total_(%)', 'wacc_tax_rate,_(%)', 'wacc_cost_of_debt,_(%)', 'wacc_debt_weight,_(%)', 'wacc_equity_weight,_(%)', 'total_assets', 'debt_total', 'd/e', 'dummylarge', 'dummymid', 'Year', 'ticker_prefix']
Unique df keys:   ticker_prefix  Year
0           JPM  2016
1           JPM  2017
2           JPM  2018
3           JPM  2019
4           JPM  2020
mc columns: ['symbol', 'year', 'market_cap', 'ticker_prefix', 'Year']
Sample mc keys:   ticker_prefix  Year    market_cap
0          ABCB  2016  3.004515e+09
1          ABCB  2017

## 6. Define the Accounting-Based Merton Solver

We treat **net equity** $E = A - F$ (assets minus debt) as a call option on the firm's assets. Given:

- $A$: total assets (book value)  
- $F$: total debt (book value)  
- $\sigma_E$: observed equity volatility  
- $r_f$: risk-free rate  
- $T$: time horizon (1 year)  

we solve for:

- $V$: total asset value  
- $\sigma_V$: asset volatility  

by enforcing the two Merton equations:

1. **Option-pricing relation**  
   $$
   V\Phi(d_1) - Fe^{-r_fT}\Phi(d_2) - (A - F) = 0
   $$

2. **Volatility link**  
   $$
   \sigma_E - \frac{V}{(A - F)}\Phi(d_1)\sigma_V = 0
   $$

with  
$$
d_1 = \frac{\ln(V/F) + (r_f + \frac{1}{2}\sigma_V^2)T}{\sigma_V\sqrt{T}}, 
\quad
d_2 = d_1 - \sigma_V\sqrt{T}
$$

We will use `scipy.optimize.root` to find $(V,\sigma_V)$ that makes both expressions zero, similarly as the calculation using market data

In [None]:
from scipy.stats import norm
from scipy.optimize import root

def merton_solver_accounting(row, T=T):
    """
    Solve for asset value V and volatility sigma_V using accounting data.
    Returns (V, sigma_V, status_flag, tag).
    
     Convert accounting values from millions to actual USD for consistency.
    """
    # 1. Extract inputs (convert millions to actual USD)
    rf  = row['rit_rf'] / 100               # convert % to decimal, using rit_rf column
    A   = row['total_assets'] * 1_000_000   # book assets in USD (was millions)
    F   = row['debt_total'] * 1_000_000     # book debt in USD (was millions)
    σ_E = row['equity_vol']                 # equity volatility
    E   = A - F                             # net equity in USD

    # 2. Validate inputs
    if pd.isna(A) or pd.isna(F) or pd.isna(σ_E) or A <= 0 or σ_E <= 0 or F < 0:
        tag = 'missing_or_invalid'
        return np.nan, np.nan, 'invalid', tag
    if F == 0:
        return np.nan, np.nan, 'no_debt', 'no_debt'
    if E <= 0:  # Net equity must be positive
        return np.nan, np.nan, 'negative_equity', 'negative_equity'

    # 3. Define the two Merton equations
    def equations(x):
        V, σ_V = x
        if V <= 0 or σ_V <= 0:
            return [1e6, 1e6]  # Large error for invalid values
        d1 = (np.log(V/F) + (rf + 0.5*σ_V**2)*T) / (σ_V * np.sqrt(T))
        d2 = d1 - σ_V * np.sqrt(T)
        eq1 = V * norm.cdf(d1) - F * np.exp(-rf*T) * norm.cdf(d2) - E
        eq2 = σ_E - (V/E) * norm.cdf(d1) * σ_V
        return [eq1, eq2]

    # 4. Initial guess and solve
    try:
        sol = root(equations, [A, σ_E], method='hybr')
        if sol.success and sol.x[0] > 0 and sol.x[1] > 0:
            V_opt, σ_V_opt = sol.x
            return V_opt, σ_V_opt, 'converged', ''
        else:
            return np.nan, np.nan, 'no_converge', ''
    except:
        return np.nan, np.nan, 'error', 'solver_error'

# Demo on the first row
demo = df.iloc[0]
V0, σ_V0, status0, tag0 = merton_solver_accounting(demo)
print(f" {demo['instrument']} {demo['year']}: V = {V0:.2f}, σ_V = {σ_V0:.4f}, status = {status0}")


# ---- Apply to all rows ----
# Convert apply results to DataFrame with proper columns
results = pd.DataFrame(
    df.apply(merton_solver_accounting, axis=1).tolist(), #  Changed merton_solver to merton_solver_accounting
    columns=['V', 'σ_V', 'status', 'tag'], #  Added 'tag' column to match function output
    index=df.index
)
df[['V', 'σ_V', 'status', 'tag']] = results #  Added 'tag' column assignment

# ---- Check results ----
print(df[['instrument','year','V','σ_V','status','tag']].head()) # Added 'tag' to output

 JPM 2016: V = 2489755337209.08, σ_V = 0.1657, status = converged
  instrument  year             V       σ_V     status tag
0        JPM  2016  2.489755e+12  0.165704  converged    
1        JPM  2017  2.532496e+12  0.136142  converged    
2        JPM  2018  2.623749e+12  0.147320  converged    
3        JPM  2019  2.686545e+12  0.183468  converged    
4        JPM  2020  3.385457e+12  0.321152  converged    


In [None]:
# 7. Define solver, run demo, and compute all results from raw input

import numpy as np
import pandas as pd
from scipy.stats import norm
from scipy.optimize import root
from pathlib import Path

# --- 1. Load the raw INPUT data (not output!) ---
base_dir = Path().resolve()
input_fp = base_dir / 'data' / 'clean' / 'Book2_clean.csv'   # Change if needed!
df = pd.read_csv(input_fp)
print("Loaded input data with shape:", df.shape)

# --- 2. Define the Merton solver ---
T = 1.0

def merton_solver(row, T=T):
    """
    Solve for asset value V and asset volatility sigma_V.
    Returns (V, sigma_V, status_flag).
    Converts debt_total from millions to USD for calculations.
    """
    E   = row['market_cap']
    σ_E = row['equity_vol']
    F   = row['debt_total'] * 1_000_000  # Convert from millions to USD
    r_f = row['rf']

    # Input validation
    if pd.isna(E) or pd.isna(σ_E) or pd.isna(F):
        return np.nan, np.nan, 'missing_input'
    if E <= 0 or σ_E <= 0 or F < 0:
        return np.nan, np.nan, 'invalid_value'
    if F == 0:
        return np.nan, np.nan, 'no_debt'

    # Merton equations
    def equations(x):
        V, σ_V = x
        d1 = (np.log(V/F) + (r_f + 0.5*σ_V**2)*T) / (σ_V * np.sqrt(T))
        d2 = d1 - σ_V * np.sqrt(T)
        eq1 = V * norm.cdf(d1) - F * np.exp(-r_f*T) * norm.cdf(d2) - E
        eq2 = σ_E - (V/E) * norm.cdf(d1) * σ_V
        return [eq1, eq2]

    initial = [E + F, σ_E]
    sol     = root

Loaded input data with shape: (1425, 32)


In [None]:
# 7. Define solver, run demo, and compute all results from raw input

import numpy as np
import pandas as pd
from scipy.stats import norm
from scipy.optimize import root
from pathlib import Path

# --- 1. Load the raw INPUT data (not output!) ---
base_dir = Path().resolve()
input_fp = base_dir / 'data' / 'clean' / 'Book2_clean.csv'   # Change if needed!
df = pd.read_csv(input_fp)
print("Loaded input data with shape:", df.shape)

# --- 2. Define the Merton solver ---
T = 1.0

def merton_solver(row, T=T):
    """
    Solve for asset value V and asset volatility sigma_V.
    Returns (V, sigma_V, status_flag).
    Converts debt_total from millions to USD for calculations.
    """
    E   = row['market_cap']
    σ_E = row['equity_vol']
    F   = row['debt_total'] * 1_000_000  # Convert from millions to USD
    r_f = row['rf']

    # Input validation
    if pd.isna(E) or pd.isna(σ_E) or pd.isna(F):
        return np.nan, np.nan, 'missing_input'
    if E <= 0 or σ_E <= 0 or F < 0:
        return np.nan, np.nan, 'invalid_value'
    if F == 0:
        return np.nan, np.nan, 'no_debt'

    # Merton equations
    def equations(x):
        V, σ_V = x
        d1 = (np.log(V/F) + (r_f + 0.5*σ_V**2)*T) / (σ_V * np.sqrt(T))
        d2 = d1 - σ_V * np.sqrt(T)
        eq1 = V * norm.cdf(d1) - F * np.exp(-r_f*T) * norm.cdf(d2) - E
        eq2 = σ_E - (V/E) * norm.cdf(d1) * σ_V
        return [eq1, eq2]

    initial = [E + F, σ_E]
    sol     = root

Loaded input data with shape: (1425, 32)


## 7. Run the Accounting-Based Solver and Compute DDa/PDa

In this step we will:

1. Apply `merton_solver_accounting` to every row of `df` to get  
   - `asset_value` (V)  
   - `asset_vol` (σ_V)  
   - `merton_status` (convergence flag)  
   - `dd_pd_tag` (e.g. "no_debt")  
2. Compute **Distance to Default** (DDa):  
   $$
     \mathrm{DDa}
     = \frac{\ln(V / F)\;+\;(0 - \tfrac12\,\sigma_V^2)\,T}
            {\sigma_V\,\sqrt{T}}
   $$
3. Compute **Probability of Default** (PDa):  
   $$
     \mathrm{PDa} = \Phi\bigl(-\mathrm{DDa}\bigr)
   $$
4. Set both to `NaN` when `dd_pd_tag == 'no_debt'`.

In [None]:
# 7.1 Apply the solver
print("[INFO] Applying accounting-based Merton solver to each row…")
results = df.apply(merton_solver_accounting, axis=1, result_type='expand')
df[['asset_value','asset_vol','merton_status','dd_pd_tag']] = results

# 7.2 Compute DDa and PDa
V   = df['asset_value']
F   = df['debt_total'] * 1_000_000  #  Convert debt to actual USD for consistent calculation
σ_V = df['asset_vol']

# Numerator: log distance + drift (drift term uses r_f=0 here)
num = np.log(V / F) + (0.0 - 0.5 * σ_V**2) * T
# Denominator: volatility × sqrt(T)
den = σ_V * np.sqrt(T)

df['DDa'] = np.where(
    (df['dd_pd_tag']=='no_debt') | (df['dd_pd_tag']=='negative_equity'), 
    np.nan, 
    num / den
)
df['PDa'] = np.where(
    (df['dd_pd_tag']=='no_debt') | (df['dd_pd_tag']=='negative_equity'), 
    np.nan, 
    norm.cdf(-df['DDa'])
)

# 7.3 Quick check of results
print(df[['instrument','year','asset_value','asset_vol','DDa','PDa']].head())
print("\nSolver status counts:\n", df['merton_status'].value_counts())

[INFO] Applying accounting-based Merton solver to each row…
  instrument  year   asset_value  asset_vol        DDa           PDa
0        JPM  2016  2.489755e+12   0.165704   9.661405  2.198945e-22
1        JPM  2017  2.532496e+12   0.136142  11.925326  4.367087e-33
2        JPM  2018  2.623749e+12   0.147320  10.737247  3.402157e-27
3        JPM  2019  2.686545e+12   0.183468   8.900168  2.788118e-19
4        JPM  2020  3.385457e+12   0.321152   5.543240  1.484631e-08

Solver status counts:
 merton_status
converged      1405
no_debt          16
no_converge       4
Name: count, dtype: int64


## 8. Export Results & Append Diagnostics

In this final step we will:

1. **Save** the full DataFrame (including `DDa` and `PDa`) to CSV at `output_fp`.  
2. **Append** a diagnostics summary to the accounting log file (`log_fp`), including:  
   - Total rows processed  
   - Solver status counts  
   - Summary statistics for `DDa` and `PDa`  
   - Counts of missing or failed estimates  

In [None]:
# 8.1 Determine the actual DD and PD column names
cols = df.columns.str.lower().tolist()
# find candidates
dd_candidates = [c for c in df.columns if 'distance' in c.lower()]
pd_candidates = [c for c in df.columns if 'probability' in c.lower()]

dd_col = dd_candidates[0] if dd_candidates else None
pd_col = pd_candidates[0] if pd_candidates else None

# 8.2 Append diagnostics using the discovered column names
with open(log_fp, 'a') as log:
    log.write("\n=== Accounting-Based DD/PD Diagnostics ===\n")
    log.write(f"Total rows processed: {len(df)}\n")

    # Solver status
    status_col = 'merton_status' if 'merton_status' in df.columns \
                 else 'solver_status' if 'solver_status' in df.columns \
                 else None
    if status_col:
        log.write(f"\n{status_col} counts:\n")
        log.write(df[status_col].value_counts().to_string() + "\n")
    else:
        log.write("\nNo status column found. Available columns:\n")
        log.write(", ".join(df.columns) + "\n")

    # DD summary
    if dd_col:
        log.write(f"\nDistance to Default ({dd_col}) summary:\n")
        log.write(df[dd_col].describe().to_string() + "\n")
    else:
        log.write("\nNo DD column found. Available columns:\n")
        log.write(", ".join(df.columns) + "\n")

    # PD summary
    if pd_col:
        log.write(f"\nProbability of Default ({pd_col}) summary:\n")
        log.write(df[pd_col].describe().to_string() + "\n")
    else:
        log.write("\nNo PD column found. Available columns:\n")
        log.write(", ".join(df.columns) + "\n")

    # Missing counts, if found
    if dd_col:
        log.write(f"\nRows with missing {dd_col}: {df[dd_col].isna().sum()}\n")
    if pd_col:
        log.write(f"Rows with missing {pd_col}: {df[pd_col].isna().sum()}\n")

print(f"[INFO] Diagnostics appended to:\n  {log_fp}")

[INFO] Diagnostics appended to:
  /Users/guillaumebld/Library/CloudStorage/OneDrive-LoyolaUniversityChicago/Group Quantitative Research Workspace – Summer 2025 - Results/data/logs/dd_pd_accounting_log.txt


## 9. Summary and Next Steps

**What we’ve accomplished**  
- Loaded and cleaned the accounting data  
- Merged in market caps, equity volatilities, and risk-free rates  
- Defined and ran an accounting-based Merton solver  
- Computed annual **DDa** and **PDa**  
- Exported results and logged diagnostics  

**Next steps**  
1. **Review the log file** (`dd_pd_accounting_log.txt`) for any convergence warnings or missing inputs.  
2. **Compare** accounting-based metrics (`DDa/PDa`) with market-based (`DDm/PDm`) to identify discrepancies.  
3. **Visualize** the distributions of `DDa` and `PDa` across firms and years.  
4. **Incorporate** these default-risk measures into your credit models or presentations for Professor Abel.  

In [14]:
# 8.1 Export the DataFrame to CSV (accounting-based)
output_fp = base_dir / 'data' / 'merged_inputs' / 'dd_pd_accounting.csv'
log_fp    = base_dir / 'data' / 'logs' / 'dd_pd_accounting_log.txt'

# Write the main CSV
df.to_csv(output_fp, index=False)
print(f"[INFO] Accounting-based DD/PD results saved to:\n  {output_fp.resolve()}")

# 8.2 Append diagnostics to the log
# Detect the actual DD/PD and status column names
dd_col = 'DDa' if 'DDa' in df.columns else next((c for c in df.columns if 'distance' in c.lower()), None)
pd_col = 'PDa' if 'PDa' in df.columns else next((c for c in df.columns if 'probability' in c.lower()), None)
status_col = ('merton_status' if 'merton_status' in df.columns
              else 'solver_status' if 'solver_status' in df.columns
              else None)

with open(log_fp, 'a') as log:
    log.write("\n=== Accounting-Based DD/PD Diagnostics ===\n")
    log.write(f"Total rows processed: {len(df)}\n")

    # Solver status breakdown
    if status_col:
        log.write(f"\n{status_col} counts:\n")
        log.write(df[status_col].value_counts().to_string() + "\n")
    else:
        log.write("\nNo solver-status column found. Available columns:\n")
        log.write(", ".join(df.columns) + "\n")

    # Distance to Default summary
    if dd_col:
        log.write(f"\nDistance to Default ({dd_col}) summary:\n")
        log.write(df[dd_col].describe().to_string() + "\n")
    else:
        log.write("\nNo DD column found. Available columns:\n")
        log.write(", ".join(df.columns) + "\n")

    # Probability of Default summary
    if pd_col:
        log.write(f"\nProbability of Default ({pd_col}) summary:\n")
        log.write(df[pd_col].describe().to_string() + "\n")
    else:
        log.write("\nNo PD column found. Available columns:\n")
        log.write(", ".join(df.columns) + "\n")

    # Missing-value counts
    if dd_col:
        log.write(f"\nRows with missing {dd_col}: {df[dd_col].isna().sum()}\n")
    if pd_col:
        log.write(f"Rows with missing {pd_col}: {df[pd_col].isna().sum()}\n")

print(f"[INFO] Diagnostics appended to:\n  {log_fp.resolve()}")

[INFO] Accounting-based DD/PD results saved to:
  /Users/guillaumebld/Library/CloudStorage/OneDrive-LoyolaUniversityChicago/Group Quantitative Research Workspace – Summer 2025 - Results/data/merged_inputs/dd_pd_accounting.csv
[INFO] Diagnostics appended to:
  /Users/guillaumebld/Library/CloudStorage/OneDrive-LoyolaUniversityChicago/Group Quantitative Research Workspace – Summer 2025 - Results/data/logs/dd_pd_accounting_log.txt
