# DD and PD Calculation Using Market-approach (Merton Model)

This notebook walks through the `dd_pd_market.py` script step by step.  
We will:

1.  Set up our environment and imports  
2.  Load and inspect inputs  
3.  Prepare and merge data  
4.  Compute market capitalizations  
5.  Merge equity volatility  
6.  Define and run the Merton model solver  
7.  Calculate Distance to Default (DD) and Probability of Default (PD)  
8.  Export results and write diagnostics to a log  


**Timing**: Uses sigma_{E,t-1}, E_t, F_t, r_{f,t}. No lookahead bias.


In [1]:

# Parameter defaults for the market-based Merton KMV implementation
print("Market approach, Merton KMV solve. Uses μ = r_f. Barrier convention: A (total debt)")
T = 1.0
tol_E = 1e-6
tol_sigma = 1e-4
max_iter = 200
barrier_option = "A"  # For banks: debt_total represents total liabilities (deposits dominate funding)


Market approach, Merton KMV solve. Uses μ = r_f. Barrier convention: A (total debt)


## 1. Setup and Imports

Here we import all libraries and define file‐paths.

In [2]:
# 1. Install needed packages (run once per environment)
%pip install pandas numpy matplotlib seaborn scipy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [3]:
# 1. Setup and Imports (with correct file names)
import pandas as pd
import numpy as np
from scipy.stats import norm
from scipy.optimize import least_squares
from pathlib import Path
import logging
import re
from tqdm.auto import tqdm

# 1.1 Locate workspace root
def find_repo_root(start: Path) -> Path:
    for candidate in [start, *start.parents]:
        if (candidate / '.git').exists():
            return candidate
    return start

base_dir = find_repo_root(Path.cwd())
print(f"Repository root resolved to: {base_dir}")

# 1.2 Time horizon (configured in parameter cell)

# 1.3 File paths (corrected market-cap filename)
model_fp      = base_dir / 'data' / 'clean' / 'esg_0718_clean.csv'
marketcap_fp  = base_dir / 'data' / 'clean' / 'all_banks_marketcap_annual_2016_2023.csv'
vol_fp        = base_dir / 'data' / 'clean' / 'equity_volatility_by_year_DAILY.csv'
rf_fp         = base_dir / 'data' / 'clean' / 'fama_french_factors_annual_clean.csv'
log_fp        = base_dir / 'data' / 'logs' / 'dd_pd_market_log.txt'
output_dir    = base_dir / 'data' / 'outputs' / 'datasheet'
archive_dir   = base_dir / 'archive' / 'datasets'

# 1.4 Ensure directories exist
log_fp.parent.mkdir(parents=True, exist_ok=True)
output_dir.mkdir(parents=True, exist_ok=True)
archive_dir.mkdir(parents=True, exist_ok=True)

# 1.5 Existence check
for name, fp in [
    ('Accounting input', model_fp),
    ('Marketcap input', marketcap_fp),
    ('Equity vol input', vol_fp),
    ('Risk-free input', rf_fp),
]:
    print(f"{name:18s} →", "FOUND" if fp.exists() else f"MISSING ({fp.name})")

# 1.6 Deduplication audit log placeholder
duplicate_log_entries = []
invalid_input_summary = None
surviving_rows_by_year = None


Repository root resolved to: /Users/guillaumebld/Documents/Graduate_Research/Professor Abol Jalilvand/fall2025/risk_bank/risk_bank
Accounting input   → FOUND
Marketcap input    → FOUND
Equity vol input   → FOUND
Risk-free input    → FOUND


## 2. Load and Inspect Core Data

- Read the main Book2 input file  
- Clean and convert the `year` column  
- Merge in the annual risk-free rate from Fama-French  

In [4]:
# 2.1 Load Book2 data
print('[INFO] Loading Book2 data...')
df = pd.read_csv(model_fp)
print(f"→ {df.shape[0]} rows, {df[['instrument','year']].drop_duplicates().shape[0]} unique (instrument, year)")

# 2.1a Deduplicate instrument-year rows by keeping the largest debt_total
sort_cols = ['instrument', 'year', 'debt_total']
df_sorted = df.sort_values(sort_cols, ascending=[True, True, False])
deduped = df_sorted.drop_duplicates(subset=['instrument', 'year'], keep='first')
removed = df_sorted.loc[~df_sorted.index.isin(deduped.index), ['instrument', 'year', 'debt_total']]
if not removed.empty:
    print(f"→ Dropping {removed.shape[0]} duplicate instrument-year rows (kept max debt_total)")
    summary = (
        df_sorted.groupby(['instrument', 'year'])['debt_total']
        .apply(list)
        .reset_index()
    )
    summary = summary[summary['debt_total'].apply(len) > 1]
    for _, row in summary.iterrows():
        debt_values = list(row['debt_total'])
        kept = debt_values[0]
        dropped = debt_values[1:]
        dropped_str = ', '.join(f"{val:,.1f}" for val in dropped)
        message = (
            f"Duplicate {row['instrument']} {row['year']}: "
            f"kept debt_total={kept:,.1f}; dropped={dropped_str}"
        )
        print('   ' + message)
        duplicate_log_entries.append(message)
    df = deduped.reset_index(drop=True)
else:
    print("→ No duplicate instrument-year rows found (all unique).")
    df = df_sorted.reset_index(drop=True)

# 2.2  Clean year column
df = df[df['year'].notnull()].copy()
df['year'] = df['year'].astype(float).astype(int)

# 2.3 Merge risk-free rate
rf_df = pd.read_csv(rf_fp)
df = df.merge(rf_df[['year','rf']], on='year', how='left')
df['rf'] = df['rf'] / 100    # convert percent to decimal

print(f"→ After merging rf, {df.shape[0]} rows remain")

if barrier_option == "A":
    df['F'] = df['debt_total'] * 1_000_000  # F = total liabilities for banks (debt_total column)
elif barrier_option == "B":
    required_cols = {'debt_short_term', 'debt_long_term'}
    missing = [col for col in required_cols if col not in df.columns]
    if missing:
        missing_str = ", ".join(missing)
        raise ValueError(
            "barrier_option='B' requires short/long-term debt columns. "
            f"Missing: {missing_str}. Add these columns before enabling option B."
        )
    df['F'] = (df['debt_short_term'] + 0.5 * df['debt_long_term']) * 1_000_000
else:
    raise ValueError(f"Unsupported barrier_option: {barrier_option}")
print(f"[INFO] Constructed F barrier for {df.shape[0]} rows using barrier_option={barrier_option}")



[INFO] Loading Book2 data...
→ 1427 rows, 1424 unique (instrument, year)
→ Dropping 3 duplicate instrument-year rows (kept max debt_total)
   Duplicate PNC 2018: kept debt_total=60,263.0; dropped=60,263.0, 57,419.0, 57,419.0
→ After merging rf, 1424 rows remain
[INFO] Constructed F barrier for 1424 rows using barrier_option=A


## 3. Prepare Identifiers and Dates

- Standardize tickers by dropping exchange suffixes  
- Parse the `date` column and extract `Month`  
- Create a simple `symbol` field for merging

In [5]:
# 3.1 Helper to strip suffixes like .N, .OQ, etc.
def standardize_ticker(t):
    return str(t).split('.', 1)[0] if pd.notnull(t) else t

# 3.2Apply to our main DataFrame
df['ticker_prefix'] = df['instrument'].apply(standardize_ticker)

# 3.3 Ensure date is datetime, then extract month
if 'date' in df.columns:
    df['date'] = pd.to_datetime(df['date'])
else:
    df = df.assign(date=pd.NaT)

df['Month'] = df['date'].dt.month

# 3.4 Create 'symbol' for merge keys (same as ticker_prefix)
df['symbol'] = df['instrument'].apply(lambda x: str(x).split('.', 1)[0])

## 4. Compute Market Capitalization

- Load monthly price/share data  
- Calculate market cap (price × shares) in millions USD  
- Select December (or most recent) value for each symbol-year  
- Merge annual market cap into our main `df`

In [6]:
# 4.1 Load annual market‐cap data
mc = pd.read_csv(marketcap_fp)
print("Columns in mc:", mc.columns.tolist())

# 4.2 Compute market_cap only if needed
if 'market_cap' not in mc.columns:
    # fallback: compute from dec_price & shares_outstanding
    mc['market_cap'] = mc['dec_price'] * mc['shares_outstanding']

# 4.3 Standardize the ticker (drop suffixes)
mc['symbol'] = mc['symbol'].apply(standardize_ticker)

# 4.4 Parse the fiscal date and extract year/month
#    If this annual file has no 'fiscal_date' but has 'year', skip parsing
if 'fiscal_date' in mc.columns:
    mc['fiscal_date'] = pd.to_datetime(mc['fiscal_date'])
    mc['year']       = mc['fiscal_date'].dt.year
else:
    # assume the CSV’s 'year' column is correct
    mc['year'] = mc['year'].astype(int)

# 4.5 We don’t need Month or December flag for annual data, but for consistency:
mc['Month']       = mc.get('Month', 12)  # treat all as December
mc['is_december'] = True

# 4.6 Drop duplicates: keep one record per (symbol, year)
mc_annual = (
    mc
    .dropna(subset=['market_cap'])
    .drop_duplicates(subset=['symbol','year'], keep='first')
)

# 4.7 Merge into main DataFrame
df = df.merge(
    mc_annual[['symbol','year','market_cap']],
    on=['symbol','year'],
    how='left'
)

# 4.8 Quick check
print(df[['instrument','year','market_cap']].drop_duplicates().head())

Columns in mc: ['symbol', 'year', 'market_cap']
  instrument  year    market_cap
0       ABCB  2016  3.004515e+09
1       ABCB  2017  3.321505e+09
2       ABCB  2018  2.182408e+09
3       ABCB  2019  2.931470e+09
4       ABCB  2020  2.623438e+09


## 5. Merge Equity Volatility (Daily Returns, 252-Day Window)

- Load equity volatility calculated from daily returns (Bharath & Shumway 2008)  
- 252-day window from year t-1, annualized using √252  
- Merge into main `df` by ticker and year  

In [7]:
# 5. Load and Merge Equity Volatility (Daily, 252-day window)
print('[INFO] Loading equity volatility (daily-based, 252-day window)...')

# 5.1 Load DAILY equity volatility file
equity_vol = pd.read_csv(vol_fp)

# 5.2 DAILY FORMAT: Use ticker and sigma_E from daily calculations
equity_vol['ticker_prefix'] = equity_vol['ticker']
equity_vol['equity_volatility'] = equity_vol['sigma_E']
vol_annual = equity_vol[['ticker_prefix','year','equity_volatility']]

# 5.3 Merge into main DataFrame
df = df.merge(
    vol_annual,
    on=['ticker_prefix','year'],
    how='left'
)

# DATA QUALITY FILTER: Drop bank-years without complete volatility
print('\n[INFO] Applying strict data quality filter: Dropping bank-years without complete volatility...')
initial_count = len(df)
df = df[df['equity_volatility'].notna()].copy()
filtered_count = len(df)
dropped_count = initial_count - filtered_count

print(f'  Initial bank-years: {initial_count:,}')
print(f'  With complete equity volatility: {filtered_count:,} ({filtered_count/initial_count*100:.1f}%)')
print(f'  Dropped (incomplete data): {dropped_count:,} ({dropped_count/initial_count*100:.1f}%)')
print(f'  → Only complete observations will proceed to Merton solver')

# Show retained bank-years by year
if dropped_count > 0:
    retained_by_year = df.groupby('year').size()
    print(f'\n  Bank-years retained by year:')
    for year, count in retained_by_year.items():
        print(f'    {year}: {count} banks')

# 5.4 Validate required inputs
required_columns = ['instrument', 'year', 'market_cap', 'equity_volatility', 'rf', 'debt_total']
missing_required = [col for col in required_columns if col not in df.columns]
if missing_required:
    missing_str = ', '.join(missing_required)
    raise AssertionError(f"Missing required columns after merge: {missing_str}")

invalid_mask = df[required_columns].isna().any(axis=1)
if invalid_mask.any():
    invalid_count = invalid_mask.sum()
    print(f"[WARN] Dropping {invalid_count} rows with missing required inputs.")
    invalid_summary = df.loc[invalid_mask, ['instrument','year']].copy()
    print(invalid_summary.to_string(index=False))
    df = df[~invalid_mask].copy()
    df['solver_status'] = 'pending'
else:
    print("[INFO] All required inputs present for merged dataset.")

# 5.5 Finalize equity volatility column
df['equity_vol'] = df['equity_volatility']

# 5.6 Surviving row counts by year
surviving_rows_by_year = df.groupby('year').size().sort_index()
print('[INFO] Surviving rows by year after input validation:')
print(surviving_rows_by_year)

# 5.7 Add sigma_E provenance columns from merged equity volatility
print('[INFO] Adding sigma_E provenance columns...')

# Merge provenance columns from equity_vol DataFrame
provenance_cols = ['ticker_prefix', 'year', 'sigma_E_method', 'sigma_E_window_months']
if all(col in equity_vol.columns for col in provenance_cols):
    vol_provenance = equity_vol[provenance_cols].copy()
    df = df.merge(vol_provenance, on=['ticker_prefix', 'year'], how='left')
    
    # Calculate window years from window_months
    df['sigmaE_window_end_year'] = df['year'] - 1
    df['sigma_E_window_months'] = df['sigma_E_window_months'].fillna(0)
    df['sigmaE_window_start_year'] = df['sigmaE_window_end_year'] - (df['sigma_E_window_months'] / 12 - 1).clip(lower=0).astype(int)
    
    print(f'  Provenance added: {df["sigma_E_method"].notna().sum()} rows')
    print(f'  Methods: {df["sigma_E_method"].value_counts().to_dict()}')
else:
    print('  Warning: Provenance columns not found')
    df['sigmaE_window_end_year'] = df['year'] - 1
    df['sigmaE_window_start_year'] = df['year'] - 1
    df['sigma_E_method'] = 'unknown'
    df['sigma_E_window_months'] = 0

# 5.8 Create time-tagged columns for solver
print('[INFO] Creating time-tagged columns...')

df['E_t'] = df['market_cap']  # Equity value at time t
df['F_t'] = df['F']  # Face value of debt
df['rf_t'] = df['rf']  # Risk-free rate
df['T'] = 1.0  # Time horizon
df['sigma_E_tminus1'] = df['equity_volatility']  # From file, already at t-1

print(f'  E_t: {df["E_t"].notna().sum()} values')
print(f'  F_t: {df["F_t"].notna().sum()} values')
print(f'  rf_t: {df["rf_t"].notna().sum()} values')
print(f'  sigma_E_tminus1: {df["sigma_E_tminus1"].notna().sum()} values')
print(f'  equity_vol: {df["equity_vol"].notna().sum()} values')


[INFO] Loading equity volatility (daily-based, 252-day window)...

[INFO] Applying strict data quality filter: Dropping bank-years without complete volatility...
  Initial bank-years: 1,427
  With complete equity volatility: 1,362 (95.4%)
  Dropped (incomplete data): 65 (4.6%)
  → Only complete observations will proceed to Merton solver

  Bank-years retained by year:
    2016: 68 banks
    2017: 131 banks
    2018: 190 banks
    2019: 203 banks
    2020: 207 banks
    2021: 207 banks
    2022: 208 banks
    2023: 148 banks
[WARN] Dropping 2 rows with missing required inputs.
instrument  year
      COFS  2018
      COFS  2019
[INFO] Surviving rows by year after input validation:
year
2016     68
2017    131
2018    189
2019    202
2020    207
2021    207
2022    208
2023    148
dtype: int64
[INFO] Adding sigma_E provenance columns...
[INFO] Creating time-tagged columns...
  E_t: 1360 values
  F_t: 1360 values
  rf_t: 1360 values
  sigma_E_tminus1: 1360 values
  equity_vol: 1360 values


In [8]:
# TIME INTEGRITY ASSERTIONS
print('[INFO] Validating time integrity...')

# Import time checks
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd()))
from utils.time_checks import assert_time_integrity

# Assertion 1: sigma_E window must end at t-1
assert (df['sigmaE_window_end_year'] == df['year'] - 1).all(), \
    'sigma_E window end must be t-1 (no lookahead)'

# Assertion 2: rf_t must be present
assert df['rf_t'].notna().all(), 'rf_t must be present for all rows'

# Assertion 3: E_t and F_t must be positive for solver rows
solver_rows = df['equity_vol'].notna()
assert (df.loc[solver_rows, 'E_t'] > 0).all(), 'E_t must be positive'
print(f"  F_t range: {df.loc[solver_rows, 'F_t'].min():.2e} to {df.loc[solver_rows, 'F_t'].max():.2e}")
print(f"  F_t null count: {df.loc[solver_rows, 'F_t'].isna().sum()}")
# Debug F_t issues
bad_F = df.loc[solver_rows & (df['F_t'] <= 0)]
if len(bad_F) > 0:
    print(f'[ERROR] {len(bad_F)} rows have non-positive F_t:')
    print(bad_F[['instrument', 'year', 'F', 'F_t', 'debt_total']].head(10))
    # Filter out bad rows
    df = df[df['F_t'] > 0].copy()
    solver_rows = df['equity_vol'].notna()
    print(f'[INFO] Filtered to {len(df)} rows with positive F_t')

# F_t assertion now satisfied after filtering

# Run comprehensive time integrity check
assert_time_integrity(df)

print('[PASS] All time integrity assertions passed')
print(f'  - sigma_E uses only data up to t-1: {(df["sigmaE_window_end_year"] == df["year"] - 1).all()}')
print(f'  - No future data in windows: {(df["sigmaE_window_end_year"] < df["year"]).all()}')


[INFO] Validating time integrity...
  F_t range: 0.00e+00 to 6.53e+11
  F_t null count: 0
[ERROR] 16 rows have non-positive F_t:
    instrument  year    F  F_t  debt_total
413       EXSR  2021  0.0  0.0         0.0
499        FHB  2021  0.0  0.0         0.0
901       NKSH  2019  0.0  0.0         0.0
902       NKSH  2020  0.0  0.0         0.0
903       NKSH  2021  0.0  0.0         0.0
904       NKSH  2022  0.0  0.0         0.0
941       OPBK  2021  0.0  0.0         0.0
968       OVLY  2018  0.0  0.0         0.0
969       OVLY  2019  0.0  0.0         0.0
970       OVLY  2020  0.0  0.0         0.0
[INFO] Filtered to 1344 rows with positive F_t
[PASS] All time integrity assertions passed
  - sigma_E uses only data up to t-1: True
  - No future data in windows: True


## 6. Define the Merton Model Solver (Revised Equations)

In the Merton framework, the firm’s equity is treated as a European call option on its assets. We observe:

- **E**: equity market value (scaled market capitalization)  
- **sigma_E**: annualized equity volatility  
- **F**: total debt (face value)  
- **r_f**: risk-free rate  
- **T**: time horizon (1 year)  

We solve for the unobserved:

- **V**: total asset value  
- **sigma_V**: asset volatility  

by enforcing two conditions:

1.  **Option-pricing relation**  
    $$
      E \;=\; V\,\Phi(d_{1})\;-\;F\,e^{-r_{f}T}\,\Phi(d_{2})
    $$
2.  **Volatility link**  
    $$
      \sigma_{E} \;=\;\frac{V}{E}\,\Phi(d_{1})\,\sigma_{V}
    $$

where  
$$
  d_{1} \;=\;\frac{\ln\!\bigl(V/F\bigr)\;+\;\bigl(r_{f} + \tfrac12\,\sigma_{V}^{2}\bigr)\,T}
                      {\sigma_{V}\,\sqrt{T}},
  \quad
  d_{2} \;=\; d_{1} \;-\;\sigma_{V}\,\sqrt{T},
$$  
and $\Phi$ is the standard normal CDF.  

We use a numerical root-finder (`scipy.optimize.root`) to find $V$, $\sigma_{V}$ that makes both equations zero:

$$
\text{Find }V,\sigma_{V}\text{ such that both equations } = 0
$$

### What the root-finder actually does, in simple terms

1. **Start with a guess**  
    We begin by guessing values for $(V,\sigma_{V})$. A natural choice is  
    $$
      V_0 = E + F,\quad \sigma_{V,0} = \sigma_E
    $$  
    This says "assets are roughly equity plus debt" and "asset volatility is like equity volatility."
 
 2. **Measure "how wrong" we are**  
    We compute the two expressions  
    $$
      f_1(V,\sigma_V),\quad f_2(V,\sigma_V)
    $$  
    which tell us how far from zero each equation is. If both are exactly zero, our guess solves the problem.
 
 3. **Adjust the guess**  
    If either $f_1$ or $f_2$ is not zero, the solver estimates a small change to $(V,\sigma_{V})$ that should educe the errors. It uses derivatives and smart heuristics under the hood.
 
 4. **Repeat until "close enough"**  
    The process repeats—compute residuals, update guess, compute again—until both residuals are below a tiny tolerance (converged), or we hit an iteration limit (no convergence).
 
 5. **Result**  
    - If converged: we obtain $(V^*, \sigma_{V}^*)$, the asset value and volatility consistent with observed equity data.  
    - If not: we flag the failure and typically record NaN values.
 
 By packaging our two Merton equations into one Python function, `scipy.optimize.root` handles the iteration, step-size choices, and convergence checks automatically. This allows us to solve these otherwise intractable nonlinear equations with minimal custom code. 

In [9]:
# STABLE MERTON SOLVER with numerical safeguards
from scipy.optimize import least_squares
from scipy.stats import norm
import numpy as np

Phi = norm.cdf

def _d12(V, F, rf, sV, T):
    """Compute d1, d2 with numerical safety."""
    srt = sV * np.sqrt(T)
    d1 = (np.log(V/F) + (rf + 0.5*sV*sV)*T) / srt
    d2 = d1 - srt
    # Numerical safety: clip to prevent overflow in Phi
    d1 = np.clip(d1, -35, 35)
    d2 = np.clip(d2, -35, 35)
    return d1, d2

def residuals(theta, E_obs, sE_obs, F, rf, T=1.0):
    """Residual function in log space for stability."""
    # theta = [logV, logSigmaV]
    V = np.exp(theta[0])
    sV = np.exp(theta[1])
    
    d1, d2 = _d12(V, F, rf, sV, T)
    
    # Price equation
    E_model = V*Phi(d1) - F*np.exp(-rf*T)*Phi(d2)
    
    # Volatility equation: use E_model in denominator for stability
    sE_model = (V / max(E_model, 1e-12)) * Phi(d1) * sV
    
    # Relative scaling for better convergence
    r_price = (E_model - E_obs) / max(E_obs, 1.0)
    r_vol = sE_model - sE_obs
    
    return np.array([r_price, r_vol])

def solve_one(E_obs, sE_obs, F, rf, T=1.0):
    """Solve for V and sigma_V with robust bounds and method."""
    # Input validation
    if not (E_obs > 0 and sE_obs > 0 and F > 0):
        return None
    
    # Initial guess
    V0 = max(E_obs + F, 1.001*F)
    sV0 = min(max(sE_obs, 1e-3), 1.5)
    th0 = np.log([V0, sV0])
    
    # Bounds in log space
    lo = np.log([1.001*F, 1e-4])
    hi = np.log([1e3*(E_obs + F), 3.0])
    
    # Solve with robust settings
    res = least_squares(
        residuals, th0, args=(E_obs, sE_obs, F, rf, T),
        method='trf',
        loss='soft_l1',  # Robust to outliers
        ftol=1e-10,
        xtol=1e-10,
        gtol=1e-10,
        max_nfev=1000,
        bounds=(lo, hi)
    )
    
    if not res.success:
        return None
    
    # Extract solution
    V = float(np.exp(res.x[0]))
    sV = float(np.exp(res.x[1]))
    
    return V, sV, float(res.cost), res.nfev, res.status

print('[INFO] Stable Merton solver loaded')
print('  - Uses log space for V and sigma_V')
print('  - Clips d1, d2 to [-35, 35]')
print('  - Uses E_model in volatility equation denominator')
print('  - Robust loss function (soft_l1)')


[INFO] Stable Merton solver loaded
  - Uses log space for V and sigma_V
  - Clips d1, d2 to [-35, 35]
  - Uses E_model in volatility equation denominator
  - Robust loss function (soft_l1)


In [10]:
# PRE-SOLVE VALIDATION
print('[INFO] Validating inputs before solver...')

# Gate 1: E_t must be positive
bad_E = df['E_t'] <= 0
if bad_E.any():
    print(f'[WARN] Filtering {bad_E.sum()} rows with non-positive E_t')
    df = df[~bad_E].copy()
print(f'  E_t: {df["E_t"].min():.2e} to {df["E_t"].max():.2e}')

# Gate 2: F_t must be positive
bad_F = df['F_t'] <= 0
if bad_F.any():
    print(f'[WARN] Filtering {bad_F.sum()} rows with non-positive F_t')
    df = df[~bad_F].copy()
print(f'  F_t: {df["F_t"].min():.2e} to {df["F_t"].max():.2e}')

# Gate 3: rf_t in reasonable range (decimals, not percents)
bad_rf = ~df['rf_t'].between(-0.1, 0.3)
if bad_rf.any():
    print(f'[WARN] Filtering {bad_rf.sum()} rows with rf_t outside [-0.1, 0.3]')
    df = df[~bad_rf].copy()
print(f'  rf_t: {df["rf_t"].min():.4f} to {df["rf_t"].max():.4f}')

# Gate 4: sigma_E_tminus1 in reasonable range
valid_sigma = df['sigma_E_tminus1'].notna()
bad_sigma = valid_sigma & ~df['sigma_E_tminus1'].between(1e-4, 3.0)
if bad_sigma.any():
    print(f'[WARN] Filtering {bad_sigma.sum()} rows with sigma_E_tminus1 outside [0.0001, 3.0]')
    print(f'  Range before filter: {df.loc[valid_sigma, "sigma_E_tminus1"].min():.4f} to {df.loc[valid_sigma, "sigma_E_tminus1"].max():.4f}')
    df = df[~bad_sigma].copy()
    valid_sigma = df['sigma_E_tminus1'].notna()
if valid_sigma.any():
    print(f'  sigma_E_tminus1: {df.loc[valid_sigma, "sigma_E_tminus1"].min():.4f} to {df.loc[valid_sigma, "sigma_E_tminus1"].max():.4f}')

# Show leverage distribution
print('\n  E_t/F_t leverage ratio:')
print(df.eval('E_t/F_t').describe())

print(f'\n[INFO] After filtering: {len(df)} rows ready for solver')

# Reset index after filtering to avoid duplicates
df = df.reset_index(drop=True)


[INFO] Validating inputs before solver...
  E_t: 8.12e+07 to 4.89e+11
  F_t: 1.40e+04 to 6.53e+11
  rf_t: 0.0004 to 0.0495
  sigma_E_tminus1: 0.1445 to 0.9880

  E_t/F_t leverage ratio:
count      1344.000000
mean        162.528717
std        3415.977975
min           0.098664
25%           1.386046
50%           2.667520
75%           5.142831
max      107896.482941
dtype: float64

[INFO] After filtering: 1344 rows ready for solver


In [11]:
# 7.1 Run the stable Merton solver
print('[INFO] Running stable Merton solver on each row...')

results = []
solver_rows = df['sigma_E_tminus1'].notna()
total_rows = solver_rows.sum()

print(f'  Processing {total_rows} rows...')

for idx, row in df[solver_rows].iterrows():
    result = solve_one(
        E_obs=row['E_t'],
        sE_obs=row['sigma_E_tminus1'],
        F=row['F_t'],
        rf=row['rf_t'],
        T=1.0
    )
    
    if result is not None:
        V, sV, cost, nfev, status = result
        results.append({
            'index': idx,
            'asset_value': V,
            'asset_vol': sV,
            'solver_cost': cost,
            'nfev': nfev,
            'status_flag': 'converged'
        })
    else:
        results.append({
            'index': idx,
            'asset_value': np.nan,
            'asset_vol': np.nan,
            'solver_cost': np.nan,
            'nfev': 0,
            'status_flag': 'no_converge'
        })

# Merge results
results_df = pd.DataFrame(results).set_index('index')
df = df.join(results_df)

# Fill status_flag for rows without sigma_E
df['status_flag'] = df['status_flag'].fillna('no_sigma_E')

# Report
converged = (df['status_flag'] == 'converged').sum()
print(f'\n[INFO] Solver complete:')
print(f'  Converged: {converged}/{total_rows} ({100*converged/total_rows:.1f}%)' if total_rows > 0 else '  No rows to solve')
print('\nStatus counts:')
print(df['status_flag'].value_counts())


[INFO] Running stable Merton solver on each row...
  Processing 1344 rows...

[INFO] Solver complete:
  Converged: 1344/1344 (100.0%)

Status counts:
status_flag
converged    1344
Name: count, dtype: int64


In [12]:
# 7. Define solver and run a quick sanity check

# Guarded imports allow this cell to run standalone in interactive sessions
try:
    pd  # type: ignore[name-defined]
except NameError:  # pragma: no cover - interactive safeguard
    import pandas as pd  # noqa: F401
try:
    np  # type: ignore[name-defined]
except NameError:  # pragma: no cover - interactive safeguard
    import numpy as np
try:
    norm  # type: ignore[name-defined]
except NameError:  # pragma: no cover - interactive safeguard
    from scipy.stats import norm  # noqa: F401
try:
    least_squares  # type: ignore[name-defined]
except NameError:  # pragma: no cover - interactive safeguard
    from scipy.optimize import least_squares
try:
    tqdm  # type: ignore[name-defined]
except NameError:  # pragma: no cover - interactive safeguard
    from tqdm.auto import tqdm

if 'T' not in globals():  # pragma: no cover - interactive safeguard
    T = 1.0


def merton_solver(row, *, T=T, tol_E=tol_E, tol_sigma=tol_sigma, max_iter=max_iter):
    '''Solve for asset value and volatility using the Merton model.

    Returns a dictionary with solver diagnostics so downstream analysis can
    inspect failures without re-running the optimizer.
    '''
    result_template = {
        'asset_value': np.nan,
        'asset_vol': np.nan,
        'E_model': np.nan,
        'sigmaE_model': np.nan,
        'r1': np.nan,
        'r2': np.nan,
        'solver_message': '',
        'nfev': np.nan,
        'iterations': np.nan,
        'status_flag': 'invalid_inputs',
    }

    E = row['market_cap']
    sigma_E = row['equity_vol']
    F = row['F']
    r_f = row['rf']

    # 1. Input validation
    if pd.isna(E) or pd.isna(sigma_E) or pd.isna(F):
        result = result_template.copy()
        result['solver_message'] = 'missing_input'
        return result
    if E <= 0 or sigma_E <= 0 or F < 0:
        result = result_template.copy()
        result['solver_message'] = 'invalid_value'
        return result
    if F == 0:
        result = result_template.copy()
        result['status_flag'] = 'no_debt'
        result['solver_message'] = 'zero_debt'
        return result

    # 2. System of Merton equations
    def residuals(x):
        asset_value, sigma_V = x
        if sigma_V <= 0:
            return np.array([np.inf, np.inf])
        d1 = (np.log(asset_value / F) + (r_f + 0.5 * sigma_V**2) * T) / (sigma_V * np.sqrt(T))
        d2 = d1 - sigma_V * np.sqrt(T)
        E_model = asset_value * norm.cdf(d1) - F * np.exp(-r_f * T) * norm.cdf(d2)
        sigmaE_model_obs = (asset_value / max(E, np.finfo(float).eps)) * norm.cdf(d1) * sigma_V
        r1 = E_model - E
        r2 = sigmaE_model_obs - sigma_E
        return np.array([r1, r2])

    # 3. Initial guess and solve (with bounds)
    lower_V = np.nextafter(F, np.inf)
    lower_bounds = np.array([lower_V, 1e-6])
    upper_bounds = np.array([np.inf, np.inf])
    initial_V = max(E + F, lower_V * 1.0001)
    initial_sigma = max(min(sigma_E, 1.0), 1e-6)
    initial = np.array([initial_V, initial_sigma])

    try:
        sol = least_squares(
            residuals,
            initial,
            bounds=(lower_bounds, upper_bounds),
            loss='soft_l1',
            max_nfev=max_iter,
        )
    except ValueError as exc:
        result = result_template.copy()
        result['solver_message'] = f'least_squares_error: {exc}'
        return result

    asset_value, sigma_V = sol.x
    d1 = (np.log(asset_value / F) + (r_f + 0.5 * sigma_V**2) * T) / (sigma_V * np.sqrt(T))
    d2 = d1 - sigma_V * np.sqrt(T)
    E_model = asset_value * norm.cdf(d1) - F * np.exp(-r_f * T) * norm.cdf(d2)
    if E_model <= 0:
        sigmaE_model = np.nan
    else:
        sigmaE_model = (asset_value / E_model) * norm.cdf(d1) * sigma_V
    sigmaE_model_obs = (asset_value / max(E, np.finfo(float).eps)) * norm.cdf(d1) * sigma_V
    r1 = E_model - E
    r2 = sigmaE_model_obs - sigma_E

    converged = sol.success and abs(r1) <= tol_E and abs(r2) <= tol_sigma

    result = result_template.copy()
    result.update({
        'asset_value': asset_value,
        'asset_vol': sigma_V,
        'E_model': E_model,
        'sigmaE_model': sigmaE_model,
        'r1': r1,
        'r2': r2,
        'solver_message': sol.message,
        'nfev': sol.nfev,
        'iterations': getattr(sol, 'njev', np.nan),
        'status_flag': 'converged' if converged else 'no_converge',
    })
    return result


def run_merton_solver(df):
    '''Apply the Merton solver row-wise with a progress bar.'''
    records = []
    for _, row in tqdm(df.iterrows(), total=len(df), desc='Solving Merton model'):
        records.append(merton_solver(row))
    return pd.DataFrame(records, index=df.index)


# ---- Quick check on the first row ----
first_row = df.iloc[0]
first_result = merton_solver(first_row)
print(
    f"Results for {first_row['instrument']} {first_row['year']}: "
    f"asset_value = {first_result['asset_value']:.2f}, "
    f"asset_vol = {first_result['asset_vol']:.4f}, "
    f"status = {first_result['status_flag']}"
)

Results for ABCB 2016: asset_value = 3633285480.83, asset_vol = 0.2187, status = no_converge


## 7 Compute Distance to Default (DD) and Probability of Default (PD) in Detail

Once we have solved for:

- $V$ = total asset value  
- $sigma_V$ = asset volatility  

we compute:

1. **Distance to Default**  
   
$$DD = (ln(V/F) + (r_f - 0.5 sigma_V^2)T) / (sigma_V * √T)
  $$ 
   - **Numerator**  
     - $ln(V/F)$: how far assets exceed debt on a log scale  
     - $(r_f - 0.5 sigma_V^2)T$: drift adjustment for risk-free growth minus half variance  
   - **Denominator**  
     - $sigma_V √T$: scales by volatility over the horizon  

2. **Probability of Default**  
   
   PD = Φ(-DD)
   
   where Φ is the standard normal CDF. Intuitively, low DD means a higher chance assets fall below debt.

We also handle the special case **no debt** (F=0), for which DD and PD are undefined (we set them to NaN).

**IMPORTANT FIX**: The original code had a unit mismatch where:
- Market cap (E) was in actual USD 
- Debt total (F) was in USD millions from the source data

This created unrealistic V/F ratios of millions, leading to DDm values >100 and PDm = 0 due to numerical underflow. The fix multiplies `debt_total` by 1,000,000 to convert to actual USD.

Below is code that computes these step by step, with comments.

In [13]:
# 7.2 Compute DD_m and PD_m safely
# d1 = [ln(V/F) + (r_f + 0.5*sigma_V^2)T] / (sigma_V*sqrt(T))
# d2 = d1 - sigma_V*sqrt(T)  -> DD_m = d2,  PD_m = Phi(-d2)

print('[INFO] Computing DD_m and PD_m...')

Phi = getattr(norm, 'cdf', None)
if Phi is None:
    from math import erf
    def Phi(x):
        x = np.asarray(x, dtype=float)
        return 0.5*(1.0 + np.vectorize(erf)(x/np.sqrt(2.0)))

F_t  = df['F']
rf_t = df['rf']
V_t  = df['asset_value']
sV_t = df['asset_vol']
T = 1.0

converged = (df['status_flag'] == 'converged')
valid = (
    converged
    & np.isfinite(V_t) & (V_t > 0)
    & np.isfinite(sV_t) & (sV_t > 0)
    & np.isfinite(F_t) & (F_t > 0)
    & np.isfinite(rf_t)
)

# Initialize outputs
df['d1'] = np.nan
df['d2'] = np.nan
df['DD_m'] = np.nan
df['PD_m'] = np.nan
df['solver_status'] = df['status_flag']

# Compute only on valid rows
idx = np.where(valid)[0]
if idx.size:
    srt = sV_t.values[idx] * np.sqrt(T)
    d1 = (np.log(V_t.values[idx] / F_t.values[idx]) + (rf_t.values[idx] + 0.5 * sV_t.values[idx]**2) * T) / srt
    d2 = d1 - srt
    # Numerical safety
    d1 = np.clip(d1, -35, 35)
    d2 = np.clip(d2, -35, 35)
    
    df.loc[valid, 'd1']   = d1
    df.loc[valid, 'd2']   = d2
    df.loc[valid, 'DD_m'] = d2
    df.loc[valid, 'PD_m'] = Phi(-d2)

print(f'  Valid rows used for DD_m: {int(valid.sum())}')
if valid.sum() > 0:
    print(f'  DD_m range: {df.loc[valid, "DD_m"].min():.2f} to {df.loc[valid, "DD_m"].max():.2f}')
    print(f'  PD_m range: {df.loc[valid, "PD_m"].min():.2e} to {df.loc[valid, "PD_m"].max():.2e}')


[INFO] Computing DD_m and PD_m...
  Valid rows used for DD_m: 1344
  DD_m range: 0.86 to 35.00
  PD_m range: 1.12e-268 to 1.94e-01


In [14]:
# 7.3 Quick sanity checks and preview
preview_cols = [
    'instrument', 'year', 'asset_value', 'asset_vol', 'E_model', 'sigmaE_model',
    'r1', 'r2', 'd1', 'd2', 'DD_m', 'PD_m', 'status_flag', 'solver_status'
]
# Only show columns that exist
preview_cols = [c for c in preview_cols if c in df.columns]
print('Preview of results:')
print(df[preview_cols].head())
print()

print('Solver status counts:')
print(df['status_flag'].value_counts())
print()

# Extra diagnostics
valid = (df['status_flag'] == 'converged') & df['DD_m'].notna()
if valid.sum() > 0:
    print(f'Valid rows used for DD_m: {int(valid.sum())}')
    print('\nV/F (leverage) summary on valid rows:')
    print((df.loc[valid, 'asset_value'] / df.loc[valid, 'F']).describe())
else:
    print('[WARN] No valid DD_m values computed')


Preview of results:
  instrument  year   asset_value  asset_vol         d1        d2      DD_m  \
0       ABCB  2016  3.633310e+09   0.197770   8.968234  8.770464  8.770464   
1       ABCB  2017  3.685325e+09   0.229487  10.204427  9.974939  9.974939   
2       ABCB  2018  2.439065e+09   0.232307   9.808610  9.576303  9.576303   
3       ABCB  2019  4.445621e+09   0.188803   5.799107  5.610304  5.610304   
4       ABCB  2020  3.182115e+09   0.209984   8.390052  8.180068  8.180068   

           PD_m status_flag solver_status  
0  8.896580e-19   converged     converged  
1  9.811087e-24   converged     converged  
2  5.029064e-22   converged     converged  
3  1.009855e-08   converged     converged  
4  1.418415e-16   converged     converged  

Solver status counts:
status_flag
converged    1344
Name: count, dtype: int64

Valid rows used for DD_m: 1344

V/F (leverage) summary on valid rows:
count      1344.000000
mean        163.513786
std        3415.978008
min           1.094271
25%  

## 8. Export Results and Log Diagnostics

In this final step, we:

1. **Save** the full DataFrame (including `DDm` and `PDm`) to CSV for downstream modelling.  
2. **Append** a diagnostic summary to our log file, including:  
   - Total rows processed  
   - Solver status breakdown  
   - Basic statistics on `DDm` and `PDm`  
   - Count of missing or failed estimates  

In [15]:
# 8.1 Archiving and timestamped output
from datetime import datetime
import pytz
import shutil
import glob
import os

def get_timestamp_cdt():
    """Generate timestamp in YYYYMMDD_HHMMSS format (CDT timezone)"""
    cdt = pytz.timezone('America/Chicago')
    return datetime.now(cdt).strftime('%Y%m%d_%H%M%S')

def archive_old_files(output_dir, archive_dir, dataset_type, max_keep=5):
    """Move old files of dataset_type to archive, keeping only max_keep most recent"""
    pattern = str(output_dir / f"{dataset_type}_*.csv")
    old_files = sorted(glob.glob(pattern), key=lambda x: os.path.getmtime(x), reverse=True)
    
    # Move all existing files to archive
    for old_file in old_files:
        archive_path = archive_dir / os.path.basename(old_file)
        shutil.move(old_file, str(archive_path))
        print(f"[ARCHIVE] Moved to archive: {os.path.basename(old_file)}")
    
    # Clean up archive to keep only max_keep files
    archive_pattern = str(archive_dir / f"{dataset_type}_*.csv")
    archive_files = sorted(glob.glob(archive_pattern), key=lambda x: os.path.getmtime(x), reverse=True)
    
    for old_archive in archive_files[max_keep:]:
        os.remove(old_archive)
        print(f"[CLEANUP] Removed old archive: {os.path.basename(old_archive)}")

# Rename columns to standard naming convention
df = df.rename(columns={'DDm': 'DD_m', 'PDm': 'PD_m'})

# Archive old market files and save new one with timestamp
archive_old_files(output_dir, archive_dir, 'market', max_keep=5)

timestamp = get_timestamp_cdt()
output_fp = output_dir / f'market_{timestamp}.csv'
# Add provenance columns for time integrity audit
provenance_cols = ["E_t", "F_t", "rf_t", "sigma_E_tminus1", 
                   "sigmaE_window_start_year", "sigmaE_window_end_year", 
                   "V_t", "sigma_V_t", "d1", "d2", "DD_m", "PD_m", 
                   "solver_status", "resid_price", "resid_vol"]

df.to_csv(output_fp, index=False)
print(f"[INFO] Results exported to: {output_fp}")

# 8.2 Append diagnostics to the log file
with open(log_fp, 'a') as log:
    log.write("\n=== DD/PD Market-Based Model Diagnostics ===\n")
    # Total rows
    total = len(df)
    log.write(f"Total rows processed: {total}\n")
    # Deduplication audit
    if duplicate_log_entries:
        log.write("\nDeduplicated instrument-year rows (kept max debt_total):\n")
        for entry in duplicate_log_entries:
            log.write(entry + "\n")
    else:
        log.write("\nDeduplicated instrument-year rows: none detected.\n")
    # Rows removed due to invalid inputs
    if invalid_input_summary is not None and not invalid_input_summary.empty:
        log.write("\nRows dropped due to invalid inputs (solver_status=invalid_inputs):\n")
        log.write(invalid_input_summary.to_string(index=False) + "\n")
    else:
        log.write("\nRows dropped due to invalid inputs: none.\n")
    # Surviving rows by year
    if surviving_rows_by_year is not None and not surviving_rows_by_year.empty:
        log.write("\nSurviving rows by year after input validation:\n")
        log.write(surviving_rows_by_year.to_string() + "\n")
    else:
        log.write("\nSurviving rows by year after input validation: not available.\n")
    # Solver status counts
    status_counts = df['solver_status'].value_counts()
    log.write("Solver status counts:\n")
    log.write(status_counts.to_string() + "\n")
    # DD_m and PD_m summary
    log.write("\nDistance to Default (DD_m) summary:\n")
    log.write(df['DD_m'].describe().to_string() + "\n")
    log.write("\nProbability of Default (PD_m) summary:\n")
    log.write(df['PD_m'].describe().to_string() + "\n")
    F_values = df['F']
    leverage_ratio = (df['market_cap'] / F_values).replace([np.inf, -np.inf], np.nan).dropna()
    if not leverage_ratio.empty:
        median_ratio = leverage_ratio.median()
        log.write(f"\nUnit check (market_cap / F_values) median: {median_ratio:.3f}\n")
        log.write("Expected order of magnitude ~= 1 when both legs are in USD.\n")
    else:
        log.write("\nUnit check skipped: insufficient data for market_cap/F comparison.\n")
    # Missing/failure counts
    missing_dd = df['DD_m'].isna().sum()
    missing_pd = df['PD_m'].isna().sum()
    log.write(f"\nRows with missing DD_m: {missing_dd}\n")
    log.write(f"Rows with missing PD_m: {missing_pd}\n")

print(f"[INFO] Diagnostics appended to log: {log_fp}")

[ARCHIVE] Moved to archive: market_20251011_042629.csv
[CLEANUP] Removed old archive: market_20251005_032750.csv
[INFO] Results exported to: /Users/guillaumebld/Documents/Graduate_Research/Professor Abol Jalilvand/fall2025/risk_bank/risk_bank/data/outputs/datasheet/market_20251014_022125.csv
[INFO] Diagnostics appended to log: /Users/guillaumebld/Documents/Graduate_Research/Professor Abol Jalilvand/fall2025/risk_bank/risk_bank/data/logs/dd_pd_market_log.txt


In [16]:
# Generate summary statistics by year
print('[INFO] Generating summary statistics by year...')

# Filter to converged rows only for summary
converged_df = df[df['status_flag'] == 'converged'].copy()

if len(converged_df) > 0:
    # Calculate percentiles by year for DD_m and PD_m
    summary_data = []
    
    for year in sorted(converged_df['year'].unique()):
        year_data = converged_df[converged_df['year'] == year]
        
        # DD_m percentiles
        dd_percentiles = year_data['DD_m'].quantile([0.1, 0.25, 0.5, 0.75, 0.9])
        summary_data.append({
            'year': year,
            'metric': 'DD_m',
            'p10': dd_percentiles[0.1],
            'p25': dd_percentiles[0.25],
            'p50': dd_percentiles[0.5],
            'p75': dd_percentiles[0.75],
            'p90': dd_percentiles[0.9]
        })
    
    for year in sorted(converged_df['year'].unique()):
        year_data = converged_df[converged_df['year'] == year]
        
        # PD_m percentiles
        pd_percentiles = year_data['PD_m'].quantile([0.1, 0.25, 0.5, 0.75, 0.9])
        summary_data.append({
            'year': year,
            'metric': 'PD_m',
            'p10': pd_percentiles[0.1],
            'p25': pd_percentiles[0.25],
            'p50': pd_percentiles[0.5],
            'p75': pd_percentiles[0.75],
            'p90': pd_percentiles[0.9]
        })
    
    # Overall statistics
    dd_overall = converged_df['DD_m'].quantile([0.1, 0.25, 0.5, 0.75, 0.9])
    summary_data.append({
        'year': 'overall',
        'metric': 'DD_m',
        'p10': dd_overall[0.1],
        'p25': dd_overall[0.25],
        'p50': dd_overall[0.5],
        'p75': dd_overall[0.75],
        'p90': dd_overall[0.9]
    })
    
    pd_overall = converged_df['PD_m'].quantile([0.1, 0.25, 0.5, 0.75, 0.9])
    summary_data.append({
        'year': 'overall',
        'metric': 'PD_m',
        'p10': pd_overall[0.1],
        'p25': pd_overall[0.25],
        'p50': pd_overall[0.5],
        'p75': pd_overall[0.75],
        'p90': pd_overall[0.9]
    })
    
    # Create summary DataFrame
    summary_df = pd.DataFrame(summary_data)
    
    # Save summary to analysis directory
    summary_fp = base_dir / 'data' / 'outputs' / 'analysis' / f'market_{timestamp}_summary.csv'
    summary_df.to_csv(summary_fp, index=False)
    print(f'[INFO] Summary statistics saved to {summary_fp.name}')
    print(f'\nSummary preview:')
    print(summary_df.head(10))
else:
    print('[WARN] No converged rows found. Summary not generated.')


[INFO] Generating summary statistics by year...
[INFO] Summary statistics saved to market_20251014_022125_summary.csv

Summary preview:
   year metric           p10           p25           p50           p75  \
0  2016   DD_m  5.476170e+00  6.515713e+00  7.843591e+00  9.107503e+00   
1  2017   DD_m  4.708746e+00  5.567261e+00  7.115652e+00  8.551055e+00   
2  2018   DD_m  5.098445e+00  5.931710e+00  7.065331e+00  8.304750e+00   
3  2019   DD_m  5.244325e+00  6.091519e+00  7.658579e+00  8.980456e+00   
4  2020   DD_m  5.235378e+00  6.194743e+00  7.559966e+00  9.755017e+00   
5  2021   DD_m  1.928283e+00  2.321138e+00  2.879043e+00  3.619382e+00   
6  2022   DD_m  4.050001e+00  4.566353e+00  5.665929e+00  6.809619e+00   
7  2023   DD_m  4.378479e+00  4.947104e+00  5.950258e+00  7.694768e+00   
8  2016   PD_m  1.432983e-29  4.216354e-20  2.189579e-15  5.020230e-11   
9  2017   PD_m  8.457630e-25  9.406359e-18  5.569258e-13  1.308464e-08   

            p90  
0  1.140263e+01  
1  1.021542e+