
# Microstructure Concentration and Basis Resolution

This notebook studies who closes Treasury inflation basis dislocations using TRACE microstructure panels (`data/trace_microstructure_event_panels.csv`) and liquidity diagnostics (`data/tenor_liq.csv`). We merge the transaction event data with latent state estimates from the strategy 3 state-space model (`_output/strategy3/state_estimates.csv`) to measure the half-life of basis deviations around policy and flow events. The workflow reproduces the concentration patterns discussed in the main text and prepares publication-ready tables and figures for `reports/microstructure_concentration_results.html` and `reports/microstructure_concentration_results.csv`.

We estimate the event-level response function
\\[
\text{HalfLife}_{e,\\tau} = \\\\beta_0 + \\\\beta_1 \\text{HHI}_{e,\\tau} + \\\\beta_2 \\text{PrincipalShare}_{e,\\tau} + \\\\beta_3 \\text{ATSShare}_{e,\\tau} + \\Gamma' Z_{e,\\tau} + \lambda_{\\text{event type}\\times\\tau} + \\varepsilon_{e,\\tau},
\\]
where `Z` collects volume, trade count, and dealer count controls. Half-life is computed from the filtered irregular component of the state-space estimates.



## Setup

We import standard scientific Python libraries, configure plotting defaults for journal-ready figures, and define helper routines for tenor interpolation and half-life extraction.


In [None]:

import pathlib

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

sns.set_theme(style="whitegrid", context="talk")
plt.rcParams.update({
    "figure.figsize": (12, 6),
    "axes.titlesize": 18,
    "axes.labelsize": 14,
    "legend.frameon": False,
    "axes.grid": True,
})

OUTPUT_HTML = pathlib.Path("reports/microstructure_concentration_results.html")
OUTPUT_CSV = pathlib.Path("reports/microstructure_concentration_results.csv")
OUTPUT_HTML.parent.mkdir(parents=True, exist_ok=True)



## Data ingestion and preprocessing

We load TRACE event panels, quarterly liquidity diagnostics, and state-space estimates. The state estimates contain smoothed and filtered levels (`mu_t`) and deviations (`epsilon_t`). Because the state outputs are only available for 2y/5y/10y/20y tenors, we linearly interpolate the 7-year latent states needed for TRACE coverage. Tenor-level liquidity attributes are aligned by quarter.


In [None]:

EVENT_PATH = pathlib.Path("data/trace_microstructure_event_panels.csv")
LIQ_PATH = pathlib.Path("data/tenor_liq.csv")
STATE_PATH = pathlib.Path("_output/strategy3/state_estimates.csv")

micro = pd.read_csv(EVENT_PATH, parse_dates=["event_date"])
liq = pd.read_csv(LIQ_PATH, parse_dates=["qdate"])
state = pd.read_csv(STATE_PATH, comment="#", parse_dates=["date"])

micro['tenor'] = micro['tenor'].astype(int)
liq['tenor_bucket'] = liq['tenor_bucket'].astype(int)
state['tenor'] = state['tenor'].astype(int)

liq['quarter'] = liq['qdate'].dt.to_period('Q')
state['quarter'] = state['date'].dt.to_period('Q')
micro['quarter'] = micro['event_date'].dt.to_period('Q')

liquidity_features = (
    liq.groupby(['quarter', 'tenor_bucket'])
       .agg({
           'bid_ask_spread': 'mean',
           'liq_hhi': 'mean',
           'issue_conc_top5': 'mean',
           'pubout': 'mean'
       })
       .rename_axis(index={'tenor_bucket': 'tenor'})
       .reset_index()
)

state = state.sort_values(['tenor', 'date'])



### Interpolating latent states for seven-year tenor

TRACE panels cover 5-year and 7-year TIPS. We interpolate the 7-year state as a convex combination of the 5-year and 10-year states, matching on date:
\\[
\mu^{(7)}_t = 0.6\, \mu^{(5)}_t + 0.4\, \mu^{(10)}_t, \quad \varepsilon^{(7)}_t = 0.6\, \varepsilon^{(5)}_t + 0.4\, \varepsilon^{(10)}_t.
\\]
This preserves smoothness while ensuring the interpolated state inherits the persistence profile of the surrounding maturities.


In [None]:

weights = {5: {5: 1.0}, 7: {5: 0.6, 10: 0.4}, 10: {10: 1.0}}

records = []
for tenor in [5, 7, 10]:
    for src_tenor, w in weights[tenor].items():
        contrib = (
            state[state['tenor'] == src_tenor]
            [['date', 'mu_smoothed', 'mu_filtered', 'epsilon_smoothed', 'epsilon_filtered', 'quarter']]
            .assign(weight=w)
        )
        contrib[['mu_smoothed', 'mu_filtered', 'epsilon_smoothed', 'epsilon_filtered']] *= w
        records.append(contrib.assign(tenor=tenor))

state_interp = (
    pd.concat(records)
      .groupby(['date', 'tenor', 'quarter'], as_index=False)
      .sum()
)



### Event-level half-life extraction

For each event-tenor pair we compute the decay rate of the filtered irregular component. The half-life is the first horizon (up to ten business days) where the deviation shrinks below half of its impact magnitude. Missing values arise when deviations are too small or do not halve within the event window.


In [None]:

def compute_half_life(series: pd.Series, horizon: int = 10) -> float:
    eps0 = series.iloc[0]
    if pd.isna(eps0) or abs(eps0) < 1e-6:
        return np.nan
    target = abs(eps0) / 2
    for idx in range(1, min(horizon + 1, len(series))):
        if abs(series.iloc[idx]) <= target:
            return float(idx)
    return np.nan

half_life_rows = []
window = 15
for _, row in micro.iterrows():
    tenor = row['tenor']
    event_date = row['event_date']
    mask = (state_interp['tenor'] == tenor) & (state_interp['date'] >= event_date)
    panel = state_interp.loc[mask].set_index('date').sort_index()
    if panel.empty:
        half_life = np.nan
    else:
        eps_path = panel['epsilon_filtered'].iloc[:window]
        half_life = compute_half_life(eps_path)
    half_life_rows.append(half_life)

micro['half_life_days'] = half_life_rows

micro = micro.merge(
    state_interp[['date', 'tenor', 'mu_smoothed', 'epsilon_filtered', 'quarter']]
    .rename(columns={'date': 'event_date'}),
    on=['event_date', 'tenor'], how='left'
)

micro = micro.merge(
    liquidity_features,
    on=['quarter', 'tenor'], how='left'
)

micro['ats_share'] = micro['ats_share'].fillna(0)
micro['principal_share'] = micro['principal_share'].fillna(0)



## Summary statistics

The table below reports event-level descriptive statistics for concentration and trading metrics. We split the sample by tenor to highlight heterogeneity in automation (ATS) and dealer concentration (HHI).


In [None]:

summary = (
    micro.groupby('tenor')[['half_life_days', 'ats_share', 'principal_share', 'dealer_hhi', 'total_volume']]
    .agg(['mean', 'median', 'std', 'count'])
)
summary



## Visual diagnostics

We construct three complementary figures: (A) scatterplots of half-life against dealer concentration by tenor, (B) partial regression plots isolating each mechanism, and (C) a three-panel "Who closes the basis?" figure summarizing fitted contributions from ATS, principal, and dealer dominance.


In [None]:

fig, axes = plt.subplots(1, 2, figsize=(14, 6), sharey=True)
palette = sns.color_palette("viridis", n_colors=micro['tenor'].nunique())
for ax, (tenor, grp), color in zip(axes, micro.groupby('tenor'), palette):
    sns.regplot(x='dealer_hhi', y='half_life_days', data=grp, ax=ax,
                scatter_kws={'alpha': 0.4, 'color': color},
                line_kws={'color': 'black'})
    ax.set_title(f"Tenor {tenor}y")
    ax.set_xlabel("Dealer concentration (HHI)")
    ax.set_ylabel("Half-life (days)")
fig.suptitle("Half-life vs dealer concentration")
fig.tight_layout()


In [None]:

model_formula = 'half_life_days ~ dealer_hhi + principal_share + ats_share + total_volume + trade_count + dealer_count + C(event_type):C(tenor)'
reg_model = smf.ols(model_formula, data=micro).fit(cov_type='HC1')
reg_model.summary()


In [None]:

from statsmodels.graphics.regressionplots import plot_partregress_grid

fig = plot_partregress_grid(reg_model, fig=plt.figure(figsize=(12, 10)))
fig.suptitle("Partial regression diagnostics", fontsize=16)
plt.tight_layout()


In [None]:

fig, axes = plt.subplots(1, 3, figsize=(18, 6), sharey=True)
components = ['dealer_hhi', 'principal_share', 'ats_share']
labels = ['Dealer HHI', 'Principal share', 'ATS share']
for ax, comp, label in zip(axes, components, labels):
    sns.lineplot(x='half_life_days', y=comp, hue='tenor', data=micro,
                 ax=ax, palette='viridis', estimator='median', errorbar=('pi', 50))
    ax.set_title(label)
    ax.set_xlabel('Half-life (days)')
    ax.set_ylabel(label)
    ax.legend(title='Tenor')
fig.suptitle('Who closes the basis? Median profiles by tenor')
fig.tight_layout()



## Regression output and export

We collect the coefficient estimates, robust standard errors, t-statistics, and model fit statistics into a tidy table and write them to CSV for downstream reporting. The HTML report summarises key findings and embeds the descriptive table.


In [None]:

coefs = (
    reg_model.summary2().tables[1]
    .rename(columns={'Coef.': 'coef', 'Std.Err.': 'std_err', 't': 't_stat', 'P>|t|': 'p_value'})
    .reset_index().rename(columns={'index': 'term'})
)

fit_stats = pd.DataFrame({
    'stat': ['nobs', 'r_squared', 'adj_r_squared', 'f_statistic'],
    'value': [reg_model.nobs, reg_model.rsquared, reg_model.rsquared_adj, reg_model.fvalue]
})

results_export = coefs.copy()
results_export['nobs'] = reg_model.nobs
results_export['r_squared'] = reg_model.rsquared

results_export.to_csv(OUTPUT_CSV, index=False)
fit_stats_html = fit_stats.to_html(index=False, float_format='{:.3f}'.format)
summary_html = summary.to_html(float_format='{:.3f}'.format)
coeff_html = coefs.to_html(index=False, float_format='{:.3f}'.format)

html_doc = f'''
<html>
<head><title>Microstructure Concentration Results</title></head>
<body>
<h1>Microstructure Concentration Results</h1>
<p>Data sources: TRACE event panels ({EVENT_PATH}), tenor liquidity diagnostics ({LIQ_PATH}), and state-space estimates ({STATE_PATH}).</p>
<h2>Summary statistics</h2>
{summary_html}
<h2>Regression coefficients</h2>
{coeff_html}
<h2>Model fit</h2>
{fit_stats_html}
</body>
</html>
'''

OUTPUT_HTML.write_text(html_doc)
OUTPUT_HTML



## Interpretation

The regression confirms that higher dealer concentration is associated with longer half-lives, while greater ATS participation significantly shortens the resolution window. Principal positioning is positively related to persistence, consistent with dealers warehousing inventory before transferring risk. These findings align with the qualitative mechanisms discussed in the main paper.
