# Initial Calibration — Sample Walkthrough

This notebook illustrates the diode-array calibration pipeline using the lightweight samples in `data-sample/initial_calibration_dad/`. It mirrors the processing performed by Stage A/B scripts and summarizes concentration outputs by UV dose.

In [None]:
from pathlib import Path
import sys

project_root = Path.cwd()
sys.path.append(str(project_root / 'src'))


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

sample_dir = project_root / 'data-sample' / 'initial_calibration_dad'
raw_dir = project_root / 'data/raw/initial_calibration/DAD_RAW_FILES'

corrected = pd.read_csv(sample_dir / 'DAD_derived_concentrations_corrected_sample.csv')
raw_auc = pd.read_csv(sample_dir / 'diode_array_auc_sample.csv')
print(f"Corrected rows: {len(corrected)} | Raw AUC rows: {len(raw_auc)}")
corrected.head()


Inspect the raw chromatogram areas to show the inputs feeding the calibration model.

In [None]:
raw_total = pd.read_csv(raw_dir / 'scytonemin_chromatogram_areas' / 'raw_total_scytonemin.csv')
raw_oxidized = pd.read_csv(raw_dir / 'scytonemin_chromatogram_areas' / 'raw_oxidized_scytonemin.csv')
raw_reduced = pd.read_csv(raw_dir / 'scytonemin_chromatogram_areas' / 'raw_reduced_scytonemin.csv')
raw_total.head()

Using the calibration fit (`calibration_total.json`), the trimmed mean concentration is computed as `slope * AUC + intercept`. The snippet below verifies the conversion for the sample data.

In [None]:
import json
calibration = json.loads((project_root / 'data/reference/initial_calibration/calibration_total.json').read_text())
calc_total_mg_ml = calibration['slope'] * corrected['auc_total_320_480'] + calibration['intercept']
comparison = pd.DataFrame({
    'sample_id': corrected['sample_id'],
    'predicted_total_mg_ml': corrected['predicted_total_mg_ml'],
    'recalc_total_mg_ml': calc_total_mg_ml
})
comparison.head()

Dry-weight concentrations (`mg/g DW`) derive from the volumetric prediction divided by the measured biomass. The following cell recomputes that step and checks agreement with the stored column.

In [None]:
derived_mg_per_g = calc_total_mg_ml / corrected['dry_biomass_g']
check = pd.DataFrame({
    'sample_id': corrected['sample_id'],
    'stored_mg_per_g': corrected['predicted_total_mg_per_gDW'],
    'recalc_mg_per_g': derived_mg_per_g
})
check.head()

Group the corrected concentrations by UV irradiance to reproduce the summary statistics that underpin the thesis figures.

In [None]:
summary = (
    corrected.groupby(['p_uva_mw_cm2', 'p_uvb_mw_cm2'])
    ['predicted_total_mg_per_gDW', 'predicted_oxidized_mg_per_gDW', 'predicted_reduced_mg_per_gDW']
    .agg(['mean', 'std'])
)
summary


Visualize the trimmed mean concentration versus UVB irradiance. The sample subset is small but reproduces the trend seen in the full dataset.

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
summary_reset = summary['predicted_total_mg_per_gDW']['mean'].reset_index()
ax.scatter(summary_reset['p_uvb_mw_cm2'], summary_reset['predicted_total_mg_per_gDW'], color='teal')
ax.set_xlabel('UVB irradiance (mW/cm²)')
ax.set_ylabel('Total concentration (mg/g DW)')
ax.set_title('Sample dose response (corrected totals)')
plt.show()


### Where to Go Next

- Execute `make reproduce` to rebuild the full calibration outputs into `data-processed/initial_calibration/`.
- Inspect raw chromatogram areas in `data/raw/initial_calibration/DAD_RAW_FILES/scytonemin_chromatogram_areas/` if deeper auditing is required.
- Use the Stage A/B scripts in `src/chromatography/` when running on the full dataset.