In [1]:
import pandas as pd
import numpy as np

## Propagating ΔG uncertainty from Rinnenthal et al. (2011) and extrapolating to 25 °C

We pulled thermodynamic parameters from **Rinnenthal et al. (2011)**. For each base pair/site, the paper reports:

- ΔH, σΔH  
- ΔS, σΔS  
- ΔG, σΔG  

In the publication, ΔG and σΔG are reported at **20 °C**. For our manuscript figure, we would like to **extrapolate ΔG to 25 °C** using the reported ΔH and ΔS.

### Why we need a covariance term
Free energy is computed as:

$$
\Delta G(T) = \Delta H - T\Delta S
$$

To propagate uncertainty in ΔG, we cannot assume ΔH and ΔS are independent, because they are obtained from the same fit and are typically strongly correlated. The full error propagation (including covariance) is:

$$
\sigma_{\Delta G}
=
\sqrt{
\sigma_{\Delta H}^2
+
T^2 \sigma_{\Delta S}^2
-
2T\,\mathrm{Cov}(\Delta H,\Delta S)
}
$$

### Back-calculating Cov(ΔH, ΔS) from the reported σΔG at 20 °C
Because the authors report σΔG at 20 °C, we back-calculate the covariance that makes their reported uncertainty consistent with the full propagation equation.

Starting from:

$$
\mathrm{Var}(\Delta G)
=
\mathrm{Var}(\Delta H)
+
T^2\,\mathrm{Var}(\Delta S)
-
2T\,\mathrm{Cov}(\Delta H,\Delta S)
$$

Re-arranging to isolate the covariance term:

$$
2T\,\mathrm{Cov}(\Delta H,\Delta S)
=
\mathrm{Var}(\Delta H)
+
T^2\,\mathrm{Var}(\Delta S)
-
\mathrm{Var}(\Delta G)
$$

Dividing both sides by \(2T\):

$$
\mathrm{Cov}(\Delta H,\Delta S)
=
\frac{
\mathrm{Var}(\Delta H)
+
T^2\,\mathrm{Var}(\Delta S)
-
\mathrm{Var}(\Delta G)
}{2T}
$$

Written explicitly in terms of standard errors:

$$
\mathrm{Cov}(\Delta H,\Delta S)
=
\frac{
\sigma_{\Delta H}^2
+
T^2\,\sigma_{\Delta S}^2
-
\sigma_{\Delta G}^2
}{2T}
$$

### Unit consistency
⚠️ **Units must be consistent.** In our tables, ΔH is reported in kcal/mol and ΔS in cal/mol/K, so entropy must be converted to kcal/mol/K before applying the equations:

$$
\sigma_{\Delta S}^{(\mathrm{kcal})}=\frac{\sigma_{\Delta S}^{(\mathrm{cal})}}{1000}
$$

The implemented form is therefore:

$$
\mathrm{Cov}\!\left(\Delta H,\frac{\Delta S}{1000}\right)
=
\frac{
\sigma_{\Delta H}^2
+
T^2\left(\frac{\sigma_{\Delta S}}{1000}\right)^2
-
\sigma_{\Delta G}^2
}{2T}
$$

### Extrapolating to 25 °C
Once Cov(ΔH, ΔS) is back-calculated at 20 °C (T = 293.15 K), we treat it as the covariance implied by the authors’ fit and use it to compute ΔG(T) and σΔG(T) at **25 °C** (T = 298.15 K).

### Note on numerical mismatch at 20 °C
Recomputing ΔG at 20 °C from the reported ΔH and ΔS does not exactly reproduce the published ΔG value. However, the discrepancy is within the reported uncertainty, and σΔG is reproduced closely. We therefore treat this difference as negligible and attribute it to rounding and precision differences in the published table.

In [3]:
import math

def cov_dH_dS(
    sigma_dH,
    sigma_dS,
    sigma_dG,
    T=293.15,
    entropy_units="cal"
):
    """
    Compute Cov(ΔH, ΔS) from propagated errors in ΔG.

    Parameters
    ----------
    sigma_dH : float
        Std. error of ΔH (kcal/mol)
    sigma_dS : float
        Std. error of ΔS (cal/mol/K by default)
    sigma_dG : float
        Std. error of ΔG (kcal/mol)
    T : float
        Temperature in Kelvin (default = 293.15 K, 20 °C)
    entropy_units : {"cal", "kcal"}
        Units of sigma_dS

    Returns
    -------
    cov : float
        Covariance Cov(ΔH, ΔS)
        Units:
        - kcal^2/mol^2/K if entropy_units="kcal"
        - kcal·cal/mol^2/K if entropy_units="cal"
    """

    # Convert entropy uncertainty to kcal/mol/K if needed
    if entropy_units == "cal":
        sigma_dS_kcal = sigma_dS / 1000.0
    elif entropy_units == "kcal":
        sigma_dS_kcal = sigma_dS
    else:
        raise ValueError("entropy_units must be 'cal' or 'kcal'")

    cov_kcal = (
        sigma_dH**2
        + (T * sigma_dS_kcal)**2
        - sigma_dG**2
    ) / (2 * T)

    # Return in requested entropy units
    if entropy_units == "cal":
        return cov_kcal * 1000.0
    else:
        return cov_kcal

def calc_dG(
    dH,
    dS,
    T=293.15,
    entropy_units="cal"
):
    """
    Calculate ΔG from ΔH and ΔS.

    Parameters
    ----------
    dH : float
        Enthalpy change ΔH (kcal/mol)
    dS : float
        Entropy change ΔS (cal/mol/K by default)
    T : float
        Temperature in Kelvin (default = 293.15 K, 20 °C)
    entropy_units : {"cal", "kcal"}
        Units of dS
    Returns
    -------
    dG : float
        Gibbs free energy change ΔG (kcal/mol)
    """
    
    # Convert entropy to kcal/mol/K if needed
    if entropy_units == "cal":
        dS_kcal = dS / 1000.0
    elif entropy_units == "kcal":
        dS_kcal = dS
    else:
        raise ValueError("entropy_units must be 'cal' or 'kcal'")

    dG = dH - T * dS_kcal
    return dG

def calc_dG_error(
    sigma_dH,
    sigma_dS,
    cov_dH_dS,
    T=293.15,
    entropy_units="cal"
):
    """
    Calculate the propagated error in ΔG from errors in ΔH and ΔS.

    Parameters
    ----------
    sigma_dH : float
        Std. error of ΔH (kcal/mol)
    sigma_dS : float
        Std. error of ΔS (cal/mol/K by default)
    cov_dH_dS : float
        Covariance Cov(ΔH, ΔS)
    T : float
        Temperature in Kelvin (default = 293.15 K, 20 °C)
    entropy_units : {"cal", "kcal"}
        Units of sigma_dS and cov_dH_dS
    Returns
    -------
    sigma_dG : float
        Std. error of ΔG (kcal/mol)
    """
    
    # Convert entropy uncertainty and covariance to kcal/mol/K if needed
    if entropy_units == "cal":
        sigma_dS_kcal = sigma_dS / 1000.0
        cov_dH_dS_kcal = cov_dH_dS / 1000.0
    elif entropy_units == "kcal":
        sigma_dS_kcal = sigma_dS
        cov_dH_dS_kcal = cov_dH_dS
    else:
        raise ValueError("entropy_units must be 'cal' or 'kcal'")

    sigma_dG = math.sqrt(
        sigma_dH**2
        + (T * sigma_dS_kcal)**2
        - 2 * T * cov_dH_dS_kcal
    )

    return sigma_dG

In [8]:
# Numbers taken directly from publication
# dH in kJ/mol, dS in J/mol K, dG in kJ/mol

imino_ref = pd.read_csv('fourU_imino_ref_Rinnenthal2011.csv')
imino_ref['cov_dH_dS'] = imino_ref.apply(
    lambda row: cov_dH_dS(
        sigma_dH=row['dH_err'],  # convert
        sigma_dS=row['dS_err'],  # convert
        sigma_dG=row['dG_err'],  # convert
        T=293.15,  # 20 C
        entropy_units="cal" # actually J
    ),
    axis=1
).round(2) / 4.184  # convert kJ to kcal

imino_ref['dG_calc_25'] = imino_ref.apply(
    lambda row: calc_dG(
        dH=row['dH'],
        dS=row['dS'],
        T=298.15,  # 20 C
        entropy_units="cal" # actually J
    ),
    axis=1
).round(2) / 4.184  # convert kJ to kcal

imino_ref['dG_calc_25_err'] = imino_ref.apply(
    lambda row: calc_dG_error(
        sigma_dH=row['dH_err'],  # convert
        sigma_dS=row['dS_err'],  # convert
        cov_dH_dS=row['cov_dH_dS'],  # convert
        T=297.15,  # 20 C
        entropy_units="cal" # actually J
    ),
    axis=1
).round(2) / 4.184  # convert kJ to kcal


imino_ref['dG_calc_20'] = imino_ref.apply(
    lambda row: calc_dG(
        dH=row['dH'],
        dS=row['dS'],
        T=293.15,  # 20 C
        entropy_units="cal" # actually J
    ),
    axis=1
).round(2)

imino_ref['dG_calc_20_err'] = imino_ref.apply(
    lambda row: calc_dG_error(
        sigma_dH=row['dH_err'],  # convert
        sigma_dS=row['dS_err'],  # convert
        cov_dH_dS=row['cov_dH_dS'],  # convert
        T=293.15,  # 20 C
        entropy_units="cal" # actually J
    ),
    axis=1
).round(2) / 4.184  # convert kJ to kcal

In [11]:
# filter WT
imino_ref_wt = imino_ref[imino_ref['var'] == 'WT']
imino_ref_wt = imino_ref_wt[['site_comp', 'nt_comp', 'dG', 'dG_err', 'dG_calc_20', 'dG_calc_20_err', 'dG_calc_25', 'dG_calc_25_err']].sort_values('site_comp')

# convert dG and dG_err from kJ to kcal
imino_ref_wt['dG'] = (imino_ref_wt['dG'])/ 4.184
imino_ref_wt['dG_err'] = (imino_ref_wt['dG_err']) / 4.184

# rename site_comp to site and nt_comp to base
imino_ref_wt = imino_ref_wt.rename(columns={'site_comp': 'site', 'nt_comp': 'base'})
imino_ref_wt.to_csv('fourU_imino_ref_WT_Rinnenthal2011_processed.csv', index=False)

In [12]:
# filter WT
imino_ref_wt = imino_ref[imino_ref['var'] == 'A8C']
imino_ref_wt = imino_ref_wt[['site_comp', 'nt_comp', 'dG', 'dG_err', 'dG_calc_20', 'dG_calc_20_err', 'dG_calc_25', 'dG_calc_25_err']].sort_values('site_comp')

imino_ref_wt['dG'] = (imino_ref_wt['dG'])/ 4.184
imino_ref_wt['dG_err'] = (imino_ref_wt['dG_err']) / 4.184

# rename site_comp to site and nt_comp to base
imino_ref_wt = imino_ref_wt.rename(columns={'site_comp': 'site', 'nt_comp': 'base'})
imino_ref_wt.to_csv('fourU_imino_ref_A8C_Rinnenthal2011_processed.csv', index=False)