# **03. Reweighting MD simulations to fit the experimental data**

The **reweighting.py** Python script is used to apply a maximum-entropy-based bias to the structural ensembles that were also used for **calc_hdx.py**. Running **reweighting.py** will apply the minimum possible bias to the initial ensemble to match up with experimental HDX data with a given level of uncertainty . More details can be found in the publication below:

[Bradshaw, R. T., Marinelli, F. et al. (2020) 'Interpretation of HDX Data by Maximum-Entropy Reweighting of Simulated Structural Ensembles', Biophysical Journal, 118(7), 1649-1664](https://www.cell.com/biophysj/fulltext/S0006-3495(20)30124-7)

### **Necessary inputs**

- data_folders: directory/ies with calc_hdx output .tmp files of contacts & H-bonds per residue
- exp_file: target HDX data file
- kint_file: intrinsic rate file
- times: HDX timepoints in minutes
- gamma: gamma value

### **Gamma values**

Gamma values are proportional to the level of uncertainty included in the reweighting fit to the target data. See equation 7 of [Bradshaw, Marinelli et al.](https://www.cell.com/biophysj/fulltext/S0006-3495(20)30124-7) for further details.

As the gamma value increases, a greater bias will be applied to the underlying ensemble, the reweighting will fit more tightly to the target data, and the error between the final predicted HDX and the target experimental HDX will be reduced.

However, correct data fitting requires robustness checks. Overfitting, in which the predicted and target data agree more precisely than the true (unknown) level of uncertainty in the target data, can readily occur. We therefore perform multiple reweighting analyses, each with a unique gamma value, to select the optimal gamma value to generate a robust final structural ensemble for further analyses.

Gamma values used for this protocol range from $1*10^{-3}$ to $9*10^{0}$.

### **Example script**

Here is an example script, *gamma_10^-3.py*, that was used to run HDX reweighting using the predicted HDX deuterated fractions from **calc_hdx.py** and experimental HDX data.

In [1]:
#!/usr/bin/env python

import os

# Import the Maximum Entropy reweighting class
from HDXer.reweighting import MaxEnt

### Inputs ###

# A list of folders that contain the 'Contacts_' and 'Hbonds_' files from calc_hdx
folders = [ os.path.expandvars("$HDXER_PATH/protocol/BPTI/BPTI_calc_hdx") ]
# The path to the target experimental data file
expt = os.path.expandvars("$HDXER_PATH/protocol/BPTI/BPTI_expt_data/BPTI_expt_dfracs.dat")
# The path to the file containing intrinsic rates for each residue in your protein, generated by calc_hdx
rates = os.path.expandvars("$HDXER_PATH/protocol/BPTI/BPTI_calc_hdx/BPTI_Intrinsic_rates.dat")
# A list of timepoints in the experimental data (in minutes)
times = [ 0.167, 1.0, 10.0, 120.0 ]


### Running reweighting ###

# This loop will run reweighting for gamma values from 1 x 10^-3 to 9 x 10^-3
# Adapt it as necessary
exponent = -3
basegamma = 10**exponent

for multiplier in range(1, 10):
    reweight_object = MaxEnt(do_reweight=True, do_params=False, stepfactor=0.00001)
    reweight_object.run(gamma=(multiplier * basegamma), data_folders=folders, kint_file=rates, exp_file=expt, times=times, restart_interval=100, out_prefix=f'reweighting_gamma_{multiplier}x10^{exponent}_')
    print(f'Reweighting for gamma = {multiplier}x10^{exponent} completed')

# Help text describing options and how to call the reweighting functions
# is available in the docstrings of the MaxEnt class, e.g.:
#help(MaxEnt)
#help(MaxEnt.run)

Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 1x10^-3 completed
Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 2x10^-3 completed
Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 3x10^-3 completed
Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 4x10^-3 completed
Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 5x10^-3 completed
Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 6x10^-3 completed
Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 7x10^-3 completed
Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 8x10^-3 completed
Contacts read
Hbonds read
Segments and experimental dfracs read
Reweighting for gamma = 9x10^-3 completed


The range of gamma in this script is from $1*10^{-3}$ to $9*10^{-3}$. There are multiple wrapper scripts within the reweighting folder (e.g. *gamma_10^-2.py*, *gamma_10^-1.py*) to cover a large range of gammas.

Output files from running **reweighting.py** detail the final predicted deuterated fractions after reweighting and mean square deviation between the predicted and target data, the 'apparent work' applied as a bias to the ensemble as a whole, and the individual weights (i.e. probabilities) of each frame in the structural ensemble. Note that as a _reweighting_ protocol, these weights will never be reduced to zero. All frames present in the initial structural ensemble will be present in the final ensemble, with varying, positive, and finite relative weights.

Once HDX reweighting is completed for a large range of gamma values, we can plot a decision curve to see the results. This will be shown in the next notebook *04_decision_plot.ipynb*.