New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine why number of equilibrium iterations is almost always 1 #1000
Comments
@jchodera : I started investigating this today, and I have 1 finding and 2 questions. Here is the function in Four things happen in this function:
Finding 1: Since n_effective samples is largest at iteration 0, and Question 1: Why does Now that I'm familiar with these functions, I'm going to play around with swapping out the effective energy timeseries with either the state index replica timeseries or the d_u/d_lambda replica timeseries. |
As a follow up to question 2 above, I tried swapping out Here's the code I used for feeding in the state index replica timeseries: # Retrieve state index replica timeseries
analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=n_iterations)
timeseries_to_analyze = analyzer.read_energies()[-1].data
# Set parameters to be used in `get_equilibration_data_per_sample()`
fast = True
max_subset = 100
# Run code in `multistate.utils.get_equilibration_per_sample()`
# Cast to array if not already
series = np.array(timeseries_to_analyze)
# Special trap for constant series
time_size = n_iterations
set_size = time_size - 1 # Cannot analyze the last entry
# Set maximum
if max_subset is None or set_size < max_subset:
max_subset = set_size
# Special trap for series of size 1
if max_subset == 0:
max_subset = 1
# Special trap for constant or size 1 series
if series.std() == 0.0 or max_subset == 1:
print("std dev is 0")
g_i = np.ones([max_subset], np.float32)
n_effective_i = np.ones([max_subset], np.float32)
counter = np.arange(max_subset)
i_t = np.floor(counter * time_size / max_subset).astype(int)
for i, t in enumerate(i_t):
print(i)
try:
g_i[i] = timeseries.statisticalInefficiencyMultiple(series[:,t:], fast=fast)
except:
#g_i[i] = (time_size - t + 1)
raise Exception("Unable to run statisticalInefficiencyMultiple")
n_effective_i[i] = (time_size - t + 1) / g_i[i]
# Run code in MultiStateSamplerAnalyzer.get_equilibration_data()
t0 = 1
# Discard equilibration samples.
# TODO: if we include u_n[0] (the energy right after minimization) in the equilibration detection,
# TODO: then number_equilibrated is 0. Find a better way than just discarding first frame.
n_effective_max = n_effective_i.max()
i_max = n_effective_i.argmax()
n_equilibration = i_t[i_max] + t0 # account for initially discarded frames
g_t = g_i[i_max]
print(n_equilibration, g_t, n_effective_max) Output:
I then generated a d_u/d_lambda replica timeseries and fed it through the above code snippet. Here is the output:
|
Closing via choderalab/openmmtools#610 |
@zhang-ivy has noticed that the number of equilibrium iterations assessed by automatic equilibration detection is almost always 1. We should investigate why this is happening and whether there is a better timeseries to feed the automatic equilibration detection than the super-replica energy to better partition between transient initial equilibration and production.
The text was updated successfully, but these errors were encountered: