# String Simulation Check

This notebook will help you analyse the convergence of the string-method and if you are lucky extract a nice free energy surface. 

In [None]:
import numpy as np
import glob as glob
import matplotlib.pyplot as plt
import matplotlib as mpl
import pickle
import os
import src.analysis as spc
#import logging
#logging.getLogger("blib2to3.pgen2.driver").setLevel(logging.WARNING)

In [None]:
%load_ext autoreload
%autoreload 2
%load_ext lab_black

# String Convergence Analysis

## Extract CVs

In the cell bellow you can select which will be the simulation directory (in case this notebook is elsewhere). If the notebook is in the simulation directory just leave it as ".".

In [None]:
%ls ../data/raw

In [None]:
simulation_directory = "/data/sperez/Projects/string_sims/data/raw/C2I_v1_amber/"
simulation_directory = "/data/sperez/Projects/string_sims/data/raw/C2I_lb_v1/"
simulation_directory = "/data/sperez/Projects/string_sims/data/raw/C2I_v1/"
simulation_directory = "/data/sperez/Projects/string_sims/data/raw/C2I_lb_v1_amber/"
os.chdir(simulation_directory)
os.getcwd()

In [None]:
%ls md

Load the strings in the `strings` variable.

In [None]:
files = spc.natural_sort(glob.glob("./strings/string[0-9]*txt"))

In [None]:
strings = np.array([np.loadtxt(file).T for file in files])

In [None]:
with open("cv.pkl", "rb") as file:
    cvs, ndx_groups = pickle.load(file)

In [None]:
print("String details")
print("")
print(f"Number of string: {strings.shape[0]}")
print(f"Number of cvs: {strings.shape[1]}")
print(f"Number of beads per string: {strings.shape[2]}")

# Analyze string convergence
In these next plots you will be able to study the convergence of the string. At convergence the strings should be oscillating around an equilibrium position and not drift over the different iterations.

## Strings as a function of time
In this plot we can see the evolution of each string CV as function of the timeration number separatelly.

You can change two parameters in these plots the `start_iteration` before which all data is not plotted and the `n_average` which is the number of strings iterations to average in one block of strings. This is done in order to cancel some of the noisyness in the representation, to reduce the number of strings in the plot and to see more clearly if there is average drift.

In [None]:
fig, ax = spc.strings_time_series(
    strings, ndx_groups, start_iteration=1, n_average=25, av_last_n_it=25
)

In [None]:
fig, ax = spc.rmsd_strings_time_series(strings, ndx_groups)

## Evolution over CVs that are a function of the cvs

If you are interested in studying the convergence of cvs that are a function of CVs (for example averaging over symmetrical distances). You can construct a `reduced_string` array in which cvs are a function of the cvs used for the string method. In the example bellow, we produce two cvs which are the mean of cvs used in the string method simulation. Then, similar plotting as before can be done. 

In addition if you are interested in the convergence of some other cv which is not a function of the cvs used in the string method you can also study them! Just extract the average value of that particular CV in the `md/*/*/restrained/traj_comp.xtc` for all the restrained simulation and shape them into an `reduced_string` numpy array with shape (n_iterations, n_cvs, n_beads).

If this sort of analysis is meaningless in your system, for example because the chosen cvs are very diagnostic, please ignore this section.

In [None]:
reduced_string = spc.strings_to_SF_IG(strings, [0, 1], [10, 11])
reduced_string_labels = ["SF (nm)", "IG (nm)"]

In [None]:
fig, ax = spc.two_cv_strings_time_series(
    reduced_string,
    reduced_string_labels,
    start_iteration=0,
    n_average=50,
    av_last_n_it=50,
)

In [None]:
fig, ax = spc.all_rmsd_strings_time_series(reduced_string, "RMSD[Reduced String] (nm)")

In [None]:
fig, ax = spc.all_rmsd_strings_time_series(strings, "RMSD[String] (nm)")