Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting PCA equilibration times for the other components #170

Open
batukav opened this issue Feb 24, 2024 · 10 comments
Open

getting PCA equilibration times for the other components #170

batukav opened this issue Feb 24, 2024 · 10 comments

Comments

@batukav
Copy link
Collaborator

batukav commented Feb 24, 2024

Hi all,

I'm interested in calculating the PCA equilibration times not only from (or for) the first principal component but from the first 10 components. What would I need to change in the code to do this?

I would like to create a eq_times.json file as an output that contains the equilibration times calculated for each of the top 10 principal components. @pbuslaev, any suggestions?

@pbuslaev
Copy link
Collaborator

You will have to rework the PCA computation a bit. Currently, only one PC is taken into account (check this function). Maybe the easiest solution, though not the nicest, is to add an argument coord to the get_proj method of PCA class:

def get_proj(self, data, pc = 0):
        # projection on PC
        proj = np.tensordot(cdata, self.eig_vecs[pc:pc + 1], axes=(1, 1)).T
        proj = np.concatenate(np.array(proj), axis=None)

        self.proj = proj

Then, you will need to iterate over PC components you're interested in in main. So instead of current loop, you'd likely want to have smth like

equilibration_times = {}
# Iterate over trajectories for different lipids
for traj in parser.concatenated_trajs:
    print(f"Main: parsing lipid {traj[4]}")
    # Creat PCA for trajectory
    pca_runner = PCA(traj[0], traj[1], traj[2], traj[3], parser.trjLen, pc)
    # Run PCA
    data = pca_runner.PCA()
    print("Main: PCA: done")
    _tmp_eq = []
    for pc in range(10):
        # Project trajectory on PC1
        pca_runner.get_proj(data, pc)
        print("Main: Projections: done")
        # Calculate autocorrelation
        pca_runner.get_autocorrelations()
        print("Main: Autocorrelations: done")
        # Estimate equilibration time
        te2 = TimeEstimator(pca_runner.autocorrelation).calculate_time()
        # NOTE: if you want pure time, do not divide by trjLen
        _tmp_eq.append(te2/parser.trjLen)
        print("Main: EQ time: done")
    equilibration_times[traj[4]] = _tmp_eq

Hope it helps

@batukav
Copy link
Collaborator Author

batukav commented Feb 25, 2024

Thank you, this helps a lot. I will make the necessary changes and will update this post. Do you see any value of adding this functionality to the databank?

@pbuslaev
Copy link
Collaborator

I am not sure how useful it would be for the whole databank. Out of interest, I would be happy to contribute to the large-scale application of PCA analysis to membranes (e.g. to the whole databank). I think there could be some interesting findings which can be published somewhere, but I am not sure that it would be aligned with the databank goals that much.

@batukav
Copy link
Collaborator Author

batukav commented Feb 27, 2024

Hello,

I have made the implementation as the following:

for the final loop

        equilibration_times = {}
        # Iterate over trajectories for different lipids
        for traj in parser.concatenated_trajs:
            print(f"Main: parsing lipid {traj[4]}")
            # Creat PCA for trajectory
            pca_runner = PCA(traj[0], traj[1], traj[2], traj[3], parser.trjLen)
            # Run PCA
            data = pca_runner.PCA()
            print("Main: PCA: done")
            _tmp_eq = []
            for pc in range(10):
                # Project trajectory on PC1
                pca_runner.get_proj(data, pc)
                print("Main: Projections: done")
                # Calculate autocorrelation
                pca_runner.get_autocorrelations()
                print("Main: Autocorrelations: done")
                # Estimate equilibration time
                te2 = TimeEstimator(pca_runner.autocorrelation).calculate_time()
                # NOTE: if you want pure time, do not divide by trjLen
                _tmp_eq.append(te2/parser.trjLen)
                print("Main: EQ time: done")
            equilibration_times[traj[4]] = _tmp_eq

and modified the get_proj() as

    def get_proj(self, cdata, pc = 0):
            # projection on PC
            proj = np.tensordot(cdata, self.eig_vecs[pc:pc + 1], axes=(1, 1)).T
            proj = np.concatenate(np.array(proj), axis=None)

            self.proj = proj

with these modifications, I managed to get 10 projections. When I run the code, I got exactly the same equilibration times for the PC0 as with the previous version of the code.

A curious thing is that when I applied the code to some other data (databank entry 2c6/ae5/2c6ae54d8ce9652f529cd674727a0602d06c8990/2c6ae54d8ce9652f529cd674727a0602d06c8990) it turned out that the first PC was not the slowest one:

{"POPC": [3.3065293017437223, 9.520027080952298, 6.154706020727084, 1.2030953056192617, 1.2769982402263582, 4.590255687551348, 1.3070063956914977, 3.1167834233800424, 1.6524583381178275, 5.383538786484999]}

For the others I tried (36e/333/36e333d9de810fdc8d0009df0e755225f7645161/36e333d9de810fdc8d0009df0e755225f7645161 and 478/b6c/478b6c725d0a406058482c7d80ee728165e4a884/d6eaf162a46bcbce0935cb07cf3927ba2f7bacbc also some other) the first component is indeed the slowest one.

I'm curious if these results make any sense, as I thought that the first component is always the slowest. Could it be that my implementation is wrong, or interpretation wrong, or both with varying degrees?

@pbuslaev
Copy link
Collaborator

Was the trajectory with strange results with Polarizable ff?

@batukav
Copy link
Collaborator Author

batukav commented Feb 28, 2024 via email

@pbuslaev
Copy link
Collaborator

I rememember, that when I tried PCA with polarizable force fields, I also observed this strange behaviour for some of the trajectories. Usually, it was due to insufficient sampling in polarizable simulations. I won't be able to tell more without additional details, unfortunately. My experience, that for such strange cases, running full PCA analysis can be really helpful to detect the origin of the problem.

@batukav
Copy link
Collaborator Author

batukav commented Feb 28, 2024

I see, that's also my hunch. What do you mean by the full PCA analysis? What would I need to do there?

@pbuslaev
Copy link
Collaborator

The analysis implemented in Databank only focuses on fast computation of equilibration times. But as described in the original paper, many more things, including structural ensembles, can be checked. So if you are interested in the behavior of a particular trajectory, full analysis can be helpful. But it will require deep dive into results, computed for a trajectory and thus can not be easily scaled.

@ohsOllila
Copy link
Member

Should we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants