New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of data ran with other than Gromacs codes #5
Comments
Hello, I can help with implementing OpenMM and possibly for Amber data. I think in the long run it would be very helpful for us to create a separate Python library that contains all the functions we are using, both for building and analyzing the data bank, and rewriting the AddData.py to use these functions. For instance, the analysis available in AddData.py around the lines 800 and afterwards won't be possible with OpenMM and Amber files as there is no tpr file from them. |
I was looking at the AddData.py code with this issue in mind and below are my thoughts:
In conclusion, making the AddData.py to work with other programs should not be very complicated. @batukav do you have OpenMM trajectories in Zenodo which we could start to work with? |
@ohsOllila I have one pure POPE trajectory uploaded and ready for further analysis at 10.5281/zenodo.4665773. 2- There's already an implementation in MDAnalysis https://github.com/MDAnalysis/MDAnalysisCookbook/blob/master/examples/membrane-leaflets.py Maybe this could be an easier for our purposes. 3- I can make the necessary implementation for the OpenMM. 5- To make molecules whole, we can try using MDAnalysis again. Then we can get rid of gmx tools for this point. |
2- I do not think that we need modifications in the part which calculates the number of lipids in leaflets. Only thing that needs to be done is to have the structure file in line 845. Where this comes from (creation or input) will be different depending on the program, but this should not be difficult. 5- If you have a good idea how to do it, it would be great. I think that the first step would be to try to add the data you mentioned. Only parts which do not seem completely straightforward are to decide which files are compulsory for openMM (i.e. creating the dictionary), and how molecules will be made whole (we can also request people to have molecules in whole in other than Gromacs trajectories if needed). Do you want to try to modify the code and add the data? |
5- I'll work on it. Yes, in the following days I'll be analyzing the POPE data for the NMRLipids VI project. In the meantime I can try to figure out the compulsory files from the OpenMM data |
I created "check_bond.py" that takes the trajectory, checks the bond lengths, and if a bond length is larger than 0.2 nm performs a simple wrapping using the MDAnalysis. Centering the bilayer without actually visualizing the trajectory is a bit tricky (at least for me). However, for the OP calculation, if I remember correctly, all we need is a wrapped trajectory. We can keep improving the centering options, but as far as I can see it now, the wrapping issue is solved. Please let me know if there are problems with the script. |
Quality evalution merged to main branch now
Basic analysis codes (order parameters, area per lipid, form factor and thickness) are now implemented for Gromacs and openMM. For gromacs, the analysis codes tries to use tpr, but if this does not work due to version dependecies, a gro file is created with Gromacs commands and the created gro is used with MDAnalysis. For openMM, a pdb file is given instead of tpr and pdb is used with MDAnalysis. However, this is not yet updated in the instructions. For example, see are per lipid calculation in calcALP.py. I think that similar idea should work also for other programs. However, I am not aware of a code that could be used to check that molecules are whole and that there no inconvenient jumps over pbc in z-direction for other than Gromacs simulations. Therefore, the requirements for other than Gromacs trajectories is more strict currently (this is also not yet in the instructions). Because this discussion is now overlapping with another issue "Preferred topology field in the input files?" issue, I will close the issue #10 and we can continue discussion in here. |
Have you considered using MDAnalysis transformations with topology provided? At least for me it works fine: For |
Thanks, this looks useful. However, I am not sure which file we should use as topology with openMM trajectories? Currently we used pdb, but bonds are probably not sufficiently defined there. |
From |
I remember trying the By simply adding a |
Actually, I now think that we can even overcome providing the topology. If we know what is the lipid name and what the force field is we actually know which atoms are connected (this information can also be stored in the mapping file for instance). Then we can upload use |
Wouldn't this approach have problems if there are custom made/non-standard residues and/or different types of lipids (like glycolipids etc.)? Asking for a topology from the user wouldn't be too much to request, I suppose. |
For a future public reference; what should be the command for centering the bilayere center of mass at the origin and wrapping the remaining atoms? |
Yes, I think that we should definitely rather ask the required topology files from the data contributor rather than try to reproduce the information. Question is that which files do we exactly need, for example, for OpenMM? For example, in the recent data uploaded by @batukav (https://doi.org/10.5281/zenodo.7415922, https://doi.org/10.5281/zenodo.7418619), there are some xml files but not psf files. Are these xml files enough? Are both of those needed? If making molecules whole in a reasonable and robust way in these simulations is not feasible, we can also accept only OpenMM simulations with molecules already whole (which is the case in all current OpenMM simulations in the databank). For other analyses I have done from the databank (standard + the ones in the manuscript), pdb for OpenMM has been sufficient assuming that the molecules are whole. This would probably apply also to other programs, such as Amber from which there is also data in Zenodo. For Amber we have the same questions: Do we ask some topology file, and if yes, which one? Or do we just ask pdb and require that molecules are whole? |
With OpenMM one can generate the topology information within the Python scripts that are used to run the simulation. Therefore sometimes one does not need to create a separate topology file and read it into the simulation. This was the approach taken by the AMOEBA script I have used. The xml files are the force field files. It should be indeed possible for us to parse the force field file to get the bond information and use this to check from the pdb file + trajectory if the molecules are whole. I don't know how fast we can come up with an implementation. Also, this is a rather edge case that should not be a high priority, in my opinion. That being said, I will check if I can somehow save the topology information from OpenMM so that I can update the Zenodo files. Current order parameter script calculates the bond lengths and raises a warning if this is beyond a threshold. We can urge the users to upload a topology file and this should not be a problem in a large majority of the cases. For the uploads without a topology information, we can raise a warning in the scripts saying that we cannot make the molecules whole therefore this and that functionality of the databank won't work.
Simulations from the AMBER MD engine requires |
OpenMM simulations with molecules already whole (which is the case in all current OpenMM simulations in the databank). |
I have now enabled addition of simulations ran with NAMD program in exactly the same way as openMM simulations. This seems to work. This might be the way to go also for data with other programs. We can try this when the first contributions with other programs will come. |
I will close this. If there will be data from new programs, we can open new issues specifically for each program. |
AddData.py already contains the dictionaries also for other than Gromacs simulations. However, the code probably does not fully work with those. This should be fixed at least for OpenMM and Amber data that we already have in Zenodo.
The text was updated successfully, but these errors were encountered: