Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts used to generate these files #3

Closed
jchodera opened this issue Sep 26, 2014 · 6 comments
Closed

Add scripts used to generate these files #3

jchodera opened this issue Sep 26, 2014 · 6 comments

Comments

@jchodera
Copy link
Contributor

@davidlmobley : We should capture the scripts you used to generate various parts of this repository.

@davidlmobley
Copy link
Member

OK. The most useful one at present is probably the one which makes v0.31
out of v0.3, which is attached. Basically the whole construction so far has
been a series of different one-off scripts which pull different files and
info from different places and make updates to the dataset (for example see
the "add_504_to_dataset.py" script attached, which added the compounds from
the 504 molecule set to the dataset when I was first building it). I can
dump all of these on you if you like but they probably wouldn't be much use.

On the other hand, perhaps there are specific tasks you want to be able to
do and need code snippets for. So if that's the case I'm happy to either
just send you those code snippets, or dump the whole set of scripts on you
and let you look through for what you need. Let me know what you prefer.

I think going forward it will make sense to make further updates more along
the lines of the strategy you've suggested, which is to make the pickle
file the definitive source, and re-generate all files from the pickle file.
(Though, doing the charge calculations every time may be a non-trivial
computational expense).

However, one thing to think about is what will happen AFTER we repeat all
of the hydration free energy calculations using freshly generated files.
After that point, we will need to do one of the folowing:
a) stop re-generating charges/parameter files every time, to avoid changing
these files so they no longer match those used for the calculations
b) always regression test the new charges/parameter files against the old
ones so that they don't change
c) specify what version of the parameter files the calculated values were
done with so as to be able to tolerate updates to the parameter files
without having to update the calculations (though I don't like this since
then you'd have a database containing calculated values connected with
parameter files from a different version of the database)
d) set it up to automatically repeat any calculations anytime any of the
parameter files change (which would need to be coupled with b)

Thanks,
David

On Fri, Sep 26, 2014 at 8:41 AM, John Chodera notifications@github.com
wrote:

@davidlmobley https://github.com/davidlmobley : We should capture the
scripts you used to generate various parts of this repository.


Reply to this email directly or view it on GitHub
#3.

David Mobley
dmobley@gmail.com
949-385-2436

@jchodera
Copy link
Contributor Author

To me, the most critical script is the one that generates groups.txt, since I have no idea how to generate this with tools I know about.

@davidlmobley
Copy link
Member

Ah, this is very simple. Checkmol, from Haider. Here's the Python:

#MAJOR STEP: Run checkmol on the compound to store functional groups
groups = commands.getoutput('checkmol mol2files_sybyl/%s.mol2' % cid )
#Break at newlines to separate groups
groups = groups.split('\n')
#Clean by removing 'compound' from group names where it's present
(unnecessary)
groups = [ group.replace(' compound','') for group in groups ]
#Store to dictionary
database[cid]['groups'] = groups

David

On Fri, Sep 26, 2014 at 10:54 AM, John Chodera notifications@github.com
wrote:

To me, the most critical script is the one that generates groups.txt,
since I have no idea how to generate this with tools I know about.


Reply to this email directly or view it on GitHub
#3 (comment).

David Mobley
dmobley@gmail.com
949-385-2436

@jchodera
Copy link
Contributor Author

Here't the checkmol page:
http://merian.pch.univie.ac.at/~nhaider/cheminf/cmmm.html

@jchodera
Copy link
Contributor Author

WHOA. Checkmol is written in PASCAL.

@davidlmobley
Copy link
Member

Hahaha. Wow. Yeah, I've never thought it is the perfect tool for doing
this. But, it does give me something useful.

David

On Fri, Sep 26, 2014 at 11:45 AM, John Chodera notifications@github.com
wrote:

WHOA. Checkmol is written in PASCAL.


Reply to this email directly or view it on GitHub
#3 (comment).

David Mobley
dmobley@gmail.com
949-385-2436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants