Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a manual template? #72

Open
rafwiewiora opened this issue Apr 6, 2016 · 3 comments
Open

Add a manual template? #72

rafwiewiora opened this issue Apr 6, 2016 · 3 comments

Comments

@rafwiewiora
Copy link
Member

I want to add a manual template to the modeling (unpublished structure). @sonyahanson John said you've done this before? Do you have any quick pointers / script?

Thanks so much!

@rafwiewiora
Copy link
Member Author

Ok, so this is pretty simples, yet hacky unfortunately. All you need is, after ensembler gather_templates and before ensembler align:

  • into templates/structures-resolved provide a cleaned up pdb
  • append a FASTA sequence of the PDB's topology to 'templates/templates-resolved-seq.fa'

So for now, I input the manual PDB in a manual_pdbs directory and run this between gather_templates and align:

import mdtraj as md

# put manual pdbs to be added in manual_pdbs/
manual_pdbs = ['TDIX.pdb']

for pdb in manual_pdbs:
    traj = md.load('manual_pdbs/' + pdb)
    protein_atoms = traj.top.select('protein')
    traj = traj.atom_slice(protein_atoms)
    traj.save('templates/structures-resolved/SETD8_HUMAN_%s_A.pdb' % pdb.split('.')[0])
    resolved_seq = traj.top.to_fasta()[0]
    f = open('templates/templates-resolved-seq.fa', 'a')
    f.write('\n>SETD8_HUMAN_%s_A\n' % pdb.split('.')[0])
    f.write(resolved_seq)
    f.write('\n')
    f.close()

The only obstacle I had for just passing the PDB to Ensembler API was the use of SIFTS files - will need to write something extracting the appropriate features from the PDB. Let me think about this a bit more and propose something (ultimately I think having to call gather_templates twice would be good - e.g. first gather from Uniprot, then ensembler gather_templates --gather_from manual_pdb --structure_path X).

@danielparton do you have any suggestions? I was wondering a couple of things about the code:

  • what's the significance of the templates-full-seq.fa file? Here I'm skipping writing the full sequence of my manual PDB there with no problems
  • you remove some residues before making the FASTA for templates-resolved-seq.fa - particularly mutations - why is this important? Here I've just pasted the FASTA for my manual structure 'as is' with no problems - in fact I preferred this way, because when I removed mutated residues, ensembler align was generated an alignment with one residue aligned wrongly (it was an R surrounded by mutations, and there was another R just before those mutations removed - so it was clicking as that first R not the right, second one).

@sonyahanson
Copy link

Sorry for the delay here. I like your script for adding the new templates to the fasta file programmatically. I have just done this manually so far. I have a pretty complete description of how I've been using ensembler in the astrazeneca and dansu-dansu repos.

@rafwiewiora
Copy link
Member Author

Cool, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants