In [1]:
%load_ext autoreload
%autoreload 2

In [3]:
import molsysmt as msm

# How to convert a form into other form

The meaning of molecular system 'form', in the context of MolSysMT, has been described previously in the section XXX. There is in MolSysMT a method to convert a form into other form: `molsysmt.convert()`. This method is the keystone of this library, the hinge all other methods and tools in MolSysMT rotates on. And in addition, the joining piece connecting the pipes of your work-flow when using different python libraries.

The method `molsysmt.convert()` requires at least two input arguments: the original pre-existing item in whatever form accepted by MolSysMT (see XXX), and the name of the output form: 

In [4]:
molecular_system = msm.convert('mmtf:1tcd', 'molsysmt.MolSys')

The id code `1tcd` from the MMTF Protein Data Bank is converted into a native `molsysmt.MolSys` python object. At this point, you probably think that this operation can also be done with the method `molsysmt.load()`. And you are right. Actually, `molsysmt.load()` is nothing but an alias of `molsysmt.convert()`. Although redundant, a loading method was included in MolSysMT just for the sake of intuitive usability. But it could be removed from the library since `molsysmt.convert()` has the same functionality.

The following cells illustrate some conversions you can do with `molsysmt.convert()`:

In [5]:
msm.convert('pdb:1sux', '1sux.pdb') # fetching a pdb file to save it locally

In [6]:
msm.convert('mmtf:1sux', '1sux.mmtf') # fetching an mmtf to save it locally

In [7]:
molecular_system = msm.convert('1tcd.pdb', 'mdtraj.Trajectory') # loading a pdb file as an mdtraj.Trajectory object

In [8]:
seq_aa3 = msm.convert(molecular_system, 'aminoacids3:seq')

In [9]:
print(seq_aa3)

aminoacids3:LysProGlnProIleAlaAlaAlaAsnTrpLysCysAsnGlySerGluSerLeuLeuValProLeuIleGluThrLeuAsnAlaAlaThrPheAspHisAspValGlnCysValValAlaProThrPheLeuHisIleProMetThrLysAlaArgLeuThrAsnProLysPheGlnIleAlaAlaGlnAsnAlaIleThrArgSerGlyAlaPheThrGlyGluValSerLeuGlnIleLeuLysAspTyrGlyIleSerTrpValValLeuGlyHisSerGluArgArgLeuTyrTyrGlyGluThrAsnGluIleValAlaGluLysValAlaGlnAlaCysAlaAlaGlyPheHisValIleValCysValGlyGluThrAsnGluGluArgGluAlaGlyArgThrAlaAlaValValLeuThrGlnLeuAlaAlaValAlaGlnLysLeuSerLysGluAlaTrpSerArgValValIleAlaTyrGluProValTrpAlaIleGlyThrGlyLysValAlaThrProGlnGlnAlaGlnGluValHisGluLeuLeuArgArgTrpValArgSerLysLeuGlyThrAspIleAlaAlaGlnLeuArgIleLeuTyrGlyGlySerValThrAlaLysAsnAlaArgThrLeuTyrGlnMetArgAspIleAsnGlyPheLeuValGlyGlyAlaSerLeuLysProGluPheValGluIleIleGluAlaThrLysSerLysProGlnProIleAlaAlaAlaAsnTrpLysCysAsnGlySerGluSerLeuLeuValProLeuIleGluThrLeuAsnAlaAlaThrPheAspHisAspValGlnCysValValAlaProThrPheLeuHisIleProMetThrLysAlaArgLeuThrAsnProLysPheGlnIleAlaAlaGlnAsnAlaIleThrArgSerGlyAlaPheThrGlyGluValSerLeuGlnIleL

In [10]:
seq_aa1 = msm.convert(seq_aa3, 'aminoacids1:seq')

In [11]:
print(seq_aa1)

aminoacids1:KPQPIAAANWKCNGSESLLVPLIETLNAATFDHDVQCVVAPTFLHIPMTKARLTNPKFQIAAQNAITRSGAFTGEVSLQILKDYGISWVVLGHSERRLYYGETNEIVAEKVAQACAAGFHVIVCVGETNEEREAGRTAAVVLTQLAAVAQKLSKEAWSRVVIAYEPVWAIGTGKVATPQQAQEVHELLRRWVRSKLGTDIAAQLRILYGGSVTAKNARTLYQMRDINGFLVGGASLKPEFVEIIEATKSKPQPIAAANWKCNGSESLLVPLIETLNAATFDHDVQCVVAPTFLHIPMTKARLTNPKFQIAAQNAITRSGAFTGEVSLQILKDYGISWVVLGHSERRLYYGETNEIVAEKVAQACAAGFHVIVCVGETNEEREAGRTAAVVLTQLAAVAQKLSKEAWSRVVIAYEPVWAIGTGKVATPQQAQEVHELLRRWVRSKLGTDIAAQLRILYGGSVTAKNARTLYQMRDINGFLVGGASLKPEFVEIIEATKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


The conversion can be done over the entiry system or over a part of it. The input argument `selection` works with most of the MolSysMT methods, with `molsysmt.convert()` also. To know more about how to perform selections there is a section on this documentation entitled "XXX". By now, lets see some simple selections to see how it operates: 

In [12]:
system_molsysmt = msm.convert('1tcd.mmtf', to_form='molsysmt.DataFrame', selection='molecule.type=="protein"')

In [17]:
msm.get(system_molsysmt, target='system', form=True)

'molsysmt.DataFrame'

In [18]:
system_molsysmt

Unnamed: 0,atom.index,atom.name,atom.id,atom.type,atom.formal_charge,atom.bonded_atom_indices,group.index,group.name,group.id,group.type,...,molecule.id,molecule.type,entity.index,entity.name,entity.id,entity.type,bioassembly.index,bioassembly.name,bioassembly.id,bioassembly.type
0,0,N,1,N,0.0,[1],0,LYS,4,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
1,1,CA,2,C,0.0,"[0, 2, 4]",0,LYS,4,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
2,2,C,3,C,0.0,"[1, 3, 9]",0,LYS,4,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
3,3,O,4,O,0.0,[2],0,LYS,4,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
4,4,CB,5,C,0.0,"[1, 5]",0,LYS,4,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3813,3813,CG,3814,C,0.0,"[3812, 3814]",496,LYS,251,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
3814,3814,CD,3815,C,0.0,"[3813, 3815]",496,LYS,251,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
3815,3815,CE,3816,C,0.0,"[3814, 3816]",496,LYS,251,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
3816,3816,NZ,3817,N,1.0,[3815],496,LYS,251,aminoacid,...,,protein,0,TRIOSEPHOSPHATE ISOMERASE,,protein,0,1,,
