<a href="https://colab.research.google.com/github/MVolobueva/MTase-classification/blob/main/Classification_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#MTase classification
In this notebook we will focus on the pipeline for MTase classification.

*Our classification algorithm   uses cat-profiles (HMM profiles that locate catalytic motifs from DB Pfam and SUPFAM). Cat-profiles allows detecting regions of cat- and sam- subdomains and motifs.*

Pipeline involves 3 steps:


1.   Running hmmer package
2.   Cutting subdomain and motif region from profile alignment and MTase sequence
3.  Class detection on motif and subdomain regions




##0. Clone github repository

In [15]:
!git clone https://github.com/MVolobueva/MTase-classification.git

Cloning into 'MTase-classification'...
remote: Enumerating objects: 48, done.[K
remote: Counting objects: 100% (48/48), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 48 (delta 9), reused 40 (delta 5), pack-reused 0[K
Receiving objects: 100% (48/48), 481.92 KiB | 6.88 MiB/s, done.
Resolving deltas: 100% (9/9), done.


## 1.Running hmmer package



### 1.1 Hmmer package installation

In [16]:
!sudo apt-get -y install hmmer

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
hmmer is already the newest version (3.3.2+dfsg-1).
0 upgraded, 0 newly installed, 0 to remove and 9 not upgraded.


### 1.2 HMM profiles search

Make folder for results

In [17]:
!mkdir ./results/

mkdir: cannot create directory ‘./results/’: File exists


**file.stk**  - file with alignment of MTase sequences according to HMM profile

**file.domtbl** - file with HMM profile hits


#### 1.2.1 SUPFAM profiles search

In [18]:
!hmmsearch --cpu 3 -E 0.01 --domE 0.01 --incE 0.01 --incdomE 0.01 \
        -o /dev/null --noali -A ./results/SUPFAM.stk --domtblout ./results/SUPFAM.domtbl \
        ./MTase-classification/HMM_profiles/cat_profiles_SUPFAM.hmm ./MTase-classification/Sample_MTases/MTase_sequences.fasta

####1.2.2 Pfam profiles search

In [19]:
!hmmsearch --cpu 3 -E 0.01 --domE 0.01 --incE 0.01 --incdomE 0.01 \
        -o /dev/null --noali -A ./results/Pfam.stk --domtblout ./results/Pfam.domtbl \
        ./MTase-classification/HMM_profiles/cat_profiles_Pfam.hmm ./MTase-classification/Sample_MTases/MTase_sequences.fasta

###2. Cutting subdomain and motif region from profile alignment and MTase sequence

Install required libraries


In [20]:
!git clone https://github.com/isrusin/etsv
!python3 -m pip install -e etsv

fatal: destination path 'etsv' already exists and is not an empty directory.
Obtaining file:///content/etsv
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: etsv
  Building editable for etsv (pyproject.toml) ... [?25l[?25hdone
  Created wheel for etsv: filename=etsv-0.0.2-0.editable-py3-none-any.whl size=1282 sha256=0b36b219d284303f8c28f6c933b00044c68c151bb8a0c8c2914f7a0da3ff1340
  Stored in directory: /tmp/pip-ephem-wheel-cache-k06awe1q/wheels/54/37/04/f3f5689d9aa16adc2b15b6be1fe70879c9be31cb99bdf04281
Successfully built etsv
Installing collected packages: etsv
  Attempting uninstall: etsv
    Found existing installation: etsv 0.0.2
    Uninstalling etsv-0.0.2:
      Successfully uninstalled etsv-0.0.2
Successfully installed etsv-0.0.2


Cut sam- and cat- subdomain and motif regions from alignments

####2.1 SUPFAM

In [21]:
!./MTase-classification/Scripts/get_aln_regions.py \
  ./MTase-classification/profile-markup/SMD_regions.tsv \
  ./results/SUPFAM.stk > ./results/region_alignments_SUPFAM.tsv

####2.2 Pfam

In [22]:
!./MTase-classification/Scripts/get_aln_regions.py \
  ./MTase-classification/profile-markup/PMD_regions.tsv \
  ./results/Pfam.stk > ./results/region_alignments_Pfam.tsv

**region_alignments.tsv** - file with cut sections of regions from the MTase
sequence alignment along a cat-profile

##3. Detect class

####3.1 SUPFAM

In [23]:
!mkdir ./results/SUPFAM

mkdir: cannot create directory ‘./results/SUPFAM’: File exists


In [24]:
!python ./MTase-classification/Scripts/classification_SUPFAM.py

Path to dataframe:./results/region_alignments_SUPFAM.tsv
Path to cat-motif from class A:/content/MTase-classification/Scripts/A_class.csv
Path to cat-motif from class B:/content/MTase-classification/Scripts/B_class.csv
Path to cat-motif from class C:/content/MTase-classification/Scripts/C_class.csv
Path to cat-motif from class D:/content/MTase-classification/Scripts/D_class.csv
where to save:./results/SUPFAM


We get a table in which the coordinates of the cat-motif are indicated if the class was detected by profiles


In [25]:
import pandas as pd
pd.read_csv('/content/results/SUPFAM/classes.csv')

Unnamed: 0.1,Unnamed: 0,REBASE_name,A_topology,B_topology,C_topology,D_topology,K_topology,M_topology,L_topology
0,0,M.PvuII,,53-56,,,,,
1,1,M.CcrMI,,31-34,,,,,
2,2,M.HpyAXI,,104-107,,,,,
3,3,M.RsrI,,65-68,,,,,
4,4,M.TthHB8ORF409P,,47-50,,,,,
5,5,M.EcoP15I,,123-126,,,,,
6,6,M1.HpyAVI,,29-32,,,,,
7,7,M1.BcnI,,,,,,,156-159
8,8,M.Mbo45V,487-490,123-126,,,,,
9,9,M.CagIX,,,,,,,247-250


Coordinates of subdomains and motifs could be found in table *./results/region_alignments_SUPFAM.tsv*

For MTase M.Mbo45V there are two cat-motif.
If we look at cat-subdomains for cat-profiles from group A, we see reduced cat-subdomain sequences. So M.Mbo45V has one functional cat-domain from class B. This example shows that the final class should be determined by expert judgment.


In [31]:
dt = pd.read_csv('./results/region_alignments_SUPFAM.tsv', sep='\t')
dt[dt['REBASE_name'] == 'M.Mbo45V']

Unnamed: 0,#:Hit_ID,REBASE_name,Model_ID,Region_name,Alignment_coords,Region_coords,Region_coords_HMM,Alignment_frags
36,M.Mbo45V:0045988:87-467,M.Mbo45V,45988,sam_subdom,87-467,"87-113,400-466","9-30,234-292","NKPTNTLIIGENY..DA.LKNLIViesqsE,AKPVELIKLLIKLH...."
37,M.Mbo45V:0045988:87-467,M.Mbo45V,45988,cat_subdom,87-467,114-237,31-150,TvnYDVIYIDPPYNTESslsdgnnl.........sekddvgssK.F...
38,M.Mbo45V:0045988:87-467,M.Mbo45V,45988,sam_motif,87-467,422-429,256-263,DFYAGSGT
39,M.Mbo45V:0045988:87-467,M.Mbo45V,45988,cat_motif,87-467,123-126,38-41,DPPY
228,M.Mbo45V:0036976:88-486,M.Mbo45V,36976,sam_subdom,88-486,"88-113,400-466","1-22,191-249","-KPTNTLIIGENY.....DALK.NLiviesQS...E,AKPVELIKL..."
229,M.Mbo45V:0036976:88-486,M.Mbo45V,36976,cat_subdom,88-486,116-237,23-129,NYDVIYIDPPYNTE......SSLsdgnnls...................
230,M.Mbo45V:0036976:88-486,M.Mbo45V,36976,sam_motif,88-486,422-429,213-220,DFYAGSGT
231,M.Mbo45V:0036976:88-486,M.Mbo45V,36976,cat_motif,88-486,123-126,30-33,DPPY
428,M.Mbo45V:0037952:89-439,M.Mbo45V,37952,sam_subdom,89-439,"89-115,400-439","1-22,196-254","--PTNTLIIGENY.....DAL.KNLiviesqsET...V,AKPVELI..."
429,M.Mbo45V:0037952:89-439,M.Mbo45V,37952,cat_subdom,89-439,116-237,23-124,NYDVIYIDPPYNTESSLsdgnnlsek.......................


There is no class for MTase M.BceJII	as it has reduced sam-subdomains. the only way to determine the class is through expert evaluation of the predicted structure.

In [33]:
dt = pd.read_csv('./results/region_alignments_SUPFAM.tsv', sep='\t')
dt[dt['REBASE_name'] == 'M.BceJII']

Unnamed: 0,#:Hit_ID,REBASE_name,Model_ID,Region_name,Alignment_coords,Region_coords,Region_coords_HMM,Alignment_frags
88,M.BceJII:0045988:60-135,M.BceJII,45988,sam_subdom,60-135,60-66,"9-30,234-292","-------------..--.--ALPHa...dG,--------------...."
89,M.BceJII:0045988:60-135,M.BceJII,45988,cat_subdom,60-135,67-135,31-150,S..FKLVVFDPPHLERAgpr....................swlR.A...
90,M.BceJII:0045988:60-135,M.BceJII,45988,sam_motif,60-135,,256-263,--------
91,M.BceJII:0045988:60-135,M.BceJII,45988,cat_motif,60-135,74-77,38-41,DPPH
280,M.BceJII:0036976:64-132,M.BceJII,36976,sam_subdom,64-132,64-66,"1-22,191-249","-------------.....----.--.....AD...G,---------..."
281,M.BceJII:0036976:64-132,M.BceJII,36976,cat_subdom,64-132,67-132,23-129,SFKLVVFDPPHLER......AGPr.........................
282,M.BceJII:0036976:64-132,M.BceJII,36976,sam_motif,64-132,,213-220,--------
283,M.BceJII:0036976:64-132,M.BceJII,36976,cat_motif,64-132,74-77,30-33,DPPH
488,M.BceJII:0037952:64-129,M.BceJII,37952,sam_subdom,64-129,64-66,"1-22,196-254","-------------.....---.---.......AD...G,-------..."
489,M.BceJII:0037952:64-129,M.BceJII,37952,cat_subdom,64-129,67-129,23-124,SFKLVVFDPPHLERAGPrsw.............................


####3.2 Pfam


In [26]:
!mkdir ./results/Pfam

In [28]:
!python ./MTase-classification/Scripts/Classification_Pfam.py

Path to dataframe:./results/region_alignments_Pfam.tsv
where to save:./results/Pfam


Class by cat-profiles from Pfam. We get a table in which the coordinates of the cat-motif are indicated if the class was detected by profiles.

In [29]:
import pandas as pd
pd.read_csv('/content/results/Pfam/classes_Pfam.csv')

Unnamed: 0.1,Unnamed: 0,REBASE_name,E_topology,F_topology,G_topology
0,0,M.Rsp105IV,86-89,,
1,1,M.SsoI,,140-143,
2,2,M.AvaV,,,16-19
3,3,M.MunI,,,42-45
