### Introduction

##### Scientific Question: Since the research shows that the MyD88-dependent TLR signaling pathway is most likely to be functional in terms of Mycobacterium Vaccae vaccine effect, is it possible for Mycobacterium Vaccae to have the same effect of stimulating human TLR signaling pathway?

##### Scientific Hypothesis: If the TLR receptor of humans is structurally similar to the TLR receptor of mice, Mycobacterium Vaccae vaccine might have the same effect of stimulating TLR signaling pathway in humans.

### Loading in Packages 1

In [27]:
pip install logomaker

Collecting logomaker
  Downloading logomaker-0.8-py2.py3-none-any.whl (11.8 MB)
[K     |████████████████████████████████| 11.8 MB 5.2 MB/s eta 0:00:01
Installing collected packages: logomaker
Successfully installed logomaker-0.8
Note: you may need to restart the kernel to use updated packages.


In [51]:
# Import libraries for Pairwise Sequence
from Bio import pairwise2
from Bio.Seq import Seq
from Bio.pairwise2 import format_alignment
from Bio import SeqIO

# Import libraries for logomaker
import logomaker as lm

# Import libraries for PDB
from Bio.PDB import *
import nglview as nv

from Bio.SeqUtils.ProtParam import ProteinAnalysis

In [127]:
# Creating sample sequences
seq1 = SeqIO.read('gene.fna', 'fasta')
seq2 = SeqIO.read('gene.fna2', 'fasta')
print(seq1)
print(seq2)
print(len(seq1.seq))
print(len(seq2.seq))

ID: NC_000004.12:153684280-153710637
Name: NC_000004.12:153684280-153710637
Description: NC_000004.12:153684280-153710637 TLR2 [organism=Homo sapiens] [GeneID=7097] [chromosome=4]
Number of features: 0
Seq('AGGCCGGCTCGGAGGCAGCGAGAAAGCGCAGCCAGGCGGCTGCTCGGCGTTCTC...TGA')
ID: NC_000069.7:c83749045-83743579
Name: NC_000069.7:c83749045-83743579
Description: NC_000069.7:c83749045-83743579 Tlr2 [organism=Mus musculus] [GeneID=24088] [chromosome=3]
Number of features: 0
Seq('GCCTGCCCTGTGGCTCCTGCCAGCTCTGATGCCAGGCTCCGTTCCCCTTGCAGA...GGA')
26358
5467


### Performing Bioinformatics Analysis 1: Pairwise Sequence

In [20]:
# Finding similarities
alignments = pairwise2.align.globalxx(seq1, seq2)

# Showing results
print(format_alignment(*alignment))

ID: NC_0000-04.12:15368--428-0-1-5--37106-3-7-
Name: NC_000004--.12-:1536-8--428-0-1-5--37106-3-7-
Description: NC_000004--.12-:1536-8--428-0-1-5--37106-3-7- TLR--2 [organism=Homo--- --sapien----s] [GeneID=7--097--] [chromosome=4-]
Number of features: 0
Seq('AGGCCG-GCTCG--GA-GGCAG-CGAGAAAGC-GC-AGC-CA-G--GCGGCT-G-CTCGGCGTTC---T--C---...TG-A')
||||||||||| . |. |   .|  |   ||  || ||    | | |||||||||| ||||   |   |     |  |   ||  || ||    | | ||||||||||||||||| ||||   |   |     |  |   ||  || ||    | | ||    ||||||||||||       |  |         |||||||||||   |    ||||||||||||||  |||||||||||||||||||||||||||||  |||  || |   |  |||   |       | || ||| |  |  ||  |  | |||  |||||   |  |   ||| | |||
ID: NC_000069-.7-:---c8374--90--45-837---43579
Name: NC_-0000-69.--7:----c8374--90--45-837---43579
Description: NC_-0000-69.--7:----c8374--90--45-837---43579 T--lr2 [organism=----Mus mus-----culus] [GeneID=-240--88] [chromosome=-3]
Number of features: 0
Seq('--GCC-TGC-C-CTG-TGGC--TC-------CTGCCAGCTC-TGATGC--C-A

### Plotting the Results 1: Sequence Logos

In [116]:
# create counts matrix
    for align in alignments 
        seq = align.seq
        seq_str = str(seq)
        Multi_seq.append(seq_str)

#Multi_counts_df = lm.alignment_to_matrix(sequences=list1, to_type='counts', characters_to_ignore='.-X')

# preview counts dataframe
#Multi_counts_df.head()

# show full Multi counts
#lm.Logo(Multi_counts_df)

IndentationError: unexpected indent (2114116777.py, line 2)

### Performing Bioinformatics Analysis 2: Homology Sequence 

In [101]:
# Homology Sequence of Human TLR. 

# First, create a PDBParser object:
parser = PDBParser()
# Then, create a structure object from a PDB file in the following way (the PDB file in this case is called 1FAT.pdb, PHA-L is a user defined name for the structure):
Human_structure = parser.get_structure("Human_TLR2", "Human.pdb")
# View Biopython structural entities
view = nv.show_biopython(Human_structure)
view

NGLWidget()

https://swissmodel.expasy.org/templates/6nig.1

In [118]:
# Homology Sequence of Mouse TLR. 

# First, create a PDBParser object:
parser = PDBParser()
# Then, create a structure object from a PDB file in the following way (the PDB file in this case is called 1FAT.pdb, PHA-L is a user defined name for the structure):
Mouse_structure = parser.get_structure("Mouse_TLR2", "Mouse.pdb")
# View Biopython structural entities
view = nv.show_biopython(Mouse_structure)
view

NGLWidget()

https://swissmodel.expasy.org/assess/6KZaUV/02

Biounit Oligo State	Homo-dimer

QSQE	0.50

Method	X-ray, 2.35 Å

Seq Similarity	0.48

Coverage	0.69

Range	27-544

### Plotting the Results: 3D Protein Measurement

In [119]:
# Polypeptide of Human sequence
polypeptide_builder = CaPPBuilder()
counter = 1
for polypeptide in polypeptide_builder.build_peptides(Human_structure):
    Human_seq = polypeptide.get_sequence()
    print(f"Sequence: {counter}, Length: {len(Human_seq)}")
    print(Human_seq)
    counter += 1

Sequence: 1, Length: 218
SLSCDRNGICKGSSGSLNSIPSGLTEAVKSLDLSNNRITYISNSDLQRCVNLQALVLTSNGINTIEEDSFSSLGSLEHLDLSYNYLSNLSSSWFKPLSSLTFLNLLGNPYKTLGETSLFSHLTKLQILRVGNMDTFTKIQRKDFAGLTFLEELEIDASDLQSYEPKSLKSIQNVSHLILHMKQHILLLEIFVDVTSSVECLELRDTDLDTFHFSELST
Sequence: 2, Length: 328
NSLIKKFTFRNVKITDESLFQVMKLLNQISGLLELEFDDCTLNGVGNFRASDNDRVIDPGKVETLTIRRLHIPRFYLFYDLSTLYSLTERVKRITVENSKVFLVPCLLSQHLKSLEYLDLSENLMVEEYLKNSACEDAWPSLQTLILRQNHLASLEKTGETLLTLKNLTNIDISKNSFHSMPETCQWPEKMKYLNLSSTRIHSVTGCIPKTLEILDVSNNNLNLFSLNLPQLKELYISRNKLMTLPDASLLPMLLVLKISRNQLKSVPDGIFDRLTSLQKIWLHTNPWDCSCPRIDYLSRWLNKNSQKEQGSAKCSGSGKPVRSIICP
Sequence: 3, Length: 273
SLSCDRNGICKGSSGSLNSIPSGLTEAVKSLDLSNNRITYISNSDLQRCVNLQALVLTSNGINTIEEDSFSSLGSLEHLDLSYNYLSNLSSSWFKPLSSLTFLNLLGNPYKTLGETSLFSHLTKLQILRVGNMDTFTKIQRKDFAGLTFLEELEIDASDLQSYEPKSLKSIQNVSHLILHMKQHILLLEIFVDVTSSVECLELRDTDLDTFHFSELSTGETNSLIKKFTFRNVKITDESLFQVMKLLNQISGLLELEFDDCTLNGVGNFRASD
Sequence: 4, Length: 272
IDPGKVETLTIRRLHIPRFYLFYDLSTLYSLTERVKRITVENSKVFLVPCLLSQHLKSLEYLDLSENLMVEEYLKNSA

In [122]:
analyzed_seq = ProteinAnalysis(str(Human_seq))

In [123]:
# Analyzing Human amino acid sequence
analyzed_seq.get_amino_acids_percent()

{'A': 0.01838235294117647,
 'C': 0.029411764705882353,
 'D': 0.04044117647058824,
 'E': 0.05514705882352941,
 'F': 0.022058823529411766,
 'G': 0.025735294117647058,
 'H': 0.022058823529411766,
 'I': 0.0625,
 'K': 0.07352941176470588,
 'L': 0.17279411764705882,
 'M': 0.01838235294117647,
 'N': 0.0661764705882353,
 'P': 0.05514705882352941,
 'Q': 0.03308823529411765,
 'R': 0.04779411764705882,
 'S': 0.10661764705882353,
 'T': 0.0625,
 'V': 0.04044117647058824,
 'W': 0.01838235294117647,
 'Y': 0.029411764705882353}

In [105]:
# hexlix, turn, sheet
analyzed_seq.secondary_structure_fraction()

(0.34558823529411764, 0.2536764705882353, 0.2647058823529412)

In [106]:
# hydrophobicity
analyzed_seq.gravy()

-0.22573529411764706

In [120]:
# Polypeptide of Mouse sequence
polypeptide_builder = CaPPBuilder()
counter = 1
for polypeptide in polypeptide_builder.build_peptides(Mouse_structure):
    Mouse_seq = polypeptide.get_sequence()
    print(f"Sequence: {counter}, Length: {len(Mouse_seq)}")
    print(Mouse_seq)
    counter += 1

Sequence: 1, Length: 518
SLSCDASGVCDGRSRSFTSIPSGLTAAMKSLDLSFNKITYIGHGDLRACANLQVLMLKSSRINTIEGDAFYSLGSLEHLDLSDNHLSSLSSSWFGPLSSLKYLNLMGNPYQTLGVTSLFPNLTNLQTLRIGNVETFSEIRRIDFAGLTSLNELEIKALSLRNYQSQSLKSIRDIHHLTLHLSESAFLLEIFADILSSVRYLELRDTNLARFQFSPLPVDEVSSPMKKLAFRGSVLTDESFNELLKLLRYILELSEVEFDDCTLNGLGDFNPSESDVVSELGKVETVTIRRLHIPQFYLFYDLSTVYSLLEKVKRITVENSKVFLVPCSFSQHLKSLEFLDLSENLMVEEYLKNSACKGAWPSLQTLVLSQNHLRSMQKTGEILLTLKNLTSLDISRNTFHPMPDSCQWPEKMRFLNLSSTGIRVVKTCIPQTLEVLDVSNNNLDSFSLFLPRLQELYISRNKLKTLPDASLFPVLLVMKIRENAVSTFSKDQLGSFPKLETLEAGDNHFVCSCELLSF


In [125]:
analyzed_seq = ProteinAnalysis(str(Mouse_seq))

In [126]:
# Analyzing Mouse amino acids sequence
analyzed_seq.get_amino_acids_percent()

{'A': 0.032818532818532815,
 'C': 0.019305019305019305,
 'D': 0.05019305019305019,
 'E': 0.06177606177606178,
 'F': 0.055984555984555984,
 'G': 0.04054054054054054,
 'H': 0.021235521235521235,
 'I': 0.0444015444015444,
 'K': 0.04826254826254826,
 'L': 0.17374517374517376,
 'M': 0.017374517374517374,
 'N': 0.05405405405405406,
 'P': 0.03667953667953668,
 'Q': 0.02895752895752896,
 'R': 0.04633204633204633,
 'S': 0.12934362934362933,
 'T': 0.055984555984555984,
 'V': 0.05405405405405406,
 'W': 0.005791505791505791,
 'Y': 0.023166023166023165}

In [111]:
# hexlix, turn, sheet
analyzed_seq.secondary_structure_fraction()

(0.3571428571428571, 0.26061776061776065, 0.2857142857142857)

In [112]:
# hydrophobicity
analyzed_seq.gravy()

-0.01583011583011585

### Analyzing the Results

In the pairwise alignment score system, “+1” for a match and “0” for a mismatch between the pair of sequences at any given site of comparison. An optimal alignment is an alignment giving the highest score. The pairwise alignment score for human and mouse TLR gene is 187, which is comparatively low compared to the length of mouse TLR gene sequence, which is 5467. Thus the results show a low similarity between human and mouse TLR gene sequence.
Homology modeling refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. By using SWISS Model, we got that the sequence similarity is 0.48 and the coverage is 0.69. Then we use 3D protein measurements to know the specific properties of human and mouse TLR protein. The significant gaps in amino acid sequence percent, structure fraction, and hydrophocbicity between human and mouse TLR protein all demonstrate the different properties between the two proteins. Thus it is likely that Mycobacterium Vaccae vaccine would not have the same effect of stimulating TLR signaling pathway in humans. 