Welcome to the tutorial on tryptic peptides and their role in mass spectrometry! In this session, we’ll explore how enzymes like trypsin can be used to break down proteins into peptides, a crucial step in mass spectrometry analysis. Understanding these peptides helps us identify the proteins and understand their functions in biological systems.


Let’s try it together! Download the protein sequence P00533-1 from UniProt.org, and let's use our script to find out what peptides we can expect from a trypsin digest. How many peptides does the tool generate? What does this tell us about the protein?

In [None]:
#Find count of tryptic peptides in protein P00533-1

In [None]:
#protein sequence is read from a FASTA file
fasta = open('/content/P00533-1.fasta', 'r')

In [None]:
#removes the first line of the FASTA file (header) and stores the sequence in variable prot_seq
prot_seq = fasta.readlines()[1:]

In [None]:
#concatenation of the lines and stored in variable proteinSequence
proteinSequence = ''.join(prot_seq)

In [None]:
#replaces all newline characters from variable protein sequence
proteinSequence = proteinSequence.replace("\n","")

In [None]:
#Function to obtain tryptic peptides
def getTrypticPeptides(proteinSequence):
    peptides = []
    #store tryptic peptides
    last_broken_at = 0
    #track starting position of next peptide
    for aa_position in range(0, len(proteinSequence) - 1):
        amino_acid = proteinSequence[aa_position]
        #aa_position is index, iterates over elements of proteinSequence (amino acids)
        next_amino_acid = proteinSequence[aa_position + 1]
        if (amino_acid == "K" or amino_acid == "R") and next_amino_acid != "P":
            peptide = proteinSequence[last_broken_at:aa_position + 1]
            peptides.append(peptide)
            last_broken_at = aa_position + 1

    return peptides


In [None]:
# Call the function
trypticpeptides = getTrypticPeptides(proteinSequence)

In [None]:
# print the obtained peptides
print(trypticpeptides)
print(len(trypticpeptides))


['MFNNCEVVLGNLEITYVQR', 'NYDLSFLK', 'TIQEVAGYVLIALNTVER', 'IPLENLQIIR', 'GNMYYENSYALAVLSNYDANK', 'TGLK', 'ELPMR', 'NLQEILHGAVR', 'FSNNPALCNVESIQWR', 'DIVSSDFLSNMSMDFQNHLGSCQK', 'CDPSCPNGSCWGAGEENCQK', 'LTK', 'IICAQQCSGR', 'CR', 'GK', 'SPSDCCHNQCAAGCTGPR', 'ESDCLVCR', 'K', 'FR', 'DEATCK', 'DTCPPLMLYNPTTYQMDVNPEGK', 'YSFGATCVK', 'K', 'CPR', 'NYVVTDHGSCVR', 'ACGADSYEMEEDGVR', 'K', 'CK', 'K', 'CEGPCR', 'K', 'VCNGIGIGEFK', 'DSLSINATNIK', 'HFK', 'NCTSISGDLHILPVAFR', 'GDSFTHTPPLDPQELDILK', 'TVK', 'EITGFLLIQAWPENR', 'TDLHAFENLEIIR', 'GR', 'TK', 'QHGQFSLAVVSLNITSLGLR', 'SLK', 'EISDGDVIISGNK', 'NLCYANTINWK', 'K', 'LFGTSGQK', 'TK', 'IISNR', 'GENSCK', 'ATGQVCHALCSPEGCWGPEPR', 'DCVSCR', 'NVSR', 'GR', 'ECVDK', 'CNLLEGEPR', 'EFVENSECIQCHPECLPQAMNITCTGR', 'GPDNCIQCAHYIDGPHCVK', 'TCPAGVMGENNTLVWK', 'YADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPK', 'IPSIATGMVGALLLLLVVALGIGLFMR', 'R', 'R', 'HIVR', 'K', 'R', 'TLR', 'R', 'LLQER', 'ELVEPLTPSGEAPNQALLR', 'ILK', 'ETEFK', 'K', 'IK', 'VLGSGAFGTVYK', 'GLWIPEGEK', 'VK', 'IPVA