The collection of all the fragment masses generated by the mass spectrometer
is called an experimental spectrum.

For now, we will assume for simplicity that the mass spectrometer breaks the copies of a
cyclic peptide at every possible two bonds, so that the resulting experimental spectrum
contains the masses of all possible linear fragments of the peptide, which are called
subpeptides --> idealni eksperimentalni spektar

The theoretical spectrum of a cyclic peptide Peptide is the collection of all of the masses
of its subpeptides, in addition to the mass 0 and the mass of the entire peptide, with
masses ordered from smallest to largest

Idealni eksperimentalni spektar je jednak teoretskom spektru

In [1]:
aminoacids_integer_masses_dict = {'A':71.03711,
'R':156.10111,
'N':114.04293,
'D':115.02694,
'C':103.00919,
'E':129.04259,
'Q':128.05858,
'G':57.02146,
'H':137.05891,
'I':113.08406,
'L':113.08406,
'K':128.09496,
'M':131.04049,
'F':147.06841,
'P':97.05276,
'S':87.03203,
'T':101.04768,
'W':186.07931,
'Y':163.06333,
'V':99.06841}

In [2]:
for key in aminoacids_integer_masses_dict:
  aminoacids_integer_masses_dict[key] = int(aminoacids_integer_masses_dict[key])

In [5]:
import numpy as np

CYCLOPEPTIDESEQUENCING(Spectrum)

        Peptides ← a set containing only the empty peptide
        
        while Peptides is nonempty

            Peptides ← Expand(Peptides)

            for each peptide Peptide in Peptides

                if Mass(Peptide) = ParentMass(Spectrum)

                    if Cyclospectrum(Peptide) = Spectrum

                        output Peptide

                    remove Peptide from Peptides

                else if Peptide is not consistent with Spectrum

                    remove Peptide from Peptides
                    

In [6]:
def PeptideIntegerMass(peptide):
  return sum(peptide) #return sum of all integer masses of peptide

In [7]:
def ParentMass(spectrum):
  return max(spectrum)

In [8]:
def Expand(candidate_linear_peptides,aminoacids_integer_masses):
  return [(candidate_linear_peptides[i] + [aminoacid_integer_mass]) for i in range(len(candidate_linear_peptides)) for aminoacid_integer_mass in aminoacids_integer_masses]

Teoretski spektar linearnog peptida konzistentan sa idealnim eksperimentalnim spektrom cikličnog peptida --> svaka masa iz teoretskog spektra linearnog peptida se onoliko puta nalazi u idealnom eksperimentalnom spektru koliko se puta nalazi u teoretskom spektru --> ako idealni eksperimentalni spektar ima više masa od teoretskog spektra i ako su konzistentni onda je linearni peptid subpeptid cikličnog peptida, ako idealni eksperimentalni spektar i teoretski spektar imaju jednaki broj masa i ako su konzistentni, onda je linearni peptid ciklični peptid. 

Primjer

0 97 97 99 101 103 196 198 198 200 202 295 297 299 299 301 394 396 398 400 400 497 --> idealni eksperimentalni spektar cikličnog peptida

97 99 101 103

P  V  T   C

Branch --> imamo 4 * 18 = 72 peptida

Npr. 101 - 103

101 103

T   C

TC je linearni peptid, maseni spektrometar ga moze razbiti na fragmente T, C, TC

101 103 204

T   C   TC

Masa 204 se ne nalazi u idealnom eksperimentalnom spektru --> subpeptid TC se ne nalazi u cikličnom peptidu jer bi inače njegova masa bila u idealnom eksperimentalnom spektru --> ne nastavljamo graditi taj linearni peptid jer neće biti isti kao ciklični peptid jer sadrži subpeptid TC --> u bounding stepu ga odbacujemo

In [15]:
def FindAllKmers(peptide_integer_masses,k):
  kmers_list = []
  i = 0
  while i + k - 1 <= len(peptide_integer_masses)-1:
    kmers_list.append(peptide_integer_masses[i:i+k])
    i = i + 1
  return kmers_list

In [26]:
def GenerateAllSubpeptidesIntegerMasses(peptide_integer_masses):
  subpeptides_integer_masses = []
  for subpeptide_length in range(len(peptide_integer_masses)): #from 0 to len(peptide_integer_masses) - 1
    for subpeptide_integer_mass in FindAllKmers(peptide_integer_masses,subpeptide_length+1): #from 1 to len(peptide_integer_masses)
      subpeptides_integer_masses.append(sum(subpeptide_integer_mass))
  return subpeptides_integer_masses

In [27]:
from collections import Counter

In [28]:
def SpectraConsistency(peptide_integer_masses,spectrum):
  peptide_integer_masses_list = []
  spectrum_counter_dict = Counter(spectrum)
  for peptide_integer_mass in peptide_integer_masses_list:
    peptide_integer_masses_list.append(peptide_integer_mass)
  all_subpeptides_integer_masses = GenerateAllSubpeptidesIntegerMasses(peptide_integer_masses)
  all_subpeptides_integer_masses_counter_dict = Counter(all_subpeptides_integer_masses)
  for subpeptide_integer_mass in all_subpeptides_integer_masses:
    if all_subpeptides_integer_masses_counter_dict[subpeptide_integer_mass] > spectrum_counter_dict[subpeptide_integer_mass]: #1 > 0, 2 > 1
      return 0
  return 1

In [29]:
def PrintPeptide(peptide):
  string_to_print = ''
  for aminoacid_integer_mass in peptide:
    string_to_print = string_to_print + str(aminoacid_integer_mass) + '-'
  print(string_to_print[0:len(string_to_print)-1])

In [30]:
def CycloPeptideSequencing(spectrum,aminoacids_integer_masses):
   candidate_linear_peptides = [[]] #every sublist presents candidate linear peptide
   cyclic_peptide_mass = ParentMass(spectrum)
   while len(candidate_linear_peptides) > 0:
     candidate_linear_peptides = Expand(candidate_linear_peptides,aminoacids_integer_masses)
     indices_to_pop = []
     for i in range(len(candidate_linear_peptides)):
       if PeptideIntegerMass(candidate_linear_peptides[i]) == cyclic_peptide_mass:
          if SpectraConsistency(candidate_linear_peptides[i],spectrum) == 1:
            PrintPeptide(candidate_linear_peptides[i])
            indices_to_pop.append(i) #pop so that we don't print the same peptide twice  
          indices_to_pop.append(i) #PeptideIntegerMass(candidate_linear_peptides[i]) == cyclic_peptide_mass and theoretical spectrum of linear peptide is not consistent with cyclic's peptide theoretical spectrum
       elif SpectraConsistency(candidate_linear_peptides[i],spectrum) == 0: #PeptideIntegerMass(candidate_linear_peptides[i]) != cyclic_peptide_mass and SpectraConsistency(candidate_linear_peptides[i],spectrum) == 0
          indices_to_pop.append(i)
       #else --> if PeptideIntegerMass(candidate_linear_peptides[i]) != cyclic_peptide_mass and CycloSpectrum(candidate_linear_peptides[i],spectrum) == 1 --> potential candidate peptide
     candidate_linear_peptides = np.delete(candidate_linear_peptides,indices_to_pop,axis=0).tolist()

In [21]:
spectrum = [0, 113, 128, 186, 241, 299, 314, 427]

In [22]:
aminoacids_integer_masses = list(np.unique(list(aminoacids_integer_masses_dict.values())))

In [23]:
CycloPeptideSequencing(spectrum,aminoacids_integer_masses)

113-128-186
113-186-128
128-113-186
128-186-113
186-113-128
186-128-113


In [None]:
with open('/content/rosalind_ba4e.txt') as task_file:
  spectrum = [line.rstrip() for line in task_file]

In [None]:
spectrum = spectrum[0]

In [None]:
spectrum = spectrum.split(' ')

In [None]:
for i in range(len(spectrum)):
  spectrum[i] = int(spectrum[i])

In [None]:
CycloPeptideSequencing(spectrum,aminoacids_integer_masses)