The collection of all the fragment masses generated by the mass spectrometer
is called an experimental spectrum.

For now, we will assume for simplicity that the mass spectrometer breaks the copies of a
cyclic peptide at every possible two bonds, so that the resulting experimental spectrum
contains the masses of all possible linear fragments of the peptide, which are called
subpeptides --> idealni eksperimentalni spektar

The theoretical spectrum of a cyclic peptide Peptide is the collection of all of the masses
of its subpeptides, in addition to the mass 0 and the mass of the entire peptide, with
masses ordered from smallest to largest

Idealni eksperimentalni spektar je jednak teoretskom spektru

In [1]:
aminoacids_integer_masses_dict = {'A':71.03711,
'R':156.10111,
'N':114.04293,
'D':115.02694,
'C':103.00919,
'E':129.04259,
'Q':128.05858,
'G':57.02146,
'H':137.05891,
'I':113.08406,
'L':113.08406,
'K':128.09496,
'M':131.04049,
'F':147.06841,
'P':97.05276,
'S':87.03203,
'T':101.04768,
'W':186.07931,
'Y':163.06333,
'V':99.06841}

In [2]:
for key in aminoacids_integer_masses_dict:
  aminoacids_integer_masses_dict[key] = int(aminoacids_integer_masses_dict[key])

In [3]:
aminoacids_integer_masses = list(set(aminoacids_integer_masses_dict.values()))

In [5]:
import numpy as np

In [6]:
def PeptideIntegerMass(peptide):
  return sum(peptide) #return sum of all integer masses of peptide

In [7]:
def ParentMass(spectrum):
  return max(spectrum) #we assume that the biggest mass in spectrum is the mass of peptide

In [8]:
def Expand(candidate_linear_peptides,aminoacids_integer_masses):
  return [(candidate_linear_peptides[i] + [aminoacid_integer_mass]) for i in range(len(candidate_linear_peptides)) for aminoacid_integer_mass in aminoacids_integer_masses]

In [9]:
def FindAllKmers(peptide_integer_masses,k):
  kmers_list = []
  i = 0
  while i + k - 1 <= len(peptide_integer_masses)-1:
    kmers_list.append(peptide_integer_masses[i:i+k])
    i = i + 1
  return kmers_list

In [11]:
def FindAllKmersCyclicPpetide(peptide_integer_masses,k):
  kmers_list = []
  i = 0
  while i + k - 1 <= len(peptide_integer_masses)-1:
    kmers_list.append(peptide_integer_masses[i:i+k])
    i = i + 1
  while i <= len(peptide_integer_masses) - 1:
    kmers_list.append(peptide_integer_masses[i:len(peptide_integer_masses)] + peptide_integer_masses[0:k-len(peptide_integer_masses[i:len(peptide_integer_masses)])])
    i = i + 1
  return kmers_list

In [15]:
def GenerateAllSubpeptidesIntegerMassesLinearPeptide(peptide_integer_masses):
  subpeptides_integer_masses = []
  for subpeptide_length in range(len(peptide_integer_masses)):
    for subpeptide_integer_mass in FindAllKmers(peptide_integer_masses,subpeptide_length+1):
      subpeptides_integer_masses.append(sum(subpeptide_integer_mass))
  return subpeptides_integer_masses

In [16]:
def GenerateAllSubpeptidesIntegerMassesCyclicPeptide(peptide_integer_masses):
  subpeptides_integer_masses = []
  for subpeptide_length in range(len(peptide_integer_masses)):
    for subpeptide_integer_mass in FindAllKmersCyclicPpetide(peptide_integer_masses,subpeptide_length+1):
      subpeptides_integer_masses.append(sum(subpeptide_integer_mass))
  return subpeptides_integer_masses

In [14]:
from collections import Counter

spectrum --> eksperimentalni spektar cikličnog peptida, nije idealan jer maseni spektrometar može generirati spektar koji je različit od idealnog sa lažnim masama i nedostajućim masama

leaderboard --> top N scoring linearnih peptida uključujući ties

Scoriramo linearni peptide u odnosu na ciklični spektar

Teoretski spektar linearnog peptida konzistentan sa idealnim eksperimentalnim spektrom cikličnog peptida --> svaka masa iz teoretskog spektra linearnog peptida se onoliko puta nalazi u idealnom eksperimentalnom spektru koliko se puta nalazi u teoretskom spektru --> ako idealni eksperimentalni spektar ima više masa od teoretskog spektra i ako su konzistentni onda je linearni peptid subpeptid cikličnog peptida, ako idealni eksperimentalni spektar i teoretski spektar imaju jednaki broj masa i ako su konzistentni, onda je linearni peptid ciklični peptid.

Dakle, treba scorirati spektre linearnih peptida u odnosu na ciklični spektar jer će u tom slučaju najveći score imati linearni peptid čiji su subpeptidi svi subpeptidi cikličnog peptida

Ako scoriramo teoretski spektar linearnog peptida u odnosu na spektar cikličnog peptida onda će najveći score imati onaj linearni peptid koji je najsličniji cikličnom peptidu, sličnost određujemo preko funkcije Score.

In [17]:
def LinearPeptideScoring(spectrum,linear_peptide_spectrum): #the score is computed for linear_peptide_spectrum against spectrum
  linear_peptide_spectrum_kmers = GenerateAllSubpeptidesIntegerMassesLinearPeptide(linear_peptide_spectrum)
  linear_peptide_spectrum_counter_dict = Counter(linear_peptide_spectrum_kmers)
  spectrum_counter_dict = Counter(spectrum)
  score = 0
  scored_aminoacid_integer_masses = []
  for aminoacid_integer_mass in linear_peptide_spectrum_kmers:
    if aminoacid_integer_mass not in scored_aminoacid_integer_masses:
      if linear_peptide_spectrum_counter_dict[aminoacid_integer_mass] == spectrum_counter_dict[aminoacid_integer_mass]:
        score = score + linear_peptide_spectrum_counter_dict[aminoacid_integer_mass]
        scored_aminoacid_integer_masses.append(aminoacid_integer_mass)
      elif linear_peptide_spectrum_counter_dict[aminoacid_integer_mass] > spectrum_counter_dict[aminoacid_integer_mass]:
        if spectrum_counter_dict[aminoacid_integer_mass] > 0: #if peptide_theoretical_spectrum_counter_dict[aminoacid_integer_mass] > 0 then there are surpluss occurences of same mass in experimental spectrum
          score = score + spectrum_counter_dict[aminoacid_integer_mass]
          scored_aminoacid_integer_masses.append(aminoacid_integer_mass)
      else: #spectrum_counter_dict[aminoacid_integer_mass] < peptide_theoretical_spectrum_counter_dict[aminoacid_integer_mass] --> there are surpluss occurences of same mass in theoretical spectrum
        if linear_peptide_spectrum_counter_dict[aminoacid_integer_mass] > 0:
          score = score + linear_peptide_spectrum_counter_dict[aminoacid_integer_mass]
          scored_aminoacid_integer_masses.append(aminoacid_integer_mass)
  return score

In [31]:
def CyclicPeptideScoring(spectrum,linear_peptide_spectrum): #the score is computed for linear_peptide_spectrum against spectrum
  linear_peptide_spectrum_kmers = GenerateAllSubpeptidesIntegerMassesCyclicPeptide(linear_peptide_spectrum)
  linear_peptide_spectrum_counter_dict = Counter(linear_peptide_spectrum_kmers)
  spectrum_counter_dict = Counter(spectrum)
  score = 0
  scored_aminoacid_integer_masses = []
  for aminoacid_integer_mass in linear_peptide_spectrum_kmers:
    if aminoacid_integer_mass not in scored_aminoacid_integer_masses:
      if linear_peptide_spectrum_counter_dict[aminoacid_integer_mass] == spectrum_counter_dict[aminoacid_integer_mass]:
        score = score + linear_peptide_spectrum_counter_dict[aminoacid_integer_mass]
        scored_aminoacid_integer_masses.append(aminoacid_integer_mass)
      elif linear_peptide_spectrum_counter_dict[aminoacid_integer_mass] > spectrum_counter_dict[aminoacid_integer_mass]:
        if spectrum_counter_dict[aminoacid_integer_mass] > 0: #if peptide_theoretical_spectrum_counter_dict[aminoacid_integer_mass] > 0 then there are surpluss occurences of same mass in experimental spectrum
          score = score + spectrum_counter_dict[aminoacid_integer_mass]
          scored_aminoacid_integer_masses.append(aminoacid_integer_mass)
      else: #spectrum_counter_dict[aminoacid_integer_mass] < peptide_theoretical_spectrum_counter_dict[aminoacid_integer_mass] --> there are surpluss occurences of same mass in theoretical spectrum
        if linear_peptide_spectrum_counter_dict[aminoacid_integer_mass] > 0:
          score = score + linear_peptide_spectrum_counter_dict[aminoacid_integer_mass]
          scored_aminoacid_integer_masses.append(aminoacid_integer_mass)
  return score

In [32]:
def ListToString(linear_peptide):
  string = ''
  for aminoacid_integer_mass in linear_peptide:
    string = string + str(aminoacid_integer_mass) + '-'
  return string[0:len(string)-1]

In [33]:
def StringToList(linear_peptide_string):
  linear_peptide_list = linear_peptide_string.split('-')
  for i in range(len(linear_peptide_list)):
    linear_peptide_list[i] = int(linear_peptide_list[i])
  return linear_peptide_list

In [36]:
def Trim(leaderboard,spectrum,N): #peptides in leaderboard are scored as linear peptides
  leaderboard_scores_dict = {}
  for linear_peptide in leaderboard:
    leaderboard_scores_dict.update({ListToString(linear_peptide):LinearPeptideScoring(spectrum,linear_peptide)})
  sorted_linear_peptides = sorted(leaderboard_scores_dict.keys(), key=leaderboard_scores_dict.get, reverse=True)
  top_n_peptides = sorted_linear_peptides[0:N] #last index is N-1, next index is N
  i = N
  for i in range(N,len(sorted_linear_peptides)):
    if leaderboard_scores_dict[sorted_linear_peptides[i]] == leaderboard_scores_dict[top_n_peptides[len(top_n_peptides)-1]]:
      top_n_peptides.append(sorted_linear_peptides[i])
    else:
      break
  for i in range(len(top_n_peptides)):
    top_n_peptides[i] = StringToList(top_n_peptides[i])
  return top_n_peptides

In [37]:
def LeaderboardCycloPeptideSequencing(spectrum,N,aminoacids_integer_masses):
  leaderboard = [[]]
  leader_peptide = []
  cyclic_peptide_mass = ParentMass(spectrum)
  while(len(leaderboard) > 0):
    leaderboard = Expand(leaderboard,aminoacids_integer_masses) #branching
    indices_to_pop = []
    for i in range(len(leaderboard)):
      if PeptideIntegerMass(leaderboard[i]) == cyclic_peptide_mass:
        if CyclicPeptideScoring(spectrum,leaderboard[i]) > CyclicPeptideScoring(spectrum,leader_peptide): #return linear peptide with the highest score, score is computed between linear peptide's spectrum and cyclic petpide's spectrum
          leader_peptide = leaderboard[i]
      elif PeptideIntegerMass(leaderboard[i]) > cyclic_peptide_mass: #pop peptide since we are only interested in peptide's having the same mass as the cyclic peptide
        indices_to_pop.append(i)
    leaderboard = np.delete(leaderboard,indices_to_pop,axis=0).tolist() #delete peptides with mass biger than cyclic peptide's mass
    leaderboard = Trim(leaderboard,spectrum,N) #bounding
  return leader_peptide

In [38]:
def PrintPeptide(peptide):
  string_to_print = ''
  for aminoacid_integer_mass in peptide:
    string_to_print = string_to_print + str(aminoacid_integer_mass) + '-'
  print(string_to_print[0:len(string_to_print)-1])

In [39]:
N = 10

In [40]:
spectrum = [0, 71, 113, 129, 147, 200, 218, 260, 313, 331, 347, 389, 460]

In [41]:
PrintPeptide(LeaderboardCycloPeptideSequencing(spectrum,N,aminoacids_integer_masses))

129-71-147-113


In [58]:
N = 381

In [59]:
with open('/content/rosalind_ba4g.txt') as task_file:
  spectrum = [line.rstrip() for line in task_file]

In [60]:
spectrum = spectrum[0]

In [61]:
spectrum = spectrum.split(' ')

In [62]:
for i in range(len(spectrum)):
  spectrum[i] = int(spectrum[i])

In [63]:
PrintPeptide(LeaderboardCycloPeptideSequencing(spectrum,N,aminoacids_integer_masses))

128-131-71-131-97-147-128-101-87-156-128-147
