# Chaining the Amino Acids

 
In “Translating RNA into Protein”, we examined the translation of RNA into an amino acid chain for the construction of a protein. When two amino acids link together, they form a peptide bond, which releases a molecule of water; see Figure 1. Thus, after a series of amino acids have been linked together into a polypeptide, every pair of adjacent amino acids has lost one molecule of water, meaning that a polypeptide containing n amino acids has had n−1 water molecules removed.

More generally, a residue is a molecule from which a water molecule has been removed; every amino acid in a protein are residues except the leftmost and the rightmost ones. These outermost amino acids are special in that one has an "unstarted" peptide bond, and the other has an "unfinished" peptide bond. Between them, the two molecules have a single "extra" molecule of water (see the atoms marked in blue in Figure 2). Thus, the mass of a protein is the sum of masses of all its residues plus the mass of a single water molecule.

There are two standard ways of computing the mass of a residue by summing the masses of its individual atoms. Its monoisotopic mass is computed by using the principal (most abundant) isotope of each atom in the amino acid, whereas its average mass is taken by taking the average mass of each atom in the molecule (over all naturally appearing isotopes).

Many applications in proteomics rely on mass spectrometry, an analytical chemical technique used to determine the mass, elemental composition, and structure of molecules. In mass spectrometry, monoisotopic mass is used more often than average mass, and so all amino acid masses are assumed to be monoisotopic unless otherwise stated.

The standard unit used in mass spectrometry for measuring mass is the atomic mass unit, which is also called the dalton (Da) and is defined as one twelfth of the mass of a neutral atom of carbon-12. The mass of a protein is the sum of the monoisotopic masses of its amino acid residues plus the mass of a single water molecule (whose monoisotopic mass is 18.01056 Da).

In the following several problems on applications of mass spectrometry, we avoid the complication of having to distinguish between residues and non-residues by only considering peptides excised from the middle of the protein. This is a relatively safe assumption because in practice, peptide analysis is often performed in tandem mass spectrometry. In this special class of mass spectrometry, a protein is first divided into peptides, which are then broken into ions for mass analysis.

In [76]:
ammino_sequence = input("ayy\n").upper()
ammino_seq_lenght = len(ammino_sequence)

ayy
VTGLKFPLHKMCMFGVQIALLDEFVVTFTEEVHQAICKRCSMNKQVTYQEDDKNAQGDSIFVRDDDMAIHVHWHGMMTAQPKILYRCHISCLLQFRDPSISRSLFGQNTYHHDLGKNGPHCWCSPSQKDICKWHTKHQKSDDSNLWTDRHLILFSIYQAMNEWPYNWRMDYVNCYWICKKACLQHEYSWHHITDWDHESIKIKNLCKLVPKKVWGWFHMKESQFKEVYFNCFHCTSFQMEQPFPNTHNIAGQIVDIMTHYNMRMIDYEVQEVYSDGMAMCNTTLCIAFWEFRTHKQDQFLYYDWTGMRESTFKGINFPVFKGGMKYSNRFWYMRIAFCVCQCERQYFPTKQHATYVPNKPRTTEVKGYFDSNVWACIACAGTKMKLTVIEMDHCETMWSRFRHWLDKPKFDWSRPWRGEMLEYEERVHYKSWRHHTETFFMLACHIIIVGLETKDARIRMNQNQCDWRMSCHRNCCPYALQSHTGSQAKYAGGGSSQWQCWARSAHLPWRPPMLWGYEEMCGKYLDFVNIKDNYVHGYTPDKVVSIFSFSPGMVRPGFEWRQEDWKTIYTLSCPNEGRDGQWETQLSCTYMWEYTDMEMGKDKNNHTQPDTMSHGMTCGIMKFRCVCHNQSRPSYDHGEICNDHPPVSIYTMELKYGVYQNFLEKLCDTIIYCPIDRCSEAILAMNSRCSCTFEGPIQFVKPLAYMAHWHGQNQNTKTCKMRLRVPFYHVWLYMQGYTFRWSVHGFDVDGTDPDMCQCFCMHHMCTLMMMMFVLVFQRTRYLQVPINNKWRLPSEITNYQGSPADNSTIDADAIHTPRKHMEQWEVIFWKYHDMGYCNISPFFRDTFWRINYHLRTLFATTICPFWQVWKHKQDWKQPEEHITQAYPCYARFY


In [57]:
from collections import Counter
isotopic_diz = {}
with open ('files/monoisotopic mass table.txt') as file:
    lines = file.readlines()
    for line in lines:
        line = line.strip().split()
        isotopic_diz[line[0]] = line[1]       
print(isotopic_diz)

{'A': '71.03711', 'C': '103.00919', 'D': '115.02694', 'E': '129.04259', 'F': '147.06841', 'G': '57.02146', 'H': '137.05891', 'I': '113.08406', 'K': '128.09496', 'L': '113.08406', 'M': '131.04049', 'N': '114.04293', 'P': '97.05276', 'Q': '128.05858', 'R': '156.10111', 'S': '87.03203', 'T': '101.04768', 'V': '99.06841', 'W': '186.07931', 'Y': '163.06333'}


In [77]:
tally= Counter(ammino_sequence)

In [78]:

#for el,value in tally.items():

print(round(sum([float(isotopic_diz[el.upper()])*value for el,value in tally.items() ]),3))

106894.969


In [79]:
 
res= 0
for el,value in tally.items():
    print(res)
    res += float(isotopic_diz[el.upper()])* value
    
print(round(res,3))

0
4061.80481
9518.37953
11856.25939
16492.70585
23153.64377
30359.99586
34242.10626
40957.99285
47247.93637
52192.377490000006
58211.130750000004
63526.08157
65799.26909
71435.58915
76855.37793
83723.82677
87640.26812
91973.89945999999
99637.87597
106894.969


### modi alternativi letti su rosalind:

In [88]:
 f = open('files/monoisotopic mass table.txt', 'r')
mass = f.read()
#mass string to list
mass = mass.split()
print(mass)
print(mass[0::2])
#mass string to dict
mass = dict(zip(mass[0::2],mass[1::2]))


['A', '71.03711', 'C', '103.00919', 'D', '115.02694', 'E', '129.04259', 'F', '147.06841', 'G', '57.02146', 'H', '137.05891', 'I', '113.08406', 'K', '128.09496', 'L', '113.08406', 'M', '131.04049', 'N', '114.04293', 'P', '97.05276', 'Q', '128.05858', 'R', '156.10111', 'S', '87.03203', 'T', '101.04768', 'V', '99.06841', 'W', '186.07931', 'Y', '163.06333']
['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y']
