# Ash Ediga

## Inspection Introduction

Problem 16
* searches for specific amino acid sequences within a DNA sequence and its complement by translating DNA codons to amino acids. It utilizes a dictionary, DNA_Codons, mapping DNA codons to amino acids and also creates an inverse dictionary for reverse lookups. A key function, complement_dna, computes the complementary DNA strand, essential for examining both strands of the DNA. Another function, translate_to_amino_acids, translates a DNA sequence into an amino acid sequence using the codon mapping. The script's core, search_amino_acid_sequences, identifies segments in the DNA and its complement that match a given amino acid sequence. Executed from the command line, the script reads DNA and amino acid sequences from standard input and outputs all matching DNA sequence

Problem 17
* This Python script is designed to calculate the cyclospectrum of a peptide sequence, a task commonly performed in bioinformatics and mass spectrometry analysis. The script defines a dictionary, amino_to_mass, mapping each amino acid to its respective mass. The peptide_mass_calc function computes the total mass of a given peptide by summing the masses of its constituent amino acids. Another function, linear_fragment, generates all possible linear fragments of the peptide, essential for constructing the cyclospectrum. The core function, cyclospectrum, calculates the cyclospectrum by determining the mass of each fragment generated by linear_fragment, including the mass of the whole peptide. The script reads a peptide sequence from standard input and outputs the calculated cyclospectrum, running its main logic in the main function when executed.

Problem 18
* Cyclopeptide Sequencing Problem, a significant challenge in bioinformatics related to protein structure analysis. The script is organized into three main classes: PeptideKmers, PeptideSpectrum, and CyclopeptideSolver. The PeptideKmers class is responsible for generating all possible subsequences (k-mers) of a given length from a peptide sequence. It takes the peptide and a k-value to produce a list of these k-mers. Following this, the PeptideSpectrum class calculates the mass spectrum of a peptide. It utilizes a predefined dictionary that maps each amino acid to its corresponding mass and computes the mass of every subsequence of the peptide, including the empty sequence and the peptide itself. The resultant mass spectrum is a sorted list of these calculated masses.

## Problem16

In [None]:
import sys

# Restructured DNA codons dictionary
codon_pairs = [ 
    "GCT": "A", "GCC": "A", "GCA": "A", "GCG": "A",
    "TGT": "C", "TGC": "C",
    "GAT": "D", "GAC": "D",
    "GAA": "E", "GAG": "E",
    "TTT": "F", "TTC": "F",
    "GGT": "G", "GGC": "G", "GGA": "G", "GGG": "G",
    "CAT": "H", "CAC": "H",
    "ATA": "I", "ATT": "I", "ATC": "I",
    "AAA": "K", "AAG": "K",
    "TTA": "L", "TTG": "L", "CTT": "L", "CTC": "L", "CTA": "L", "CTG": "L",
    "ATG": "M",
    "AAT": "N", "AAC": "N",
    "CCT": "P", "CCC": "P", "CCA": "P", "CCG": "P",
    "CAA": "Q", "CAG": "Q",
    "CGT": "R", "CGC": "R", "CGA": "R", "CGG": "R", "AGA": "R", "AGG": "R",
    "TCT": "S", "TCC": "S", "TCA": "S", "TCG": "S", "AGT": "S", "AGC": "S",
    "ACT": "T", "ACC": "T", "ACA": "T", "ACG": "T",
    "GTT": "V", "GTC": "V", "GTA": "V", "GTG": "V",
    "TGG": "W",
    "TAT": "Y", "TAC": "Y",
    "TAA": "_", "TAG": "_", "TGA": "_"
]
DNA_Codons = dict(codon_pairs)
inverse_Codons = {value: key for key, value in DNA_Codons.items()}

def complement_dna(sequence):
    translation_table = str.maketrans('ACGTN', 'TGCAN')
    return sequence[::-1].translate(translation_table)

def split_into_codons(dna_sequence, offset=0):
    return [dna_sequence[i:i + 3] for i in range(offset, len(dna_sequence), 3)]

def translate_to_amino_acids(dna_seq):
    return ''.join(DNA_Codons.get(codon, '') for codon in split_into_codons(dna_seq))

def search_amino_acid_sequences(dna, amino_acid_seq):
    found_sequences = []
    dna = dna.upper()
    reversed_dna = complement_dna(dna)
    amino_acid_seq = amino_acid_seq.upper()
    length = len(amino_acid_seq) * 3

    for i in range(len(dna) - length + 1):
        segment = dna[i:i + length]
        if translate_to_amino_acids(segment) == amino_acid_seq:
            found_sequences.append(segment)
    
    for i in range(len(reversed_dna) - length + 1):
        segment = reversed_dna[i:i + length]
        if translate_to_amino_acids(segment) == amino_acid_seq:
            found_sequences.append(complement_dna(segment))

    return found_sequences

def run_amino_acid_search():
    dna_sequence, amino_acid_sequence = (line.strip().upper() for line in sys.stdin)
    matching_sequences = search_amino_acid_sequences(dna_sequence, amino_acid_sequence)

    for sequence in matching_sequences:
        print(sequence)

if __name__ == "__main__":
    run_amino_acid_search()


## Problem 17

In [None]:
import sys

# Mapping of amino acids to their respective masses
amino_to_mass = {
    "G": 57, "A": 71, "S": 87, "P": 97, "V": 99,
    "T": 101, "C": 103, "I": 113, "L": 113, "N": 114,
    "D": 115, "K": 128, "Q": 128, "E": 129, "M": 131,
    "H": 137, "F": 147, "R": 156, "Y": 163, "W": 186
}

def peptide_mass_calc(peptide):
    ''' Calculate the total mass of a peptide '''
    return sum(amino_to_mass.get(amino, 0) for amino in peptide)

def linear_fragment(peptide):
    ''' Generates all linear fragments of the peptide '''
    fragments = []
    for i in range(1, len(peptide)):
        for j in range(len(peptide) - i + 1):
            fragments.append(peptide[j:j + i])
    return fragments

def cyclospectrum(peptide):
    ''' Calculate the cyclospectrum of the peptide '''
    spectrum = [0]
    linear_fragments = linear_fragment(peptide)
    
    # Add masses of linear fragments
    for fragment in linear_fragments:
        spectrum.append(peptide_mass_calc(fragment))
    
    # Add mass of the whole peptide
    spectrum.append(peptide_mass_calc(peptide))
    
    return sorted(spectrum)

def main():
    # Read the sequence from standard input
    sequence = sys.stdin.readline().strip()

    # Calculate and print the cyclospectrum
    spectrum = cyclospectrum(sequence)
    print(' '.join(map(str, spectrum)))

if __name__ == "__main__":
    main()


## Problem 18

In [None]:
import sys

class PeptideKmers:
    ''' Generate all possible k-mers of a given length from a peptide sequence '''
    def __init__(self, peptide, k):
        self.peptide = peptide
        self.k = k

    def generate_kmers(self):
        return [self.peptide[i:i+self.k] for i in range(len(self.peptide) - self.k + 1)]

class PeptideSpectrum:
    ''' Calculate the mass spectrum of a peptide '''
    mass_of_amino_acids = {
        'A': 71, 'G': 57, 'M': 131, 'S': 87, 'C': 103,
        'H': 137, 'N': 114, 'T': 101, 'D': 115, 'I': 113,
        'P': 97, 'V': 99, 'E': 129, 'K': 128, 'Q': 128,
        'W': 186, 'F': 147, 'L': 113, 'R': 156, 'Y': 163
    }

    def __init__(self, peptide):
        self.peptide = peptide

    def calculate_spectrum(self):
        subsets = [''] + [sub for k in range(1, len(self.peptide)+1) for sub in PeptideKmers(self.peptide, k).generate_kmers()]
        return sorted([sum(self.mass_of_amino_acids.get(aa, 0) for aa in subset) for subset in subsets])

class CyclopeptideSolver:
    ''' Solve the Cyclopeptide Sequencing Problem given a spectrum '''
    def __init__(self, spectrum):
        self.spectrum = spectrum

    def find_sequences(self):
        active_peptides = [[]]
        final_sequences = []

        while active_peptides:
            extended_peptides = [pep + [mass] for pep in active_peptides for mass in set(PeptideSpectrum.mass_of_amino_acids.values())]
            active_peptides = []

            for peptide in extended_peptides:
                if sum(peptide) == max(self.spectrum):
                    if PeptideSpectrum(peptide).calculate_spectrum() == self.spectrum:
                        final_sequences.append(peptide)
                elif all(mass in self.spectrum for mass in PeptideSpectrum(peptide).calculate_spectrum()):
                    active_peptides.append(peptide)

        return final_sequences

def main():
    input_spectrum = [int(mass) for mass in sys.stdin.readline().strip().split()]
    solver = CyclopeptideSolver(input_spectrum)
    for sequence in solver.find_sequences():
        print('-'.join(map(str, sequence)), end=' ')

if __name__ == "__main__":
    main()


## Inspection Results

Konstantinos: There is no error handling in your code. you're generating all possible k-mers for each k value. This is a good approach, but it could be more efficient if large peptides are processed, as it generates a lot of intermediate lists. 

Trevor: In CyclopeptideSolver.find_sequences, the nested loops with list comprehensions might become inefficient for larger spectra due to the creation of numerous temporary lists. You might want to profile this part if you're dealing with large datasets.

