**Aufgabe: ProteinAnalyzer und EnzymeAnalyzer**

Erstelle eine Klasse `ProteinAnalyzer`, die folgende Funktionalitäten bietet:  
1. Akzeptiert eine Proteinsequenz (bestehend aus den 20 kanonischen Aminosäuren: "ACDEFGHIKLMNPQRSTVWY") als Eingabe.  
   - Validiert, dass nur gültige Aminosäuren enthalten sind.  
2. Berechnet die Länge der Sequenz.  
3. Erstellt eine Aminosäuren-Zusammensetzung als Wörterbuch, das die Häufigkeit jeder Aminosäure in der Sequenz enthält.  
4. Implementiert eine Methode `most_frequent_residue`, die die häufigste Aminosäure zurückgibt.  
5. Implementiert eine Methode `hydrophobic_residues`, die eine Liste der hydrophoben Aminosäuren in der Sequenz zurückgibt. Nutze die hydrophoben Aminosäuren: "A", "V", "I", "L", "M", "F", "W", "Y".  

Erstelle dann eine abgeleitete Klasse `EnzymeAnalyzer`, die zusätzlich folgende Funktionalitäten bietet:  
1. Nimmt eine Proteinsequenz und eine Liste von Enzymschnittstellen als Eingabe. Jede Schnittstelle ist durch eine Sequenz (z. B. "KR" oder "RG") definiert.  
2. Validiert, dass die Schnittstellen aus gültigen Aminosäuren bestehen.  
3. Implementiert eine Methode `find_cut_sites`, die alle Positionen in der Proteinsequenz zurückgibt, an denen die Enzymschnittstellen gefunden werden.  
4. Implementiert eine Methode `digest`, die die Proteinsequenz basierend auf den Schnittstellen in Fragmente zerlegt und diese als Liste zurückgibt.  

**Zusatz:**
- Stelle sicher, dass alle Eigenschaften und Methoden sinnvoll durch Vererbung genutzt werden.  
- Füge sinnvolle Methoden zur Darstellung (`__repr__`) und zum Kombinieren von Objekten (`__add__`) hinzu.  
- Schreibe Beispielaufrufe für beide Klassen, die die wichtigsten Methoden demonstrieren.  

---

Viel Erfolg! 😊

In [94]:
class ProteinAnalyzer:
    def __init__(self, sequence):

        self.valid_amino = "ACDEFGHIKLMNPQRSTVWY"
        
        for amino in sequence:
            if amino not in self.valid_amino:
                raise ValueError("Invalid Amino Acid in Sequence")
        self.sequence = sequence
           
    def show_len(self):
        return len(self.sequence)

    def show_dict(self):
        amino_dict = {amino: 0 for amino in self.valid_amino}
        for amino in self.sequence:
            if amino in amino_dict:
                amino_dict[amino] += 1
        return amino_dict
    
    def mostfrequentresidue(self):
        amino_dict = self.show_dict()
        max_count = max(amino_dict.values())
        # most_frequent = max(amino_dict, key=amino_dict.get)  <------- short version
        for amino, count in amino_dict.items():
            if count == max_count:
                return f"The most frequent residue is '{amino}' with count {max_count}."
    
    def hydrophobicresidues(self):
        hydrophobic_amino = "A", "V", "I", "L", "M", "F", "W", "Y"
        h_aminos_in_seq = []
        for h_amino in self.sequence:
            if h_amino in hydrophobic_amino:
                h_aminos_in_seq.append(h_amino)
        return h_aminos_in_seq
    
    def __repr__(self):
        return f"{self.__class__.__name__}(sequence='{self.sequence}')"

    def __str__(self):
        return f"Angegebene Protein Sequenz: '{self.sequence}')"

    def __add__(self, other):
        return self.sequence + other.sequence

    def display(self):
        print(self.sequence)

class EnzymeAnalyzer(ProteinAnalyzer):
    def __init__(self, sequence, cut_sites):
        super().__init__(sequence)
        
        #Short Version GPT
        # Flatten all amino acids in cut_sites into a single list
        #all_cut_amino = ''.join(cut_sites)
        
        # Validate all amino acids in the cut sites
        #if not all(amino in self.valid_amino for amino in all_cut_amino):
        #    raise ValueError("Invalid Amino Acid in Cut Site")

        for sites in cut_sites:
            for amino in sites:
                if amino not in self.valid_amino:                  
                    raise ValueError("Invalid Amino Acid in Cut Site")
        self.cut_sites = cut_sites
    
    def find_cut_sites(self):
        indices = []
        for i in range(len(self.sequence) - 1):
            for site in self.cut_sites:
                # Check if the current two-amino substring matches a cut site
                if self.sequence[i:i+len(site)] == site:
                    indices.append(i)
        return indices
    
    def digest(self):
        cut_indices = self.find_cut_sites()
                # Add start and end of sequence to indices for complete fragment extraction
        full_indices = [0] + sorted(cut_indices) + [len(self.sequence)]
        
        # Create fragments
        fragments = []
        for i in range(len(full_indices) - 1):
            # Extract fragment from one index to the next
            fragment = self.sequence[full_indices[i]:full_indices[i+1]]
            fragments.append(fragment)
        
        return fragments

    def display(self):
        print(self.sequence)
        print(self.cut_sites)



'''
seq1 = ProteinAnalyzer("ACDEFAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYY")
seq1.display()
print(seq1.show_len())
print(seq1.show_dict())
print(seq1.mostfrequentresidue())
print(seq1.hydrophobicresidues())
print(seq1.__repr__())
print(seq1.__str__())
'''
seq1 = ProteinAnalyzer("ACDEFAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYY")
seq2 = ProteinAnalyzer("WWWWWWW")

print(seq1 + seq2)



seq_cut = EnzymeAnalyzer("AAEEFKRLLMRGSAAYYYKRDSDDDD", ["KR", "RG"])
#seq_cut.display()
seq_cut.find_cut_sites()
seq_cut.digest()

ACDEFAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYYWWWWWWW


['AAEEF', 'KRLLM', 'RGSAAYYY', 'KRDSDDDD']

In [95]:
# CLAUDE VERSION

class ProteinAnalyzer:
    VALID_AMINO_ACIDS = "ACDEFGHIKLMNPQRSTVWY"
    HYDROPHOBIC_AMINO_ACIDS = set("AVILMFWY")

    def __init__(self, sequence):
        # Validate input sequence in one go
        if not all(amino in self.VALID_AMINO_ACIDS for amino in sequence):
            raise ValueError("Invalid Amino Acid in Sequence")
        self.sequence = sequence
           
    def show_len(self):
        return len(self.sequence)

    def show_dict(self):
        # Use Counter for more efficient counting
        from collections import Counter
        return {amino: Counter(self.sequence)[amino] for amino in self.VALID_AMINO_ACIDS}
   
    def mostfrequentresidue(self):
        # Use max with a key function for a more concise solution
        amino_dict = self.show_dict()
        max_amino = max(amino_dict, key=amino_dict.get)
        return f"The most frequent residue is '{max_amino}' with count {amino_dict[max_amino]}."
   
    def hydrophobicresidues(self):
        # Use list comprehension for efficiency
        return [amino for amino in self.sequence if amino in self.HYDROPHOBIC_AMINO_ACIDS]
   
    def __repr__(self):
        return f"{self.__class__.__name__}(sequence='{self.sequence}')"
    
    def __str__(self):
        return f"Angegebene Protein Sequenz: '{self.sequence}'"
    
    def __add__(self, other):
        return self.sequence + other.sequence

    def display(self):
        print(self.sequence)


class EnzymeAnalyzer(ProteinAnalyzer):
    def __init__(self, sequence, cut_sites):
        # Validate sequence and cut sites in one go
        super().__init__(sequence)
        
        # Validate cut sites using a set comprehension
        if not all(all(amino in self.VALID_AMINO_ACIDS for amino in site) for site in cut_sites):
            raise ValueError("Invalid Amino Acid in Cut Site")
        
        self.cut_sites = cut_sites
   
    def find_cut_sites(self):
        # Use a list comprehension with enumerate for more pythonic approach
        return [i for i in range(len(self.sequence) - 1) 
                for site in self.cut_sites 
                if self.sequence[i:i+len(site)] == site]
   
    def digest(self):
        # Optimize fragment extraction
        cut_indices = self.find_cut_sites()
        full_indices = [0] + sorted(cut_indices) + [len(self.sequence)]
        
        # Use list comprehension for fragment extraction
        return [self.sequence[full_indices[i]:full_indices[i+1]] 
                for i in range(len(full_indices) - 1)]

    def display(self):
        print(f"Sequence: {self.sequence}")
        print(f"Cut Sites: {self.cut_sites}")


# Example usage
def main():
    # Demonstrate ProteinAnalyzer methods
    seq1 = ProteinAnalyzer("ACDEFAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYY")
    seq2 = ProteinAnalyzer("WWWWWWW")
    
    print("Sequence concatenation:", seq1 + seq2)
    print("Sequence length:", seq1.show_len())
    print("Amino acid dictionary:", seq1.show_dict())
    print("Most frequent residue:", seq1.mostfrequentresidue())
    print("Hydrophobic residues:", seq1.hydrophobicresidues())
    print("Representation:", repr(seq1))
    print("String representation:", str(seq1))

    # Demonstrate EnzymeAnalyzer methods
    seq_cut = EnzymeAnalyzer("AAEEFKRLLMRGSAAYYYKRDSDDDD", ["KR", "RG"])
    print("\nCut site indices:", seq_cut.find_cut_sites())
    print("Digested fragments:", seq_cut.digest())

if __name__ == "__main__":
    main()

Sequence concatenation: ACDEFAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYYWWWWWWW
Sequence length: 38
Amino acid dictionary: {'A': 11, 'C': 1, 'D': 1, 'E': 1, 'F': 1, 'G': 0, 'H': 0, 'I': 0, 'K': 0, 'L': 0, 'M': 0, 'N': 0, 'P': 0, 'Q': 0, 'R': 0, 'S': 0, 'T': 0, 'V': 0, 'W': 0, 'Y': 23}
Most frequent residue: The most frequent residue is 'Y' with count 23.
Hydrophobic residues: ['A', 'F', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y']
Representation: ProteinAnalyzer(sequence='ACDEFAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYY')
String representation: Angegebene Protein Sequenz: 'ACDEFAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYY'

Cut site indices: [5, 10, 18]
Digested fragments: ['AAEEF', 'KRLLM', 'RGSAAYYY', 'KRDSDDDD']
