So now that we know a little more about inheritance and how it works, let's update and expand on our previous work with the Sequence class and Sequence records.

 

When we wrote Sequence before, we did so at least somewhat thinking we'd be dealing with DNA sequences. However, that's not always the case,  Sequence could just as well refer to a protein sequence instead. And the methods we wrote in our Sequence class, the things we might have added like get GC count or finding the reverse complement, aren't appropriate for protein sequences.  This represents a great opportunity to utilize inheritance to create two subtypes of Sequence.

 

In a notebook, start with a markdown cell and plan out what you think these 3 classes should look like. What are the common elements of Sequences (things we could define in the parent class Sequence) and what would need to be unique to DNASequence and ProteinSequence classes? What rules do you want to enforce about what these sequences should look like and how do you want enforce those rules? Do you need to override constructors, or could the parent's work? Remember, eventually you want these to work with the SequenceRecord class we built earlier, so don't make any huge fundamental changes that would break that.

 

Your classes should, at minimum:

1) have a __repr__ and __str__ that provide a meaningful representation as a string

2) check that the bases or amino acids in the string are valid

3) work as the argument for a SequenceRecord

DNA

1) a translate method that will convert the DNA sequence and return a ProteinSequence object

2) one other method of your choice (what you did previously is fine)

Protein

1) a method of your choice. In this case, if the method you would implement is too complex to reasonably implement or would use resources you don't have access to, it is okay to leave it as what is called a stub method (has only one line, "pass") and explain in comments what this method would do and it's purpose
 

Here is a dictionary you can copy into your code to help facilitate DNA translation:

aa_dict = {'M':['ATG'], 'F':['TTT', 'TTC'], 'L':['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'], 'C':['TGT', 'TGC'], 'Y':['TAC', 'TAT'], 'W':['TGG'], 'P':['CCT', 'CCC', 'CCA', 'CCG'], 'H':['CAT', 'CAC'],
'Q':['CAA', 'CAG'], 'R':['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'], 'I':['ATT', 'ATC', 'ATA'], 'T':['ACT', 'ACC', 'ACA', 'ACG'],
'N':['AAT', 'AAC'], 'K':['AAA', 'AAG'], 'S':['AGT', 'AGC', 'TCT', 'TCC', 'TCA', 'TCG'], 'V':['GTT', 'GTC', 'GTA', 'GTG'],
'A':['GCT', 'GCC', 'GCA', 'GCG'], 'D':['GAT', 'GAC'], 'E':['GAA', 'GAG'], 'G':['GGT', 'GGC', 'GGA', 'GGG'], '*':['TAA','TAG','TGA']}

In [2]:
#sequence class goes here
from functools import total_ordering
@total_ordering

# sequence class creation
class Seq:
    def __init__(self, seqs):
        self.seqs = seqs
    # for the formal conversion of the string, repr is used
    def __repr__(self):
        return "Sequence = {}".format(self.seqs)
    # to informally convert string, str is used
    def __str__(self):
        return "DNA Sequence = '{}'".format(self.seqs)
    # when we want to define criteria for two sequences, we use eq and lt
    def __eq__(self, other):
        return self.seqs == other.seqs
    def __lt__(self, other):
        return len(self.seqs) < len(other.seqs)
    

In [3]:
class DNASeq(Seq):
    def Translate(self, dnaseq):
        self.dnaseq = dnaseq
        aa_dict = {'M':['ATG'], 'F':['TTT', 'TTC'], 'L':['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'], 'C':['TGT', 'TGC'], 'Y':['TAC', 'TAT'], 'W':['TGG'], 'P':['CCT', 'CCC', 'CCA', 'CCG'], 'H':['CAT', 'CAC'],
                    'Q':['CAA', 'CAG'], 'R':['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'], 'I':['ATT', 'ATC', 'ATA'], 'T':['ACT', 'ACC', 'ACA', 'ACG'],
                    'N':['AAT', 'AAC'], 'K':['AAA', 'AAG'], 'S':['AGT', 'AGC', 'TCT', 'TCC', 'TCA', 'TCG'], 'V':['GTT', 'GTC', 'GTA', 'GTG'],
                    'A':['GCT', 'GCC', 'GCA', 'GCG'], 'D':['GAT', 'GAC'], 'E':['GAA', 'GAG'], 'G':['GGT', 'GGC', 'GGA', 'GGG'], '*':['TAA','TAG','TGA']}      
        proteinseq = ''
        for i in range(0,len(dnaseq) - 2,3):
            #print(dnaseq[i:i+3])
            for key,value in aa_dict.items():
                if dnaseq[i:i+3] in value:
                    proteinseq+=key
                else:
                    pass
        print(f'Protein sequence is {proteinseq}')
    def GC(self, n):
        self.n = n
        count = 0
        for i in n:
            if i == "C" or i == "G":
                count = count + 1
        return count
    def Validate(self, seqs):
        self.seqs = seqs
        Nucleotide = ["A", "T", "G", "C"]
        tempseq = self.seqs
        for nuc in tempseq:
            if nuc not in Nucleotide:
                return False
        return tempseq

In [4]:
# mentioning a random seqeunce for DNA as a sample
dna1 = Seq("ATGCTAGCATGCATATCGATC")
# mentioning a random seqeunce for DNA as a sample
dna2 = Seq("ATCGAGCATCGATCG")

In [5]:
# Validating a sequence
a = DNASeq(dna2)
a.Validate("ATCGAGCATCGATCG")

'ATCGAGCATCGATCG'

In [6]:
# translation of DNA seq to protein
d = DNASeq(dna1)
d.Translate("ATGCTAGCATGCATATCGATC")

Protein sequence is MLACISI


In [7]:
# counting the GC content in the DNA seq
c = DNASeq("ATGCTAGCATGCATATCGATC")
c.GC("ATGCTAGCATGCATATCGATC")

9

In [8]:

class ProteinSeq(Seq):
    def Validate(self, seqs):
        self.seqs = seqs
        AA = ["A","R","N","D","C","Q","E","G","H","I",
              "L","K","M","F","P","S","T","W","Y","V"]
        temp_aa = self.seqs
        for aa in temp_aa:
            if aa not in AA:
                return False
        return temp_aa

    def AAfreq(self, seqs):
        temp_aa = {"A":0,"R":0,"N":0,"D":0,"C":0,"Q":0,"E":0,"G":0,"H":0,"I":0,
                  "L":0,"K":0,"M":0,"F":0,"P":0,"S":0,"T":0,"W":0,"Y":0,"V":0}
        for aa in seqs:
            temp_aa[aa] += 1
        return temp_aa

In [9]:
# mentioning a random sequence for Protein as a sample
prot1 = Seq("HAPPYNEWYEAR")
# mentioning a random sequence for Protein as a sample
prot2 = Seq("GREATDAY")

In [10]:
p = ProteinSeq(prot2)
p.Validate("GREATDAY")

'GREATDAY'

In [11]:
cnt = ProteinSeq(prot1)
cnt.AAfreq("HAPPYNEWYEAR")

{'A': 2,
 'R': 1,
 'N': 1,
 'D': 0,
 'C': 0,
 'Q': 0,
 'E': 2,
 'G': 0,
 'H': 1,
 'I': 0,
 'L': 0,
 'K': 0,
 'M': 0,
 'F': 0,
 'P': 2,
 'S': 0,
 'T': 0,
 'W': 1,
 'Y': 2,
 'V': 0}

In [12]:
# SequenceRecord class goes here

# creation of a Sequence Record class
class SequenceRecord:
    def __init__(self, label, seq):
        self.label = label
        if type(seq) is Seq:
            self.Seq = seq
    # using repr and str
    def __repr__(self):
        return "As per repr -> Label: {}, Object type is Sequence: {}, Sequence: {}".format(self.label, isinstance(self.Seq, Seq), self.Seq.seqs)
    def __str__(self):
        return "As per str -> Label: {}, Object type is Sequence: {}, Sequence: {}".format(self.label, isinstance(self.Seq, Seq), self.Seq.seqs) 


In [13]:
ptnseq = SequenceRecord ('Mus musculus', prot1)
print(ptnseq)

As per str -> Label: Mus musculus, Object type is Sequence: True, Sequence: HAPPYNEWYEAR
