Coding 1: (20 points)

Design your own object class. It should be an object that is useful to solve a problem relevant to
your daily life, such as a protein sequence or perhaps a bank account object. Good commenting
is a must! Docstrings are not required, however. The object must have the following
characteristics:

1. A brief written description of the class, its purpose, and how to use it. (5 points)
2. A well-designed init, getters, and setters. (5 points)
3. At least two extra methods that serve useful purposes. (5 points)
4. Design two subclasses of this object that extend it in a useful, meaningful way. (5 points)

In [124]:
import random
import re

class Sequence:
    """This is a representation of any biological sequence. This specific class is
    intended to be extended as a base class and is not intended to be implemented
    on its own (although it can be). The base Sequence class does not have a type,
    although child classes could (e.g. DNA/RNA/Protein sequences). The sequence
    itself should not be able to be arbitrarily set, and neither should its type.
    It also has a "quality string" which denotes the confidence that each element
    of the sequence is actually correct. For example, in the case that this sequence
    represents a DNA sequence, the first value of the quality string represents
    the confidence that the first nucleotide base call is actually correct. By
    default, the quality string is empty, but it should always be the same length
    as the sequence itself.

    A Sequence object can be created by simply passing the sequence as a string to
    the constructor method. Sequences can have their quality string set, they can
    have single elements mutated (either at random or at a specific index with a
    specific value), inserted, or deleted.

    Example usage:
    # create an amino acid sequence
    > aa_sequence = Sequence("RLLKGEENWYQIIAAA")

    # return the sequence using the getter method
    > aa_sequence.get_sequence()

    # mutate a random amino acid residue at a random position
    > aa_sequence.mutate()

    # mutate a random amino acid residue into a Tryptophan
    > aa_sequence.mutate(mutation_value="W")

    # delete the last amino acid residue
    > aa_sequence.delete_element(15)

    :param sequence: the sequence to be represented
    :type sequence: str
    """
    def __init__(self, sequence: str):
        self.__sequence_string = sequence
        self.__sequence_type = None
        self.__quality_string = ""

    def _valid_index(self, index: int):
        """helper function for checking if index is valid in other member functions"""
        if index < 0 or index > len(self.__sequence_string):
            return False
        else:
            return True

    def _valid_sequence(self) -> None:
        """checking for a valid sequence doesn't matter in the base case"""
        pass

    def get_sequence(self) -> str:
        """getter function to return the sequence"""
        return self.__sequence_string

    def get_sequence_length(self) -> int:
        """getter function to return the length of the sequence"""
        return len(self.__sequence_string)

    def get_sequence_type(self) -> str:
        """getter function to return the sequence type"""
        return self.__sequence_type

    def get_quality_values(self) -> str:
        """getter function to return the quality values"""
        return self.__quality_string

    def update_quality_values(self, quality_values) -> None:
        """setter function to change the sequence to something completely new"""
        if len(quality_values) != self.get_sequence_length():
            raise ValueError("The quality value sequence should be the same length as the sequence")
        self.__quality_string = quality_values

    def mutate(self, index: int=None, mutation_value: str=None) -> None:
        """
        extra method #1: mutate a value at a given index in the sequence
        the mutated value will be one of the existing values in the sequence unless
        the mutation value is explicitly defined. the mutation may be silent (e.g.
        the value will be the same)

        :param index: the index to mutate at. sequence is 0-indexed
        :param mutation_value: value to mutate at a specified index in the sequence.
        :return:
        """
        if index and not self._valid_index(index):
            raise IndexError("Tried to insert outside of the sequence bounds")
        elif not index:
            index = random.choice(range(0, len(self.__sequence_string)))

        if not mutation_value:
            mutation_value = random.choice(list(set(self.__sequence_string)))

        if index == 0:
            self.__sequence_string = mutation_value + self.__sequence_string[index + 1:]
        else:
            self.__sequence_string[:index] + mutation_value + self.__sequence_string[index + 1:]

    def insert_element(self, index: int, insertion_value: str) -> None:
        """
        extra method #2: insert a value at a given index in the sequence

        :param index: the index to insert an element into the sequence
        :param insertion_value: the value to insert at the provided index
        :return:
        """
        if not self._valid_index(index):
            raise IndexError("Tried to insert outside of the sequence bounds")

        self.__sequence_string = self.__sequence_string[:index] + insertion_value + self.__sequence_string[index:]

    def delete_element(self, index: int) -> None:
        """
        extra method #3: delete a value at a given index in the sequence

        :param index: the index to delete an element in the sequence from
        :return:
        """
        if not self._valid_index(index):
            raise IndexError("Tried to delete outside of the sequence bounds")

        self.__sequence_string = self.__sequence_string[:index] + self.__sequence_string[index+1:]

In [125]:
class DNASequence(Sequence):
    """A representation of a DNA sequence.

    In addition to the functionality of the base class, a DNA sequence
    can also transcribe an RNA sequence from itself, and can also recall
    its complementary sequence.

    Example usage:
    # create a DNA sequence
    > my_dna_sequence = DNASequence("ATCCCGGCGATATA")

    # get the complementary strand
    > my_dna_sequence.complement()

    # get the RNA transcription of this strand
    > my_dna_sequence.transcribe()

    :param sequence: the DNA sequence
    :type sequence: str
    """
    def __init__(self, sequence: str):
        super().__init__(sequence)
        if not self._valid_sequence():
            raise ValueError("Not a valid DNA sequence, double check the input bases")
        self.__sequence_type = "DNA"

    def _valid_sequence(self) -> bool:
        """
        check if the sequence is a valid DNA sequence

        :return: whether the sequence is a valid DNA sequence
        """
        if any([base.lower() not in {"a", "t", "g", "c"} for base in self.get_sequence()]):
            return False
        else:
            return True

    def complement(self) -> str:
        """
        return the complementary DNA sequence

        :return: the complementary/reverse DNA strand
        """
        nucleotide_map = {
            "A": "T",
            "T": "A",
            "G": "C",
            "C": "G",
        }
        reverse_strand = "".join([nucleotide_map[base.upper()] for base in self.get_sequence()])

        if self.get_sequence()[0].isupper():
            return reverse_strand
        else:
            return reverse_strand.lower()

    def transcribe(self) -> str:
        """
        return the RNA sequence that would be transcribed from this sequence

        :return: the RNA version of the sequence
        """
        if self.get_sequence()[0].isupper():
            return re.sub(r"T", "U", self.get_sequence())
        else:
            return re.sub(r"t", "u", self.get_sequence())

In [134]:
class RNASequence(Sequence):
    """A representation of an RNA sequence

    In addition to the functionality of the base class, an RNA sequence
    can also translate an amino acid sequence from itself

    Example usage:
    # create an RNA sequence
    > my_rna_sequence = RNASequence("AUGAAGGCUCGGAUAUAGGG")

    # translate the RNA sequence into an amino acid sequence
    > my_rna_sequence.translate()

    :param sequence: the RNA sequence
    :type sequence: str
    """
    def __init__(self, sequence):
        super().__init__(sequence)
        if not self._valid_sequence():
            raise ValueError("Not a valid RNA sequence, double check the input bases")
        self.__sequence_type = "RNA"

    def _valid_sequence(self) -> bool:
        """
        check if the sequence is a valid DNA sequence

        :return: whether the sequence is a valid DNA sequence
        """
        if any([base.lower() not in {"a", "u", "g", "c"} for base in self.get_sequence()]):
            return False
        else:
            return True

    def _generate_codon(self) -> str:
        """
        helper function to split the RNA sequence into codons

        :return: a 3-base codon
        """
        remaining_bases = self.get_sequence()

        while remaining_bases:
            yield remaining_bases[:3]
            remaining_bases = remaining_bases[3:]

    def translate(self) -> str:
        """
        translate the RNA sequence into a sequence of amino acids. the length of the sequence
        should be divisible by 3

        :return: the translated amino acid sequence from the RNA sequence
        """
        if self.get_sequence_length() % 3 != 0:
            raise ValueError("The length of the RNA sequence is not divisible by 3, not sure which reading frame to use")

        codon_dict = {"UUU": "F", "UUC": "F",
                      "UUA": "L", "UUG": "L", "CUU": "L", "CUC": "L", "CUA": "L", "CUG": "L",
                      "AUU": "I", "AUA": "I", "AUC": "I",
                      "AUG": "M",
                      "GUU": "V", "GUC": "V", "GUA": "V", "GUG": "V",
                      "UCU": "S", "UCC": "S", "UCA": "S", "UCG": "S", "AGU": "S", "AGC": "S",
                      "CCU": "P", "CCC": "P", "CCA": "P", "CCG": "P",
                      "ACU": "T", "ACC": "T", "ACA": "T", "ACG": "T",
                      "GCU": "A", "GCC": "A", "GCA": "A", "GCG": "A",
                      "UAU": "Y", "UAC": "Y",
                      "CAU": "H", "CAC": "H",
                      "CAA": "Q", "CAG": "Q",
                      "AAU": "N", "AAC": "N",
                      "AAA": "K", "AAG": "K",
                      "GAU": "D", "GAC": "D",
                      "GAA": "E", "GAG": "E",
                      "UGU": "C", "UGC": "C",
                      "UGG": "W",
                      "CGU": "R", "CGC": "R", "CGA": "R", "CGG": "R", "AGA": "R", "AGG": "R",
                      "GGU": "G", "GGC": "G", "GGA": "G", "GGG": "G",
                      "UAA": "[STOP]", "UAG": "[STOP]", "UGA": "[STOP]",
        }

        aa_chain = ""
        for codon in self._generate_codon():
            aa_chain += codon_dict[codon]

        return aa_chain


Coding 2: (20 points)

Incomprehensible!

Create two comprehensions of any kind you like. For each one, do the following (10 points, 1
per comprehension, 2 for each of the following):

1. Attempt to create the comprehension as a for loop. If it is impossible, say why.
2. Attempt to create the comprehension as a lambda (with any friends like
map/filter/reduce). If it is impossible, say why.

In [144]:
# a list comprehension -> a for loop
daisy_comprehension = ["she loves me" if num % 2 else "she loves me not" for num in range(30)]

daisy_loop = []
for num in range(30):
    if num % 2:
        daisy_loop.append("she loves me")
    else:
        daisy_loop.append("she loves me not")

In [155]:
# a dict comprehension -> a lambda + map
names = ["Pat", "Ron", "Mo", "Nehemiah", "Omerit", "Marvin", "Isaiah"]
who_let_the_dogs_out_comprehension = {name: (True if name == "Isaiah" else False) for name in names}

who_let_the_dogs_out_lambda = lambda x: x == "Isaiah"

who_let_the_dogs_out_map = {name: next(map(who_let_the_dogs_out_lambda, [name])) for name in names}

Coding 3: (10 points)

Efficiency exploration: we’re advanced now!

Devise two pieces of code that are sufficiently unique from any other code you wrote in the
prior questions. They should be loops of some kind that perform a finite task (no while True
stuff!), even if that task is silly.  Include a brief description of the purpose of each piece of code.
(1 point each)

Time their execution using the timeit module as here:
https://docs.python.org/3.8/library/timeit.html (1 point each)

Try making some change in your code that you think might make the code more efficient but not
alter its function - that is, it must produce the same result as its unaltered form. If you do, show
the change and record your difference in time. If you are unable to find an adjustment that does
so, mention what you tried. (2 points each)

Please include a viable explanation of why your change made the code run more quickly, or
why the change failed to do so. (1 point each)

In [1]:
import timeit

In [6]:
# code snippet 1: compare ways of finding the sum of 0 to 100000000

def for_loop_sum() -> None:
    """
    slowest way to calculate; use a naive for-loop to
    loop over integers from 0 to 100000000

    :return:
    """
    iterations = 100000000
    counter = 0
    for i in range(iterations + 1):
        counter += i
    print(counter)

def sum_sum() -> None:
    """
    faster way to calculate; use the built-in `sum`
    function to sum the list of integers more quickly.

    the increase in speed is likely due to `sum` being implemented
    and rigorously optimized in C inside the Python interpreter,
    whereas the for-loop has to be interpreted and is thus slower.

    :return:
    """
    print(sum(range(100000000 + 1)))

print(f'For-loop time: {timeit.timeit("for_loop_sum()", number=1, setup="from __main__ import for_loop_sum")}')
print("-----")
print(f'sum func time: {timeit.timeit("sum_sum()", number=1, setup="from __main__ import sum_sum")}')

5000000050000000
For-loop time: 3.5041889999993145
-----
5000000050000000
sum func time: 1.0773054159944877


In [7]:
# code snippet 2: appending to a list

def for_loop_append():
    """
    slow way to create a list of all integers from 0 to 100000000

    :return:
    """
    list_to_append = []
    iterations = 100000000

    for i in range(iterations + 1):
        list_to_append.append(i)

def comprehension_append():
    """
    faster way to create a list of all integers from 0 to 100000000

    this approach is might be faster if the list is created wholesale at the start of the
    comprehension call. since memory is already allocated, filling the list is a trivial task.
    in contrast, the for-loop implementation needs to expand how much memory is allocated to
    the list with each iteration, which may increase runtime. the for-loop implementation
    also needs to look up the `.append()` method with each iteration, which may require
    additional runtime.

    :return:
    """
    iterations = 100000000
    list_comprehension_append = [i for i in range(iterations + 1)]

print(f'For-loop time: {timeit.timeit("for_loop_append()", number=1, setup="from __main__ import for_loop_append")}')
print("-----")
print(f'Comprehension time: {timeit.timeit("comprehension_append()", number=1, setup="from __main__ import comprehension_append")}')

For-loop time: 2.7473626249702647
-----
Comprehension time: 2.3807997920084745
