# Bioinformatics Python programming placement assessment

---

## Purpose

The purpose of this assessment is to determine if you already possess the Python programming skill set pre-requirements of **BIOINF 529: Bioinformatic Concepts & Algorithms** and to waive your requirement for **BIOINF 575: Programming Laboratory in Bioinformatics**.

---

## Instructions

* *Carefully* read **all** instructions for *each* section before attempting to write any code
* <u>Do not alter</u> **any** of the provided source code other than what is requested

---

## Information

### General:

---
### <font color="red">Warning</font>
As we know that there is a strong urge to perform at the edge of your ability in graduate school, we **strongly** urge you to <u>refrain from getting assistance with this assessment</u>, and <u>only submitting work that you know you are currently capable of</u>. 

This assessment is not used to evaluate you in any other way other than your Python programming abilities. 

That is, if you turn in work that is *not your own* or you *do not fully understand* and we, consequently, waive the *BIOINF 575: Programming Laboratory in Bioinformatics* requirement, <font color="red">you will likely suffer</font> from a steeper learning curve during *BIOINF 529: Bioinformatic Concepts & Algorithms*.

---

This is only an *assessment* and will not be used for any other form of evaluation other than that stated in the [Purpose](#Purpose) section of this file.

After your assessment is evaluated, the instructors of *BIOINF 529: Bioinformatic Concepts & Algorithms* will determine if you meet the pre-requirements for the course.

**Note**: We fully acknowledge that you may (or may not) have prior programming experience. This experience may be in other programming languages (e.g. MATLAB, R, etc) or in the language of instruction: **Python**. However, *BIOINF 529: Bioinformatic Concepts & Algorithms* relies on fundamental *Python* programming concepts not readily used in other languages or regularly covered in self-taught curriculum. 

Therefore, please know that we are not assessing your ability to program, but rather your ability to program using intermediate/advanced Python concepts.

# Important Readme:

1. You do **not** have to take this assessment
    * If you do take this assessment; you have the *opportunity* to waive the *BIOINF 575: Programming Laboratory in Bioinformatics* requirement
    * If you do not take this assessment; you will just have to take *BIOINF 575: Programming Laboratory in Bioinformatics* 
* You do **not** have to pass this assessment should you choose to take it
    * If you pass; the *BIOINF 575: Programming Laboratory in Bioinformatics* requirement will be waived
    * If you do not pass; you will just have to take *BIOINF 575: Programming Laboratory in Bioinformatics* 
* This assessment will **not** be stored after the assessment is complete
* If you find yourself not readily understanding what is asked of you in this assessment, do **not** worry. 
    * You **will** be taught these concepts in *BIOINF 575: Programming Laboratory in Bioinformatics* 
    * You **will** find yourself well-prepared for *BIOINF 529: Bioinformatic Concepts & Algorithms*.
* This assessment will **not** be used in any other way to evaluate you as a student or your place in this program

### Points of Assessment

You will be assessed on your ability to understand and implement the following *Points of Assessment*:

|#|Concept|
|--|:--|
|1| [Jupyter notebooks](#Part-0)|
|2| [git](#Part-0)|
|3| [I/O operations](#Part-1%3A-I%2FO-and-string-operations)|
|4| [String operations](#Part-1%3A-I%2FO-and-string-operations)|
|5| [Numpy arrays and simple Python data structures](#Part-2%3A-Numpy%2C-data-structures%2C-%3Ccode%3Erandom%3C%2Fcode%3E%2C-slicing%2C-indexing%2C-and-subsetting)|
|6| [Random selection techniques](#Part-2%3A-Numpy%2C-data-structures%2C-%3Ccode%3Erandom%3C%2Fcode%3E%2C-slicing%2C-indexing%2C-and-subsetting)| 
|7| [Indexing, slicing, and subsetting data](#Part-2%3A-Numpy%2C-data-structures%2C-%3Ccode%3Erandom%3C%2Fcode%3E%2C-slicing%2C-indexing%2C-and-subsetting)|
|8| [Object-Oriented Programming](#Part-3%3A-Object-Oriented-Progamming)|
|9| Coding by Contract|
|10| Iteration methods|
|11| Python control structures|

---

## Administration

### Timeline

You will have one (1) week to complete the assessment

### Submission

Please attach the file in an email to **both** Ryan E Mills (remills@umich.edu) and Alan Boyle (apboyle@umich.edu)

---

# Part 0

You may not know it, but you are already on your way to fulfilling one of the [Points of Assessment: *Jupyter Notebooks*](#Points-of-Assessment). Jupyter Notebooks (\*.ipynb files) are the primary medium of instruction in *BIOINF 529: Bioinformatic Concepts & Algorithms*. Since you were able to view this file, it means you have a basic understanding of using Jupyter Notebooks.

## Instructions

* Use `git` to acquire the data from this [repository](https://github.com/dcmb-courses/bioinf529-assessment-data)
* ***move this notebook*** file into the newly acquired repository

You will be using this data for [Part 1](#Part-1%3A-I%2FO-and-string-operations) of this assessment

---

# Part 1: I/O and string operations

## Instructions

# A. Write the following function
0. <u>No `import` statements are allowed in this section</u>
1. Write a function called <u>`placement_io`</u> with the following positional arguments:
    1. <u>`file`</u>: a string path and filename to data that will be ingested (e.g. '/c/user/me/some_file.txt')
    * <u>`delim`</u>: what is the delimiter of the file (e.g. tab, comma, etc)
2. The function must open the declared `file` and process it *row-by-row*:
    1. Each row must be delimited by `delim` and converted into a `list`
    * Check each row if there is a variant:
        1. If there is no variant; add a period ('.') to the end of the row
        * If there is a variant; create a new string from the reference sequence such that the variant is present in the correct position and add it to the end of the row
    * Add the newly modified row into another `list` that contains all the modified rows
* The function must <u>`return`</u> the modified rows `list`

In [29]:
# Use this cell to write the function described above

def placement_io(file, delim):
    '''
    Function written in Python 3.9 on Windows 10
    
    INPUT:
    file: string path and filename to data that will be ingested (e.g. '/c/user/me/some_file.txt')
          The file has to be in the following format: [start|end|DNA sequence|indicator of variant + idx + single letter base] 
          with the same type of delimiter separating them. The indicator of variant has to be a string of following format: "No_variant" or "Variant_#_base".
    delim: what is the delimiter of the file (e.g. tab, comma, etc).
    
    WHAT IS IT DOING:
    This function reads the input file and processes the file row by row.
    It checks the DNA sequence on possible variants: If there is a variant, append the correct sequence. If not, append a period.
    
    OUTPUT: 
    Returns list with modified rows (lists).
    
    '''

# Read filepath and file in.

    f = open(file, "r", encoding="utf8")


    length = len(f.readlines())
    f.seek(0)
    output_list = []

    # Go through the input file row by row.
    for i in range(length):
        current_row = f.readline()     
        current_row_list = current_row.split(delim)

        if "No_variant" in current_row:
            current_row_list.append(".")
            output_list.append(current_row_list)
            
        elif "Variant" in current_row: 
            # Find the start and end number of given DNA sequence.
            start = int(current_row_list[0])
            end = int(current_row_list[1])
            # Find digits in string and the idx of the variant base that is subject to replacement.
            variant_idx = ""
            for char in current_row_list[3]:
                if char.isdigit():
                    variant_idx = variant_idx + char
            variant_idx = int(variant_idx)-start

            # Find the variant base that is to be switched and create modified DNA sequence accordingly.
            variant_base = current_row_list[3][-2] 
            mod_DNA = current_row_list[2][:variant_idx-1] + variant_base + current_row_list[2][variant_idx:]     
            current_row_list.append(mod_DNA)
            output_list.append(current_row_list)        

        else:
            print("Ambiguos information about variants. Please check input data and make sure that data is marked with either 'No_variant' or 'Variant'. Capitalization and spelling are important.")

    return output_list

### B. Use the function on the data file provided in the git repo

In [30]:
# Use this cell to write the code that uses the above function on the provided file

mod_list = placement_io(r'C:\Users\Jenni\OneDrive\Desktop\directory\bioinf529-assessment-data\sample_data.tsv', "\t")
for row in mod_list:
    print(row)

['4863', '4883', 'GATATAGCACACAAGTAGAC', 'No_variant\n', '.']
['2310', '2330', 'AAAATTAGTAGATTTCAGAG', 'Variant_2327_T\n', 'AAAATTAGTAGATTTCTGAG']
['5313', '5333', 'CCATTTTCAGAATTGGGTGT', 'No_variant\n', '.']
['6302', '6322', 'AACTTGATATAATACCAATA', 'Variant_6303_C\n', 'CACTTGATATAATACCAATA']
['7673', '7693', 'TTAACAATTACACAAGCTTA', 'No_variant\n', '.']
['2378', '2398', 'CACATCCCGCAGGGTTAAAA', 'Variant_2390_C\n', 'CACATCCCGCACGGTTAAAA']
['7355', '7375', 'CTATGGGCGCAGCCTCAATG', 'No_variant\n', '.']
['2022', '2042', 'ATTAGTAGGACCTACACCTG', 'Variant_2025_G\n', 'ATGAGTAGGACCTACACCTG']
['8537', '8557', 'GTGGGTTTTCCAGTCACACC', 'Variant_8545_A\n', 'GTGGGTTATCCAGTCACACC']
['436', '456', 'AGTATGGGCAAGCAGGGAGC', 'Variant_454_T\n', 'AGTATGGGCAAGCAGGGTGC']


---

# Part 2: Numpy, data structures, `random`, slicing, indexing, and subsetting

## Instructions

### A. Setup your workspace
0. The *only* `import` statements allowed in this section are: `numpy`, `random`, and `itertools`
* Load in the following libraries:
    1. Numpy (aliased as `np`)
    * `random` from the standard library
    * any *specific* tools needed from the `itertools` library -> Do **not** import `itertools` directly, just the sub-functions needed

In [None]:
# Use this cell to complete the directions detailed under 'A. Setup your workspace'

import numpy as np
import random 
from itertools import product

### B. Write the following functions

#### a. Numpy array

1. Write a function called <u>`placement_numpy`</u> with the following **keyword** arguments:
    1. <u>`min`</u>: the minimum number to randomly select from (default: 0)
    * <u>`max`</u>: the maximum number to randomly select from (default: 42)
    * <u>`n_rows`</u>: the number of rows in the array (default: 3)
    * <u>`n_cols`</u>: the number of columns in the array (default: 4)
    * <u>`n`</u>: the number of random samples (default: 1000)
    * <u>`fn`</u>: the name of a function that will be used to aggregate results (default: `sum`)
    * <u>`seed`</u>: allow the user to define a random seed for all operations (default: None)
* The function must set the seed declared by <u>`seed`</u> using the Numpy library
* The function must use the Numpy library to create a random array of integers ranging from <u>`min`</u> to <u>`max`</u> such that it fills a Numpy array of dimensions (<u>`n_rows`</u>, <u>`n_cols`</u>)
* Collect <u>`n`</u> random samples with replacement
* Iterate through all the random samples
    * Use <u>`fn`</u> to aggregate the results
* When all iterations are complete, the function must <u>`return`</u> the aggregate answer

In [None]:
# Use this cell to complete the directions detailed under 'a. Numpy array'

def placement_numpy(min=0, max=42, n_rows=3, n_cols=4, n=1000, fn="sum", seed=None):
    
    '''
    TASK:
    The function must set the seed declared by seed using the Np library.
    The function must us the Np library to create a random array of integers ranging from min to max such that it fills a np array of dimensions (n_rows and n_cols)
    Collect n random samples with replacement
    Iterate through all the random samples
        Use fn to aggregate the results
    When all iterations are complete, the function must return the aggregate answer.
    
    WHAT IS THIS FUNCTION DOING?
    It creates an array of size (n_rows x n_cols) with random numbers ranging from min to max. From this array it chooses n samples.
    The aggregation of these numbers will be returned by this function.
    
    '''
    # Create a random array and flatten it for easier handling.
    rand_seed = np.random.seed(seed)   
    rand_array = np.random.randint(min, max, size=(n_rows,n_cols)).flatten()

    # Create a list of random samples from previous array and aggregate the numbers accordingly.
    rand_samples = []
    for i in range(n):
        sample = random.choice(rand_array)
        rand_samples.append(sample)
        
    # Delete these variable names because naming input arguments/variables after inbuilt functions is causing errors.    
    del max
    del min
    
    answer = eval(fn)(rand_samples)
    return answer



In [None]:
# Use this cell to write the code that uses the placement_numpy function

ans_def = placement_numpy()
print(ans_def)

ans_max = placement_numpy(12,22,3,4,100,fn="max", seed = 2)
print(ans_max)

ans_mean = placement_numpy(12,22,3,4,100,fn="np.mean")
print(ans_mean)

#### b. Python data structure: `list`

1. Write a function called <u>`placement_list`</u> with the following **keyword** arguments:
    1. <u>`left`</u>: a Python `list` object (default: None)
    * <u>`right`</u>: a Python `list` object (default: None)
* The function must ensure that both `left` and `right` are the same size or `left` is one (1) item larger than `right`
* The function must ensure that both `left` and `right` are larger than, or equal to, two (2) items
* Subset `left` by just the <u>odd</u> numbered items within it
* Subset `right` by just the <u>even</u> numbered items within it
* The function must <u>`return`</u> the Cartesian product of both subsets as a `list`

In [None]:
# Use this cell to complete the directions detailed under 'b. Python data structure: list'

def placement_list(left:list = None, right:list = None):
    
    '''
    INPUT:
    left: list of at least 2 elements.
    right: list of at least 2 elements.
    lists must be equal or maximum 1 item can be larger than the other.
    
    RETURN:
    Function returns the Cartesian product of "subset left" (odd numbers of left list) and "subset right" (even numbers of right list) as list.
    There are certain requirements to the input functions:
    
    '''
    if type(left) and type(right) is not list:
        raise TypeError("The input arguments must be lists.")
        
    if len(left) != len(right):#or len(left) is not len(right)+1:
        raise AssertionError("The function must ensure that both lists: (left and right) are the same size or left is one item larger than right.")
        
    if len(left)<2 and len(right)<2:
        raise AssertionError("The function must ensure that both left and right are larger than, or equal to, two items")
        
    sub_left = []
    sub_right = []
    
    for i in range(len(left)):
        if left[i]%2 != 0:
            sub_left.append(left[i])
    
    for i in range(len(right)):
        if right[i]%2 == 0:
            sub_left.append(right[i])
    
    cart_prod = list(product(left,right))     
    return cart_prod

In [31]:
# Use this cell to write the code that uses the placement_list function

a = [1,2,3]
b = [2,6,12]

ans = placement_list(a,b)
print(ans)

NameError: name 'placement_list' is not defined

#### c. Python data structure: `dict`

1. Write a function called <u>`placement_dict`</u> with the following **keyword** arguments:
    1. <u>`seq`</u>: a `list` of numbers ranging from one (1) to sixty-four (64) (inclusive) (default: None)
    * <u>`codon_key`</u>: a `dict` where each codon is associated with a number (default: None)
    * <u>`codon_trans`</u>: a `dict` where each codon is associated with its associated single letter protein designation (default: None)
* The function must use the provided code `codon_key` and `codon_trans`
* The function must reverse `codon_key` such that the number is the key and the codon is the value
* The function must iterate through both dictionaries, creating a dictionary of dictionaries:
    1. The outer key is the number of the codon
    * The outer value is a dictionary:
        1. The inner key is the codon
        * The inner value is the codon's protein translation
    * If a codon is in `codon_key` and not `codon_trans`, skip it
    * If a codon is in `codon_trans` and not `codon_key`, skip it
* The function must then process the `seq`:
    1. It must generate the nucleotide sequence given a list of numbers
    * It must generate the protein translation of the sequence given a list of numbers
* The function must `return` a `tuple` with the following items:
    1. The first item must be a string representation of the nucleotide sequence translation of `seq`
    * The second item must be a string representation of the protein sequence translation of `seq`

In [None]:
# This code is for you to use to complete the directions detailed under 'c. Python data structure: dict' 
nts = 'ACGT'
codon_key = {codon: i for i, codon in enumerate((x+y+z for x in nts for y in nts for z in nts), 1)}

trans_table = {
    'TTT': 'F', 'TTC': 'F', 'TTY': 'F', 'TTA': 'L', 'TTG': 'L', 'TTR': 'L', 'TCT': 'S', 'TCC': 'S',
    'TCA': 'S', 'TCG': 'S', 'TCN': 'S', 'TCY': 'S', 'TCR': 'S', 'TAT': 'Y', 'TAC': 'Y', 'TAY': 'Y',
    'TAA': 'X', 'TAG': 'X', 'TAR': 'X', 'TGT': 'C', 'TGC': 'C', 'TGY': 'C', 'TGA': 'X', 'TGG': 'W',
    'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L', 'CTY': 'L', 'CTR': 'L', 'CTN': 'L', 'YTG': 'L',
    'YTA': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', 'CCY': 'P', 'CCR': 'P', 'CCN': 'P',
    'CAT': 'H', 'CAC': 'H', 'CAY': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CAR': 'Q', 'CGT': 'R', 'CGC': 'R',
    'CGA': 'R', 'CGG': 'R', 'CGY': 'R', 'CGR': 'R', 'CGN': 'R', 'ATT': 'I', 'ATC': 'I', 'ATA': 'I',
    'ATY': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', 'ACY': 'T', 'ACR': 'T',
    'ACN': 'T', 'AAT': 'N', 'AAC': 'N', 'AAY': 'N', 'AAA': 'K', 'AAG': 'K', 'AAR': 'K', 'AGT': 'S',
    'AGC': 'S', 'AGY': 'S', 'AGA': 'R', 'AGG': 'R', 'AGR': 'R', 'GTT': 'V', 'GTC': 'V', 'GTA': 'V',
    'GTG': 'V', 'GTY': 'V', 'GTR': 'V', 'GTN': 'V', 'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
    'GCY': 'A', 'GCR': 'A', 'GCN': 'A', 'GAT': 'D', 'GAC': 'D', 'GAY': 'D', 'GAA': 'E', 'GAG': 'E',
    'GAR': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G', 'GGY': 'G', 'GGR': 'G', 'GGN': 'G'
}

In [None]:
# Use this cell to complete the directions detailed under 'c. Python data structure: dict'

def placement_dict(seq:list=None, codon_key:dict=None, codon_trans:dict=None):
    
    '''
    INPUT:
    codon_key: a dict where each codon is associated with a number (default: None)
    codon_trans: a dict where each codon is associated with its associated single letter protein designation (default: None)
    
    WHAT IS IT DOING:
    1: reverse codon_key dictionary
    2: create a dictionary of dictionaries:
        outer key: # of codon.
        outer value: dictionary
            inner key: codon
            inner value: codon´s protein translation
    3: process the seq given a list of numbers [1,64]
        generate nucleotide sequence
        generate protein translation
    
    RETURN: 
    The function must return a tuple with the following items:
    The first item must be a string representation of the nucleotide sequence translation of seq
    The second item must be a string representation of the protein sequence translation of seq
    
    '''

    # (1) Reverse the input dictionary codon_key so that the codons are now the value and the keys are the numbers.
    codon_value = {num: codon for codon, num in codon_key.items()}
    
    # (2) Create a nested dictionary of format: {1:{AAA:X}, 2:{AAC:Y}, 3:{AAG:Z}}}
    nested_dict = {}
    missing_keys = []
    for key in codon_value:  
        if codon_value[key] not in codon_trans.keys():
            inner_dict = {codon_value[key]:""}
            nested_dict[key] = inner_dict
            missing_keys.append(key)
        else:
            for trans_key in codon_trans.keys(): 
                if codon_value[key] == trans_key:
                    inner_dict = {codon_value[key]:codon_trans[trans_key]}
                    nested_dict[key] = inner_dict

    # (3) Generate Seq tuple using input list.
    nucleotide = ""
    protein = ""
    
    # Check if all numbers in input seq are defined in dictionary.
    try: 
        for num in seq:   
            nucleotide =  nucleotide + list(nested_dict[num].keys())[0]
            protein = protein + list(nested_dict[num].values())[0]
        nucl_prot = (nucleotide, protein)
    except:
        print("Input sequence contains a number that exceeds the number of existing codons with range [1,64]. The number is: ", num)
    else:
        if len(missing_keys) > 0:
            print("The following codon number(s) are missing in our dictionary and were not translated:", missing_keys)
        return nucl_prot

In [None]:
# Use this cell to write the code that uses the placement_dict function

sequence = [1,2,4,6,8,10,12,14,33,64]

ans = placement_dict(sequence, codon_key, codon_trans)

print(ans)

---

# Part 3: Object-Oriented Progamming

## Instructions

A rudimentary `class` has been created for you below called `FASTA`. It represents a canonical-only <u>nucleotide</u> FASTA sequence. As objects are meant to contain and/or group attributes and methods that relate to specific type of object, it is important that you know how read and use these.

Use the `FASTA class` provided below to complete the following instructions:
1. You may not modify any of the pre-existing code. That is, you are allowed to add to, but not take away from it
* All class methods below must be added to the functionality of the `FASTA` object.
    1. Write a class method called `complement`:
        1. This class method takes no arguments
        * This class method must `return` the complement sequence of the `seq` attribute contained within the object
    * Write a class method called `reverse`:
        1. This class method takes no arguments
        * This class method must `return` the reverse sequence of the `seq` attribute contained within the object
    * Write a class method called `rev_comp`:
        1. This class method takes no arguments
        * This class method must `return` the reverse complement sequence of the `seq` attribute contained within the object
    * Write a class method called `is_complete`:
        1. This class method takes the argument `kmer` (default: 3)
        * This class method must `return` whether or not the `seq` attribute can be divided into n `kmer`-length kmers such that there is no remainder
    * Write a class method called `translate`:
        1. This class method takes a `dict` as an argument:
            * The keys of the dictionary are codons
            * The values of the dictionary are the protein translations
        * This class method must translate the `seq` attribute contained within the object into its single-letter protein translation using the `dict` argument for translation:
            * Ignore detection of 'Stop' codons
            * If a codon is observed in `seq` that is not in `dict`, the class method should translate this as 'X' instead
            * Only translate <u>complete</u> codons
        * This class method must `return` the translated sequence

In [32]:
# Use the FASTA class below to complete the instructions detailed above
class FASTA:
    
    __slots__ = ['_header', '_seq']
    
    def __init__(self, sequence = None, header = None):
        self._header = header
        self._seq = sequence
    
    @property
    def header(self):
        if self._header is None:
            return '>'
        else:
            return self._header
    
    @header.setter
    def header(self, header):
        if not header.startswith('>'):
            header = '>' + header
        self._header = header
    
    @property
    def seq(self):
        return self._seq
    
    @seq.setter
    def seq(self, sequence):
        sequence = sequence.upper()
        if all(True if letter in 'ACGT' else False for letter in sequence):
            self._seq = sequence
        else:
            raise ValueError('Sequence contains non-canonical nucleotides')
    
    def __len__(self):
        return len(self.seq)
    
    def __str__(self):
        formatted_seq = '\n'.join([self.seq[i:i+80] for i in range(0,len(g),80)])
        return f"{self.header if self.header else ' '}Sequence Length: {len(self)}\n{formatted_seq}"

    def __repr__(self):
        return f"FASTA(header='{self.header}', sequence='{self.seq}')"
    
###################################################################################################################
   
    # This class method must return the complement sequence of the seq attribute contained within the object
    def complement(self):
        complement_seq = ""
        for base in self._seq:
            if base == "A":
                complement_seq += "T"
            elif base == "T":
                complement_seq += "A"
            elif base == "C":
                complement_seq += "G"
            elif base == "G":
                complement_seq += "C"
            else:
                raise ValueError('Sequence contains non-canonical nucleotides')
                
        return complement_seq
    
    # This class method must return the reverse sequence of the seq attribute contained within the object
    def reverse(self):
        reverse_seq = self._seq[::-1]
        return reverse_seq
    
    #This class method must return the reverse complement sequence of the seq attribute contained within the object
    def rev_comp(self):
        complement_seq = self.complement()
        reverse_comp_seq = complement_seq[::-1]
        return reverse_comp_seq
    
    # This class method must return whether or not the seq attribute can be divided 
    # into n kmer-length kmers such that there is no remainder
    def is_complete(self, kmer = 3):
        if len(self._seq)%3 == 0:
            return True
        else:
            return False
    
    # This class method translates the DNA sequence to a protein sequence according to an input dictionary.
    def translate(self, codon_protein_dict:dict):
        
        if self.is_complete(3):

            codon_triplets = [self._seq[i:i+3] for i in np.arange(0,len(self.seq),3)]           
            protein_trans = ""
            for triplet in codon_triplets:
                
                # Ignore stop codons
                if triplet == "TAA" or triplet == "TAG" or triplet == "TGA":
                    continue
                # If a codon in seq not in dict translate to "X"
                if triplet not in codon_protein_dict:
                    protein_trans += "X"
                    continue
                    
                for codon in codon_protein_dict:         
                    if triplet == codon:
                        protein_trans += codon_protein_dict[codon]
        else:
            print("The DNA sequence is not complete and cannot be translated.")
            
        return protein_trans
        

In [24]:
# Use this cell to implement the FASTA class and show how your newly-written methods work

import numpy as np 

# This is a slightly modified translation dictionary from a previous task. Stop codon values added and "TTT" changed to TTX.
trans_table = {
    'TTX': 'F', 'TTC': 'F', 'TTY': 'F', 'TTA': 'L', 'TTG': 'L', 'TTR': 'L', 'TCT': 'S', 'TCC': 'S',
    'TCA': 'S', 'TCG': 'S', 'TCN': 'S', 'TCY': 'S', 'TCR': 'S', 'TAT': 'Y', 'TAC': 'Y', 'TAY': 'Y',
    'TAA': 'stop', 'TAG': 'stop', 'TAR': 'X', 'TGT': 'C', 'TGC': 'C', 'TGY': 'C', 'TGA': 'stop', 'TGG': 'W',
    'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L', 'CTY': 'L', 'CTR': 'L', 'CTN': 'L', 'YTG': 'L',
    'YTA': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', 'CCY': 'P', 'CCR': 'P', 'CCN': 'P',
    'CAT': 'H', 'CAC': 'H', 'CAY': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CAR': 'Q', 'CGT': 'R', 'CGC': 'R',
    'CGA': 'R', 'CGG': 'R', 'CGY': 'R', 'CGR': 'R', 'CGN': 'R', 'ATT': 'I', 'ATC': 'I', 'ATA': 'I',
    'ATY': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', 'ACY': 'T', 'ACR': 'T',
    'ACN': 'T', 'AAT': 'N', 'AAC': 'N', 'AAY': 'N', 'AAA': 'K', 'AAG': 'K', 'AAR': 'K', 'AGT': 'S',
    'AGC': 'S', 'AGY': 'S', 'AGA': 'R', 'AGG': 'R', 'AGR': 'R', 'GTT': 'V', 'GTC': 'V', 'GTA': 'V',
    'GTG': 'V', 'GTY': 'V', 'GTR': 'V', 'GTN': 'V', 'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
    'GCY': 'A', 'GCR': 'A', 'GCN': 'A', 'GAT': 'D', 'GAC': 'D', 'GAY': 'D', 'GAA': 'E', 'GAG': 'E',
    'GAR': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G', 'GGY': 'G', 'GGR': 'G', 'GGN': 'G', 
}

# Testing with two DNA sequences. One is erroneous and the other is valid. Expected results: 10 Proteins (TRSISKPGL_X)
# error_sq = "ACTAGGTCCATTTCCAAACCCGGTTTGTAGXYZ" 

# exp_one = FASTA(error_sq)
# input_seq = exp_one.seq
# a = exp_one.complement()
# b = exp_one.reverse()
# c = exp_one.rev_comp()
# d = exp_one.is_complete()
# e = exp_one.translate(trans_table)

# print("This is the input sequence:", input_seq)
# print("This is the complement of the input sequence:", a)
# print("This is the reverse of the input sequence:", b)
# print("This is the reverse + complement of the input sequence:",c)
# print("The input sequence is complete. T/F?",d)
# print("This is the Protein Translation of the input sequence:",e)


good_sq = "ACTAGGTCCATTTCCAAACCCGGTTTGTAGTTT" 

exp_two = FASTA(good_sq)
input_seq = exp_two.seq
a = exp_two.complement()
b = exp_two.reverse()
c = exp_two.rev_comp()
d = exp_two.is_complete()
e = exp_two.translate(trans_table)

print("This is the input sequence:", input_seq)
print("This is the complement of the input sequence:", a)
print("This is the reverse of the input sequence:", b)
print("This is the reverse + complement of the input sequence:",c)
print("The input sequence is complete. T/F?",d)
print("This is the Protein Translation of the input sequence:",e)


This is the input sequence: ACTAGGTCCATTTCCAAACCCGGTTTGTAGTTT
This is the complement of the input sequence: TGATCCAGGTAAAGGTTTGGGCCAAACATCAAA
This is the reverse of the input sequence: TTTGATGTTTGGCCCAAACCTTTACCTGGATCA
This is the reverse + complement of the input sequence: AAACTACAAACCGGGTTTGGAAATGGACCTAGT
The input sequence is complete. T/F? True
This is the Protein Translation of the input sequence: TRSISKPGLX


---

# Assessment Rubric

1. [ ] Jupyter notebooks
* [ ] git
* [ ] I/O operations
* [ ] String operations
* [ ] Numpy arrays and simple Python data structures
* [ ] Random selection techniques
* [ ] Indexing, slicing, and subsetting data
* [ ] Object-Oriented Programming
* [ ] Coding by Contract
* [ ] Iteration methods
* [ ] Python control structures