# `lab03`—DNA & RNA Sequencing

**Objectives**

-   Write a simple function to implement a mathematical formula.
-   Use functions to modularize code.
-   Explain how variable scope impacts what the program "sees".
-   Understand the difference between _returning_ a value and _printing_ a value.
-   Use default values in functions.

This lab will introduce some basic algorithms for parsing and processing DNA strands once they have been sequenced into a computer format.

With the material from this lab in your toolkit, you will be prepared to start working on the course project with your team.  Watch for the first milestone assignment.

_Some of the exercises in this lab were inspired by [Rosalind](https://rosalind.info/)._

##  DNA Elements

A DNA sequence is composed of adenine (`'A'`), guanine (`'G'`), cytosine (`'C'`), and thymine (`'T'`) nucleobases.

![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/DNA_chemical_structure.svg/411px-DNA_chemical_structure.svg.png)

During the process of gene expression, RNA reads off each nucleobase and stores it as its complement.  Thus an RNA sequence is a string containing uracil (`'U'`), cytosine (`'C'`), guanine (`'G'`), and adenine (`'A'`) [bases](https://en.wikipedia.org/wiki/RNA#Types_of_RNA).  (Note that `U` pairs with `A` as RNA does not contain thymine `T`.)

| Symbol | Name     | Complementary Base |
|--------|----------|--------------------|
| A  | adenine  | T (DNA); U (RNA)   |
| C  | cytosine | G                  |
| G  | guanine  | C                  |
| T  | thymine  | A                  |
| U  | uracil   | A                  |

Today's multi-part problem will lead you through the basic elements of DNA sequence data through transcription into RNA and then translating RNA sequences into codons.

![](https://oerpub.github.io/epubjs-demo-book/resources/0324_DNA_Translation_and_Codons.jpg)

### Parsing DNA

### <span style="color:#345995">Exercise 1: Counting DNA Nucleotides</span>

Compose a function `dna_count` which accepts a DNA string `dna` and returns a `list` or `tuple` of four integers representing the number of times that the symbols `A`, `C`, `G`, and `T` occur in `dna`.

In [1]:
#grade

def dna_count(dna):
    dna = dna.upper()
    count_A = dna.count('A')
    count_C = dna.count('C')
    count_G = dna.count('G')
    count_T = dna.count('T')
    return count_A,count_C,count_G,count_T

In [2]:
# Here are some DNA strings you can test your function with.
dna0 = 'TGCA'
dna0_answer = (1,1,1,1)
dna1 = 'TTTGTCTAGTGGGCGACTCGCCCAATAGACAACGGTTT'
dna1_answer = (8,9,10,11)
dna2 = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'
dna2_answer = (20,12,17,21)

assert dna_count(dna0) == dna0_answer,"Test case dna0 failed"
print ("Success for dna0!")
assert dna_count(dna1) == dna1_answer,"Test case dna1 failed"
print ("Success for dna1!")
assert dna_count(dna2) == dna2_answer,"Test case dna0 failed"
print ("Success for dna2!")

Success for dna0!
Success for dna1!
Success for dna2!


<div class="alert alert-danger">
Check in with your team and TA to make sure everyone understands concepts up through this point.
</div>

### Complementing DNA

Once we can parse basic DNA strings, our next step is to transcribe DNA into RNA.  We are going to write two functions, which will use an accumulator pattern with a loop and some comparison logic to convert from DNA to RNA and backwards.

### <span style="color:#345995">Exercise 2: Transcribing DNA to RNA</span>

Compose a function `dna2rna` which accepts a DNA string `dna` and returns a string `rna` containing the RNA strand corresponding to its DNA input.  That is, the input `'ACGT'` should return `'UGCA'`.  The function should convert any input into upper-case.

In [7]:
#grade

def dna2rna(dna):
    rna = ''
    for symbol in dna:
        if symbol == 'A':
            rna = rna + 'U'
        elif symbol == 'T':
            rna += 'A'
        elif symbol == 'G':
            rna += 'C'
        elif symbol == 'C':
            rna += 'G'
    return rna

In [8]:
# Here are some DNA strings you can test your function with.
dna0 = 'TGCA'
rna0 = 'ACGU'
dna1 = 'TTTGTCTAGTGGGCGACTCGCCCAATAGACAACGGTTT'
rna1 = 'AAACAGAUCACCCGCUGAGCGGGUUAUCUGUUGCCAAA'
dna2 = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'
rna2 = 'UCGAAAAGUAAGACUGACGUUGCCCGUUAUACAGAGACACACCUAAUUUUUUUCUCACAGACUAUCGUCG'

assert dna2rna(dna0) == rna0,"Test case dna0 failed"
print ("Success for dna0!")
assert dna2rna(dna1) == rna1,"Test case dna1 failed"
print ("Success for dna1!")
assert dna2rna(dna2) == rna2,"Test case dna2 failed"
print ("Success for dna2!")

Success for dna0!
Success for dna1!
Success for dna2!


### <span style="color:#345995">Exercise 3: Transcribing RNA to DNA</span>

Now turn things around.

Compose a function `rna2dna` which accepts an RNA string `rna` and returns a string `dna` containing the DNA strand corresponding to its RNA input.  That is, the input `'ACGU'` should return `'TGCA'`.  The function should convert any input into upper-case.

In [9]:
#grade

def rna2dna(rna):
    dna = ''
    for symbol in rna:
        if symbol == 'A':
            dna += 'T'
        elif symbol == 'C':
            dna += 'G'
        elif symbol == 'G':
            dna += 'C'
        elif symbol == 'U':
            dna += 'A'
    return dna

In [10]:
# Here are some RNA strings you can test your function with.
rna0 = 'ACGU'
dna0 = 'TGCA'
rna1 = 'AAACAGAUCACCCGCUGAGCGGGUUAUCUGUUGCCAAA'
dna1 = 'TTTGTCTAGTGGGCGACTCGCCCAATAGACAACGGTTT'
rna2 = 'UCGAAAAGUAAGACUGACGUUGCCCGUUAUACAGAGACACACCUAAUUUUUUUCUCACAGACUAUCGUCG'
dna2 = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'

assert rna2dna(rna0) == dna0,"Test case rna0 failed"
print ("Success for rna0!")
assert rna2dna(rna1) == dna1,"Test case rna1 failed"
print ("Success for rna1!")
assert rna2dna(rna2) == dna2,"Test case rna2 failed"
print ("Success for rna2!")

Success for rna0!
Success for rna1!
Success for rna2!


<div class="alert alert-danger">
Check in with your team and TA to make sure everyone understands concepts up through this point.
</div>

### Mapping RNA to Amino Acids (Codons)

At this point, you have two functions which can convert DNA and RNA from one representation to the other.  Next, we require the ability to translate an RNA string into a codon.

One of the major functions of RNA in the body is as “messenger RNA”, which contains groups of three-letter *codons* mapping to amino acids expressed in the cell.  Thus if we find `CUU CAG` in mRNA, we anticipate that the cell will create leucine and glutamine, written `LQ`:

    'CUUCAG' → 'LQ'

or, in terms of our program, we could write

    rna2amino( 'CUU' )

which yields

    'L'
    
and so forth.

The full table of codons follows.

<table class="wikitable">
    <h4>Standard genetic code<sup><a href="https://en.wikipedia.org/wiki/Genetic_code#RNA_codon_table">RNA codon table</a></sup></h4>
<tr>
<th rowspan="2">1st<br />
base</th>
<th colspan="8">2nd base</th>
<th rowspan="2">3rd<br />
base</th>
</tr>
<tr>
<th colspan="2">U</th>
<th colspan="2">C</th>
<th colspan="2">A</th>
<th colspan="2">G</th>
</tr>
<tr>
<th rowspan="4">U</th>
<td>UUU</td>
<td rowspan="2" style="background-color:#ffe75f">(Phe/F) Phenylalanine</td>
<td>UCU</td>
<td rowspan="4" style="background-color:#b3dec0">(Ser/S) Serine</td>
<td>UAU</td>
<td rowspan="2" style="background-color:#b3dec0">(Tyr/Y) Tyrosine</td>
<td>UGU</td>
<td rowspan="2" style="background-color:#b3dec0">(Cys/C) Cysteine</td>
<th>U</th>
</tr>
<tr>
<td>UUC</td>
<td>UCC</td>
<td>UAC</td>
<td>UGC</td>
<th>C</th>
</tr>
<tr>
<td>UUA</td>
<td rowspan="6" style="background-color:#ffe75f">(Leu/L) Leucine</td>
<td>UCA</td>
<td>UAA</td>
<td style="background-color:#B0B0B0;">Stop (<i>Ochre</i>)</td>
<td>UGA</td>
<td style="background-color:#B0B0B0;">Stop (<i>Opal</i>)</td>
<th>A</th>
</tr>
<tr>
<td>UUG</td>
<td>UCG</td>
<td>UAG</td>
<td style="background-color:#B0B0B0;">Stop (<i>Amber</i>)</td>
<td>UGG</td>
<td style="background-color:#ffe75f;">(Trp/W) Tryptophan&#160;&#160;&#160;&#160;</td>
<th>G</th>
</tr>
<tr>
<th rowspan="4">C</th>
<td>CUU</td>
<td>CCU</td>
<td rowspan="4" style="background-color:#ffe75f">(Pro/P) Proline</td>
<td>CAU</td>
<td rowspan="2" style="background-color:#bbbfe0">(His/H) Histidine</td>
<td>CGU</td>
<td rowspan="4" style="background-color:#bbbfe0">(Arg/R) Arginine</td>
<th>U</th>
</tr>
<tr>
<td>CUC</td>
<td>CCC</td>
<td>CAC</td>
<td>CGC</td>
<th>C</th>
</tr>
<tr>
<td>CUA</td>
<td>CCA</td>
<td>CAA</td>
<td rowspan="2" style="background-color:#b3dec0">(Gln/Q) Glutamine</td>
<td>CGA</td>
<th>A</th>
</tr>
<tr>
<td>CUG</td>
<td>CCG</td>
<td>CAG</td>
<td>CGG</td>
<th>G</th>
</tr>
<tr>
<th rowspan="4">A</th>
<td>AUU</td>
<td rowspan="3" style="background-color:#ffe75f">(Ile/I) Isoleucine</td>
<td>ACU</td>
<td rowspan="4" style="background-color:#b3dec0">(Thr/T) Threonine&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;</td>
<td>AAU</td>
<td rowspan="2" style="background-color:#b3dec0">(Asn/N) Asparagine</td>
<td>AGU</td>
<td rowspan="2" style="background-color:#b3dec0">(Ser/S) Serine</td>
<th>U</th>
</tr>
<tr>
<td>AUC</td>
<td>ACC</td>
<td>AAC</td>
<td>AGC</td>
<th>C</th>
</tr>
<tr>
<td>AUA</td>
<td>ACA</td>
<td>AAA</td>
<td rowspan="2" style="background-color:#bbbfe0">(Lys/K) Lysine</td>
<td>AGA</td>
<td rowspan="2" style="background-color:#bbbfe0">(Arg/R) Arginine</td>
<th>A</th>
</tr>
<tr>
<td>AUG<sup class="reference" id="ref_methionineA">[A]</sup></td>
<td style="background-color:#ffe75f;">(Met/M) Methionine</td>
<td>ACG</td>
<td>AAG</td>
<td>AGG</td>
<th>G</th>
</tr>
<tr>
<th rowspan="4">G</th>
<td>GUU</td>
<td rowspan="4" style="background-color:#ffe75f">(Val/V) Valine</td>
<td>GCU</td>
<td rowspan="4" style="background-color:#ffe75f">(Ala/A) Alanine</td>
<td>GAU</td>
<td rowspan="2" style="background-color:#f8b7d3">(Asp/D) Aspartic acid</td>
<td>GGU</td>
<td rowspan="4" style="background-color:#ffe75f">(Gly/G) Glycine</td>
<th>U</th>
</tr>
<tr>
<td>GUC</td>
<td>GCC</td>
<td>GAC</td>
<td>GGC</td>
<th>C</th>
</tr>
<tr>
<td>GUA</td>
<td>GCA</td>
<td>GAA</td>
<td rowspan="2" style="background-color:#f8b7d3">(Glu/E) Glutamic acid</td>
<td>GGA</td>
<th>A</th>
</tr>
<tr>
<td>GUG</td>
<td>GCG</td>
<td>GAG</td>
<td>GGG</td>
<th>G</th>
</tr>
</table>

### <span style="color:#345995">Exercise 4: Mapping a Triplet to a Codon</span>

Compose a function `rna2codon` which accepts a three-letter codon `triplet` and returns a string `amino` representing the corresponding amino acid per the table above.  That is, the input `'GAU'` should return `'D'`.  The function should convert any input into upper-case and should check that the codon is valid. If codon in invalid the function should return "Invalid" string.

We provide a dictionary `genetic_code` which you may use in composing your function.

In [38]:
#grade

def rna2codon(triplet):
    genetic_code = {
        'UUU': 'F', 'UUC': 'F', 'UUA': 'L', 'UUG': 'L',        'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L',
        'AUU': 'I', 'AUC': 'I', 'AUA': 'I', 'AUG': 'M',        'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V',

        'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S',        'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
        'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',        'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',

        'UAU': 'Y', 'UAC': 'Y', 'UAA': '*', 'UAG': '*',        'CAU': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
        'AAU': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',        'GAU': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',

        'UGU': 'C', 'UGC': 'C', 'UGA': '*', 'UGG': 'W',        'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
        'AGU': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',        'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G',
    }
    allowed_codons = set('ACGU')

    if triplet in genetic_code:
        return genetic_code[triplet]
    else:
        return "Invalid"

In [39]:
# Here are some RNA strings you can test your function with.
rna0 = 'UUU'
rna0_codon = 'F'
rna1 = 'AAC'
rna1_codon = 'N'
rna2 = 'UAA'
rna2_codon = '*'
rna3 = 'KIL'
rna3_codon = "Invalid"

assert rna2codon(rna0) == rna0_codon,"Test case rna0 failed"
print ("Success for rna0!")
assert rna2codon(rna1) == rna1_codon,"Test case rna1 failed"
print ("Success for rna1!")
assert rna2codon(rna2) == rna2_codon,"Test case rna2 failed"
print ("Success for rna2!")
assert rna2codon(rna3) == rna3_codon,"Test case rna3 failed"
print ("Success for rna3!")

Success for rna0!
Success for rna1!
Success for rna2!
Success for rna3!


<div class="alert alert-danger">
Check in with your team and TA to make sure everyone understands concepts up through this point.
</div>

### <span style="color:#345995">Exercise 5: Mapping a String of Triplets to Codons</span>

Compose a function `rna2codons` which accepts a string of three-letter codons `triplets` and returns a string `amino` representing the set of corresponding amino acids per the table above.  That is, the input `'GAUUAUUCC'` should return `'DYS'`.  The function should convert any input into upper-case and should check that each codon is valid.

The tricky part is figuring out how to get a string chopped into three-letter chunks.  (This is harder than it seems at first.)  There are many ways you can think of to do this.  One possibility:

In [13]:
example_string = 'abcdefghijklmnopqrstuvwxyz'
for i in range( 0,int( len( example_string ) / 3 ) ):
    print( example_string[ 3*i:3*i+3 ] )

abc
def
ghi
jkl
mno
pqr
stu
vwx


In [49]:
#grade

def rna2codons(triplets):
    result = ""
    for i in range( 0,int( len( triplets ) / 3 ) ):
        a = triplets[3*i:3*i+3].upper()
        result = result + rna2codon(a)
    return result

In [50]:
# Here are some RNA strings you can test your function with.
rna0 = 'UUUAGC'
rna0_codon = 'FS'
rna1 = 'AACUGGAGG'
rna1_codon = 'NWR'
rna2 = 'GAGCAAAGUUAA'
rna2_codon = 'EQS*'

assert rna2codons(rna0) == rna0_codon,"Test case rna0 failed"
print ("Success for rna0!")
assert rna2codons(rna1) == rna1_codon,"Test case rna1 failed"
print ("Success for rna1!")
assert rna2codons(rna2) == rna2_codon,"Test case rna2 failed"
print ("Success for rna2!")

Success for rna0!
Success for rna1!
Success for rna2!


<div class="alert alert-danger">
Check in with your team and TA to make sure everyone understands concepts up through this point.
</div>

### <span style="color:#345995">Exercise 6: Put the Pipeline Together</span>

Finally, we are interested in taking a string of DNA sequence data, transcribing its RNA complement, and translating the resulting RNA to amino acids.  This requires that you:

1.  Convert the string from DNA to RNA.  (Which function does this?)
2.  Convert the RNA string to its corresponding protein expression string.  (Which function does this?)
3.  Return the resulting string.

Compose a function `dna2codons` which accepts a string `dna` and returns a string `codons` representing the sequence of corresponding amino acids per the table above.  The function should convert any input into upper-case and should check that the codon is valid.

In [34]:
#grade
def dna2codons(dnaString):
    codons = ''
    
    # Convert the string from DNA to RNA. (Which function does this?)
    def dna2rna(dna):
        rna = ''
        for symbol in dna:
            if symbol == 'A':
                rna = rna + 'U'
            elif symbol == 'T':
                rna += 'A'
            elif symbol == 'G':
                rna += 'C'
            elif symbol == 'C':
                rna += 'G'
        return rna
    
    # Convert the RNA string to its corresponding protein expression string. (Which function does this?)
    def rna2codons(triplets):
        genetic_code = {
        'UUU': 'F', 'UUC': 'F', 'UUA': 'L', 'UUG': 'L',        'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L',
        'AUU': 'I', 'AUC': 'I', 'AUA': 'I', 'AUG': 'M',        'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V',

        'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S',        'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
        'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',        'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',

        'UAU': 'Y', 'UAC': 'Y', 'UAA': '*', 'UAG': '*',        'CAU': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
        'AAU': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',        'GAU': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',

        'UGU': 'C', 'UGC': 'C', 'UGA': '*', 'UGG': 'W',        'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
        'AGU': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',        'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G',
    }
        tripletsList = []
        newCodon =''
    
        for i in range(0,int(len(triplets) / 3)):
            tripletsList.append(triplets[3*i:3*i+3].upper())
    
        for i in range (len(tripletsList)):
            if tripletsList[i] in genetic_code:
                newCodon += genetic_code[tripletsList[i]]
            else:
                return "Invlid"
        return newCodon
    
    # Return the resulting string.
    return rna2codons(dna2rna(dnaString))

In [35]:
# Here are some RNA strings you can test your function with.
rna0 = 'TTTGTCTAGTGGGCGACTCGCCCAATAGACAACGGTTT'
rna0_codon = 'KQITR*AGYLLP'
rna1 = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'
rna1_codon = 'SKSKTDVARYTETHLIFFSQTIV'
rna2 = 'TGCA'
rna2_codon = 'T'

assert dna2codons(rna0) == rna0_codon,"Test case rna0 failed"
print ("Success for rna0!")
assert dna2codons(rna1) == rna1_codon,"Test case rna1 failed"
print ("Success for rna1!")
assert dna2codons(rna2) == rna2_codon,"Test case rna2 failed"
print ("Success for rna2!")

Success for rna0!
Success for rna1!
Success for rna2!


<div class="alert alert-danger">
Check in with your team and TA to make sure everyone understands concepts up through this point.
</div>

##  The Project Narrative (A Teaser)

The woolly mammoth _Mammuthus primigenius_ reined over the [steppes](https://en.wikipedia.org/wiki/Mammoth_steppe) of the Northern Hemisphere for hundreds of thousands of years until their final extinction [around 2000 B.C.](https://en.wikipedia.org/wiki/Wrangel_Island#First_human_settlements_and_the_extinction_of_the_woolly_mammoth)  The shock of the end of the megafauna has been speculated to be the inflection point leading to the development of agriculture and civilization.

![Rouffignac Cave possesses striking cave art images of the woolly mammoth.](https://www.bradshawfoundation.com/sn/rouffignac3.jpg)

- [A Kabil, "Could Reviving the Woolly Mammoth Help Solve Climate Change?" (The Long Now Foundation)](https://blog.longnow.org/02017/03/28/reviving-woolly-mammoth-solve-climate-change/)
- ["Restoring the Ice Age Mammoth Steppe to Beat Climate Change" (Palladium Podcast)](https://palladiummag.com/2020/03/02/palladium-podcast-27-restoring-the-ice-age-mammoth-steppe-to-beat-climate-change/)
- [C Ciaccia, "Woolly mammoth cells brought back to life in shocking scientific achievement"](https://www.foxnews.com/science/woolly-mammoth-cells-brought-back-to-life-in-shocking-scientific-achievement)
- [N R Longrich, "How the extinction of ice age mammals may have forced us to invent civilisation"](https://theconversation.com/how-the-extinction-of-ice-age-mammals-may-have-forced-us-to-invent-civilisation-128799)

Since some estimates hold there to be "10 million mammoths … still frozen in Siberia," and [DNA has a half-life of around 500 years](https://www.nature.com/news/dna-has-a-521-year-half-life-1.11555), it is plausible that sufficient genetic material survives to allow the mammoth to be revived.

![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/35/%D0%9E%D0%B7%D0%B5%D1%80%D0%BE_%D0%94%D1%83%D1%81-%D0%A5%D0%BE%D0%BB%D1%8C_%D0%B2%D0%B5%D1%87%D0%B5%D1%80%D0%BE%D0%BC._%D0%A2%D0%B5%D1%81-%D0%A5%D0%B5%D0%BC%D1%81%D0%BA%D0%B8%D0%B9_%D0%BA%D0%BE%D0%B6%D1%83%D1%83%D0%BD.jpg/1024px-%D0%9E%D0%B7%D0%B5%D1%80%D0%BE_%D0%94%D1%83%D1%81-%D0%A5%D0%BE%D0%BB%D1%8C_%D0%B2%D0%B5%D1%87%D0%B5%D1%80%D0%BE%D0%BC._%D0%A2%D0%B5%D1%81-%D0%A5%D0%B5%D0%BC%D1%81%D0%BA%D0%B8%D0%B9_%D0%BA%D0%BE%D0%B6%D1%83%D1%83%D0%BD.jpg)

As a scientist working for PaleoGen, you have been tasked with sequencing and processing mammoth DNA data after these have been extracted and sequenced into computer format.  You need to prepare a number of standard tools for parsing and processing DNA data, after which point you will work on reconstructing the mammoth genome.

- ["Revival of the woolly mammoth" (Wikipedia)](https://en.wikipedia.org/wiki/Revival_of_the_woolly_mammoth)
- [E Palkopoulou et al., "Complete Genomes Reveal Signatures of Demographic and Genetic Declines in the Woolly Mammoth"](https://www.cell.com/current-biology/fulltext/S0960-9822(15)00420-0)
- [N Rohland et al., "Genomic DNA Sequences from Mastodon and Woolly Mammoth Reveal Deep Speciation of Forest and Savanna Elephants"](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1000564)
- [The Mammoth Genome Project (Penn State)](http://mammoth.psu.edu/)
- [D H Mann et al., "Ice-age megafauna in Arctic Alaska: extinction, invasion, survival"](https://www.sciencedirect.com/science/article/abs/pii/S0277379113001200)

The project will consist of several milestones and a final report.  These will be made available to you on the main course page.