# Molecular Bio Intro
The nucleus is considered the hub of cellular activity. **Chromatin** are macromolecules that fill it and condenses into chromosomes during mitosis. One class of these macromolecules is **nucleic acids**.\
Nucleic acids are **polymers**, repeating chains of smaller similarly structured molecules or **monomers**. They are called strands bc they are long and thin.\
A nucleic acid monomer is a **nucleotide**. These are used as a unit of strand length (nt). \
The structure of a nucleotide has three parts: a sugar molecule, a negatively charged ion (**phosphate**), and a compound called **nucleobase** (base).\
**Polymerization** is when the sugar of one nucleotide bonds to the phosphate of the next nucleotide in the chain. This forms a sugar-phosphate backbone for the nucleic acid strand.\
Nucleotides of a specific type of nucleic acid always contain the same sugar and phosphate molecules, they differ in their bases. This means strands can be told apart by the order of their bases (The primary structure)\
So, in DNA the bases are adenine(A), cytosine(C), guanine(G), and thymine(T). And a genome is the total sum of DNA in an organism's chromosomes.

## Problem 1
Given a string of length (s <= 1000) containing A,C,G,T return four integers representing the number of times each symbol(ACTG) occurs in the string.

In [1]:
# O(N) Solution that looks at each symbol up to four times
def nuc_count(sample):
    a=0
    c=0
    t=0
    g=0
    for nucleotide in sample:
        if nucleotide == 'A':
            a += 1
        if nucleotide == 'C':
            c += 1
        if nucleotide == 'T':
            t += 1
        if nucleotide == 'G':
            g += 1
    print(f'{a} {c} {g} {t}')
    return 

# More elegant python solution, and since count() is a low-level C method it's fast
def fast_count(s):
    return s.count("A"), s.count("C"), s.count("T"), s.count("G")

# Even less code but maybe less readable
def short_count(s):
    print(*map(s.count, "ACGT"))

sample = 'GACATGTAACTTCTAGGATTCGAAGCGGCCTAAGGACGTGTTGACGCGGGTGTTATTATAATGTAGTCTGGGTGATTCAAGAAACCGCAGTATTACTGGTAACGACGCACCCGTAAAGGTCGGCTATTTTTGACTGCCCTATCGCATCAGCGCCCTGCAGACTGTAGGGATAGCTCAGGCGACGCAGGTTCTTTAGTTGCTTCGGTGGCGTGAGGGAGGTGCTAGAGGTGGTCCGCCTTTCAATGTATTGTACGCCTAGCAGCGTTATCCAACGGGACGATGTAAAGATGAATAAACCCGTGAATCTATTTGTAGATGGCATTCGACCCGAACATGGTTAACTGAGTATATAGACTCATACTCCAGGTTAGCGAAATTTTCAACGGCTGCGGCCATTCCAAGGGCTAAGTCCCGTCCGGAGTTCGTGCACTTAAGGGGCTATGCAGGGCTATATCTGATTGTGATATACATTTGTTTGGGCAGCTGAAGTTAAGCGCTATTATGCGAATCCAGCAAATGGCGTTCATTTAGTCGGCCAGGTGAAGATACTTAAGGCAGCGACCACGTCTCCCCATGGGGGAGATGGGGGAAACACGGTTACACTTTAGTACGTCGGCATCTTCTGGCACCTAAAGCTAAGAAGTTGTGTGCTTTAGACCGATGGCTGCAAACTCACGTGGCTTGGTCCGAGTAAAACCCATTACGTGGAGAACCGTCGTTGGTGTACCTCCAGCCCGGCTGGCCGGGGTAGAACTTCCTCCGGGTACAGCGGTCATACCGTTCCGGCAAAGCTTGGGGGCTAGCGAAAAGCAACTGGCTAGCAGCTGTATCTTTTTACATCTCCCTCCATCGCGCAGGGCATCATAGATCAATCAGTTCGTATAGATTCAGATTTAGGATACGACGAGCTATTGCTCCCCTGGGTACAGCAAC'
nuc_count(sample)
print(fast_count(sample))
short_count(sample)


225 214 256 238
(225, 214, 238, 256)
225 214 256 238


# The other Nucleic Acid
In the chromatin, **RNA** (ribose nucleic acid) has a different sugar (ribose vs deoxyribose) and it has a base called **uracil** in place of thymine.\
Initially it was thought RNA was only in plants and DNA in animals, but both are present in all life.\
The DNA/RNA primary structure is so similar bc DNA serves as a blueprint for **messenger RNA** (mRNA), which is created through **RNA transcription** (T -> U)\

## Problem 2
Given a DNA string of length < 1000nt with A,C,T,G convert to a transcribed RNA string of A,C,G,U

In [22]:
# Basic Solution
def rna_transcribe(dna):
    rna=''
    for nuc in dna:
        if nuc == 'T':
            nuc = 'U'
        rna += nuc
    return rna

# Pythonic - Use replace()
def quick_trans(dna):
    rna = dna.replace('T', 'U')
    return rna

# Test with sample data
def test(sample, expected):
    result = rna_transcribe(sample)
    correct =  (result == expected)
    print(result)
    if correct:
        print('Correct! Passed test!')
sample = 'GATGGAACTTGACTACGTAAATT'
expected = 'GAUGGAACUUGACUACGUAAAUU'
test(sample, expected)

sample2 = 'TGATCACACAGGCATCACACAGTATACATCCCTAATTAGAGGGCCATTCCACTGCCCAGTTGAAGACCTTTTTGCGCGCCAAGAGGAAAGCTGCGCACAAGCCGCGATACACAACATACATACGTACAGGGGTTGAGAAAGCGACGGACGTGGGCCACCGTGTCGTGGGACCAGTCGAACTGATCAATGATGTTATCCTCCTACATTGTATCATCGAGACGCATAACCGGTGCCAGGGGAAGAAAATAGTATGGTGAATGAACGCCGACGGAGGTGTCACTTGAGGATAACAGGTTGACTGGTTTATCTTGATATTTCCTACGGTATGCTCGGTAACATTCTCATCAGACGCTGTAGGCAGTGGTCGTAATCCAGCCTCGTGAAGCATTCAAGCTAATCCATTCCACCCGCCTCTGGCGGAGGGATCAGTTAATTATCGTGTCGTCTCGCCATGAGGAATGGCTAGGACTCAATGCAGTACGAAGTCCACGAAGTAAATAAACATCGGCACCGGATGAGACGTCGCATGCAATGCCGAAATGGGCATACGCCGAGGTAAGATCTGAGTTGGTATAACGGATGCCGGCCATCCCATACATCCATCAGAAAGACGTCAGGCCGGCAGGTGACACGTTGGGGGAGGGGGGGCCCGGCATGAATTGTCGTAGAGGACTTTGACCACAGGATTGGTCTAACGGCTTTGAACTGAACTCGTGTTGACATCCTCTCCTGACATAATTGTTTAGATCGTATCCTCGTCACTCACGATACTATGAAATCTCACTTAGCAACACGTCCGCCCCTTGGACATACCCACCGGGCTCCTAGAACGGGTCTTGGTTTTGCCCGCCAGTACACCGTCGTAAGCTGGATCGCCAATTCACCGGGTCCTTCTTAAACCTTTTAGCGCACGTCCGTGGGCCTAT'
print("\n"+rna_transcribe(sample2))
print("\n"+quick_trans(sample2))

GAUGGAACUUGACUACGUAAAUU
Correct! Passed test!

UGAUCACACAGGCAUCACACAGUAUACAUCCCUAAUUAGAGGGCCAUUCCACUGCCCAGUUGAAGACCUUUUUGCGCGCCAAGAGGAAAGCUGCGCACAAGCCGCGAUACACAACAUACAUACGUACAGGGGUUGAGAAAGCGACGGACGUGGGCCACCGUGUCGUGGGACCAGUCGAACUGAUCAAUGAUGUUAUCCUCCUACAUUGUAUCAUCGAGACGCAUAACCGGUGCCAGGGGAAGAAAAUAGUAUGGUGAAUGAACGCCGACGGAGGUGUCACUUGAGGAUAACAGGUUGACUGGUUUAUCUUGAUAUUUCCUACGGUAUGCUCGGUAACAUUCUCAUCAGACGCUGUAGGCAGUGGUCGUAAUCCAGCCUCGUGAAGCAUUCAAGCUAAUCCAUUCCACCCGCCUCUGGCGGAGGGAUCAGUUAAUUAUCGUGUCGUCUCGCCAUGAGGAAUGGCUAGGACUCAAUGCAGUACGAAGUCCACGAAGUAAAUAAACAUCGGCACCGGAUGAGACGUCGCAUGCAAUGCCGAAAUGGGCAUACGCCGAGGUAAGAUCUGAGUUGGUAUAACGGAUGCCGGCCAUCCCAUACAUCCAUCAGAAAGACGUCAGGCCGGCAGGUGACACGUUGGGGGAGGGGGGGCCCGGCAUGAAUUGUCGUAGAGGACUUUGACCACAGGAUUGGUCUAACGGCUUUGAACUGAACUCGUGUUGACAUCCUCUCCUGACAUAAUUGUUUAGAUCGUAUCCUCGUCACUCACGAUACUAUGAAAUCUCACUUAGCAACACGUCCGCCCCUUGGACAUACCCACCGGGCUCCUAGAACGGGUCUUGGUUUUGCCCGCCAGUACACCGUCGUAAGCUGGAUCGCCAAUUCACCGGGUCCUUCUUAAACCUUUUAGCGCACGUCCGUGGGCCUAU

UGAUCACACAGGCAUCACACAGUAU

# The Secondary and Tertiary Structures of DNA
Primary structure tells you nothing about the 3D structure of DNA. The Watson & Crick (and Rosalind Franklin, and Raymond Gosling) paper from 1953 presented a structure:
1. The DNA molecule is made of 2 strands running in opposite directions
2. Each base bonds to a base in the opposite strand. Adenine to thymine, cytosine to guanine. The complement of each base is the base it always bonds to
3. The two strands are twisted in a double helix.

1, and 2 compose the **secondary structure** of DNA, and 3 the **tertiary structure**.
The bonding of two complementary bases is called a **base pair**. Usually DNA length is represented by base pairs (bp) not nt.\
Because of the complementary nature, once you know one strand's bases you can deduce the other.\
The bases of each strand run in opposite order to match up.

## Problem 3
Complement of A is T, complement of C is G.\
Given a DNA string s of length <= 1000bp return the reverse complement

In [41]:
def rev_com(strand):
    strand = strand[::-1] # extended slice with negative step
    result = ''
    for c in strand:
        if c == 'A':
            c = 'T'
        elif c == 'T':
            c = 'A'
        elif c == 'C':
            c = 'G'
        elif c == 'G':
            c = 'C'
        result += c
    return result

# Clever workaround to use replace()
def rev_com_workaround(strand):
    # The upper() call lets you do all replaces in one go without compromising the others
    result = strand.replace('A','t').replace('T','a').replace('C','g').replace('G','c').upper()[::-1]
    return result

# Translation method
def rcom_translate(strand):
    return strand[::-1].translate(str.maketrans('ACGT','TGCA'))

sample = 'CAGTAATCAGCGTCTGCATGGCTAGCCCACGACTCGTCCGGTTGACGTCAAG'
print(rev_com(sample))
print('\n')
print(rev_com_workaround(sample))
print('\n')
print(rcom_translate(sample))

CTTGACGTCAACCGGACGAGTCGTGGGCTAGCCATGCAGACGCTGATTACTG


CTTGACGTCAACCGGACGAGTCGTGGGCTAGCCATGCAGACGCTGATTACTG


CTTGACGTCAACCGGACGAGTCGTGGGCTAGCCATGCAGACGCTGATTACTG


# Rabbits and Recurrence Relations
Fibonacci's rabbits:
1. The population begins the first month with a pair of newborn rabbits
2. Rabbits reach reproductive age after a month
3. Every rabbit of age reproduces in a given month
4. Exactly one month after two rabbits mate they produce one male and one female rabbit
5. Rabbits never die or stop reproducing
How many rabbits will there be in one year?\
144 rabbits or total_months^2

A **recurrence relation** is a way of defining the terms of a sequence with respect to the value of the previous terms.\
With the rabbits each month contains the rabbits that were alive in the previous month plus any new offspring.\
This also means the number of offspring in a given month is equal to the amount of rabbits two months prior: \
if n is the number of months and F<sub>n</sub> is the number of rabbit pairs alive\
F<sub>n</sub> = F<sub>n-1</sub> + F<sub>n-2</sub>\
When finding the nth term in a sequence, use a recurrence relation to generate progressively larger values of n. This is a dynamic programming concept.

## Problem 4
Given positive ints *n*<=40 and *k*<=5
Return the total number of rabbit pairs present after *n* months, if beginning with 1 pair and in each generation, every pair of reproductive-age rabbits produces a litter of *k* rabbit pairs

In [2]:
import timeit
# While this recursive function is correct, the run time is inefficient
def rabbit_pairs(n,k):
    # Establish base case of F(1) = F(2) = 1
    if n == 2 or n ==1:
        return 1;
    else:
        # Sum of the current with the new rabbit pairs provided by the recurrence relation
        return rabbit_pairs(n-1,k) + rabbit_pairs(n-2,k)*k
%time print(rabbit_pairs(31,3))

# This O(n) function is actually faster
def fib(n,k):
    prev1, prev2 = 1, 1
    for i in range(2,n):
        prev1, prev2 = prev2, prev1 * k + prev2
    return prev2
%time print(fib(999,5))

# With Dynamic Programming (memoization) it is also fast
# Use a dict to store the computed vals so far
memo = {}
def fib_memo(n,k=1):
    #print(memo)
    args = (n,k)
    if args in memo:
        return memo[args] # give back values already computed
    # otherwise compute
    if n == 1:
        result = 1
    elif n == 2:
        result = 1
    else:
        result = fib_memo(n-1,k) + fib_memo(n-2,k)*k
    memo[args] = result
    return result
%time print(fib_memo(999,5))


47079164257
Wall time: 227 ms
4985520533788572478138143712304922574570812175459011698153379704472857147513663532589229883352870356149317474413074766280352143733958723915936843525354628514731874477707187606559407670300699587924788164858517177453400785714027160603614426654098138223865644305772844686501550883664134184441045219032877706833791083662353110109100620310415665170716017573936522098591485760232995174328244658446641692634964888170335809397508875359545007921169726486
Wall time: 0 ns
4985520533788572478138143712304922574570812175459011698153379704472857147513663532589229883352870356149317474413074766280352143733958723915936843525354628514731874477707187606559407670300699587924788164858517177453400785714027160603614426654098138223865644305772844686501550883664134184441045219032877706833791083662353110109100620310415665170716017573936522098591485760232995174328244658446641692634964888170335809397508875359545007921169726486
Wall time: 1.03 ms
