# DNA Sequence 101

Understanding DNA Sequences with Python

## Objectives

In this lesson, we'll:
- Learn what DNA is and what it's made of
- Represent DNA sequences in Python
- Implement basic operations (length, counting nucleotide bases, complement, reverse complement)
- Build a `DNASequence` python class

## Deoxyribonucleic Acid (DNA)

DNA (Deoxyribonucleic Acid) is made up of **four bases**:
- `A`: Adenine
- `T`: Thymine
- `G`: Guanine
- `C`: Cytosine

Each base pairs with it's **complement**:
|Base|Complement|
|---|---|
|A|T|
|T|A|
|G|C|
|C|G|

Sp a DMA strand like `A T G C` pairs with `T A C G`

DNA Sequences are written as strings of these letters:
`ATGCGTACGTTAGC`

In [1]:
# Since DNA is just a sequence of these nucleotides, it can simply be represented as a string (which is essentially a list of characters) in Python
dna_sequence = "ATGCGTACGTTAGC"

print("DNA Sequence:", dna_sequence)
print("Length:", len(dna_sequence))
print("Bases:", set(dna_sequence))

DNA Sequence: ATGCGTACGTTAGC
Length: 14
Bases: {'A', 'C', 'G', 'T'}


In [2]:
BASES = ['A', 'T', 'G', 'C']

## Counting Nucleotides

In [3]:
def count_nucleotides(seq: str) -> dict[str, int]:
    """Count the number of nucleotides in the given DNA sequence"""
    counts = { base: seq.count(base) for base in BASES }
    return counts

In [4]:
count_nucleotides('ATGCGTACGTTAGC')

{'A': 3, 'T': 4, 'G': 4, 'C': 3}

## Generating Random DNA Sequences

Turns out that the only requirement for a valid DNA sequence is to only have these nucleotides. The order does not matter. Since we will need a lot of DNA sequences in the following lessons, let's create a simple helper function to generate random DNA sequences

In [5]:
# We'll need to import the `random` module from the Python standard library to be able to select bases randomly
import random

In [6]:
def generate_random_sequence(length: int = 12) -> str:
    """Generates a random DNA sequence of given length"""
    seq = [random.choice(BASES) for _ in range(length)]
    return "".join(seq)

In [None]:
# Generate a random DNA sequence of length 12 (default)
seq = generate_random_sequence()
seq

'TCCCCATCGTGC'

In [8]:
# Generate a much larger sequence
seq = generate_random_sequence(1000)
seq

'TAAACTATTGCGGTATTCTGATCGGAAGCGTTCCGCAGGATTCTGAACCATACGGCAGCCGCTTAGGCTAGGGTGCCTGTTGAAAACCACCCAACTGAGTTTGCAGATTCGCAAGCGGGCACAAGCCCTATCGCGAGTGATTCGGAATTACATGAACTGCCAGCGAGGGTTCCCGCCCGGCTTCGACCCCAGACCTTCAAAATCATTGTTTGTTGGCAGTTACCTGGCTGAGTTGCGATCTGTTCTTCTGGTCCTTCCGCCGGATAAAAGGATTGTCCGTCATGCAGAATACGCGTCAGTAGTAGGACGGTAAGATCTGGCATTTTCTAGAGGGAACCCCATTGAGCCGGCTGCTATCCCTTTTTTAACCATAAGCCAGGTAGACTTGGCCTGTATCGGCAATCACGGGGCGGCGCACAGAAAAAGTGTACCATGATGGCGCTCATAGCCCCAGTGCGGAATGAATTTTCGTACTGTTTGAGCCTTTCCTCGTCGGGTGTGCCGGGGACCGTATTCCGCGCCGATCCTTGGGACTACAAACCAACCCATCTATATGTCGGCCCAAGTTAGGCCACACCTACAAGGCATCAACACCTTGAACGCGAAAGGTTATTTCACGGAGATGTCAGCCGCAGGTCTCGGCCTCCGGGTTTAAGTTAAACTCAAGTTCGGTTTTATTGAATATGCAGTTGCGCCTGTGAACGGACCCCAGCGGTGCAACTGTCTCACTACAGTGATTGGCTAGATGGTATTAACATTGTACATGGGGTTAGGTAAAATCTTGACCACATTATGCCGAGGCGTGACGGGGCCTACGTAGGTTAATGCTGTTATACAGTCGCCGTGCGATCCTTTGGGGTTAACGTCCATATACTAAACCTGGCTTAAGGCTATCAAATGCAAAGAACTTCATGTACGGGCCGACCTGATTTTCCTTTGGTGCTACCGTGCCCTGTACTATTCACATAGTTTAGGGAAAGCAATTGTGGTCTCTCTAAT