# DNA Composition  GC,AT Content and Frequency

#### GC Contents In DNA
+ GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases in a DNA or RNA molecule 
that are either Guanine (G) or Cytosine (C)

A -> T <br/>
G -> C

#### Usefulness
+ In polymerase chain reaction (PCR) experiments, the GC-content of short oligonucleotides known as primers is often used to predict their annealing temperature to the template DNA. 
+ A higher GC-content level indicates a relatively higher melting temperature.
+ DNA with low GC-content is less stable than DNA with high GC-content
+ High GC content DNA can make it difficult to perform PCR amplication due to difficulty in designing a primer long enough to provide great specifity


#### AT Contents in DNA
+ AT content is the percentage of nitrogenous bases in a DNA or RNA molecule that are either Adenine (A) or Thymine (T)
+ AT base pairing yields only 2 hydrogen bonds

#### GC Content

In [1]:
from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction

In [2]:
dna = Seq("ATGATCTCGTAA")

In [3]:
print(gc_fraction(dna))

0.3333333333333333


In [4]:
# Custom function
def gc_content(seq):
    return float(seq.count("G") + seq.count("C")) / len(seq)

In [5]:
gc_content(dna)

0.3333333333333333

In [6]:
# custom function 2
def gc_content2(seq):
    gc = [i for i in seq.upper() if i in "GC"]
    return len(gc) / len(seq)

In [7]:
gc_content2(dna)

0.3333333333333333

In [8]:
gc_content2(dna.lower())

0.3333333333333333

#### AT Content

In [9]:
def at_content(seq):
    seq = seq.upper()
    return (seq.count("A") + seq.count("T")) / len(seq)

In [10]:
at_content(dna)

0.6666666666666666

### Melting Point of DNA

- Higher GC means high melting point
- Tm_Wallace: "Rule of thumb"
- Tm_GC: Empirical formulas based on GC Content. Salt and mismatch corrections can be included
- Tm_NN: Calculation based on nearest neighbour thermodynamics. Several tables for DNA/DNA, DNA/RNA and RNA/RNA hybridizations are included. Correction for mismatch, dangling ends, salt concentration and other additives are available.

In [11]:
from Bio.SeqUtils import MeltingTemp as mt

In [12]:
# check for the melting point using wallace
mt.Tm_Wallace(dna)

32.0

In [13]:
# check for the melting point GC Content
mt.Tm_GC(dna)

23.569568738644566

In [14]:
# Examples
ex1 = "ATGCATGGTGCGCGA"
ex2 = "ATTTGTGCTCCTGGA"

In [15]:
gc_fraction(ex1)

0.6

In [16]:
gc_fraction(ex2)

0.4666666666666667

In [17]:
gc_content2(ex1)

0.6

In [18]:
gc_content2(ex2)

0.4666666666666667

In [19]:
def get_metrics(seq):
    gc = gc_fraction(seq)
    at = at_content(seq)
    melting = mt.Tm_GC(seq)
    result = "GC:{}, AT:{}, Temp:{}".format(gc,at,melting)
    return result

In [20]:
get_metrics(ex1)

'GC:0.6, AT:0.4, Temp:44.5029020719779'

In [21]:
get_metrics(ex2)

'GC:0.4666666666666667, AT:0.5333333333333333, Temp:39.03623540531123'

### GC Skew

+ check when the nucleotide (G,C) are over or under abundant in a particular region of a DNA or RNA
+ Helps to indicate DNA lagging strand or leading strand
+ GC skew pos = leading
+ GC skew neg = lagging

In [22]:
from Bio.SeqUtils import GC123, GC_skew, xGC_skew

In [23]:
# GC content first,second,third position
dna

Seq('ATGATCTCGTAA')

In [24]:
GC123(dna)

(33.333333333333336, 0.0, 25.0, 75.0)

In [25]:
# GC_Skew 
GC_skew(dna,10)

[0.0, 0.0]

#### Subsequence
+ Search for a DNA subseq in sequence, return list of [subseq, positions]

In [26]:
from Bio.SeqUtils import nt_search

In [27]:
s1 = Seq("ACTATT")
subseq= Seq("ACT")

In [29]:
nt_search(str(s1), str(subseq))

['ACT', 0]

In [30]:
s1 = Seq("ACTATT")
subseq= Seq("ATT")

In [31]:
nt_search(str(s1), str(subseq))

['ATT', 3]

# Well Done!