GC Contents in DNA
- GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases in a DNA or RNA molecule that are either Guanine (G) or Cytosine (C).

<!-- end of list -->

A=T, G=C

Usefulness
- In polymerase chain reaction (PCR) experiments, the GC-content of short oligonucleotides known as primers is often used to predict their annealing temperature to the template DNA.
- A higher GC-content level indicates a relatively higher melting temperature
- DNA with low GC-content is less stable than DNA with high GC-content
- High GC content DNA can make it difficult to perform PCR amplification due to difficulty in designing a primer long enough to provide great specifity

In [1]:
!pip install BioPython

import Bio
from Bio.Seq import Seq
from Bio.SeqUtils import GC

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting BioPython
  Downloading biopython-1.81-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: BioPython
Successfully installed BioPython-1.81


Method 1

In [2]:
dna_seq = Seq('ATGATCTCGTAA')
dna_seq

Seq('ATGATCTCGTAA')

In [3]:
GC(dna_seq)



33.333333333333336

Method 2 \
Custom Function to Get GC Count

In [4]:
dna_seq = Seq('ATGATCTCGTAA')
dna_seq

Seq('ATGATCTCGTAA')

In [5]:
dna_seq.count('A')

4

In [6]:
def gc_content(seq):
  result = float(seq.count('G') + seq.count('C')) / len(seq) * 100
  return result

In [7]:
gc_content(dna_seq)

33.33333333333333

Method 3

In [8]:
def gc_content2(seq):
  gc = [B for B in seq.upper() if B in 'GC']
  result = float(len(gc)) / len(seq) * 100
  return result

In [9]:
gc_content2(dna_seq)

33.33333333333333

In [10]:
dna_seq.lower()

Seq('atgatctcgtaa')

In [11]:
gc_content2('atgatctcgtaa')

33.33333333333333

In [12]:
GC('atgatctcgtaa')

33.333333333333336

Melting Point of DNA
- Higher GC means high melting point.
- Tm_Wallace: 'Rule of thumb'.
- Tm_GC: Empirical formulas based on GC content. Salt and mismatch corrections can be included.
- Tm_NN: Calculation based on nearest neighbor thermodynamics. Several tables for DNA/DNA, DNA/RNA, and RNA/RNA hybridizations are included. Correction for mismatches, dangling ends, salt concentration, and other additives are available.

In [13]:
import Bio.SeqUtils

In [14]:
dir(Bio.SeqUtils)

 'CodonAdaptationIndex',
 'GC',
 'GC123',
 'GC_skew',
 'IUPACData',
 'Seq',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_gc_values',
 'complement',
 'complement_rna',
 'cos',
 'exp',
 'gc_fraction',
 'log',
 'molecular_weight',
 'nt_search',
 'pi',
 're',
 'seq1',
 'seq3',
 'sin',
 'six_frame_translations',
 'standard_dna_table',
 'xGC_skew']

In [15]:
from Bio.SeqUtils import MeltingTemp as mt

In [16]:
dna_seq

Seq('ATGATCTCGTAA')

In [17]:
GC(dna_seq)

33.333333333333336

In [18]:
# check for the melting point using wallace
mt.Tm_Wallace(dna_seq)

32.0

In [19]:
# check for the melting point using GC content
mt.Tm_GC(dna_seq)

23.569568738644566

Check for the Nucleotide Molecular Weight
- ProtParam.ProteinAnalysis
- Counter from collections

In [20]:
from Bio.Seq import Seq
from Bio.SeqUtils import molecular_weight

In [21]:
dna = Seq('ATGATCTCGTAA')

In [22]:
# Molecular weight of DNA nucleotide
molecular_weight(dna)

3724.3894999999998

Exercise

In [23]:
!pip install BioPython

import Bio
from Bio.Seq import Seq

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [24]:
sequence1 = Seq('ATGCATGGTGCGCGA')
sequence2 = Seq('ATTTGTGCTCCTGGA')

In [25]:
from Bio.SeqUtils import GC

print("GC sequence1 =", GC(sequence1))
print("GC sequence2 =", GC(sequence2))

print("The sequence that has the highest GC is", "sequence1" if GC(sequence1) > GC(sequence2) else "sequence2")

GC sequence1 = 60.0
GC sequence2 = 46.666666666666664
The sequence that has the highest GC is sequence1


In [26]:
import Bio.SeqUtils
from Bio.SeqUtils import MeltingTemp as mt

print(mt.Tm_Wallace(sequence1))
print(mt.Tm_GC(sequence1))
print(mt.Tm_NN(sequence1))
print()
print(mt.Tm_Wallace(sequence2))
print(mt.Tm_GC(sequence2))
print(mt.Tm_NN(sequence2))
print()
print("The sequence that has the higher melting temperature is",
      "sequence1" if mt.Tm_Wallace(sequence1) > mt.Tm_Wallace(sequence2)
      else "sequence2")

48.0
44.5029020719779
51.018959639430136

44.0
39.03623540531123
42.05858302814346

The sequence that has the higher melting temperature is sequence1


In [27]:
from Bio.Seq import Seq
from Bio.SeqUtils import molecular_weight

print(molecular_weight(sequence1))
print(molecular_weight(sequence2))
print()
print("The sequence that has the highest molecular weight is", 
      "sequence1" if molecular_weight(sequence1) > molecular_weight(sequence2)
      else "sequence2")

4712.995199999999
4653.9565

The sequence that has the highest molecular weight is sequence1


BLUEJACK CASE

In [None]:
!pip install BioPython

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import Bio
import matplotlib.pyplot as plt

from Bio.Seq import Seq
from Bio.SeqUtils import molecular_weight
from Bio.SeqUtils import MeltingTemp as mt
from Bio.SeqUtils import GC

In [None]:
seqA = Seq('AGCTTGCAGCGTCCGTTAGCTCGAGTCCAGGACGTTAGTCCTGCAGTC')
seqB = Seq('CAGTAAGTTGCCGTTAGCGCGTAGTGCCAGTAAGCGGCTCGTTAGTGG')

print("Sequence A:", seqA)
print("Sequence B:", seqB)

In [None]:
print(molecular_weight(seqA))
print(molecular_weight(seqB))

In [None]:
print(mt.Tm_Wallace(seqA))
print(mt.Tm_Wallace(seqB))

In [None]:
plt.bar(['SeqA', 'SeqB'], [mt.Tm_Wallace(seqA), mt.Tm_Wallace(seqB)])
plt.ylim([120, 160])
plt.xlabel('DNA Sequence')
plt.ylabel('Melting Temp')
plt.show()

In [None]:
def gc_content(seq):
  gc = [B for B in seq.upper() if B in 'GC']
  result = float(len(gc)) / len(seq) * 100
  return result

print(gc_content(seqA))
print(gc_content(seqB))

In [None]:
def at_content(seq):
  at = [B for B in seq.upper() if B in 'AT']
  result = float(len(at)) / len(seq) * 100
  return result

print(at_content(seqA))
print(at_content(seqB))

In [None]:
plt.bar(['SeqA', 'SeqB'], [gc_content(seqA), gc_content(seqB)])
plt.bar(['SeqA', 'SeqB'], [at_content(seqA), at_content(seqB)])
plt.legend(['GC Content', 'AT Content'])
plt.xlabel('DNA Sequence')
plt.ylabel('Percentage')
plt.ylim(35, 60)
plt.show()