# Day 1: Morning Lab 
## **BioPython Fundamentals**
---

## **üìå Introduction to Colab**

Colab is a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources. It is the interactive platform we'll use for practicals.
<br> <br>
Watch the following 3-min tutorial: https://www.youtube.com/watch?v=inN8seMm7UI

# **üìå Part 1: *BioPython* Fundamentals**

<a href="https://biopython.org/">*BioPython*</a> is a Python library - an open-source collection of Python tools specifically designed for biological computation. It provides a standard set of modules that simplify the manipulation of complex biological data. <br>

‚ö°**Key Capabilities:**
- **Sequence Manipulation**: Easy handling of DNA, RNA, and protein sequences.
- **Biological Operations**: Built-in methods for transcription, translation, and reverse complements.
- **Data Integration**: Tools for reading/writing various biological file formats
<br>

üìö **Optional Learning Resources**: David Boo's tutorials

- https://david-boo.github.io/biopython-tutorial-first/
- https://david-boo.github.io/biopython-tutorial-second/

In [16]:
!pip install biopython
from IPython.display import clear_output
clear_output()


### **1. The `Seq` Module**

‚û°Ô∏è The `Seq` object is the central data structure in BioPython. It behaves like a Python string but includes specialized biological methods to support the **Central Dogma** of molecular biology.

| Operation                  | Biopython Method              | Biological Context                                      |
|----------------------------|-------------------------------|---------------------------------------------------------|
| Transcription              | `.transcribe()`               | Converts DNA coding strand to RNA (T ‚Üí U).               |
| Translation                | `.translate()`                | Converts RNA codons into Amino Acids (Protein).         |
| Reverse Complement         | `.reverse_complement()`       | Generates the 5' ‚Üí 3' sequence of the opposite strand.  |

In [37]:
from Bio.Seq import Seq

# Creating a DNA sequence
dna_seq = Seq("ATGCGTACGTAG")

# 1. Transcription (DNA -> RNA)
rna_seq = dna_seq.transcribe()
print("RNA Sequence:", rna_seq)

# 2. Translation (RNA -> Protein)
protein_seq = rna_seq.translate()
print("Protein Sequence:", protein_seq)

# 3. Reverse Complement (Coding strand <--> Template strand)
rev_complement = dna_seq.reverse_complement()
print("Reverse Complement:", rev_complement)

# 4. Transcription with Replace (avoid it!)
rna_seq1 = dna_seq.replace("T", "U")
print("DNA to RNA with replace() method:", rna_seq1)

RNA Sequence: AUGCGUACGUAG
Protein Sequence: MRT*
Reverse Complement: CTACGTACGCAT
DNA to RNA with replace() method: AUGCGUACGUAG


‚ö†Ô∏è **Note**: The **`*`** in a protein sequence represents a stop codon (UAG, UAA, UGA), signaling the end of a protein. <br>
In the previous example, the protein ends after Methionine (M), Arginine (R), and Threonine (T). Please refer to the translation table below to see how`AUGCGUACGUAG` is translated into `MRT*`.

<div style="display: flex; justify-content: center;"> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Aminoacids_table.svg/1200px-Aminoacids_table.svg.png" width="40%" /> </div>

### **2. The `SeqUtils` Module**

‚û°Ô∏è Beyond sequence manipulation, BioPython provides the `SeqUtils` module to calculate physical and chemical properties of nucleic acids and proteins.
<br> You will study this utility in more detail in **Exercise 6**.

In [None]:
from Bio.SeqUtils import gc_fraction, molecular_weight

dna_seq = Seq("ATGCGTACGTAG")

# GC Content (returns a float of [0, 1], so we multiply by 100)
gc_content = gc_fraction(dna_seq) * 100 
print("GC Content:", gc_content, "%")

# Molecular Weight Calculation
# ‚ö†Ô∏è Note: seq_type can be 'DNA', 'RNA', or 'protein'
mol_weight = molecular_weight(dna_seq, seq_type='DNA')
print("Molecular Weight:", mol_weight, "Dalton (Da)")


GC Content: 50.0 %
Molecular Weight: 3765.4016 Dalton (Da)


### **3. The `Align` Module**

‚û°Ô∏è Sequence alignment is used to identify regions of similarity that may indicate functional or evolutionary relationships. Using `Align` module, we can efficiently find the most optimal way to align two sequences based on a mathematical scoring system.

In [49]:
from Bio import Align

# 1. Initialize the PairwiseAligner with default scores
aligner = Align.PairwiseAligner(match_score=1.0, mismatch_score=0.0, open_gap_score=0.0, extend_gap_score=0.0)

# 2. Define sequences to align
seq1 = "GAUUACA"
seq2 = "GCAUGCU"

# 3. Perform alignment
alignments = aligner.align(seq1, seq2)

# 4. Select the best alignment (it's always the first one)
best_alignment = alignments[0]
print(f"Best Alignment: \n{best_alignment}")
print(f"Best Alignment Score: {best_alignment.score}")
print("-" * 90)
# 5. Optional: Print all possible alignments 
# for alignment in alignments:
#     print(alignment)

Best Alignment: 
target            0 G-AUUA-CA-  7
                  0 |-||---|-- 10
query             0 GCAU--GC-U  7

Best Alignment Score: 4.0
------------------------------------------------------------------------------------------


üîé **Visualizing the Alignment**

Biopython represents the relationship between sequences using a specific visual notation in the output:

- **Vertical Bars (`|`)**: These represent a **Match**, where the characters in both sequences are identical.
- **Dashes (`-`)**: These represent **Gaps (Indels)**.  
  - A dash in the *target* sequence indicates an **insertion** in the query.  
  - A dash in the *query* sequence indicates a **deletion** from the target.
<br> <br>

‚ûï **The Scoring System**

The aligner determines the "best" alignment by maximizing a total score based on specific parameters. In this lab, we use a simple scoring scheme where matches contribute points and mismatches or gaps do not:

<div align="center">

$$
\text{Total Score} = \sum (\text{Matches} \times S_{\text{match}}) + \sum (\text{Mismatches} \times S_{\text{mismatch}}) + \sum (\text{Gaps} \times S_{\text{gap}})
$$

</div>

Based on our current configuration:

- Match Score ($S_{\text{match}}$): **+1.0**
- Mismatch Score ($S_{\text{mismatch}}$): **0.0**
- Gap Score ($S_{\text{gap}}$): **0.0**

### **4. The Data Retrieval `Entrez` Module**

‚û°Ô∏è The `Entrez` module provides a gateway to the **Entrez** search and retrieval system, which integrates data from 40 different health and life science databases, including PubMed, GenBank, and the Protein Data Bank (PDB).

üî∑ **Core Functions**

| Function  | Description                                                                 |
|-----------|-----------------------------------------------------------------------------|
| `efetch`  | Retrieves data records (like sequences or articles) based on a **specific ID**. |
| `esearch` | Searches a database for records that match your text keywords.              |
| `einfo`   | Provides information about a specific database (e.g., how many records it contains). |


In [None]:
from Bio import Entrez

# Optional: Provide your email
# ‚ö†Ô∏è NCBI requires this for contact in case of esccessive requests or issues.
Entrez.email = "your_email@example.com"

# Use Entrez.efetch() to retrieve a record from the NCBI database
# Parameters:
#   db       : The database to query ("nucleotide" for DNA/RNA sequences)
#   id       : The unique accession number of the record (here: NM_001301717, a human gene transcript)
#   rettype  : The format of the returned data ("fasta" for FASTA format)
#   retmode  : The output mode ("text" for plain text, alternative is "xml")
handle = Entrez.efetch(
    db="nucleotide", 
    id="NM_001301717", 
    rettype="fasta", 
    retmode="text")

sequence_data = handle.read() # Read entire response content
print(sequence_data)


>NM_001301717.2 Homo sapiens C-C motif chemokine receptor 7 (CCR7), transcript variant 4, mRNA
CTCTAGATGAGTCAGTGGAGGGCGGGTGGAGCGTTGAACCGTGAAGAGTGTGGTTGGGCGTAAACGTGGA
CTTAAACTCAGGAGCTAAGGGGGAAACCAATGAAAAGCGTGCTGGTGGTGGCTCTCCTTGTCATTTTCCA
GGTATGCCTGTGTCAAGATGAGGTCACGGACGATTACATCGGAGACAACACCACAGTGGACTACACTTTG
TTCGAGTCTTTGTGCTCCAAGAAGGACGTGCGGAACTTTAAAGCCTGGTTCCTCCCTATCATGTACTCCA
TCATTTGTTTCGTGGGCCTACTGGGCAATGGGCTGGTCGTGTTGACCTATATCTATTTCAAGAGGCTCAA
GACCATGACCGATACCTACCTGCTCAACCTGGCGGTGGCAGACATCCTCTTCCTCCTGACCCTTCCCTTC
TGGGCCTACAGCGCGGCCAAGTCCTGGGTCTTCGGTGTCCACTTTTGCAAGCTCATCTTTGCCATCTACA
AGATGAGCTTCTTCAGTGGCATGCTCCTACTTCTTTGCATCAGCATTGACCGCTACGTGGCCATCGTCCA
GGCTGTCTCAGCTCACCGCCACCGTGCCCGCGTCCTTCTCATCAGCAAGCTGTCCTGTGTGGGCATCTGG
ATACTAGCCACAGTGCTCTCCATCCCAGAGCTCCTGTACAGTGACCTCCAGAGGAGCAGCAGTGAGCAAG
CGATGCGATGCTCTCTCATCACAGAGCATGTGGAGGCCTTTATCACCATCCAGGTGGCCCAGATGGTGAT
CGGCTTTCTGGTCCCCCTGCTGGCCATGAGCTTCTGTTACCTTGTCATCATCCGCACCCTGCTCCAGGCA
CGCAACTTTGAGCGCAACAAGGCCATCAAGGTGATCATCGCTGTGGTCGTGGT

### **5. The `PDB` Module**

 ‚û°Ô∏è The Protein Data Bank (PDB) is the standard format for representing a 3D image of proteins, nucleic acids, and complex assemblies. Each file contains a list of every atom in the molecule and its exact position in a 3D space.

 üî∑ **SMCRA Hierarchy**
 Biopython's `PDB` module organizes PDB data using a nested **SMCRA** hierarchy. To access a specific atom, you have to navigate ("drill down") through these levels:
- **Structure (S)**: The top-level object representing the entire PDB entry.
- **Model (M)**: A specific 3D snapshot.  
- **Chain (C)**: Individual polypeptide or nucleic acid chains (e.g., chain A, B).
- **Residue (R)**: The building blocks of the chain (amino acids, nucleotides).
- **Atom (A)**: The individual atoms with 3D coordinates that define the molecular shape.

In [None]:
from IPython.display import clear_output
# Example PDB file (1A8O.pdb)
# You can download a PDB file from https://www.rcsb.org/
# If running in Colab, you can use wget to download a PDB file
!wget https://files.rcsb.org/download/1A8O.pdb
clear_output()

In [None]:
from Bio.PDB import PDBParser

# Initialize the parser
parser = PDBParser()

# Load structure from the pdb file
structure = parser.get_structure("1A8O", "1A8O.pdb")

# Loop through the hierarchy: Structure -> Model -> Chain -> Residue -> Atom
for model in structure:
    for chain in model:
        for residue in chain:
            for atom in residue:
                # Print details for each atom
                print(f"Chain {chain.id} | {residue.resname}({residue.id[1]}) | {atom.name} | Coords: {atom.coord}")
                
                # Print the first 10 atoms
                if atom.serial_number >= 10:
                    print("...")
                    break
            break
        break



Chain A | MSE(151) | N | Coords: [19.594 32.367 28.012]
Chain A | MSE(151) | CA | Coords: [20.255 33.101 26.891]
Chain A | MSE(151) | C | Coords: [20.351 34.558 27.296]
Chain A | MSE(151) | O | Coords: [19.362 35.291 27.282]
Chain A | MSE(151) | CB | Coords: [19.457 32.943 25.591]
Chain A | MSE(151) | CG | Coords: [20.022 33.7   24.387]
Chain A | MSE(151) | SE | Coords: [21.718 33.262 23.918]
Chain A | MSE(151) | CE | Coords: [21.424 31.798 22.897]


# üìå **Part 2: Exercises on Central Dogma Processes**

### **üî∂ Task 1/7: DNA to RNA Transcription**

üéØ **Objective**: Use BioPython to transcribe a DNA sequence into an RNA sequence. 

Use `dna_sequence = "ATGCGTACGTAG"` as DNA sequence

In [None]:
from Bio.Seq import Seq

def transcribe_dna_to_rna_biopython(dna_sequence):
    # üî∏ Continue your code here


RNA Sequence: AUGCGUACGUAG



<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python
from Bio.Seq import Seq

def transcribe_dna_to_rna_biopython(dna_sequence):
  dna_seq = Seq(dna_sequence)
  rna_seq = dna_seq.transcribe()
  return rna_seq

#Example usage:
dna_sequence = "ATGCGTACGTAG"
rna_sequence = transcribe_dna_to_rna_biopython(dna_sequence)
print("RNA Sequence:", rna_sequence)
```
</details>

### üî∂**Task 2/7: Finding the Start and Stop Codons**

üéØ **Objective**: Use BioPython to write a function that finds the start (AUG) and stop codons (UAA, UAG, UGA) in an RNA sequence.

Use `rna_sequence = "AUGGCGUAAUGCUGA"` as RNA sequence

In [None]:
from Bio.Seq import Seq

# This function finds the start codon and stop codons in an RNA sequence
def find_start_stop_codons(rna_sequence):
    rna_seq = Seq(rna_sequence)

    start_codon_position = rna_seq.find("AUG") # Start codon "AUG" position
    stop_codons = ["UAA", "UAG", "UGA"] # Possible stop codons

    stop_positions = [] # Store the positions of stop codons here
    
    for codon in stop_codons:
        # üî∏ Continue your code here



    # Return the start and earliest stop codon
    if stop_positions:
        return start_codon_position, min(stop_positions) # return the first stop codon and earliest stop
    else:
        return start_codon_position, None  # return None if no stop codons were found

# Example usage:
rna_sequence = "AUGGCGUAAUGCUGA"  # Example RNA sequence
start, stop = find_start_stop_codons(rna_sequence)

# Print the positions of the start and stop codons
print(f"Start Codon found at position: {start}, Stop Codon found at position: {stop}")

Start Codon found at position: 0, Stop Codon found at position: 6


<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python
from Bio.Seq import Seq

# This function finds the start codon and stop codons in an RNA sequence
def find_start_stop_codons(rna_sequence):
    # Convert the RNA sequence into a BioPython sequence object
    rna_seq = Seq(rna_sequence)
    
    # Find the position of the start codon "AUG"
    start_codon_position = rna_seq.find("AUG")
    
    # List of possible stop codons
    stop_codons = ["UAA", "UAG", "UGA"]
    
    # Find positions of stop codons after the start codon
    stop_positions = []
    for codon in stop_codons:
        stop_position = rna_seq.find(codon, start_codon_position)
        if stop_position != -1:  # If the stop codon is found
            stop_positions.append(stop_position)
    
    # Return the position of the start codon and the earliest stop codon, or None if no stop codons were found
    if stop_positions:
        return start_codon_position, min(stop_positions)  # Return the first stop codon
    else:
        return start_codon_position, None  # If no stop codon found, return None

# Example usage:
rna_sequence = "AUGGCGUAAUGCUGA"  # Example RNA sequence
start, stop = find_start_stop_codons(rna_sequence)

# Print the positions of the start and stop codons
print(f"Start Codon found at position: {start}, Stop Codon found at position: {stop}")

```
</details>

### üî∂ **Task 3/7: Translation of RNA to Protein**

üéØ **Objective**: Use BioPython to translate an RNA sequence into a protein sequence.

Use `rna_sequence = "AUGGCGUAA"`


In [None]:
from Bio.Seq import Seq

def translate_rna_to_protein_biopython(rna_sequence):
    # üî∏ Continue your code here


Protein Sequence: MA


<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python

from Bio.Seq import Seq

def translate_rna_to_protein_biopython(rna_sequence):
    rna_seq = Seq(rna_sequence)
    protein_seq = rna_seq.translate(to_stop=True)
    return protein_seq

# Example usage:
rna_sequence = "AUGGCGUAA"
protein_sequence = translate_rna_to_protein_biopython(rna_sequence)
print("Protein Sequence:", protein_sequence)
```
</details>

### üî∂ **Task 4/7: Simulate a Point Mutation and Its Effect**

üéØ **Objective**: Use BioPython to simulate a point mutation in a DNA sequence and analyze its effect on the resulting protein sequence.

Use `dna_sequence = "ATGCGTACGTAG"`

In [None]:
# Import the necessary module
from Bio.Seq import Seq
import random  # Import random for generating random numbers

# Function to simulate a point mutation
def simulate_point_mutation(dna_sequence):
    # Convert the input DNA sequence into a Biopython Seq object
    dna_seq = Seq(dna_sequence)

    # List of all possible DNA nucleotides
    nucleotides = ['A', 'T', 'C', 'G']

    # Randomly choose a position in the DNA sequence
    position = random.randint(0, len(dna_seq) - 1)

    # Find the original nucleotide at the chosen position
    # üî∏ Continue your code here 

    # Choose a new nucleotide that is different from the original
    # üî∏ Continue your code here 


    # Create the mutated DNA by replacing the original nucleotide with the new one
    # üî∏ Continue your code here 

    # Return the mutated DNA and details of the mutation
    return mutated_dna, position, original_nucleotide, new_nucleotide

# Example usage:
# Original DNA sequence
dna_sequence = "ATGCGTACGTAG"

# Call the function and get the mutated DNA and details
mutated_dna, position, original, new = simulate_point_mutation(dna_sequence)

# Print the results
print(f"Original DNA: {dna_sequence}")
print(f"Mutated DNA: {mutated_dna}")
print(f"Mutation at position {position}: {original} -> {new}")



Original DNA: ATGCGTACGTAG
Mutated DNA: ATGCGTACGTAC
Mutation at position 11: G -> C


<details>

<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python

# Import the necessary module
from Bio.Seq import Seq
import random  # Import random for generating random numbers

# Function to simulate a point mutation
def simulate_point_mutation(dna_sequence):
    # Convert the input DNA sequence into a Biopython Seq object
    dna_seq = Seq(dna_sequence)
    
    # List of all possible DNA nucleotides
    nucleotides = ['A', 'T', 'C', 'G']
    
    # Randomly choose a position in the DNA sequence
    position = random.randint(0, len(dna_seq) - 1)
    
    # Find the original nucleotide at the chosen position
    original_nucleotide = dna_seq[position]
    
    # Choose a new nucleotide that is different from the original
    new_nucleotide = random.choice([n for n in nucleotides if n != original_nucleotide])
    
    # Create the mutated DNA by replacing the original nucleotide with the new one
    mutated_dna = dna_seq[:position] + new_nucleotide + dna_seq[position + 1:]
    
    # Return the mutated DNA and details of the mutation
    return mutated_dna, position, original_nucleotide, new_nucleotide

# Example usage:
# Original DNA sequence
dna_sequence = "ATGCGTACGTAG"

# Call the function and get the mutated DNA and details
mutated_dna, position, original, new = simulate_point_mutation(dna_sequence)

# Print the results
print(f"Original DNA: {dna_sequence}")
print(f"Mutated DNA: {mutated_dna}")
print(f"Mutation at position {position}: {original} -> {new}")

```
</details>

### üî∂ **Task 5/7: Calculate the Melting Temperature (Tm) of DNA**

üéØ **Objective**: Use BioPython to calculate the melting temperature of a DNA sequence.

Use `dna_sequence = "ATGCGTACGTAG"`

In [None]:
from Bio.Seq import Seq
from Bio.SeqUtils import MeltingTemp as mt

def calculate_melting_temperature_biopython(dna_sequence):
  # üî∏ Continue your code here

# Example usage:
dna_sequence = "ATGCGTACGTAG"
tm = calculate_melting_temperature_biopython(dna_sequence)
print("Melting Temperature (Tm):", tm, "¬∞C")

Melting Temperature (Tm): 36.0 ¬∞C


<details>
<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python

from Bio.SeqUtils import MeltingTemp as mt

def calculate_melting_temperature_biopython(dna_sequence):
    dna_seq = Seq(dna_sequence)
    tm = mt.Tm_Wallace(dna_seq)
    return tm

# Example usage:
dna_sequence = "ATGCGTACGTAG"
tm = calculate_melting_temperature_biopython(dna_sequence)
print("Melting Temperature (Tm):", tm, "¬∞C")
```
</details>

### üî∂ **Task 6/7: Calculate GC content**

üéØ **Objective**: Calculate the GC content and the molecular weight of a DNA sequence using SeqUtils.

Use `dna_seq = "ATGCGTACGTAG"` as the DNA sequence

#### üí° **GC Content**

GC content is the percentage of guanine (G) and cytosine (C) bases in a DNA or RNA molecule. It is calculated as:

$$
\text{GC Content} = \left( \frac{\text{Number of G + C bases}}{\text{Total number of bases}} \right) \times 100
$$

GC content affects DNA stability, as G-C pairs form **three hydrogen bonds** (compared to two in A-T pairs), making GC-rich regions more thermally stable.
#### üí° **Molecular Weight of Nucleotides**

Each nucleotide base has a specific average molecular weight (in Daltons, Da):
<div align='center'>

| Base     | Molecular Weight (Da) |
|:----------:|:-----------------------:|
| Adenine (A)   | ~313.21              |
| Thymine (T)   | ~304.2               |
| Guanine (G)   | ~329.21              |
| Cytosine (C)  | ~289.18              |
| Uracil (U)    | ~306.17              |
</div>

‚ö†Ô∏è **Note:** For every nucleotide added to the chain (beyond the first), a water molecule (H‚ÇÇO, ~18.015 Da) is lost due to the formation of phosphodiester bonds.
<br> <br>
Therefore, the molecular weight is computed as the sum of the weights of all bases in the sequence, minus the weight of water for each phosphodiester bond:

$$
\text{Molecular Weight} = \sum \text{Base Weights} - (n - 1) \times \text{H}_2\text{O Weight}
$$

Where $n$ is the number of nucleotides.

In [None]:
from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction, molecular_weight

# Step 1: Define a DNA sequence
dna_seq = Seq("ATGCGTACGTAG")

# Step 2: Calculate the GC content
# üî∏ Continue your code here


# Step 3: Calculate the molecular weight of the DNA sequence
# üî∏ Continue your code here

GC Content: 50.0 %
Molecular Weight: 3765.4015999999997 Da


<details>
<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python
from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction, molecular_weight

# Step 1: Define a DNA sequence
dna_seq = Seq("ATGCGTACGTAG")

# Step 2: Calculate the GC content
gc_content = gc_fraction(dna_seq) * 100  # Convert to percentage
print("GC Content:", gc_content, "%")

# Step 3: Calculate the molecular weight of the DNA sequence
mol_weight = molecular_weight(dna_seq, seq_type='DNA')
print("Molecular Weight:", mol_weight, "Da")

```
</details>

### üî∂ **Task 7/7: Manually Calculate GC Content**
üéØ **Objective**: Calculate the GC content and molecular weight of a DNA sequence in Python without using any external libraries.

In [None]:
def calculate_gc_content(dna_seq):
    """Calculate GC content of a DNA sequence."""
    g_count = dna_seq.count('G')
    c_count = dna_seq.count('C')
    total_bases = len(dna_seq)
    # üî∏ Continue your code here
    # gc_content = 
    
    return gc_content

def calculate_molecular_weight(dna_seq):
    """Calculate molecular weight of a DNA sequence."""
    # Molecular weights of DNA bases with 5'-phosphate group
    weights = {
        'A': 331.2218,  # Adenine
        'T': 322.2085,  # Thymine
        'G': 347.2212,  # Guanine
        'C': 307.1971   # Cytosine
    }
    total_weight = 0 #Store your weight here

    # Sum weights of individual bases
    for base in dna_seq:
        # üî∏ Continue your code here

    # Subtract the weight of water (18.015) for each phosphodiester bond. üí° Hint: Use len(dna_seq)
    # üî∏ Continue your code here

    return total_weight

# Input DNA sequence
dna_sequence = "ATGCGTACGTAG"

# Calculate GC content and molecular weight
gc_content = calculate_gc_content(dna_sequence)
mol_weight = calculate_molecular_weight(dna_sequence)

# Print results
print(f"DNA Sequence: {dna_sequence}")
print(f"GC Content: {gc_content:.2f}%")
print(f"Molecular Weight: {mol_weight:.2f} Da")



DNA Sequence: ATGCGTACGTAG
GC Content: 50.00%
Molecular Weight: 3765.40 Da


<details>
<summary><font color="Orange">Click here to reveal the answer</font></summary>

```python

def calculate_gc_content(dna_seq):
    """Calculate GC content of a DNA sequence."""
    g_count = dna_seq.count('G')
    c_count = dna_seq.count('C')
    total_bases = len(dna_seq)
    gc_content = ((g_count + c_count) / total_bases) * 100
    return gc_content

def calculate_molecular_weight(dna_seq):
    """Calculate molecular weight of a DNA sequence."""
    # Molecular weights of DNA bases with 5'-phosphate group
    weights = {
        'A': 331.2218,  # Adenine
        'T': 322.2085,  # Thymine
        'G': 347.2212,  # Guanine
        'C': 307.1971   # Cytosine
    }
    total_weight = 0

    # Add weights of individual bases
    for base in dna_seq:
        if base in weights:
            total_weight += weights[base]

    # Subtract the weight of water (18.015) for each phosphodiester bond
    total_weight -= 18.015 * (len(dna_seq) - 1)

    return total_weight

# Input DNA sequence
dna_sequence = "ATGCGTACGTAG"

# Calculate GC content and molecular weight
gc_content = calculate_gc_content(dna_sequence)
mol_weight = calculate_molecular_weight(dna_sequence)

# Print results
print(f"DNA Sequence: {dna_sequence}")
print(f"GC Content: {gc_content:.2f}%")
print(f"Molecular Weight: {mol_weight:.2f} Da")

```
</details>

#### Contributed by: Ahmed Bahaj

- [LinkedIn Profile](https://www.linkedin.com/in/ahmed-bahaj-6330031b8/)