<a href="https://colab.research.google.com/github/FarahhhFatima/dna-sequence-analyzer/blob/main/DNA_Sequence_Analyzer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### DNA Sequence Analyzer

This section will contain functions to analyze DNA sequences. We'll start with basic statistics like sequence length, base counts, and GC content.

In [2]:
def analyze_dna_sequence(dna_sequence):
    """
    Analyzes a DNA sequence to provide its length, base counts, and GC content.

    Args:
        dna_sequence (str): The DNA sequence string (e.g., 'ATGCAGTG').

    Returns:
        dict: A dictionary containing the analysis results.
    """
    dna_sequence = dna_sequence.upper() # Convert to uppercase for consistent analysis

    # Calculate sequence length
    seq_length = len(dna_sequence)

    # Calculate base counts
    a_count = dna_sequence.count('A')
    t_count = dna_sequence.count('T')
    c_count = dna_sequence.count('C')
    g_count = dna_sequence.count('G')

    # Calculate GC content
    gc_content = ((c_count + g_count) / seq_length * 100) if seq_length > 0 else 0.0

    analysis_results = {
        "sequence_length": seq_length,
        "A_count": a_count,
        "T_count": t_count,
        "C_count": c_count,
        "G_count": g_count,
        "GC_content": round(gc_content, 2) # Round to 2 decimal places
    }

    return analysis_results

Let's test our `analyze_dna_sequence` function with a sample DNA sequence.

In [4]:
# Example usage:
sample_dna = "ATGCGTACGTTAG"
analysis = analyze_dna_sequence(sample_dna)

import pandas as pd
# Display the results in a readable format (e.g., using a Pandas DataFrame)
display(pd.DataFrame([analysis]).T.rename(columns={0: 'Value'}))

Unnamed: 0,Value
sequence_length,13.0
A_count,3.0
T_count,4.0
C_count,2.0
G_count,4.0
GC_content,46.15
