# Nucleotide Counts and GC Sliding Window

## Description
This notebook demonstrates:
- Counting nucleotides from a DNA file
- Calculating GC content using a sliding window
- Using multiple methods to verify results

It introduces Python file handling and sequence analysis techniques for bioinformatics.

In [31]:
# 30-07-2025 Day 2
with open("dna.txt","r") as file:
    print(file.read())

AGTXCGAGXXTGCACTGAAZCGT


In [34]:
# Challenge 1: Count nucleotide from a DNA file

with open("dna.txt","r") as file:
    sequence= file.read().upper()
frequency= {}        # shows the frequency even if the sequence is more than a line
for base in sequence:
    if base in 'ATCG':
        frequency[base]= frequency.get(base,0) + 1
print(f"Frequency of each base:{frequency}")

Frequency of each base:{'A': 5, 'G': 6, 'T': 4, 'C': 4}


In [1]:
# Challenge 2: Count GC content in sliding windows

with open("dna.txt","r") as file:
    sequence= file.read().upper()

cleaned_sequence= ""
for base in sequence:
    if base in 'AGTC':
        cleaned_sequence +=base    # slower for larger files
        
window_size = 5
for i in range(len(cleaned_sequence) - window_size + 1):
    window = cleaned_sequence[i:i+window_size]
    gc_window = 0
    for base in window:
        if base in 'GC':
            gc_window += 1
    gc_content= round(gc_window/window_size * 100, 2) 

    print(f"Window: {window} -> GC Content: {gc_content}%")   

Window: AGTCG -> GC Content: 60.0%
Window: GTCGA -> GC Content: 60.0%
Window: TCGAG -> GC Content: 60.0%
Window: CGAGT -> GC Content: 60.0%
Window: GAGTG -> GC Content: 60.0%
Window: AGTGC -> GC Content: 60.0%
Window: GTGCA -> GC Content: 60.0%
Window: TGCAC -> GC Content: 60.0%
Window: GCACT -> GC Content: 60.0%
Window: CACTG -> GC Content: 60.0%
Window: ACTGA -> GC Content: 40.0%
Window: CTGAA -> GC Content: 40.0%
Window: TGAAC -> GC Content: 40.0%
Window: GAACG -> GC Content: 60.0%
Window: AACGT -> GC Content: 40.0%


In [42]:
# Another method
with open("dna.txt","r") as file:
    sequence= file.read().upper()

clean_sequence=[base for base in sequence if base in 'ATGC']
cleaned_sequence= "".join(clean_sequence)

window_size = 5
for i in range (len(cleaned_sequence) - window_size + 1):
    window= cleaned_sequence[i:i+window_size]
    gc_window= 0
    for base in window:
        if base in 'GC':
            gc_window +=1
    gc_content= round(gc_window/window_size * 100,2)
    print(f" Window {i}: {window} --> GC Content: {gc_content}%")

 Window 0: AGTCG --> GC Content: 60.0%
 Window 1: GTCGA --> GC Content: 60.0%
 Window 2: TCGAG --> GC Content: 60.0%
 Window 3: CGAGT --> GC Content: 60.0%
 Window 4: GAGTG --> GC Content: 60.0%
 Window 5: AGTGC --> GC Content: 60.0%
 Window 6: GTGCA --> GC Content: 60.0%
 Window 7: TGCAC --> GC Content: 60.0%
 Window 8: GCACT --> GC Content: 60.0%
 Window 9: CACTG --> GC Content: 60.0%
 Window 10: ACTGA --> GC Content: 40.0%
 Window 11: CTGAA --> GC Content: 40.0%
 Window 12: TGAAC --> GC Content: 40.0%
 Window 13: GAACG --> GC Content: 60.0%
 Window 14: AACGT --> GC Content: 40.0%
