# To Identify specific motifs (short sequences of interest) within DNA sequences

**Steps to Perform Motif Finding** 
1. Define a Function to Find Motifs in a Sequence
2. Parse the FASTA File and Use the Function
3. Print the Results

### Step 1: Define a Function to Find Motifs in a Sequence

We'll create a function that takes a DNA sequence and a motif as input and returns the positions where the motif is found

**1. Imports**

In [1]:
from Bio import SeqIO
import re

**2. Find Motifs Function:**

**re.finditer(motif, str(sequence)):** Finds all occurrences of the motif in the sequence and returns an iterator

**[m.start() for m in re.finditer(motif, str(sequence))]:** Extracts the start positions of each motif occurrence

In [2]:
def find_motifs(sequence, motif):
    return [m.start() for m in re.finditer(motif, str(sequence))]

### Step 2: Parse the FASTA File and Use the Function

Parse the FASTA file, find motifs in each sequence, and print the results.

In [3]:
# Function to parse a FASTA file and find motifs
def parse_fasta_find_motifs(file_path, motif):
    for record in SeqIO.parse(file_path, "fasta"):
        positions = find_motifs(record.seq, motif)
        print(f"ID: {record.id}")
        print(f"Motif '{motif}' found at positions: {positions}\n")

In [4]:
fasta_file = "Example1.fasta"

In [6]:
motif = "TTT"  # Replace with your motif of interest
if fasta_file:
    parse_fasta_find_motifs(fasta_file, motif)
else: 
    print("No data fetched")

ID: sequence1
Motif 'TTT' found at positions: []

ID: sequence2
Motif 'TTT' found at positions: [2, 27]

ID: sequence3
Motif 'TTT' found at positions: [14]



### FIND MOTIF FROM THE FASTA SEQUENCE FETCHED FROM THE DATABASE

In [7]:
from Bio import Entrez

In [8]:
from io import StringIO

In [9]:
# Set email for NCBI Entrez
Entrez.email = "k26sangeetha@gmail.com"  # Replace with your email

In [10]:
# Function to fetch FASTA data from NCBI
def fetch_fasta_from_ncbi(query, database="nucleotide"):
    handle = Entrez.esearch(db=database, term=query, retmax=1)
    record = Entrez.read(handle)
    handle.close()
    if record["IdList"]:
        seq_id = record["IdList"][0]
        handle = Entrez.efetch(db=database, id=seq_id, rettype="fasta", retmode="text")
        fasta_data = handle.read()
        handle.close()
        return fasta_data
    else:
        return None

In [11]:
query = "Homo sapiens COX1"
fasta_data = fetch_fasta_from_ncbi(query)
print(fasta_data)

>PP914118.1 Taenia solium isolate B cytochrome c oxidase subunit I (COX1) gene, partial cds; mitochondrial
TAGATTTTTTAATGTTTTCTTTACATTTAGCTGGTGTATCAAGTATTTTTAGTTCTATTAATTTTATATG
TACATTATATAGAGTTTTTATGACTAATATATTTTCTCGTACATCTATAGTGTTATGATCTTATTTATTT
ACATCTATCTTGTTATTGGTTACTTTACCTGTTTTGGCAGCCGCTGTTACTATGCTTCTATTTGATCGTA
AATTTAGTTCTGCGTTTTTTGATCCGTTAGGAGGTGGTGATCCTGTTTTATTTCAACATATGTTTTGATT
TTTTGGTCATCCTGAGGTTTATGTGTTAATTCTTCCGGGGTTTGGTATAATTAGTCATATATGTTTGAGT
ATAAGTATGTGTTCTGATGCTTTTGGCTTTTATGGGTTATTGTTTGCTATGTTTTCAATAGTATGTTTAG
GAAGAAGTGTATGAGGGCATCATATGTTTACGGTTGGGTTAGATGTTAAGACGGCTGTATTTTTTAGTTC
TGTTACTATGATAATTGGAGTGCCTACGGGGATTAAGGTTTTTACTTGGCTTTATATGCTTTTAAAATCT
CGTGTTAATAAGAGTGATCCGGTTTTATGATGAATAATTTCGTTTATAGTATTGTTTACATTTGGTGGTG
TAACTGGTATTATTCTATCTGCTTGTGTATTAGATAAAGTTCTTCATGATACTTGGTTTGTTGTTGCTCA
TTTTCATT




In [15]:
# Function to parse a FASTA file and find motifs
def parse_fasta_find_motifs_db(fasta_string, motif):
    fasta_io = StringIO(fasta_string)
    for record in SeqIO.parse(fasta_io, "fasta"):
        positions = find_motifs(record.seq, motif)
        print(f"ID: {record.id}")
        print(f"Motif '{motif}' found at positions: {positions}\n")

In [16]:
# Example usage
motif = "ATG"
if fasta_data:
    parse_fasta_find_motifs_db(fasta_data, motif)
else:
    print("No data fetched.")

ID: PP914118.1
Motif 'ATG' found at positions: [11, 67, 89, 124, 191, 269, 300, 340, 356, 366, 381, 398, 412, 430, 443, 462, 497, 545, 586, 589, 675]

