**Análise de Homologias por BLAST**

In [4]:
from Bio import SeqIO
from Bio.Blast import NCBIWWW, NCBIXML
import re

def blast_and_filter(gene_names, e_value_threshold=1e-5, percent_identity_threshold=50, coverage_threshold=50):
    for name_gene in gene_names:
        # Leitura da sequência e execução do BLAST
        try:
            query_seq = SeqIO.read(f"genes/{name_gene}.fasta", "fasta")
        except FileNotFoundError:
            print(f"Arquivo não encontrado para {name_gene}. Pull...")
            continue

        print(f"Iniciando busca BLAST para {name_gene}...")
        result_handle = NCBIWWW.qblast("blastp", "nr", query_seq.seq)
        print(f"BLAST concluído para {name_gene}.")

        # Parsing e filtragem dos resultados
        blast_records = NCBIXML.parse(result_handle)
        output_path = f"genes/{name_gene}_blast.fasta"
        
        with open(output_path, "w") as output_handle:
            for blast_record in blast_records:
                print(f"Número de alinhamentos encontrados para {name_gene}:", len(blast_record.alignments))
                for alignment in blast_record.alignments:
                    print("Título do alinhamento:", alignment.title)
                    for hsp in alignment.hsps:
                        query_cover = (hsp.align_length / blast_record.query_letters) * 100
                        print(f"HSP: E-value: {hsp.expect}, Identities: {hsp.identities}, "
                              f"Align length: {hsp.align_length}, Query Cover: {query_cover:.2f}%")
                        
                        percent_identity = (hsp.identities / hsp.align_length) * 100
                        if (hsp.expect <= e_value_threshold and
                            percent_identity >= percent_identity_threshold and
                            query_cover >= coverage_threshold):
                            
                            species_match = re.search(r"\[(.*?)\]", alignment.title)
                            species = species_match.group(1) if species_match else "Unknown species"
                            
                            SeqIO.write(
                                SeqIO.SeqRecord(
                                    seq=hsp.sbjct,
                                    id=alignment.accession,
                                    description=f"E-value: {hsp.expect:.2e}, Identities: {hsp.identities}/{hsp.align_length}, "
                                                f"Query Cover: {query_cover:.2f}%, Percent Identity: {percent_identity:.2f}%, "
                                                f"Species: {species}"
                                ),
                                output_handle,
                                "fasta"
                            )
                            break  # Pega apenas o melhor HSP para cada alinhamento
        
        print(f"Resultados filtrados do BLAST para {name_gene} foram salvos em '{output_path}'")



#### **1: Gene ptsP**

In [5]:
gene_names = ["ptsP"]
blast_and_filter(gene_names)

Iniciando busca BLAST para ptsP...




Busca BLAST concluída para ptsP.
Número de alinhamentos encontrados para ptsP: 50
Título do alinhamento: ref|WP_005925321.1| phosphoenolpyruvate--protein phosphotransferase [Faecalibacterium prausnitzii] >gb|EDP19718.1| phosphoenolpyruvate-protein phosphotransferase [Faecalibacterium prausnitzii M21/2] >gb|MCI3184523.1| phosphoenolpyruvate--protein phosphotransferase [Faecalibacterium prausnitzii] >gb|MCI3202328.1| phosphoenolpyruvate--protein phosphotransferase [Faecalibacterium prausnitzii] >gb|MDU8657129.1| phosphoenolpyruvate--protein phosphotransferase [Faecalibacterium prausnitzii] >gb|MDW2997156.1| phosphoenolpyruvate--protein phosphotransferase [Faecalibacterium prausnitzii]
HSP: E-value: 0.0, Identities: 547, Align length: 547, Query Cover: 100.00%
Título do alinhamento: ref|WP_097783314.1| phosphoenolpyruvate--protein phosphotransferase [Faecalibacterium prausnitzii] >gb|MDU8670066.1| phosphoenolpyruvate--protein phosphotransferase [Faecalibacterium prausnitzii] >gb|MDU872451



#### **2. Gene ButyrylCoA**

In [6]:
gene_names = ["butyrylCoA"]
blast_and_filter(gene_names)

Iniciando busca BLAST para butyrylCoA...




Busca BLAST concluída para butyrylCoA.
Número de alinhamentos encontrados para butyrylCoA: 50
Título do alinhamento: ref|WP_044960620.1| MULTISPECIES: butyryl-CoA:acetate CoA-transferase [Faecalibacterium] >gb|MBP9564639.1| butyryl-CoA:acetate CoA-transferase [Faecalibacterium sp.] >gb|AXB28579.1| butyryl-CoA:acetate CoA-transferase [Faecalibacterium prausnitzii] >gb|MBV0896480.1| butyryl-CoA:acetate CoA-transferase [Faecalibacterium prausnitzii] >gb|MBV0926594.1| butyryl-CoA:acetate CoA-transferase [Faecalibacterium prausnitzii] >gb|MCG4793536.1| butyryl-CoA:acetate CoA-transferase [Faecalibacterium prausnitzii]
HSP: E-value: 0.0, Identities: 448, Align length: 448, Query Cover: 100.00%
Título do alinhamento: ref|WP_097783900.1| MULTISPECIES: butyryl-CoA:acetate CoA-transferase [Faecalibacterium] >gb|MDR3769479.1| butyryl-CoA:acetate CoA-transferase [Faecalibacterium sp.] >gb|UYI72064.1| MAG: butyryl-CoA:acetate CoA-transferase [Oscillospiraceae bacterium] >gb|MBD8928006.1| butyryl-Co