## Análise da sequência e das features presentes no NCBI

Carregamento do package e dos módulos necessários

In [1]:
from Bio import Seq
from Bio import SeqIO

Abertura e leitura do ficheiro genbank relativamente ao gene HHEX. E respetiva sequência.

In [7]:
record2 = SeqIO.read("hhex.gb", "genbank")
print(record2.seq)

AGCTCTGCGAGGGGCCGGAGCGCGGCGGAGCCATGCAGTACCCGCACCCCGGGCCGGCGGCGGGCGCCGTGGGGGTGCCGCTGTACGCGCCCACGCCGCTGCTGCAACCCGCACACCCGACGCCCTTTTACATCGAGGACATCCTGGGCCGCGGGCCCGCCGCGCCCACGCCCGCCCCCACGCTGCCGTCCCCCAACTCCTCCTTCACCAGCCTCGTGTCCCCCTACCGGACCCCGGTGTACGAGCCCACGCCGATCCATCCAGCCTTCTCGCACCACTCCGCCGCCGCGCTGGCCGCTGCCTACGGACCCGGCGGCTTCGGGGGCCCTCTGTACCCCTTCCCGCGGACGGTGAACGACTACACGCACGCCCTGCTCCGCCACGACCCCCTGGGCAAACCTCTACTCTGGAGCCCCTTCTTGCAGAGGCCTCTGCATAAAAGGAAAGGCGGCCAGGTGAGATTCTCCAACGACCAGACCATCGAGCTGGAGAAGAAATTCGAGACGCAGAAATATCTCTCTCCGCCCGAGAGGAAGCGTCTGGCCAAGATGCTGCAGCTCAGCGAGAGACAGGTCAAAACCTGGTTTCAGAATCGACGCGCTAAATGGAGGAGACTAAAACAGGAGAACCCTCAAAGCAATAAAAAAGAAGAACTGGAAAGTTTGGACAGTTCCTGTGATCAGAGGCAAGATTTGCCCAGTGAACAGAATAAAGGTGCTTCTTTGGATAGCTCTCAATGTTCGCCCTCCCCTGCCTCCCAGGAAGACCTTGAATCAGAGATTTCAGAGGATTCTGATCAGGAAGTGGACATTGAGGGCGATAAAAGCTATTTTAATGCTGGATGATGACCACTGGCATTGGCATGTTCAGAAAACTGGATTTAGGAATAATGTTTTGCTACAGAAAATCTTCATAGAAGAACTGGAAGGCTATATAAGAAAGGGAATCAATTCTCTGGTATTCTGGAAACCTAAAAATATTTGGTGCACTGCTCAATTAA

Tamanho da respetiva sequência de DNA.

In [6]:
print("Tamanho da sequência:", len(record2.seq))

Tamanho da sequência: 1724


O ID, a descrição e o nome deste registo.

In [8]:
print("ID:",record2.id)

ID: NM_002729.5


In [12]:
print("Descrição:",record2.description)

Descrição: Homo sapiens hematopoietically expressed homeobox (HHEX), mRNA


In [13]:
print("Nome:",record2.name)

Nome: NM_002729


A lista de anotações globais (annotations) para este registo.

In [14]:
print("Lista de anotações:",record2.annotations)

Lista de anotações: {'molecule_type': 'mRNA', 'topology': 'linear', 'data_file_division': 'PRI', 'date': '24-DEC-2022', 'accessions': ['NM_002729', 'NM_001529'], 'sequence_version': 5, 'keywords': ['RefSeq', 'MANE Select'], 'source': 'Homo sapiens (human)', 'organism': 'Homo sapiens', 'taxonomy': ['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Euarchontoglires', 'Primates', 'Haplorrhini', 'Catarrhini', 'Hominidae', 'Homo'], 'references': [Reference(title='CK2-induced cooperation of HHEX with the YAP-TEAD4 complex promotes colorectal tumorigenesis', ...), Reference(title='Unraveling the Influence of HHEX Risk Polymorphism rs7923837 on Multiple Sclerosis Pathogenesis', ...), Reference(title='Integrated single-cell transcriptomics and epigenomics reveals strong germinal center-associated etiology of autoimmune risk loci', ...), Reference(title='Hhex inhibits cell migration via regulating RHOA/CDC42-CFL1 axis in human lung cancer cell

O organismo e a sua classificação taxonómica completa.

In [15]:
print("Organismo:", record2.annotations["source"])

Organismo: Homo sapiens (human)


In [16]:
print("Classificação Taxonómica:", record2.annotations["taxonomy"])

Classificação Taxonómica: ['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Euarchontoglires', 'Primates', 'Haplorrhini', 'Catarrhini', 'Hominidae', 'Homo']


Features do registo, nomeadamente o seu tipo e localização.

In [17]:
for features in record2.features:
    print ("Feature:", features)

Feature: type: source
location: [0:1724](+)
qualifiers:
    Key: chromosome, Value: ['10']
    Key: db_xref, Value: ['taxon:9606']
    Key: map, Value: ['10q23.33']
    Key: mol_type, Value: ['mRNA']
    Key: organism, Value: ['Homo sapiens']

Feature: type: gene
location: [0:1724](+)
qualifiers:
    Key: db_xref, Value: ['GeneID:3087', 'HGNC:HGNC:4901', 'MIM:604420']
    Key: gene, Value: ['HHEX']
    Key: gene_synonym, Value: ['HEX; HMPH; HOX11L-PEN; PRH; PRHX']
    Key: note, Value: ['hematopoietically expressed homeobox']

Feature: type: exon
location: [0:393](+)
qualifiers:
    Key: gene, Value: ['HHEX']
    Key: gene_synonym, Value: ['HEX; HMPH; HOX11L-PEN; PRH; PRHX']
    Key: inference, Value: ['alignment:Splign:2.1.0']

Feature: type: CDS
location: [32:845](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['CCDS:CCDS7423.1', 'GeneID:3087', 'HGNC:HGNC:4901', 'MIM:604420']
    Key: gene, Value: ['HHEX']
    Key: gene_synonym, Value: ['HEX; HMPH; HOX11L-

In [43]:
for f in range(len(record2.features)):
    print(record2.features[f].type)
    print(record2.features[f].location)


source
[0:105545](+)
gene
[12177:>93366](+)
mRNA
join{[12177:12390](+), [13545:13644](+), [21881:21935](+), [31127:31205](+), [32704:32793](+), [33399:33539](+), [37780:37886](+), [40371:40417](+), [41738:41815](+), [42800:46226](+), [46628:46717](+), [55085:55257](+), [61046:61173](+), [63139:63330](+), [66422:66733](+), [69965:70053](+), [73709:73787](+), [74287:74328](+), [80525:80609](+), [86543:86598](+), [88466:88540](+), [89957:90018](+), [91858:93366](+)}
mRNA
join{[12177:12390](+), [13545:13644](+), [21881:21935](+), [31127:31205](+), [32704:32793](+), [33399:33539](+), [37780:37886](+), [40371:40417](+), [41738:41815](+), [42800:46226](+), [46628:46717](+), [55085:55257](+), [58261:58327](+), [61049:61173](+), [63139:63330](+), [66422:66733](+), [69965:70053](+), [73709:73787](+), [74287:74328](+), [80525:80609](+), [86543:86598](+), [88466:88540](+), [89957:90018](+), [91858:93366](+)}
mRNA
join{[12209:12384](+), [13545:13644](+), [31127:31205](+), [32704:32793](+), [33399:3

Sequências codificantes associadas a este registo  e respetivo índice.

In [22]:
indices=[]
for f in range(len(record2.features)):
   if record2.features[f].type == "CDS":
    indices.append(f)
print("O índice das sequência codificante é:",indices)
for a in indices:
    print (record2.features[a])

O índice das sequência codificante é: [3]
type: CDS
location: [32:845](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['CCDS:CCDS7423.1', 'GeneID:3087', 'HGNC:HGNC:4901', 'MIM:604420']
    Key: gene, Value: ['HHEX']
    Key: gene_synonym, Value: ['HEX; HMPH; HOX11L-PEN; PRH; PRHX']
    Key: note, Value: ['homeobox, hematopoietically expressed; proline-rich homeodomain-containing transcription factor; homeobox protein HEX; homeobox protein PRH']
    Key: product, Value: ['hematopoietically-expressed homeobox protein HHEX']
    Key: protein_id, Value: ['NP_002720.1']
    Key: translation, Value: ['MQYPHPGPAAGAVGVPLYAPTPLLQPAHPTPFYIEDILGRGPAAPTPAPTLPSPNSSFTSLVSPYRTPVYEPTPIHPAFSHHSAAALAAAYGPGGFGGPLYPFPRTVNDYTHALLRHDPLGKPLLWSPFLQRPLHKRKGGQVRFSNDQTIELEKKFETQKYLSPPERKRLAKMLQLSERQVKTWFQNRRAKWRRLKQENPQSNKKEELESLDSSCDQRQDLPSEQNKGASLDSSQCSPSPASQEDLESEISEDSDQEVDIEGDKSYFNAG']



A proteína codificada e qual o seu significado biológico.

In [19]:
qualifiersid=[]
qualifiersim=[]
for a in indices:
    qualifiersid.append(record2.features[a].qualifiers["protein_id"]) 
    qualifiersim.append(record2.features[a].qualifiers["product"]) 
print ("Proteínas codificadas e os seus significados biológicos:")
for c in range(len(indices)):
       print ( qualifiersid[c], qualifiersim[c])
              

Proteínas codificadas e os seus significados biológicos:
['NP_002720.1'] ['hematopoietically-expressed homeobox protein HHEX']


O número de genes anotados no registo (feature tipo “gene”) e quais estão anotados em cada uma das cadeias.

In [20]:
genes=[]
for g in range(len(record2.features)):
    if record2.features[g].type == "gene":
        genes.append(g)
print("Número de genes:",len(genes))

Número de genes: 1


In [21]:
for a in genes:
    print ("Gene", record2.features[a].qualifiers["gene"],"anotado na cadeia", record2.features[a].location.strand)

Gene ['HHEX'] anotado na cadeia 1
