## Análise da sequência e das features presentes no NCBI

Carregamento do package e dos módulos necessários

In [1]:
from Bio import Seq
from Bio import SeqIO

Abertura e leitura do ficheiro genbank relativamente ao gene HHEX. E respetiva sequência.

In [2]:
record2 = SeqIO.read("hnf1a.gb", "genbank")
print(record2.seq)

AGCTCCAATGTAAACAGAACAGGCAGGGGCCCTGATTCACGGGCCGCTGGGGCCAGGGTTGGGGGTTGGGGGTGCCCACAGGGCTTGGCTAGTGGGGTTTTGGGGGGGCAGTGGGTGCAAGGAGTTTGGTTTGTGTCTGCCGGCCGGCAGGCAAACGCAACCCACGCGGTGGGGGAGGCGGCTAGCGTGGTGGACCCGGGCCGCGTGGCCCTGTGGCAGCCGAGCCATGGTTTCTAAACTGAGCCAGCTGCAGACGGAGCTCCTGGCGGCCCTGCTCGAGTCAGGGCTGAGCAAAGAGGCACTGATCCAGGCACTGGGTGAGCCGGGGCCCTACCTCCTGGCTGGAGAAGGCCCCCTGGACAAGGGGGAGTCCTGCGGCGGCGGTCGAGGGGAGCTGGCTGAGCTGCCCAATGGGCTGGGGGAGACTCGGGGCTCCGAGGACGAGACGGACGACGATGGGGAAGACTTCACGCCACCCATCCTCAAAGAGCTGGAGAACCTCAGCCCTGAGGAGGCGGCCCACCAGAAAGCCGTGGTGGAGACCCTTCTGCAGGAGGACCCGTGGCGTGTGGCGAAGATGGTCAAGTCCTACCTGCAGCAGCACAACATCCCACAGCGGGAGGTGGTCGATACCACTGGCCTCAACCAGTCCCACCTGTCCCAACACCTCAACAAGGGCACTCCCATGAAGACGCAGAAGCGGGCCGCCCTGTACACCTGGTACGTCCGCAAGCAGCGAGAGGTGGCGCAGCAGTTCACCCATGCAGGGCAGGGAGGGCTGATTGAAGAGCCCACAGGTGATGAGCTACCAACCAAGAAGGGGCGGAGGAACCGTTTCAAGTGGGGCCCAGCATCCCAGCAGATCCTGTTCCAGGCCTATGAGAGGCAGAAGAACCCTAGCAAGGAGGAGCGAGAGACGCTAGTGGAGGAGTGCAATAGGGCGGAATGCATCCAGAGAGGGGTGTCCCCATCACAGGCACAGGGGCTGGGCTCCAACCTC

Tamanho da respetiva sequência de DNA.

In [3]:
print("Tamanho da sequência:", len(record2.seq))

Tamanho da sequência: 3442


O ID, a descrição e o nome deste registo.

In [4]:
print("ID:",record2.id)

ID: NM_000545.8


In [5]:
print("Descrição:",record2.description)

Descrição: Homo sapiens HNF1 homeobox A (HNF1A), transcript variant 2, mRNA


In [6]:
print("Nome:",record2.name)

Nome: NM_000545


A lista de anotações globais (annotations) para este registo.

In [7]:
print("Lista de anotações:",record2.annotations)

Lista de anotações: {'molecule_type': 'mRNA', 'topology': 'linear', 'data_file_division': 'PRI', 'date': '08-JAN-2023', 'accessions': ['NM_000545'], 'sequence_version': 8, 'keywords': ['RefSeq', 'MANE Select'], 'source': 'Homo sapiens (human)', 'organism': 'Homo sapiens', 'taxonomy': ['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Euarchontoglires', 'Primates', 'Haplorrhini', 'Catarrhini', 'Hominidae', 'Homo'], 'references': [Reference(title='The contribution of functional HNF1A variants and polygenic susceptibility to risk of type 2 diabetes in ancestrally diverse populations', ...), Reference(title='Integrated Transcriptomic and Proteomic Analysis Identifies Plasma Biomarkers of Hepatocellular Failure in Alcohol-Associated Hepatitis', ...), Reference(title='The HASTER lncRNA promoter is a cis-acting transcriptional stabilizer of HNF1A', ...), Reference(title='Decreased TCF1 and BCL11B expression predicts poor prognosis for patie

O organismo e a sua classificação taxonómica completa.

In [8]:
print("Organismo:", record2.annotations["source"])

Organismo: Homo sapiens (human)


In [9]:
print("Classificação Taxonómica:", record2.annotations["taxonomy"])

Classificação Taxonómica: ['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Euarchontoglires', 'Primates', 'Haplorrhini', 'Catarrhini', 'Hominidae', 'Homo']


Features do registo, nomeadamente o seu tipo e localização.

In [10]:
for features in record2.features:
    print ("Feature:", features)

Feature: type: source
location: [0:3442](+)
qualifiers:
    Key: chromosome, Value: ['12']
    Key: db_xref, Value: ['taxon:9606']
    Key: map, Value: ['12q24.31']
    Key: mol_type, Value: ['mRNA']
    Key: organism, Value: ['Homo sapiens']

Feature: type: gene
location: [0:3442](+)
qualifiers:
    Key: db_xref, Value: ['GeneID:6927', 'HGNC:HGNC:11621', 'MIM:142410']
    Key: gene, Value: ['HNF1A']
    Key: gene_synonym, Value: ['HNF-1-alpha; HNF-1A; HNF1; HNF1alpha; HNF4A; IDDM20; LFB1; MODY3; TCF-1; TCF1']
    Key: note, Value: ['HNF1 homeobox A']

Feature: type: exon
location: [0:552](+)
qualifiers:
    Key: gene, Value: ['HNF1A']
    Key: gene_synonym, Value: ['HNF-1-alpha; HNF-1A; HNF1; HNF1alpha; HNF4A; IDDM20; LFB1; MODY3; TCF-1; TCF1']
    Key: inference, Value: ['alignment:Splign:2.1.0']

Feature: type: misc_feature
location: [10:13](+)
qualifiers:
    Key: gene, Value: ['HNF1A']
    Key: gene_synonym, Value: ['HNF-1-alpha; HNF-1A; HNF1; HNF1alpha; HNF4A; IDDM20; LFB1; MODY3

In [11]:
for f in range(len(record2.features)):
    print(record2.features[f].type)
    print(record2.features[f].location)


source
[0:3442](+)
gene
[0:3442](+)
exon
[0:552](+)
misc_feature
[10:13](+)
CDS
[226:2122](+)
misc_feature
[226:319](+)
misc_feature
[343:469](+)
misc_feature
[433:436](+)
misc_feature
[445:448](+)
misc_feature
[502:505](+)
misc_feature
[613:622](+)
misc_feature
[652:673](+)
misc_feature
[688:700](+)
misc_feature
[772:841](+)
misc_feature
[814:841](+)
misc_feature
[832:844](+)
misc_feature
[964:967](+)
misc_feature
[1012:1021](+)
misc_feature
[1033:1045](+)
misc_feature
[1072:1300](+)
misc_feature
[1162:1165](+)
misc_feature
[1858:1945](+)
exon
[552:752](+)
exon
[752:939](+)
exon
[939:1181](+)
exon
[1181:1333](+)
exon
[1333:1535](+)
exon
[1535:1727](+)
exon
[1727:1849](+)
exon
[1849:1994](+)
exon
[1994:3442](+)
regulatory
[3413:3419](+)
polyA_site
[3441:3442](+)


Sequências codificantes associadas a este registo  e respetivo índice.

In [12]:
indices=[]
for f in range(len(record2.features)):
   if record2.features[f].type == "CDS":
    indices.append(f)
print("O índice das sequência codificante é:",indices)
for a in indices:
    print (record2.features[a])

O índice das sequência codificante é: [4]
type: CDS
location: [226:2122](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['CCDS:CCDS9209.1', 'GeneID:6927', 'HGNC:HGNC:11621', 'MIM:142410']
    Key: gene, Value: ['HNF1A']
    Key: gene_synonym, Value: ['HNF-1-alpha; HNF-1A; HNF1; HNF1alpha; HNF4A; IDDM20; LFB1; MODY3; TCF-1; TCF1']
    Key: note, Value: ['isoform 2 is encoded by transcript variant 2; hepatic nuclear factor 1; albumin proximal factor; hepatocyte nuclear factor 1-alpha; transcription factor 1, hepatic; interferon production regulator factor; liver-specific transcription factor LF-B1']
    Key: product, Value: ['hepatocyte nuclear factor 1-alpha isoform 2']
    Key: protein_id, Value: ['NP_000536.6']
    Key: translation, Value: ['MVSKLSQLQTELLAALLESGLSKEALIQALGEPGPYLLAGEGPLDKGESCGGGRGELAELPNGLGETRGSEDETDDDGEDFTPPILKELENLSPEEAAHQKAVVETLLQEDPWRVAKMVKSYLQQHNIPQREVVDTTGLNQSHLSQHLNKGTPMKTQKRAALYTWYVRKQREVAQQFTHAGQGGLIEEPTGDELPTKKGRRNRFKWGPASQQILFQAYE

A proteína codificada e qual o seu significado biológico.

In [16]:
qualifiersid=[]
qualifiersim=[]
for a in indices:
    qualifiersid.append(record2.features[a].qualifiers["protein_id"]) 
    qualifiersim.append(record2.features[a].qualifiers["product"]) 
print ("Proteínas codificadas e os seus significados biológicos:")
for c in range(len(indices)):
       print ( qualifiersid[c], qualifiersim[c])
              

Proteínas codificadas e os seus significados biológicos:
['NP_000536.6'] ['hepatocyte nuclear factor 1-alpha isoform 2']


O número de genes anotados no registo (feature tipo “gene”) e quais estão anotados em cada uma das cadeias.

In [17]:
genes=[]
for g in range(len(record2.features)):
    if record2.features[g].type == "gene":
        genes.append(g)
print("Número de genes:",len(genes))

Número de genes: 1


In [18]:
for a in genes:
    print ("Gene", record2.features[a].qualifiers["gene"],"anotado na cadeia", record2.features[a].location.strand)

Gene ['HNF1A'] anotado na cadeia 1
