First example of usage:

We have one mutation of interest (rs4784227-> chr16:52565276) and we want to study its possible implications in the regulation of gene expression. First we will find the enhancers that are located on these coords with the function getCRMs_by_coord.

In [1]:
from PyBioGateway import getCRMs_by_coord, getCRM_info, getCRM_add_info,crm2phen, getPhenotype, crm2gene, gene2protein, prot2bp, gene2crm, gene2tfac

In [2]:
import time

start_time = time.time()

mutation_position = 52565276

# We define a range around the mutation position (e.g., +/- 12500 bases)
range_start = mutation_position - 12500
range_end = mutation_position + 12500

#Now we use the function to get CRMs in the defined range
crms = getCRMs_by_coord("chr-16", range_start, range_end)

# We will count the number of entries in the list
num_entries = len(crms) if isinstance(crms, list) else 0

print(f"Number of CRMs in the specified range: {num_entries}")


Number of CRMs in the specified range: 485


We will select some crm for continue the study, and we will see if the crm is related with any disease using crm2phen function.

In [3]:
for crm in crms:
    crm_name = crm['crm_name']
    phen_results = crm2phen(crm_name)
    if phen_results != ("No data available for the introduced crm or you may have introduced an instance that is not a crm. Check your data type with type_data function."):    
        print(f"CRM: {crm_name}")
        print(f"Phenotypes: {phen_results}\n")


CRM: crm/CRMHS00000005858
Phenotypes: [{'phen_id': 'OMIM/114480', 'database': 'http://biocc.hrbmu.edu.cn/DiseaseEnhancer/; http://health.tsinghua.edu.cn/jianglab/endisease/', 'articles': 'pubmed/23001124'}, {'phen_id': 'MESH/D001943', 'database': 'http://biocc.hrbmu.edu.cn/DiseaseEnhancer/; http://health.tsinghua.edu.cn/jianglab/endisease/', 'articles': 'pubmed/23001124'}, {'phen_id': 'DOID/DOID_1612', 'database': 'http://biocc.hrbmu.edu.cn/DiseaseEnhancer/; http://health.tsinghua.edu.cn/jianglab/endisease/', 'articles': 'pubmed/23001124'}]



We get the crm that are related to a phenotype, in this case only crm/CRMHS00000005858 is related to a disease, which is Breast cancer.

In [4]:
print(getPhenotype("114480"))

[{'phen_label': 'BREAST CANCER'}]


We will now find the information available for our crm with these two functions: getCRM_info and getCRM_add_info:

In [5]:
getCRM_info("crm/CRMHS00000005858")

[{'start': '52556490',
  'end': '52566288',
  'chromosome': 'NC_000016.10',
  'assembly': 'GCF_000001405.26',
  'taxon': 'NCBITaxon_9606',
  'definition': 'Cis-regulatory module located in Homo sapiens chr16 between 52556490 and 52566288'}]

In [6]:
getCRM_add_info("crm/CRMHS00000005858")

{'evidence': None,
 'database': 'http://biocc.hrbmu.edu.cn/DiseaseEnhancer/',
 'biological_samples': '',
 'articles': 'pubmed/23001124'}

We are going now to search if our enhancer have any target gene using crm2gene function.

In [7]:
genes=crm2gene("crm/CRMHS00000005858")
print(genes)

[{'gene_name': 'TOX3', 'database': 'http://biocc.hrbmu.edu.cn/DiseaseEnhancer/; http://health.tsinghua.edu.cn/jianglab/endisease/', 'articles': 'pubmed/23001124'}]


The gene TOX3 is related to our crm, now we can find which proteins are encoded by this gene in Homo sapiens with gene2protein function.

In [8]:
protein=gene2protein("TOX3","Homo sapiens")
print(protein)

[{'prot_name': 'H3BTZ9_HUMAN'}, {'prot_name': 'J3QQQ6_HUMAN'}, {'prot_name': 'TOX3_HUMAN'}]


We want to know in which biological process are involved these proteins, we will use prot2bp function.

In [9]:
for prot in protein:
    prot_name=prot['prot_name']
    bp_results=prot2bp(prot_name)
    print(f"Protein {prot_name}")
    print(f"Biological process: {bp_results}\n")

Protein H3BTZ9_HUMAN
Biological process: No data available for the introduced protein or you may have introduced an instance that is not a protein. Check your data type with type_data function.

Protein J3QQQ6_HUMAN
Biological process: No data available for the introduced protein or you may have introduced an instance that is not a protein. Check your data type with type_data function.

Protein TOX3_HUMAN
Biological process: [{'bp_id': 'GO:0042981', 'bp_label': 'regulation of apoptotic process', 'relation_label': 'O15405--GO:0042981', 'database': 'goa/', 'articles': 'pubmed/21172805'}, {'bp_id': 'GO:0043524', 'bp_label': 'negative regulation of neuron apoptotic process', 'relation_label': 'O15405--GO:0043524', 'database': 'goa/', 'articles': 'pubmed/21172805'}, {'bp_id': 'GO:0045944', 'bp_label': 'positive regulation of transcription by RNA polymerase II', 'relation_label': 'O15405--GO:0045944', 'database': 'goa/', 'articles': 'pubmed/21172805'}]



We find that TOX3_HUMAN protein is related with regulation of apoptotic process, regulation of transcription by RNA polymerase II, negative regulation of neuron apoptotic process and positive regulation of transcription by RNA polymerase II.

As multiple enhancers can regulate the same gene, we are going to find with the function gene2crm which others enhancers can regulate our target gene:

In [10]:
enh=gene2crm("TOX3")
print("Number of enhancers related with our target gene",len(enh))

Number of enhancers related with our target gene 393


There is a total of 393 enhancers related with our target gene.

Finally, we will find if there are any transcription factor related with our target gene using then function gene2tfac.

In [11]:
tf=gene2tfac("TOX3")



transcription_factors = tf[1]

print("Number of transcription factors related with our target gene:", len(transcription_factors))
print("")
print(tf)


Number of transcription factors related with our target gene: 212

('Transcription factors related with the selected gene:', [{'tfac_name': 'ARI4B_HUMAN', 'database': 'https://tflink.net', 'articles': 'pubmed/27924024', 'evidence_level': 'Low', 'definition': 'Q4LE39 involved in regulation of 9606/TOX3'}, {'tfac_name': 'CTCF_HUMAN', 'database': 'https://tflink.net', 'articles': 'pubmed/18971253; pubmed/26578589; pubmed/27924024; pubmed/29126285', 'evidence_level': 'Low', 'definition': 'P49711 involved in regulation of 9606/TOX3'}, {'tfac_name': 'AGO2_HUMAN', 'database': 'https://tflink.net', 'articles': 'pubmed/27924024', 'evidence_level': 'Low', 'definition': 'Q9UKV8 involved in regulation of 9606/TOX3'}, {'tfac_name': 'ANDR_HUMAN', 'database': 'https://tflink.net', 'articles': 'pubmed/27924024', 'evidence_level': 'Low', 'definition': 'P10275 involved in regulation of 9606/TOX3'}, {'tfac_name': 'ARID2_HUMAN', 'database': 'https://tflink.net', 'articles': 'pubmed/27924024; pubmed/291262

In [12]:
#There is a total of 212 transcription factors related with our target gene.
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time} seconds")

Execution time: 150.55037379264832 seconds
