In [1]:
from PyBioGateway import getPhenotype, phen2gene, gene2protein, prot2bp, gene2crm, phen2crm, crm2gene, gene2tfac

In this second example, we will analyse the regulatory network that might be related to a specific phenotype, in this case, breast cancer. First, we will identify which phenotypes are under study, since they contain "breast cancer" in their name. For this purpose, we will use the getPhenotype function.


In [2]:
import time

start_time = time.time()
bc_phenotypes=getPhenotype("breast cancer")
len(bc_phenotypes)


75

There are a total of 75 phenotypes that include breast cancer in their names. Now, we will find out which genes are involved with these phenotypes and all the proteins coded by these genes. Additionally, we will identify all biological processes that these proteins are involved in.

First we are going to get genes related with every phenotype, using phen2gene function.

In [3]:
gene_results=[]
for phen in bc_phenotypes:
    omim_id=phen["omim_id"]
    gene_bc=phen2gene(omim_id)
    if gene_bc != 'No data available for the introduced phenotype or you may have introduced an instance that is not a phenotype. Check your data type with type_data function.':
        gene_results.append(gene_bc)
        print("omim_id:", omim_id)
        print(gene_bc)
unique_gene=set()
for gene in gene_results:
     for gene_dict in gene:
        gene_id = gene_dict['gene_name']
        unique_gene.add(gene_id)
print("Genes related with breast cancer", unique_gene)


omim_id: 137215
[{'gene_name': 'CDH1'}]
omim_id: 114480
[{'gene_name': 'NBN'}, {'gene_name': 'ABRAXAS1'}, {'gene_name': 'AKT1'}, {'gene_name': 'BRCA1'}, {'gene_name': 'BRCA2'}, {'gene_name': 'BRIP1'}, {'gene_name': 'CHEK2'}, {'gene_name': 'PALB2'}, {'gene_name': 'PIK3CA'}, {'gene_name': 'RAD51'}]
omim_id: 114480
[{'gene_name': 'NBN'}, {'gene_name': 'ABRAXAS1'}, {'gene_name': 'AKT1'}, {'gene_name': 'BRCA1'}, {'gene_name': 'BRCA2'}, {'gene_name': 'BRIP1'}, {'gene_name': 'CHEK2'}, {'gene_name': 'PALB2'}, {'gene_name': 'PIK3CA'}, {'gene_name': 'RAD51'}]
omim_id: 114480
[{'gene_name': 'NBN'}, {'gene_name': 'ABRAXAS1'}, {'gene_name': 'AKT1'}, {'gene_name': 'BRCA1'}, {'gene_name': 'BRCA2'}, {'gene_name': 'BRIP1'}, {'gene_name': 'CHEK2'}, {'gene_name': 'PALB2'}, {'gene_name': 'PIK3CA'}, {'gene_name': 'RAD51'}]
omim_id: 137215
[{'gene_name': 'CDH1'}]
omim_id: 137215
[{'gene_name': 'CDH1'}]
omim_id: 312300
[{'gene_name': 'AR'}]
omim_id: 604370
[{'gene_name': 'BRCA1'}]
omim_id: 613399
[{'gene_nam

Genes related with breast cancer are 'AKT1', 'PALB2', 'RAD51', 'NBN', 'PIK3CA', 'BRIP1', 'CHEK2', 'CDH1', 'BRCA1', 'AR', 'BRCA2', 'RAD51C' and 'ABRAXAS1', now we will get proteins that are encoded by these genes whit gene2prot function:

In [4]:
prot_results=[]
for gene in unique_gene:
    prot_name=gene2protein(gene,"Homo sapiens")
    if prot_name != "No data available for the introduced gene. Check that the gene id is correct or if you have introduced the taxon correctly.":
        prot_results.append(prot_name)
        print("gene_id: ", gene)
        print(prot_name)

unique_prot=set()
for prot in prot_results:
     for prot_dict in prot:
        prot_id = prot_dict['prot_name']
        unique_prot.add(prot_id)
print("")
print("Protein related with breast cancer", unique_prot)
print(len(unique_prot))

gene_id:  CHEK2
[{'prot_name': 'A0A087X102_HUMAN'}, {'prot_name': 'A0A3B3ITA7_HUMAN'}, {'prot_name': 'A0A7P0MUT5_HUMAN'}, {'prot_name': 'B7ZBF2_HUMAN'}, {'prot_name': 'B7ZBF7_HUMAN'}, {'prot_name': 'B7ZBF8_HUMAN'}, {'prot_name': 'C9JFD7_HUMAN'}, {'prot_name': 'F8WCV2_HUMAN'}, {'prot_name': 'H0Y4V6_HUMAN'}, {'prot_name': 'H0Y820_HUMAN'}, {'prot_name': 'H7BZ30_HUMAN'}, {'prot_name': 'H7C0V7_HUMAN'}, {'prot_name': 'CHK2_HUMAN'}]
gene_id:  AR
[{'prot_name': 'A0A087WUX9_HUMAN'}, {'prot_name': 'A0A087X1B6_HUMAN'}, {'prot_name': 'A0A7I2PS51_HUMAN'}, {'prot_name': 'E9PEG3_HUMAN'}, {'prot_name': 'F5GZG9_HUMAN'}, {'prot_name': 'ANDR_HUMAN'}]
gene_id:  RAD51C
[{'prot_name': 'A0A087WZ35_HUMAN'}, {'prot_name': 'A0A8V8TL64_HUMAN'}, {'prot_name': 'A0A8V8TML3_HUMAN'}, {'prot_name': 'A0A8V8TML8_HUMAN'}, {'prot_name': 'A0A8V8TMU8_HUMAN'}, {'prot_name': 'E9PI66_HUMAN'}, {'prot_name': 'H7C1R0_HUMAN'}, {'prot_name': 'H7C2Q5_HUMAN'}, {'prot_name': 'J3QKK3_HUMAN'}, {'prot_name': 'J3QLB5_HUMAN'}, {'prot_name'

Finally we will look for biological processes involving proteins related to breast cancer. We will use prot2bp function:

In [5]:
bp_results=[]
for prot in unique_prot:
    bp=prot2bp(prot)
    if bp != 'No data available for the introduced protein or you may have introduced an instance that is not a protein. Check your data type with type_data function.':
        bp_results.append(bp)
        print("prot_name: ", prot)
        print(len(bp))
unique_bp=set()
for bp in bp_results:
     for bp_dict in bp:
        bp_label = bp_dict['bp_label']
        unique_bp.add(bp_label)
print("")
print("Biological processes involving proteins related to breast cancer", len(unique_bp))
print("")
print(unique_bp)

prot_name:  AKT1_HUMAN
87
prot_name:  CHK2_HUMAN
17
prot_name:  BRCA1_HUMAN
42
prot_name:  CADH1_HUMAN
16
prot_name:  RAD51_HUMAN
19
prot_name:  ABRX1_HUMAN
8
prot_name:  BRCA2_HUMAN
11
prot_name:  PK3CA_HUMAN
17
prot_name:  ANDR_HUMAN
25
prot_name:  PALB2_HUMAN
1
prot_name:  RA51C_HUMAN
6
prot_name:  FANCJ_HUMAN
8
prot_name:  NBN_HUMAN
20

Biological processes involving proteins related to breast cancer 220

{'positive regulation of TORC1 signaling', 'regulation of gene expression', 'double-strand break repair', 'negative regulation of cell population proliferation', 'non-canonical NF-kappaB signal transduction', 'negative regulation of leukocyte cell-cell adhesion', 'DNA repair', 'synapse assembly', 'double-strand break repair via homologous recombination', 'negative regulation of transcription by RNA polymerase II', 'cellular response to ionizing radiation', 'cellular response to epidermal growth factor stimulus', 'replicative senescence', 'positive regulation of endothelial cell pr

As we can see, biological processes related with breast cancer related proteins are DNA repair, cell population proliferation, regulation and apoptotic process and angiogenesis among others.

We will now find which enhancers are related with our breast cancer's genes, using the function gene2crm and we will compare the result with enhancers that will be obtained with phen2crm function (enhancers related with breast cancer phenotype).

In [6]:
crm_gene=[]
for gene in unique_gene:
    crm_name=gene2crm(gene)
    if crm_name != "No data available for the introduced gene or you may have introduced an instance that is not a gene. Check your data type with type_data function." :
        crm_gene.append(crm_name)
        print("gene_id: ", gene)
        print("Number of crm related to the gene:", len(crm_name))
unique_crm_gene=set()
for crm in crm_gene:
     for crm_dict in crm:
        crm_id = crm_dict['crm_name']
        unique_crm_gene.add(crm_id)
print("")
print("Number of crm related with breast cancer genes", len(unique_crm_gene))


gene_id:  CHEK2
Number of crm related to the gene: 897
gene_id:  AR
Number of crm related to the gene: 360
gene_id:  RAD51C
Number of crm related to the gene: 601
gene_id:  RAD51
Number of crm related to the gene: 546
gene_id:  AKT1
Number of crm related to the gene: 966
gene_id:  CDH1
Number of crm related to the gene: 903
gene_id:  PIK3CA
Number of crm related to the gene: 414
gene_id:  BRCA1
Number of crm related to the gene: 711
gene_id:  BRIP1
Number of crm related to the gene: 361
gene_id:  BRCA2
Number of crm related to the gene: 467
gene_id:  PALB2
Number of crm related to the gene: 481
gene_id:  NBN
Number of crm related to the gene: 532
gene_id:  ABRAXAS1
Number of crm related to the gene: 442

Number of crm related with breast cancer genes 7681


In [7]:
crm_phen=[]
for phen in bc_phenotypes:
    omim_id=phen["omim_id"]
    crm_bc=phen2crm(omim_id)
    if crm_bc != "No data available for the introduced phenotype or you may have introduced an instance that is not a phenotype. Check your data type with type_data function.":
        crm_phen.append(crm_bc)
        print("omim_id:", omim_id)
        print("Number of crm related with the omim id:", len(crm_bc))
unique_crm_phen=set()
for crm in crm_phen:
     for crm_dict in crm:
        crm_id = crm_dict['crm_name']
        unique_crm_phen.add(crm_id)
print("Number of crm related with breast cancer", len(unique_crm_phen))

omim_id: 114480
Number of crm related with the omim id: 302
omim_id: 114480
Number of crm related with the omim id: 302
omim_id: 114480
Number of crm related with the omim id: 302
Number of crm related with breast cancer 302


We will compare both results:

In [8]:
compare = unique_crm_gene.symmetric_difference(unique_crm_phen)
len(compare)

7893

The number of crm linked to breast cancer either by affecting genes related to the phenotype or by being described as affecting the phenotype directly is 7893.

Afterwards, we are going to search which genes are related with crms that were obtained with phen2crm funciton. We will use crm2gene function.

In [9]:
gene_crm=[]
tfac_gene=[]
for crm in unique_crm_phen:
    gene_name=crm2gene(crm)
    if gene_name != "No data available for the introduced crm or you may have introduced an instance that is not a crm. Check your data type with type_data function.":
        gene_crm.append(gene_name)
unique_gene_crm=set()
unique_database=set()
for gene in gene_crm:
    for gene_dict in gene:
        gene_id = gene_dict['gene_name']
        gene_crm_database=gene_dict["database"]
        unique_gene_crm.add(gene_id)
        unique_database.add(gene_crm_database)
print("Number of gene related with breast cancer related enhancers", len(unique_gene_crm))
print("")
print("Databases which corroborates results:", unique_database)

Number of gene related with breast cancer related enhancers 76

Databases which corroborates results: {'http://biocc.hrbmu.edu.cn/DiseaseEnhancer/', 'http://biocc.hrbmu.edu.cn/DiseaseEnhancer/; http://health.tsinghua.edu.cn/jianglab/endisease/'}


In [10]:
tfac_gene = []

for gene in unique_gene_crm:
    tfac_name = gene2tfac(gene)
    if tfac_name != "No data available for the introduced gene or you may have introduced an instance that is not a gene. Check your data type with type_data function.":
        tfac_gene.append(tfac_name)
unique_tfac = set()
for tfac_result in tfac_gene:
     if isinstance(tfac_result, tuple) and len(tfac_result) > 1 and isinstance(tfac_result[1], list):
        for tfac_dict in tfac_result[1]:
            if isinstance(tfac_dict, dict):
                tfac_id = tfac_dict.get("tfac_name")
                if tfac_id:
                    unique_tfac.add(tfac_id)
print("Number of transcription factors that affect genes that are related with breast cancer related crms:",len(unique_tfac))


Number of transcription factors that affect genes that are related with breast cancer related crms: 1017


Finally, we get that there are 76 genes that are affected by breast cancer related enhancers. And also, we obtained 1017 transcription factors releted with these genes.

In [11]:
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time} seconds")

Execution time: 273.73815512657166 seconds
