In [12]:
from PyBioGateway import getPhenotype, phen2gene, gene2protein, prot2bp, gene2crm, phen2crm

In this second example, we will analyse the regulatory network that might be related to a specific phenotype, in this case, breast cancer. First, we will identify which phenotypes are under study, since they contain "breast cancer" in their name. For this purpose, we will use the getPhenotype function.


In [2]:
bc_phenotypes=getPhenotype("breast cancer")
len(bc_phenotypes)


75

There are a total of 75 phenotypes that include breast cancer in their names. Now, we will find out which genes are involved with these phenotypes and all the proteins coded by these genes. Additionally, we will identify all biological processes that these proteins are involved in.

First we are going to get genes related with every phenotype, using phen2gene function.

In [4]:
gene_results=[]
for phen in bc_phenotypes:
    omim_id=phen["omim_id"]
    gene_bc=phen2gene(omim_id)
    if gene_bc != 'No data available for the introduced phenotype or you may have introduced an instance that is not a phenotype. Check your data type with type_data function.':
        gene_results.append(gene_bc)
        print("omim_id:", omim_id)
        print(gene_bc)
unique_gene=set()
for gene in gene_results:
     for gene_dict in gene:
        gene_id = gene_dict['gene_name']
        unique_gene.add(gene_id)
print("Genes related with breast cancer", unique_gene)


omim_id: 137215
[{'gene_name': 'CDH1'}]
omim_id: 114480
[{'gene_name': 'NBN'}, {'gene_name': 'ABRAXAS1'}, {'gene_name': 'AKT1'}, {'gene_name': 'BRCA1'}, {'gene_name': 'BRCA2'}, {'gene_name': 'BRIP1'}, {'gene_name': 'CHEK2'}, {'gene_name': 'PALB2'}, {'gene_name': 'PIK3CA'}, {'gene_name': 'RAD51'}]
omim_id: 114480
[{'gene_name': 'NBN'}, {'gene_name': 'ABRAXAS1'}, {'gene_name': 'AKT1'}, {'gene_name': 'BRCA1'}, {'gene_name': 'BRCA2'}, {'gene_name': 'BRIP1'}, {'gene_name': 'CHEK2'}, {'gene_name': 'PALB2'}, {'gene_name': 'PIK3CA'}, {'gene_name': 'RAD51'}]
omim_id: 114480
[{'gene_name': 'NBN'}, {'gene_name': 'ABRAXAS1'}, {'gene_name': 'AKT1'}, {'gene_name': 'BRCA1'}, {'gene_name': 'BRCA2'}, {'gene_name': 'BRIP1'}, {'gene_name': 'CHEK2'}, {'gene_name': 'PALB2'}, {'gene_name': 'PIK3CA'}, {'gene_name': 'RAD51'}]
omim_id: 137215
[{'gene_name': 'CDH1'}]
omim_id: 137215
[{'gene_name': 'CDH1'}]
omim_id: 312300
[{'gene_name': 'AR'}]
omim_id: 604370
[{'gene_name': 'BRCA1'}]
omim_id: 613399
[{'gene_nam

Genes related with breast cancer are 'AKT1', 'PALB2', 'RAD51', 'NBN', 'PIK3CA', 'BRIP1', 'CHEK2', 'CDH1', 'BRCA1', 'AR', 'BRCA2', 'RAD51C' and 'ABRAXAS1', now we will get proteins that are encoded by these genes whit gene2prot function:

In [37]:
prot_results=[]
for gene in unique_gene:
    prot_name=gene2protein(gene,"Homo sapiens")
    if prot_name != "No data available for the introduced gene. Check that the gene id is correct or if you have introduced the taxon correctly.":
        prot_results.append(prot_name)
        print("gene_id: ", gene)
        print(prot_name)

unique_prot=set()
for prot in prot_results:
     for prot_dict in prot:
        prot_id = prot_dict['prot_name']
        unique_prot.add(prot_id)
print("")
print("Protein related with breast cancer", unique_prot)
print(len(unique_prot))

gene_id:  AKT1
[{'prot_name': 'A0A087WY56_HUMAN'}, {'prot_name': 'A0A804HJM6_HUMAN'}, {'prot_name': 'G3V2I6_HUMAN'}, {'prot_name': 'G3V3X1_HUMAN'}, {'prot_name': 'AKT1_HUMAN'}]
gene_id:  PALB2
[{'prot_name': 'A0A8V8TKZ4_HUMAN'}, {'prot_name': 'A0A8V8TLC8_HUMAN'}, {'prot_name': 'A0A8V8TMC9_HUMAN'}, {'prot_name': 'A0A8V8TMK8_HUMAN'}, {'prot_name': 'H3BN63_HUMAN'}, {'prot_name': 'I3L1Z5_HUMAN'}, {'prot_name': 'I3L2S5_HUMAN'}, {'prot_name': 'PALB2_HUMAN'}]
gene_id:  RAD51
[{'prot_name': 'E9PI54_HUMAN'}, {'prot_name': 'E9PJ30_HUMAN'}, {'prot_name': 'E9PNT5_HUMAN'}, {'prot_name': 'H0YD61_HUMAN'}, {'prot_name': 'RAD51_HUMAN'}, {'prot_name': 'Q9NZG9_HUMAN'}]
gene_id:  NBN
[{'prot_name': 'A0A087X1V5_HUMAN'}, {'prot_name': 'A0A0C4DG07_HUMAN'}, {'prot_name': 'A0A8V8TKV9_HUMAN'}, {'prot_name': 'A0A8V8TKW6_HUMAN'}, {'prot_name': 'A0A8V8TKY0_HUMAN'}, {'prot_name': 'A0A8V8TKY5_HUMAN'}, {'prot_name': 'A0A8V8TL91_HUMAN'}, {'prot_name': 'A0A8V8TL95_HUMAN'}, {'prot_name': 'A0A8V8TL98_HUMAN'}, {'prot_name

Finally we will look for biological processes involving proteins related to breast cancer. We will use prot2bp function:

In [42]:
bp_results=[]
for prot in unique_prot:
    bp=prot2bp(prot)
    if bp != 'No data available for the introduced protein or you may have introduced an instance that is not a protein. Check your data type with type_data function.':
        bp_results.append(bp)
        print("prot_name: ", prot)
        print(len(bp))
unique_bp=set()
for bp in bp_results:
     for bp_dict in bp:
        bp_label = bp_dict['bp_label']
        unique_bp.add(bp_label)
print("")
print("Biological processes involving proteins related to breast cancer", len(unique_bp))

prot_name:  AKT1_HUMAN
87
prot_name:  RAD51_HUMAN
19
prot_name:  BRCA2_HUMAN
11
prot_name:  ANDR_HUMAN
25
prot_name:  BRCA1_HUMAN
42
prot_name:  ABRX1_HUMAN
8
prot_name:  CHK2_HUMAN
17
prot_name:  FANCJ_HUMAN
8
prot_name:  PK3CA_HUMAN
17
prot_name:  CADH1_HUMAN
16
prot_name:  PALB2_HUMAN
1
prot_name:  RA51C_HUMAN
6
prot_name:  NBN_HUMAN
20

Biological processes in which breast cancer proteins are involved 220


We will now find which enhancers are related with our breast cancer's genes, using the function gene2crm and we will compare the result with enhancers that will be obtained with phen2crm function (enhancers related with breast cancer phenotype).

In [None]:
crm_gene=[]
for gene in unique_gene:
    crm_name=gene2crm(gene)
    if crm_name != "No data available for the introduced gene or you may have introduced an instance that is not a gene. Check your data type with type_data function." :
        crm_gene.append(crm_name)
        print("gene_id: ", gene)
        print("Number of crm related to the gene:", len(crm_name))