# Building the `community` Database

In this notebook, you'll find explanations of essential functions for constructing the 'community' database. These functions provide users with the flexibility to auto update the database or perform manual interventions during preprocessing. You can also customize the database by providing your own annotations or specifying lists of ligands and receptors to align it with your specific research requirements.

**For users looking to quickly update the database, simply run the following command:**

If you are using the community `conda environment`, the necessary libraries should be installed. However, if you are using a different virtual environment, and do not have the dependencies for [mygene](https://mygene.info/) and [OmniPathR](https://omnipathdb.org/), please install.

In [4]:
library(community) # load community package

In [None]:
LR_database <- auto_update_db("both") 

If you do not have `mygene` and `OmniPathR` libraries installed please uncomment the block by removing the dash symbol, #, and run the following.

In [2]:
# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")

# BiocManager::install("mygene")

In [3]:
# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")

# BiocManager::install("OmnipathR")

# Building step by step



### Import the database of interest from OmniPath

This function imports ligand-receptor interaction data based on the specified database type. It allows for the selection of `noncurated`, `curated`, or `both` types of databases.

In [5]:
db <- import_db("both")
# db <- import_db("curated")
# db <- import_db("noncurated")

[1] "Number of pairs found"


### Break down complex interactions

Next, we processes the database to handle complex rows where either the target or the source is a complex. It splits such complex interactions into pairwise binary interactions.

In [7]:
pairwise_pairs <- create_pairwise_pairs(db)

In [9]:
head(pairwise_pairs)

Unnamed: 0_level_0,Pair.Name,Ligand,Receptor,complex_pair,source,target,source_genesymbol,target_genesymbol,is_directed,is_stimulation,is_inhibition,consensus_direction,consensus_stimulation,consensus_inhibition,sources,references,curation_effort,n_references,n_resources,annotation_strategy
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<int>,<chr>
1,IL17A_IL17RA,IL17A,IL17RA,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both
2,IL17A_IL17RC,IL17A,IL17RC,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both
3,IL17RA_IL17RC,IL17RA,IL17RC,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both
4,IL17RA_IL17A,IL17RA,IL17A,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both
5,IL17RC_IL17A,IL17RC,IL17A,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both
6,IL17RC_IL17RA,IL17RC,IL17RA,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both


### Filter through PPI

Now, we filter those binary pairs based on their presence in the protein-protein interaction (PPI) network.

In [10]:
pt_interactions <- filter_pairs_with_ppi(pairwise_pairs)

In [11]:
head(pt_interactions)

Unnamed: 0_level_0,Pair.Name,Ligand,Receptor,complex_pair,source,target,source_genesymbol,target_genesymbol,is_directed,is_stimulation,is_inhibition,consensus_direction,consensus_stimulation,consensus_inhibition,sources,references,curation_effort,n_references,n_resources,annotation_strategy
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<int>,<chr>
1,IL17A_IL17RA,IL17A,IL17RA,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both
2,IL17A_IL17RC,IL17A,IL17RC,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both
3,IL17RA_IL17A,IL17RA,IL17A,IL17A_IL17RA_IL17RC,Q16552,COMPLEX:Q8NAC3_Q96F46,IL17A,IL17RA_IL17RC,1,1,0,1,1,0,CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR,Cellinker:19838198;Cellinker:25204502;Cellinker:9367539;ICELLNET:24011563;SIGNOR:32024054,5,5,5,both
4,NPNT_ITGA8,NPNT,ITGA8,NPNT_ITGA8_ITGB1,Q6UXI9,COMPLEX:P05556_P53708,NPNT,ITGA8_ITGB1,1,1,0,1,1,0,Baccin2019;SIGNOR,Baccin2019:16988024;SIGNOR:22613833,2,2,2,LR
5,NPNT_ITGB1,NPNT,ITGB1,NPNT_ITGA8_ITGB1,Q6UXI9,COMPLEX:P05556_P53708,NPNT,ITGA8_ITGB1,1,1,0,1,1,0,Baccin2019;SIGNOR,Baccin2019:16988024;SIGNOR:22613833,2,2,2,LR
6,ITGAL_ICAM1,ITGAL,ICAM1,ITGAL_ITGB2_ICAM1,COMPLEX:P05107_P20701,P05362,ITGAL_ITGB2,ICAM1,1,1,0,0,0,0,Baccin2019;CellPhoneDB;ICELLNET;SIGNOR,Baccin2019:16988024;ICELLNET:10940895;ICELLNET:23418628;SIGNOR:12808052,4,4,4,both


### Merge these binary pairs

This function processes binary pairs from the database and merges them with the binary pairs detected through PPI. It also standarizes and reorder columns

In [14]:
complete_data <- process_single_components(db, pt_interactions)

In [15]:
head(complete_data)

Unnamed: 0_level_0,Pair.Name,Ligand,Receptor,source,target,is_directed,is_stimulation,is_inhibition,consensus_direction,consensus_stimulation,consensus_inhibition,sources,references,curation_effort,n_references,n_resources,annotation_strategy,complex_pair
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<int>,<chr>,<chr>
1,CALM1_TRPC3,CALM1,TRPC3,P0DP23,Q13507,1,0,1,1,0,1,CellTalkDB;Fantom5_LRdb;HPRD;HPRD_LRdb;LRdb;TRIP;iTALK,CellTalkDB:11248050;HPRD:15104175;TRIP:11248050;TRIP:11290752;TRIP:12601176;TRIP:18215135,6,5,5,both,
2,S100A10_TRPV6,S100A10,TRPV6,P60903,Q9H1D0,1,1,0,1,1,0,CellTalkDB;HPRD;TRIP,CellTalkDB:18187190;HPRD:12660155;TRIP:12660155;TRIP:16189514;TRIP:18187190,5,3,3,both,
3,JAK2_EPOR,JAK2,EPOR,O60674,P19235,1,1,0,1,1,0,BEL-Large-Corpus_ProtMapper;BioGRID;Cellinker;HPRD;HPRD-phos;HPRD_KEA;HPRD_MIMP;KEA;MIMP;PhosphoNetworks;PhosphoPoint;PhosphoSite_KEA;PhosphoSite_MIMP;ProtMapper;SIGNOR;SIGNOR_ProtMapper;SPIKE;Wang;iPTMnet;phosphoELM;phosphoELM_KEA;phosphoELM_MIMP,BioGRID:8343951;Cellinker:9030561;HPRD-phos:12441334;HPRD:11779507;HPRD:12441334;HPRD:8343951;KEA:10579919;KEA:10660611;KEA:11443118;KEA:12027890;KEA:12441334;KEA:7559499;KEA:9573010;ProtMapper:12441334;ProtMapper:15212693;SIGNOR:12441334;SPIKE:12524467;SPIKE:18672044;iPTMnet:10579919;iPTMnet:12441334;phosphoELM:10579919,21,13,14,LR,
4,NOTCH1_JAG2,NOTCH1,JAG2,P46531,Q9Y219,1,0,1,0,0,0,Baccin2019;CellCall;HPRD;NetPath;Ramilowski2015_Baccin2019;SPIKE,HPRD:11006133;NetPath:11006133;SPIKE:15358736,3,2,5,LR,
5,JAG2_NOTCH1,JAG2,NOTCH1,Q9Y219,P46531,1,1,1,1,1,0,Baccin2019;CellCall;CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;CellTalkDB;Cellinker;DLRP_Cellinker;DLRP_talklr;EMBRACE;Fantom5_LRdb;HPMR_Cellinker;HPMR_LRdb;HPMR_talklr;HPRD;HPRD_LRdb;HPRD_talklr;ICELLNET;KEGG-MEDICUS;Kirouac2010;LRdb;NetPath;Ramilowski2015;Ramilowski2015_Baccin2019;SIGNOR;STRING_talklr;SignaLink3;UniProt_LRdb;Wang;connectomeDB2020;iTALK;talklr,Baccin2019:1100613311006130;CellChatDB:22353464;CellPhoneDB:22353464;CellTalkDB:22353464;Cellinker:11006133;Cellinker:22353464;HPRD:11006133;ICELLNET:16921404;ICELLNET:21352254;ICELLNET:22503540;LRdb:11006133;NetPath:11006133;SIGNOR:9315665;SignaLink3:10958687;SignaLink3:11006133;SignaLink3:18988627;SignaLink3:21071413;SignaLink3:23331499;connectomeDB2020:11006133,19,11,20,both,
6,DLL1_NOTCH1,DLL1,NOTCH1,O00548,P46531,1,1,0,1,1,0,Baccin2019;CellCall;CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;CellTalkDB;Cellinker;DLRP_Cellinker;DLRP_talklr;EMBRACE;Fantom5_LRdb;HPMR_Cellinker;HPMR_LRdb;HPMR_talklr;HPRD;HPRD_LRdb;HPRD_talklr;ICELLNET;KEGG-MEDICUS;Kirouac2010;LRdb;NetPath;Ramilowski2015;Ramilowski2015_Baccin2019;SIGNOR;SPIKE;STRING_talklr;UniProt_LRdb;Wang;connectomeDB2020;iTALK;talklr,Baccin2019:1006133;Baccin2019:98194281;CellChatDB:22353464;CellPhoneDB:22353464;CellTalkDB:22353464;Cellinker:11006133;Cellinker:22353464;Cellinker:9819428;HPRD:11006133;ICELLNET:21685328;LRdb:11;LRdb:9819428;NetPath:11006133;SIGNOR:16140393;SPIKE:11006133;SPIKE:17537801;connectomeDB2020:11006133;connectomeDB2020:9819428,18,9,20,both,


### Map gene descriptions

we enriche the database with gene descriptions. It queries gene symbols to fetch their respective gene descriptions from [MyGene, a gene annotation servise](https://mygene.info/).

<div class="alert alert-block alert-info">
<b>Note:</b> This function may fail due to internet connectivity issues.If this is the case, please try again.
</div>



In [16]:
complete_data <- map_gene_data(complete_data)

“If this function fails, it may be due to internet connectivity issues. Try running it again.”
Querying chunk 1

Querying chunk 2

Querying chunk 3



Finished
Pass returnall=TRUE to return lists of duplicate or missing query terms.


In [17]:
head(complete_data)

Unnamed: 0_level_0,Pair.Name,Ligand,Ligand.Name,Receptor,Receptor.Name,complex_pair,source,target,is_directed,is_stimulation,⋯,consensus_direction,consensus_stimulation,consensus_inhibition,sources,references,curation_effort,n_references,n_resources,annotation_strategy,dup
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<int>,<chr>,<chr>
1,CALM1_TRPC3,CALM1,calmodulin 1,TRPC3,transient receptor potential cation channel subfamily C member 3,,P0DP23,Q13507,1,0,⋯,1,0,1,CellTalkDB;Fantom5_LRdb;HPRD;HPRD_LRdb;LRdb;TRIP;iTALK,CellTalkDB:11248050;HPRD:15104175;TRIP:11248050;TRIP:11290752;TRIP:12601176;TRIP:18215135,6,5,5,both,TRPC3_CALM1
2,S100A10_TRPV6,S100A10,S100 calcium binding protein A10,TRPV6,transient receptor potential cation channel subfamily V member 6,,P60903,Q9H1D0,1,1,⋯,1,1,0,CellTalkDB;HPRD;TRIP,CellTalkDB:18187190;HPRD:12660155;TRIP:12660155;TRIP:16189514;TRIP:18187190,5,3,3,both,TRPV6_S100A10
3,JAK2_EPOR,JAK2,Janus kinase 2,EPOR,erythropoietin receptor,,O60674,P19235,1,1,⋯,1,1,0,BEL-Large-Corpus_ProtMapper;BioGRID;Cellinker;HPRD;HPRD-phos;HPRD_KEA;HPRD_MIMP;KEA;MIMP;PhosphoNetworks;PhosphoPoint;PhosphoSite_KEA;PhosphoSite_MIMP;ProtMapper;SIGNOR;SIGNOR_ProtMapper;SPIKE;Wang;iPTMnet;phosphoELM;phosphoELM_KEA;phosphoELM_MIMP,BioGRID:8343951;Cellinker:9030561;HPRD-phos:12441334;HPRD:11779507;HPRD:12441334;HPRD:8343951;KEA:10579919;KEA:10660611;KEA:11443118;KEA:12027890;KEA:12441334;KEA:7559499;KEA:9573010;ProtMapper:12441334;ProtMapper:15212693;SIGNOR:12441334;SPIKE:12524467;SPIKE:18672044;iPTMnet:10579919;iPTMnet:12441334;phosphoELM:10579919,21,13,14,LR,EPOR_JAK2
4,NOTCH1_JAG2,NOTCH1,notch receptor 1,JAG2,jagged canonical Notch ligand 2,,P46531,Q9Y219,1,0,⋯,0,0,0,Baccin2019;CellCall;HPRD;NetPath;Ramilowski2015_Baccin2019;SPIKE,HPRD:11006133;NetPath:11006133;SPIKE:15358736,3,2,5,LR,JAG2_NOTCH1
5,JAG2_NOTCH1,JAG2,jagged canonical Notch ligand 2,NOTCH1,notch receptor 1,,Q9Y219,P46531,1,1,⋯,1,1,0,Baccin2019;CellCall;CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;CellTalkDB;Cellinker;DLRP_Cellinker;DLRP_talklr;EMBRACE;Fantom5_LRdb;HPMR_Cellinker;HPMR_LRdb;HPMR_talklr;HPRD;HPRD_LRdb;HPRD_talklr;ICELLNET;KEGG-MEDICUS;Kirouac2010;LRdb;NetPath;Ramilowski2015;Ramilowski2015_Baccin2019;SIGNOR;STRING_talklr;SignaLink3;UniProt_LRdb;Wang;connectomeDB2020;iTALK;talklr,Baccin2019:1100613311006130;CellChatDB:22353464;CellPhoneDB:22353464;CellTalkDB:22353464;Cellinker:11006133;Cellinker:22353464;HPRD:11006133;ICELLNET:16921404;ICELLNET:21352254;ICELLNET:22503540;LRdb:11006133;NetPath:11006133;SIGNOR:9315665;SignaLink3:10958687;SignaLink3:11006133;SignaLink3:18988627;SignaLink3:21071413;SignaLink3:23331499;connectomeDB2020:11006133,19,11,20,both,NOTCH1_JAG2
6,DLL1_NOTCH1,DLL1,delta like canonical Notch ligand 1,NOTCH1,notch receptor 1,,O00548,P46531,1,1,⋯,1,1,0,Baccin2019;CellCall;CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;CellTalkDB;Cellinker;DLRP_Cellinker;DLRP_talklr;EMBRACE;Fantom5_LRdb;HPMR_Cellinker;HPMR_LRdb;HPMR_talklr;HPRD;HPRD_LRdb;HPRD_talklr;ICELLNET;KEGG-MEDICUS;Kirouac2010;LRdb;NetPath;Ramilowski2015;Ramilowski2015_Baccin2019;SIGNOR;SPIKE;STRING_talklr;UniProt_LRdb;Wang;connectomeDB2020;iTALK;talklr,Baccin2019:1006133;Baccin2019:98194281;CellChatDB:22353464;CellPhoneDB:22353464;CellTalkDB:22353464;Cellinker:11006133;Cellinker:22353464;Cellinker:9819428;HPRD:11006133;ICELLNET:21685328;LRdb:11;LRdb:9819428;NetPath:11006133;SIGNOR:16140393;SPIKE:11006133;SPIKE:17537801;connectomeDB2020:11006133;connectomeDB2020:9819428,18,9,20,both,NOTCH1_DLL1


### Annotate gene space

Annotate each gene in the protein-protein interaction (PPI) network with their corresponding parent categories, along with a score indicating how many of the resources (# 44 resources) have annotated that gene as such.

In [18]:
annotation <- annotate_components(complete_data)

In [19]:
head(annotation)

Unnamed: 0_level_0,genesymbol,score,parent
Unnamed: 0_level_1,<chr>,<dbl>,<chr>
1,CALM1,5,intracellular
2,S100A10,4,ligand
3,JAK2,2,receptor
4,NOTCH1,22,receptor
5,JAG2,12,ligand
6,DLL1,12,ligand


### True Ligand Receptor Pairs

Additionally, as part of the annotations, we identify pairs situated between Ligand and Receptor molecules and label them as 'True_LR = TRUE,' while other pairs, such as adhesive pairs or those between Receptor-Receptor molecules, will be marked as 'True_LR = False.

In [36]:
true_LR_DB <- process_lr_db(complete_data, annotation)

### Process and direction correction on adhesive pairs

This function is designed for processing adhesive interactions, including handling swapped duplicated pairs. It allows manual curation by enabling the user to specify lists of genes annotated as ligands or receptors. If none is given, ligands and receptors will be detected through the annotation table. 

In this step we categorize ADAM, Plexin and Neuroligin families as ligands. 

In [38]:
adhesive_DB <- process_adhesive_DB(complete_data, annotation, ligand_list=list(), receptor_list=list())

### Merge adhesive and True LR

In [40]:
LR_database <- rbind(true_LR_DB, adhesive_DB)

In [41]:
str(LR_database)

'data.frame':	6941 obs. of  21 variables:
 $ True_LR              : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ Pair.Name            : chr  "S100A10_TRPV6" "JAG2_NOTCH1" "DLL1_NOTCH1" "IGF1_IGF1R" ...
 $ Ligand               : chr  "S100A10" "JAG2" "DLL1" "IGF1" ...
 $ Ligand.Name          : chr  "S100 calcium binding protein A10" "jagged canonical Notch ligand 2" "delta like canonical Notch ligand 1" "insulin like growth factor 1" ...
 $ Receptor             : chr  "TRPV6" "NOTCH1" "NOTCH1" "IGF1R" ...
 $ Receptor.Name        : chr  "transient receptor potential cation channel subfamily V member 6" "notch receptor 1" "notch receptor 1" "insulin like growth factor 1 receptor" ...
 $ complex_pair         : chr  NA NA NA NA ...
 $ source               : chr  "P60903" "Q9Y219" "O00548" "P05019" ...
 $ target               : chr  "Q9H1D0" "P46531" "P46531" "P08069" ...
 $ is_directed          : num  1 1 1 1 1 1 1 1 1 1 ...
 $ is_stimulation       : num  1 1 1 1 1 1 1 1 1 1 ...
 $ is_inhibit