# Building the `community` Database

In this notebook, you'll find explanations of essential functions for constructing the 'community' database. These functions provide users with the flexibility to auto update the database or perform manual interventions during preprocessing. You can also customize the database by providing your own annotations or specifying lists of ligands and receptors to align it with your specific research requirements.

**For users looking to quickly update the database, simply run the following command:**

If you are using the community `conda environment`, the necessary libraries should be installed. However, if you are using a different virtual environment, and do not have the dependencies for [mygene](https://mygene.info/) and [OmniPathR](https://omnipathdb.org/), please install.

In [2]:
library(community) # load community package

In [3]:
sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] community_1.4.1

loaded via a namespace (and not attached):
  [1] readxl_1.4.1                uuid_1.1-0                 
  [3] backports_1.4.1             Hmisc_4.7-2                
  [5] BiocFileCache_2.2.1         plyr_1.8.8             

In [4]:
LR_database <- auto_update_db("both") 

Retrieved interactions from both DB2109  Number of complex pairs detected13582  Number of non-redundant binary pairs produced1262  Number of binary pairs detected through PPI[1] "Number of PPI network interactions found:"
[1] 1262


ERROR: Error in process_single_components(db, pt_interactions): could not find function "process_single_components"


If you do not have `mygene` and `OmniPathR` libraries installed please uncomment the block by removing the dash symbol, #, and run the following.

In [None]:
# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")

# BiocManager::install("mygene")

In [None]:
# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")

# BiocManager::install("OmnipathR")

# Building step by step



### Import the database of interest from OmniPath

This function imports ligand-receptor interaction data based on the specified database type. It allows for the selection of `noncurated`, `curated`, or `both` types of databases.

In [None]:
db <- import_db("both")
# db <- import_db("curated")
# db <- import_db("noncurated")

### Break down complex interactions

Next, we processes the database to handle complex rows where either the target or the source is a complex. It splits such complex interactions into pairwise binary interactions.

In [None]:
pairwise_pairs <- create_pairwise_pairs(db)

In [None]:
head(pairwise_pairs)

### Filter through PPI

Now, we filter those binary pairs based on their presence in the protein-protein interaction (PPI) network.

In [None]:
pt_interactions <- filter_pairs_with_ppi(pairwise_pairs)

In [None]:
head(pt_interactions)

### Merge these binary pairs

This function processes binary pairs from the database and merges them with the binary pairs detected through PPI. It also standarizes and reorder columns

In [None]:
process_binary_pairs

In [None]:
complete_data <- process_single_components(db, pt_interactions)

In [None]:
head(complete_data)

### Map gene descriptions

we enriche the database with gene descriptions. It queries gene symbols to fetch their respective gene descriptions from [MyGene, a gene annotation servise](https://mygene.info/).

<div class="alert alert-block alert-info">
<b>Note:</b> This function may fail due to internet connectivity issues.If this is the case, please try again.
</div>



In [None]:
complete_data <- map_gene_data(complete_data)

In [None]:
head(complete_data)

### Annotate gene space

Annotate each gene in the protein-protein interaction (PPI) network with their corresponding parent categories, along with a score indicating how many of the resources (# 44 resources) have annotated that gene as such.

In [None]:
annotation <- annotate_components(complete_data)

In [None]:
head(annotation)

### True Ligand Receptor Pairs

Additionally, as part of the annotations, we identify pairs situated between Ligand and Receptor molecules and label them as 'True_LR = TRUE,' while other pairs, such as adhesive pairs or those between Receptor-Receptor molecules, will be marked as 'True_LR = False.

In [None]:
true_LR_DB <- process_lr_db(complete_data, annotation)

### Process and direction correction on adhesive pairs

This function is designed for processing adhesive interactions, including handling swapped duplicated pairs. It allows manual curation by enabling the user to specify lists of genes annotated as ligands or receptors. If none is given, ligands and receptors will be detected through the annotation table. 

In this step we categorize ADAM, Plexin and Neuroligin families as ligands. 

In [None]:
adhesive_DB <- process_adhesive_DB(complete_data, annotation, ligand_list=list(), receptor_list=list())

### Merge adhesive and True LR

In [None]:
LR_database <- rbind(true_LR_DB, adhesive_DB)

In [None]:
str(LR_database)