In [1]:
library(OmnipathR)
# library(nichenetr)
library(tidyverse)
library(dplyr)
library(VennDiagram)
library(ggplot2)
library(utils)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.0      [32m✔[39m [34mpurrr  [39m 0.3.5 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.5.0 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
Loading required package: grid

Loading required package: futile.logger



```import_ligrecextra_interactions```

link: to documentation <https://r.omnipathdb.org/reference/import_ligrecextra_interactions.html>

This LR dataset contains ligand-receptor interactions without literature reference. The ligand-receptor interactions supported by literature references are part of the `omnipath` dataset.


Default params(without filtering resources) has 8350 edges. The table includes columns as follows. 

'source' 'target' 'source_genesymbol' 'target_genesymbol' 'is_directed' 'is_stimulation' 'is_inhibition' 'consensus_direction' 'consensus_stimulation' 'consensus_inhibition' 'sources' 'references' 'curation_effort' 'n_references' 'n_resources'

**The consensus score is if resources supporting the classification of an entity into a category based on combined information of many resources.**

<span style="color:red">I do not undertand how it can have sources but no references.</span>

| sources                                                         | ref | cur_effort | n_ref | n_source |
|-----------------------------------------------------------------|-----|------------|-------|----------|
|      Baccin2019;CellCall;PhosphoPoint;Ramilowski2015_Baccin2019 | NA  | 0          | 0     | 3        |
| Baccin2019;CellCall;PhosphoPoint;Ramilowski2015_Baccin2019;Wang | NA  | 0          | 0     | 4        |
|                                                                 |     |            |       |          |

```curated_ligand_receptor_interactions```

expert curated ligand-receptor resources, provided by this function

In [2]:
# https://r.omnipathdb.org/reference/curated_ligand_receptor_interactions.html
# curated=curated_ligand_receptor_interactions()
lr <- import_ligrecextra_interactions()
lr <- lr %>% filter(!duplicated(lr[, c("source_genesymbol", "target_genesymbol")]))
curated <- curated_ligand_receptor_interactions()
curated <- curated %>% filter(!duplicated(curated[, c("source_genesymbol", "target_genesymbol")]))

```import_omnipath_intercell``` Imports the OmniPath intercellular **communication role annotation** database. It provides information on the roles in inter-cellular signaling. E.g. if a protein is a ligand, a receptor, an extracellular matrix (ECM) component, etc.

In [3]:
# the genesymbol PIK3CD-AS1 is causing an error in the later steps, we convert the name
# lr[lr == "PIK3CD-AS1"] <- "PIK3CD"

In [4]:
anno_raw <- import_omnipath_intercell()
#subset annotation DB to only ligand and receptors
anno_lig <- anno_raw %>%
    dplyr::filter(category %in% c("receptor","ligand"))
# Drop rows where the values in the "parent", "database", and "uniprot" columns are duplicated
anno_raw <- anno_raw %>% filter(!duplicated(anno_raw[, c("parent", "database", "uniprot")]))

# Breaking down complexes

Below, we produce all the the possible pairs. 

Example: lets assume complex G1_G2_G3 is linked to another complex G4_G5_G6

| c1 | c2 | complex_origin    |
|----|----|-------------------|
| G1 | G2 | G1_G2_G3_G4_G5_G6 |
| G1 | G3 | G1_G2_G3_G4_G5_G6 |
| G1 | G4 | G1_G2_G3_G4_G5_G6 |
| G1 | G5 | G1_G2_G3_G4_G5_G6 |
| G1 | G6 | G1_G2_G3_G4_G5_G6 |
| G2 | G1 | G1_G2_G3_G4_G5_G6 |
| G2 | G3 | G1_G2_G3_G4_G5_G6 |
| .. | .. | G1_G2_G3_G4_G5_G6 |

In [5]:
# This function breaks down complex interactions and returns a list of components
# Input: OmniPath_DB
# Output: list of components of complex molecules

break_down_complex <- function(OmniPath_DB) {
    # filter only those are in complex
    complex <- filter(OmniPath_DB, grepl("COMPLEX", target) | grepl("COMPLEX", source))
    complex$source <- sub("COMPLEX:", "", complex$source)
    complex$target <- sub("COMPLEX:", "", complex$target)

    # complexes are seperated into individual components
    components_target <- unique(unlist(strsplit(complex$source_genesymbol, "_")))
    components_source <- unique(unlist(strsplit(complex$target_genesymbol, "_")))
    components_both <- c(components_target, components_source)
    components_both <- unique(components_both)
    return(list(components = components_both, complex = complex))
}

In [6]:
components_lr <- break_down_complex(lr)
components_curated <- break_down_complex(curated)

In [7]:
head(str(components_curated))

List of 2
 $ components: chr [1:378] "IL17A" "ITGAL" "ITGB2" "IFNW1" ...
 $ complex   : tibble [757 × 15] (S3: tbl_df/tbl/data.frame)
  ..$ source               : chr [1:757] "Q16552" "P05107_P20701" "P05000" "O75326" ...
  ..$ target               : chr [1:757] "Q8NAC3_Q96F46" "P05362" "P17181_P48551" "P05556_P56199" ...
  ..$ source_genesymbol    : chr [1:757] "IL17A" "ITGAL_ITGB2" "IFNW1" "SEMA7A" ...
  ..$ target_genesymbol    : chr [1:757] "IL17RA_IL17RC" "ICAM1" "IFNAR1_IFNAR2" "ITGA1_ITGB1" ...
  ..$ is_directed          : num [1:757] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ is_stimulation       : num [1:757] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ is_inhibition        : num [1:757] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ consensus_direction  : num [1:757] 1 0 1 1 1 1 1 1 1 1 ...
  ..$ consensus_stimulation: num [1:757] 1 0 1 1 1 1 1 1 1 1 ...
  ..$ consensus_inhibition : num [1:757] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ sources              : chr [1:757] "CellChatDB;CellPhoneDB;Cellinker;ICELLNET;SIGNOR" "Baccin2019

NULL

In [8]:
# This function takes the complex data frame and returns a data frame containing all possible pairwise pairs
# The function loops through each row of the complex data frame and extracts the values from the first and second
# columns. The loop then generates all the possible pairwise pairs using the combn function and appends the results
# to a list. The function then binds the results into a single data frame, switches the values in the source and target
# columns, and binds the rows into a single data frame. The function then drops the self links and returns the data frame.


create_pairwise_pairs <- function(complex){
    # Produce all the possbile pairwise pairs

    results <- list()

    # Loop through each row of the data frame
    for (i in 1:nrow(complex)) {
      # Extract the values from the first column
      values1 <- unlist(strsplit(as.character(complex[i, "source_genesymbol"]), "_"))
      # Extract the values from the second column
      values2 <- unlist(strsplit(as.character(complex[i, "target_genesymbol"]), "_"))
      # Keep the original pair
      original <- paste(complex[i, "source_genesymbol"],complex[i, "target_genesymbol"],sep="_")
      # Generate all the pairwise combinations using combn
      pairs <- combn(c(values1, values2), 2)
      pairs <- t(pairs)
      pairs <- cbind(pairs,original)
      # Append the results to the list
      results[[i]] <- as.data.frame(pairs)
        colnames(results[[i]]) = c("source","target","complex_pair")
        row.names(results[[i]]) <- NULL
    }

    # Bind the results into a single data frame
    result_df2 <- as.data.frame(do.call(rbind, results))

    # Switch the values in the "col1" and "col2" columns
    df1 <- cbind(result_df2[,2], result_df2[,1], result_df2[,3])
    colnames(df1) <- names(result_df2)
    # Bind the rows into a single data frame
    result_df <- rbind(result_df2, df1)


    # Drop the self links
    result_df <- result_df %>% filter(!duplicated(result_df[, c("source", "target")]))
    
    # create pairs column
    result_df$pair=paste(result_df$source, result_df$target,sep="_")
    
    return(result_df)
}

In [9]:
pairwise_pairs_lr <- create_pairwise_pairs(components_lr$complex)
pairwise_pairs_curated <- create_pairwise_pairs(components_curated$complex)

In [10]:
head(pairwise_pairs_curated) 

Unnamed: 0_level_0,source,target,complex_pair,pair
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>
1,IL17A,IL17RA,IL17A_IL17RA_IL17RC,IL17A_IL17RA
2,IL17A,IL17RC,IL17A_IL17RA_IL17RC,IL17A_IL17RC
3,IL17RA,IL17RC,IL17A_IL17RA_IL17RC,IL17RA_IL17RC
4,ITGAL,ITGB2,ITGAL_ITGB2_ICAM1,ITGAL_ITGB2
5,ITGAL,ICAM1,ITGAL_ITGB2_ICAM1,ITGAL_ICAM1
6,ITGB2,ICAM1,ITGAL_ITGB2_ICAM1,ITGB2_ICAM1


# Annotation of components

The complexes are decomposed into their individual components. The Omnipath Intercell annotation database is imported and used to annotate each component. If at least two databases categorize a component as a ligand or receptor, it is annotated as such. If not, we check other possible categories such as extracellular matrix, secreted, and transmembrane 



In [11]:
# This function is to annotate the components from the PPI network with their parent category
# The input of this function is a df with columns of "genesymbol", "score", "parent"
# The output of this function is a df with columns of "genesymbol", "score", "parent"
# This function is used to annotate the components from the PPI network with their parent category

annotate_components <- function(components) {
    #create a df to store annotation
    df <- data.frame(genesymbol = character(length(components)), score = numeric(length(components)),
                     parent = character(length(components)), stringsAsFactors = FALSE)
    
    # Check if the components are categorized as ligands or receptors
    for (x in 1:length(components)) {
    #     maxvalue=max(filter(anno, uniprot==components[x])$consensus_score)
        genename <- components[x]
        parent_score <- sort(table(filter(anno_lig, genesymbol==components[x])$parent), decreasing = T, na.last = T)[1]
        parent_category <- names(parent_score)

        if (is.null(parent_category)) {
          parent_category <- "NA"
          parent_score <- 0
        }

        df[x, "genesymbol"] <- genename
        df[x, "score"] <- parent_score
        df[x, "parent"] <- parent_category

    #     df$genesymbol[x] <- genename
    #     df$score[x] <- parent_score
    #     df$parent[x] <- parent_category
    }
    
    # If a component is not classified as a ligand or receptor, we may consider other categories such as 
    # extracellular matrix, secreted, and transmembrane.# annotated others such as secreted, ecm etc

    df_na <- filter(df, parent=="NA")$genesymbol

    for (x in 1:length(df_na)) {
    #     maxvalue=max(filter(anno, uniprot==components[x])$consensus_score)
        genesymbol <- df_na[x]
        parent_score <- sort(table(filter(anno_raw, genesymbol==df_na[x])$parent), decreasing = T, na.last = T)[1]
        parent_category <- names(parent_score)

        df <- df %>% mutate(parent = ifelse(genesymbol == df_na[x], parent_category, parent))
        df <- df %>% mutate(score = ifelse(genesymbol == df_na[x], parent_score, score))

    }
    
    # replace ecm and secreted with ligand
    df$parent <- replace(df$parent, df$parent == "ecm", "ligand")
    df$parent <- replace(df$parent, df$parent == "secreted", "ligand")
    
    return(df)

}

In [12]:
df_lr = annotate_components(components_lr$components)
df_curated = annotate_components(components_curated$components)

In [13]:
table(df_curated$parent)


       ligand      receptor transmembrane 
          221           154             3 

# Linking 1

We are using the Omnipath intercellular interaction network, which is the largest available network of its kind, to detect interactions rather than make predictions. The creators of the network have noted that it may contain a large number of false positives. Despite this, we are using it in combination with an annotations database to detect interactions. The network has a size of 98,165 edges.

In [14]:
# Import All post-translational interactions
pt <- import_post_translational_interactions()

In [15]:
# "Separate the annotated components of complexes based on their type."

# This code filters the ligands and receptors from the curated and the lr dataset. It creates
# two dataframes for each dataset, one with ligands and the other with receptors.
# 
# This code is used to prepare the data for the analysis in the next steps. 
# 
# The function names are: filter(), which is used to filter the dataframes.
# The identifiers used are: df_lr, df_curated, parent, ligands_lr, receptors_lr, ligands_curated,
# receptors_curated

ligands_lr <- filter(df_lr, parent=="ligand")
receptors_lr <- filter(df_lr, parent=="receptor")

ligands_curated <- filter(df_curated, parent=="ligand")
receptors_curated <- filter(df_curated, parent=="receptor")


In [16]:
# Filter the PT network to include only the components of the complexes
# This code filters the PT data to only include rows where the source
# gene is a ligand and the target gene is a receptor. The result is
# saved to the pt object. We are subsetting the big PT network.
# Later this allows us to filter through all the possbile pairwise pairs

pt_lr <- pt %>%
    dplyr::filter(source_genesymbol %in% ligands_lr$genesymbol) %>%
    dplyr::filter(target_genesymbol %in% receptors_lr$genesymbol) %>%
    dplyr::distinct()

# remove duplicated
pt_lr <- pt_lr %>% filter(!duplicated(pt_lr[, c("source_genesymbol", "target_genesymbol")]))

# create pairs, so its easier to check
pt_lr$pair=paste(pt_lr$source_genesymbol, pt_lr$target_genesymbol,sep="_")


pt_curated <- pt %>%
    dplyr::filter(source_genesymbol %in% ligands_curated$genesymbol) %>%
    dplyr::filter(target_genesymbol %in% receptors_curated$genesymbol) %>%
    dplyr::distinct()

pt_curated <- pt_curated %>% filter(!duplicated(pt_curated[, c("source_genesymbol", "target_genesymbol")]))
pt_curated$pair=paste(pt_curated$source_genesymbol, pt_curated$target_genesymbol,sep="_")


In [17]:
# The data frame result_df consists of all the pairwise pair combinations, 
# and we are checking if those pairs exist in the PT network
pt_interactions_lr <- pairwise_pairs_lr %>%
    filter(pair %in% pt_lr$pair)

# do the same for the curated

pt_interactions_curated <- pairwise_pairs_curated %>%
    filter(pair %in% pt_curated$pair)

str(pt_interactions_lr)

'data.frame':	983 obs. of  4 variables:
 $ source      : chr  "IL17A" "IL17A" "NPNT" "NPNT" ...
 $ target      : chr  "IL17RA" "IL17RC" "ITGA8" "ITGB1" ...
 $ complex_pair: chr  "IL17A_IL17RA_IL17RC" "IL17A_IL17RA_IL17RC" "NPNT_ITGA8_ITGB1" "NPNT_ITGA8_ITGB1" ...
 $ pair        : chr  "IL17A_IL17RA" "IL17A_IL17RC" "NPNT_ITGA8" "NPNT_ITGB1" ...


**In below we demonstrate that all the possible pairs are produced orginating from the complex pair ```IL17A_IL17RA_IL17RC``` and through PT database, we filter out those does not exist in the DB.**

In [18]:
filter(pairwise_pairs_lr, complex_pair=="IL17A_IL17RA_IL17RC")

source,target,complex_pair,pair
<chr>,<chr>,<chr>,<chr>
IL17A,IL17RA,IL17A_IL17RA_IL17RC,IL17A_IL17RA
IL17A,IL17RC,IL17A_IL17RA_IL17RC,IL17A_IL17RC
IL17RA,IL17RC,IL17A_IL17RA_IL17RC,IL17RA_IL17RC
IL17RA,IL17A,IL17A_IL17RA_IL17RC,IL17RA_IL17A
IL17RC,IL17A,IL17A_IL17RA_IL17RC,IL17RC_IL17A
IL17RC,IL17RA,IL17A_IL17RA_IL17RC,IL17RC_IL17RA


In [19]:
filter(pt_interactions_lr, complex_pair=="IL17A_IL17RA_IL17RC")

source,target,complex_pair,pair
<chr>,<chr>,<chr>,<chr>
IL17A,IL17RA,IL17A_IL17RA_IL17RC,IL17A_IL17RA
IL17A,IL17RC,IL17A_IL17RA_IL17RC,IL17A_IL17RC


# Complexes are broken down, now we can combine with the rest of the db

In [20]:
# This function merges the single components of the complexes that are detected through PT_DB with 
# the single components of the complexes detected through OmniPath
# The function takes as input the OmniPath data frame and the pt_interactions (which the complex are broken down)

merge_single_complex <- function(OmniPath, pt_interactions){
    single_components = filter(OmniPath, !grepl('COMPLEX', target) & !grepl('COMPLEX',source))
    
    single_components <- single_components %>%
      dplyr::select(source_genesymbol, target_genesymbol) %>%
      dplyr::rename(source=source_genesymbol, target=target_genesymbol) %>%
      dplyr::mutate(complex_pair = NA)    
    
    single_components$pair <- paste(single_components$source, single_components$target, sep="_")
    
    #merge the single ones, with complexes componenets that are detected via PT_DB
    complete <- rbind(single_components, pt_interactions)
    
    #remove the duplicated ones, and drop the last ones, which are coming from the complexes
    complete <- complete[ !duplicated(complete[, "pair"], fromLast=F),]
    
    return(complete)
}

In [21]:
complete_lr <- merge_single_complex(lr, pt_interactions_lr)
complete_curated <- merge_single_complex(curated, pt_interactions_curated)

# Protein Descriptions

We use mygene library to get the protein descriptions

In [22]:
library(mygene)

Loading required package: GenomicFeatures

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:dplyr’:

    combine, intersect, setdiff, union


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min


Loading required package: S4Vectors

Loading required package: stats4


Attaching package: ‘S4Vectors’


The following objects are masked from ‘package:dplyr’:

    first, rename


The following object is masked from ‘package:tidyr’:

    expand



In [23]:
# map_gene_descriptions function:
# 1. maps gene symbols to protein descriptions using the queryMany function
# 2. maps the protein descriptions to the dataset
# 3. reorders columns and renames columns

map_gene_descriptions <- function(complete) {
    # get gene symbols
    gene_symbols <- unique(c(complete$source,complete$target))
    
    prot_descriptions <- queryMany(gene_symbols, scopes = "symbol", 
                              fields = c("name"), 
                              species = "human", as_dataframe = "True")
    
    prot_descriptions <- as.data.frame(prot_descriptions)
    
    #map protein descriptions to complete set

    for (x in 1:nrow(complete)) {
        ligand_symbol=complete[x,]$source
        receptor_symbol=complete[x,]$target
        ligand_description=filter(prot_descriptions, query==ligand_symbol)$name
        receptor_description=filter(prot_descriptions, query==receptor_symbol)$name
        lig_id=filter(anno_raw, genesymbol==ligand_symbol)$uniprot[1]
        rec_id=filter(anno_raw, genesymbol==receptor_symbol)$uniprot[1]

        if (ligand_symbol=="PIK3CD-AS1") {
          lig_id <- "O00329"
        }

    #     if (is.null(receptor_description)) {
    #       receptor_description <- "NA"
    #     }


        complete[x, "ligand.name"] = ligand_description[1]
        complete[x, "receptor.name"] = receptor_description[1]
        complete[x, "partner_a"] = lig_id
        complete[x, "partner_b"] = rec_id
    }
    
    #reorder columns
    complete <- complete[, c("pair", "source", "ligand.name", "target", "receptor.name", "complex_pair",
                             "partner_a","partner_b")]
    #rename column names
    names(complete) <- c("Pair.Name", "Ligand", "Ligand.Name", "Receptor", "Receptor.Name", "complex_pair",
                        "partner_a","partner_b")
    
    return(complete)
}

In [24]:
complete_lr <- map_gene_descriptions(complete_lr)

Querying chunk 1

Querying chunk 2

Querying chunk 3



Finished
Pass returnall=TRUE to return lists of duplicate or missing query terms.


In [25]:
complete_curated <- map_gene_descriptions(complete_curated)

Querying chunk 1

Querying chunk 2



Finished
Pass returnall=TRUE to return lists of duplicate or missing query terms.


In [26]:
head(complete_curated)

Pair.Name,Ligand,Ligand.Name,Receptor,Receptor.Name,complex_pair,partner_a,partner_b
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
EPO_EPOR,EPO,erythropoietin,EPOR,erythropoietin receptor,,P01588,P19235
CXCL16_CXCR6,CXCL16,C-X-C motif chemokine ligand 16,CXCR6,C-X-C motif chemokine receptor 6,,Q9H2A7,O00574
KITLG_KIT,KITLG,KIT ligand,KIT,"KIT proto-oncogene, receptor tyrosine kinase",,P21583,P10721
CXCL9_CXCR3,CXCL9,C-X-C motif chemokine ligand 9,CXCR3,C-X-C motif chemokine receptor 3,,Q07325,P49682
CCL5_CCR5,CCL5,C-C motif chemokine ligand 5,CCR5,C-C motif chemokine receptor 5,,P13501,P51681
CCL8_CCR5,CCL8,C-C motif chemokine ligand 8,CCR5,C-C motif chemokine receptor 5,,P80075,P51681


In [27]:
# filter(previous_db, Receptor=="NOTCH1")

# append the original structure from OmniPath

In [28]:
lr$pair <- paste(lr$source_genesymbol, lr$target_genesymbol, sep="_")
curated$pair <- paste(curated$source_genesymbol, curated$target_genesymbol, sep="_")

Create a column to merge with. We are doing this because the complex pairs in our data have been broken down, while they are not broken down in the original data. The new column will allow us to match and merge the broken-down pairs with the corresponding pairs in the original data

In [29]:
# If the complex pair is not NA, return the complex pair under the to_merge column. 
# else return the ligand and receptor names separated by an underscore
complete_lr <- complete_lr %>% mutate(to_merge = ifelse(!is.na(complex_pair), complex_pair,
                                                  paste(Ligand, Receptor, sep="_")))

complete_curated <- complete_curated %>% mutate(to_merge = ifelse(!is.na(complex_pair), complex_pair,
                                                  paste(Ligand, Receptor, sep="_")))

In [30]:
# Merge the complete_lr data frame with the lr data frame, using the to_merge column 
# in the former and the pair column in the latter
complete_lr <- as.data.frame(merge(complete_lr, lr, by.x = "to_merge", by.y = "pair"))
complete_curated <- as.data.frame(merge(complete_curated, curated, by.x = "to_merge", by.y = "pair"))

In [31]:
# Remove the columns that were used to merge the data
complete_lr <- complete_lr %>% dplyr::select(-to_merge)
complete_lr$annotation_strategy <- "LR"
complete_curated <- complete_curated %>% dplyr::select(-to_merge)
complete_curated$annotation_strategy <- "curated"

# tagging curated ones

In [32]:
complete <- rbind(complete_lr, complete_curated)

In [33]:
head(complete)

Unnamed: 0_level_0,Pair.Name,Ligand,Ligand.Name,Receptor,Receptor.Name,complex_pair,partner_a,partner_b,source,target,⋯,is_inhibition,consensus_direction,consensus_stimulation,consensus_inhibition,sources,references,curation_effort,n_references,n_resources,annotation_strategy
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<int>,<chr>
1,A2M_LRP1,A2M,alpha-2-macroglobulin,LRP1,LDL receptor related protein 1,,P01023,Q07954,P01023,Q07954,⋯,0,1,1,0,AlzPathway;Baccin2019;CellTalkDB;EMBRACE;Fantom5_LRdb;HPMR_LRdb;HPMR_talklr;HPRD;HPRD_LRdb;HPRD_talklr;LRdb;Ramilowski2015;Ramilowski2015_Baccin2019;STRING_talklr;Wang;connectomeDB2020;iTALK;talklr,AlzPathway:19026743;Baccin2019:10652313;Baccin2019:12194978;Baccin2019:1702392;CellTalkDB:10652313;HPRD:10652313;HPRD:12194978;LRdb:10652313;connectomeDB2020:10652313;connectomeDB2020:12194978;connectomeDB2020:1702392,11,4,11,LR
2,AANAT_MTNR1A,AANAT,aralkylamine N-acetyltransferase,MTNR1A,melatonin receptor 1A,,Q16613,P48039,Q16613,P48039,⋯,0,1,1,0,Baccin2019;CellTalkDB;Fantom5_LRdb;HPMR;HPMR_LRdb;HPMR_talklr;LRdb;Ramilowski2015;Ramilowski2015_Baccin2019;Wang;connectomeDB2020;iTALK;talklr,Baccin2019:12943195;CellTalkDB:12943195;HPMR:12943195;LRdb:12943195;connectomeDB2020:12943195,5,1,9,LR
3,AANAT_MTNR1B,AANAT,aralkylamine N-acetyltransferase,MTNR1B,melatonin receptor 1B,,Q16613,P49286,Q16613,P49286,⋯,0,1,1,0,Baccin2019;CellTalkDB;Fantom5_LRdb;HPMR_LRdb;HPMR_talklr;LRdb;Ramilowski2015;Ramilowski2015_Baccin2019;Wang;connectomeDB2020;iTALK;talklr,Baccin2019:12943195;CellTalkDB:12943195;LRdb:12943195;connectomeDB2020:12943195,4,1,8,LR
4,ABCA1_SHANK1,ABCA1,ATP binding cassette subfamily A member 1,SHANK1,SH3 and multiple ankyrin repeat domains 1,,O95477,Q9Y566,O95477,Q9Y566,⋯,0,0,0,0,Baccin2019;HPRD;Ramilowski2015_Baccin2019,HPRD:16192279,1,1,2,LR
5,ACE_AGTR2,ACE,angiotensin I converting enzyme,AGTR2,angiotensin II receptor type 2,,P12821,P50052,P12821,P50052,⋯,0,0,0,0,Baccin2019;CellTalkDB;Fantom5_LRdb;HPRD;HPRD_LRdb;HPRD_talklr;LRdb;Ramilowski2015;Ramilowski2015_Baccin2019;iTALK;talklr,Baccin2019:11459796;HPRD:11459796;LRdb:11459796,3,1,7,LR
6,ACE_BDKRB2,ACE,angiotensin I converting enzyme,BDKRB2,bradykinin receptor B2,,P12821,P30411,P12821,P30411,⋯,0,0,0,0,Baccin2019;CellTalkDB;EMBRACE;Fantom5_LRdb;HPRD;HPRD_LRdb;HPRD_talklr;LRdb;Lit-BM-17;Ramilowski2015;Ramilowski2015_Baccin2019;connectomeDB2020;iTALK;talklr,Baccin2019:10748135;CellTalkDB:10748135;HPRD:10748135;LRdb:10748135;Lit-BM-17:17077303;connectomeDB2020:10748135,6,2,10,LR


In [45]:
complete <- complete %>% 
       mutate(db = replace(annotation_strategy, duplicated(Pair.Name) | 
                              duplicated(Pair.Name, fromLast = TRUE), "both"))

In [46]:
complete <- complete %>% filter(!duplicated(complete[, "Pair.Name"]))

In [53]:
filter(complete, db=="LR")

Pair.Name,Ligand,Ligand.Name,Receptor,Receptor.Name,complex_pair,partner_a,partner_b,source,target,⋯,consensus_direction,consensus_stimulation,consensus_inhibition,sources,references,curation_effort,n_references,n_resources,annotation_strategy,db
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<int>,<chr>,<chr>
ABCA1_SHANK1,ABCA1,ATP binding cassette subfamily A member 1,SHANK1,SH3 and multiple ankyrin repeat domains 1,,O95477,Q9Y566,O95477,Q9Y566,⋯,0,0,0,Baccin2019;HPRD;Ramilowski2015_Baccin2019,HPRD:16192279,1,1,2,LR,LR
ACE_AGTR2,ACE,angiotensin I converting enzyme,AGTR2,angiotensin II receptor type 2,,P12821,P50052,P12821,P50052,⋯,0,0,0,Baccin2019;CellTalkDB;Fantom5_LRdb;HPRD;HPRD_LRdb;HPRD_talklr;LRdb;Ramilowski2015;Ramilowski2015_Baccin2019;iTALK;talklr,Baccin2019:11459796;HPRD:11459796;LRdb:11459796,3,1,7,LR,LR
ACE2_SLC6A19,ACE2,angiotensin converting enzyme 2,SLC6A19,solute carrier family 6 member 19,,Q9BYF1,Q695T7,Q9BYF1,Q695T7,⋯,0,0,0,CellTalkDB;IntAct,IntAct:32132184;IntAct:34189428,2,2,2,LR,LR
ACE2_TIGIT,ACE2,angiotensin converting enzyme 2,TIGIT,T cell immunoreceptor with Ig and ITIM domains,,Q9BYF1,Q495A1,Q9BYF1,Q495A1,⋯,0,0,0,Cellinker,Cellinker:32589946,1,1,1,LR,LR
ACKR1_CCR5,ACKR1,atypical chemokine receptor 1 (Duffy blood group),CCR5,C-C motif chemokine receptor 5,,Q16570,P51681,Q16570,P51681,⋯,0,0,0,Cellinker,Cellinker:29637711,1,1,1,LR,LR
ACKR3_CXCR4,ACKR3,atypical chemokine receptor 3,CXCR4,C-X-C motif chemokine receptor 4,,P25106,P61073,P25106,P61073,⋯,0,0,0,Cellinker;IntAct,Cellinker:29637711;IntAct:19380869;IntAct:21730065;IntAct:25775528;IntAct:27331810;IntAct:28862946;IntAct:29386406,7,7,2,LR,LR
ACP4_VSTM2B,ACP4,acid phosphatase 4,VSTM2B,V-set and transmembrane domain containing 2B,,Q9BZG2,A6NLU5,Q9BZG2,A6NLU5,⋯,0,0,0,Cellinker,Cellinker:32589946,1,1,1,LR,LR
ACVR2B_DLK1,ACVR2B,activin A receptor type 2B,DLK1,delta like non-canonical Notch ligand 1,,Q13705,P80370,Q13705,P80370,⋯,0,0,0,Cellinker,Cellinker:32589946,1,1,1,LR,LR
ACVRL1_ACVR2A,ACVRL1,activin A receptor like type 1,ACVR2A,activin A receptor type 2A,,P37023,P27037,P37023,P27037,⋯,0,0,0,Cellinker;DIP;HPRD;Wang,Cellinker:30761306;DIP:10187774;DIP:8612709;HPRD:10187774,4,3,4,LR,LR
ACVRL1_CDH5,ACVRL1,activin A receptor like type 1,CDH5,cadherin 5,,P37023,P33151,P37023,P33151,⋯,0,0,0,Cellinker,Cellinker:18337748,1,1,1,LR,LR


In [None]:
#this column is needed when building CellPhoneDB
# concatanated$annotation_strategy <- ifelse(concatanated$curated == TRUE, "OmniPath_curated", "OmniPath")

In [57]:
write.csv(complete, "L_R_OmniPathFull.csv", row.names=FALSE)

In [56]:
filter(complete,Ligand=="PIK3CD-AS1")

Pair.Name,Ligand,Ligand.Name,Receptor,Receptor.Name,complex_pair,partner_a,partner_b,source,target,⋯,consensus_direction,consensus_stimulation,consensus_inhibition,sources,references,curation_effort,n_references,n_resources,annotation_strategy,db
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<int>,<chr>,<chr>
PIK3CD-AS1_LY6G6C,PIK3CD-AS1,PIK3CD antisense RNA 1,LY6G6C,lymphocyte antigen 6 family member G6C,,O00329,O95867,Q5SR53,O95867,⋯,0,0,0,Fantom5_LRdb;LRdb;iTALK,,0,0,2,LR,LR
PIK3CD-AS1_SLC16A4,PIK3CD-AS1,PIK3CD antisense RNA 1,SLC16A4,solute carrier family 16 member 4,,O00329,O15374,Q5SR53,O15374,⋯,0,0,0,Fantom5_LRdb;LRdb;iTALK,,0,0,2,LR,LR


In [None]:
# concatanated[concatanated == "PIK3CD-AS1"] <- "PIK3CD"

In [None]:
# lr[lr == "PIK3CD-AS1"] <- "PIK3CD"