# LR Enrichment Analysis

This is the Julia implementation of the LR Enrichment analysis code from ISCHIA. For the moment, I have just ported it over, as it is, but with time, I'll try to make improvements to make it more efficient.

In [1]:
using Muon
using RData
using RCall
using Random
using DataFrames
using Statistics
using Combinatorics

In [None]:
mdata = readh5mu("../data/mudata.h5mu")
lr_network = load("../data/lr_network.rds")

mdata

In [None]:
gene_names = mdata["Spatial"].var.name
mdata["Spatial"].var_names = gene_names

# Create LR_Pairs column
lr_network[!, :LR_Pairs] = string.(lr_network.from, "_", lr_network.to);
lr_network = lr_network[:, [:from, :to, :LR_Pairs]]

# Filter lr_network based on conditions
from_filter = in.(lr_network[:, :from], Ref(gene_names))
to_filter = in.(lr_network[:, :to], Ref(gene_names))
all_LR_network = lr_network[from_filter .& to_filter, :]

# To reduce the computation time for this example, we randomly sample from the whole dataset of LR interactions

# all_LR_network = all_LR_network[shuffle(1:size(all_LR_network_exp, 1)), :]
all_LR_network = all_LR_network[2000:min(4000, end), :]

# Extract unique genes and common genes
all_LR_genes = unique(vcat(all_LR_network[:, :from], all_LR_network[:, :to]))
all_LR_genes_comm = intersect(all_LR_genes, collect(gene_names));

# Create LR.pairs and LR.pairs.AllCombos
LR_pairs = all_LR_network[:, :LR_Pairs]
all_combos = [join(combo, "_") for combo in combinations(all_LR_genes_comm, 2)];

In [53]:
using ISCHIA

function new_enriched_LRs(
    adata::AnnData, COI::Vector{String}, Condition::Vector{String}, 
    LR_list::Vector{String}, LR_pairs::Vector{String}, 
    exp_th::Real, corr_th::Real)

    println("Preparing L-R presence/absence matrix")

    # Extract the expression matrix from spatial_object
    spatial_object_exp = adata.layers["counts"]
    spatial_object_exp_norm = adata.X

    # Subset the expression matrix for the interested ligands and receptors
    spatial_obj_exp_LR_subset_raw = adata[:, in.(adata.var.name, Ref(LR_list))]

    # Binarize the expression matrix based on the expression threshold
    spatial_obj_exp_LR_subset_raw_binary = spatial_obj_exp_LR_subset_raw.layers["counts"] .> exp_th
    spatial_obj_exp_LR_subset_raw.layers["binary"] = spatial_obj_exp_LR_subset_raw_binary

    LR_subset_raw_binary_mask_col = vec(sum(spatial_obj_exp_LR_subset_raw_binary, dims=1) .> 0)
    LR_subset_raw_binary_mask_row = vec(sum(spatial_obj_exp_LR_subset_raw_binary, dims=2) .> 0)

    LR_presence_absence = spatial_obj_exp_LR_subset_raw[LR_subset_raw_binary_mask_row, LR_subset_raw_binary_mask_col]
    LR_presence_absence_mat = LR_presence_absence.layers["binary"]

    # Filter spots based on COI and Condition
    mask = (adata.obs[:, "CompositionCluster_CC"] .∈ Ref(COI)) .& (adata.obs[:, "orig.ident"] .∈ Ref(Condition))
    COI_spots = adata.obs_names[mask]
    rest_of_spots = setdiff(adata.obs_names, COI_spots)

    println("Calculating L-R pairs correlation")
    COI_cors_adata = spatial_obj_exp_LR_subset_raw[mask, :]
    COI_cors = cor(Array(COI_cors_adata.layers["counts"]))
    COI_cors[isnan.(COI_cors)] .= 0.0

    println("Preparing for cooccurrence")
    common_spots = intersect(LR_presence_absence.obs_names, COI_spots)
    coocur_COI = LR_presence_absence[common_spots, :]
    coocur_COI_exp = DataFrame(Matrix(transpose(coocur_COI.layers["binary"])), common_spots)
    
    println("Cooccurrence calculation starts...")
    cooccur_COI_res = calculate_cooccurrence_stats(Matrix(coocur_COI_exp), coocur_COI.var.name; spp_names=true)
    println("Cooccurrence calculation ended")

    println("Summary of cooccurrence results:")
    # display(R"summary(cooccur_COI_res)")

    println("Probability table of cooccurrence results:")
    # display(R"library(cooccur); prob.table(cooccur_COI_res)")

    cooccur_res_df = cooccur_COI_res[:results]
    # Add a 'pair' column to the result DataFrame
    cooccur_res_df[!, :pair12] = string.(cooccur_res_df[!, :sp1_name], "_", cooccur_res_df[!, :sp2_name])
    cooccur_res_df[!, :pair21] = string.(cooccur_res_df[!, :sp2_name], "_", cooccur_res_df[!, :sp1_name])

    all_cooccur_pairs = Set([cooccur_res_df.pair12; cooccur_res_df.pair21])
    common_pairs = intersect(LR_pairs, all_cooccur_pairs)

    COI_enriched_LRs = DataFrame(from=String[], to=String[], correlation=Float64[], ligand_FC=Float64[], Receptor_FC=Float64[])
    pair_count = 0
    for pair in common_pairs
        pair_count += 1
        println("$pair_count / $(length(common_pairs))")

        # Split the LR pair into individual ligand and receptor
        LR_pair_words = split(pair, "_")
        LR_pair_ligand = String(LR_pair_words[1])
        LR_pair_Receptor = String(LR_pair_words[2])
        
        # Mean expression of the ligand in the Cluster of Interest (COI) spots and rest of the spots
        ligand_exp_COI_mean = mean(adata[COI_spots, LR_pair_ligand].X)
        ligand_exp_otherspots_mean = mean(adata[rest_of_spots, LR_pair_ligand].X)
        # Calculate the ligand fold change (FC) by dividing COI mean by rest of the spots mean
        ligand_FC = round(ligand_exp_COI_mean / ligand_exp_otherspots_mean, digits=4)
        
        Receptor_exp_COI_mean = mean(adata[COI_spots, LR_pair_Receptor].X)
        Receptor_exp_otherspots_mean = mean(adata[rest_of_spots, LR_pair_Receptor].X)
        Receptor_FC = round(Receptor_exp_COI_mean / Receptor_exp_otherspots_mean, digits=4)
        
        # Retrieve the p-value for the pair from the co-occurrence results DataFrame
        pair_p = cooccur_res_df[(cooccur_res_df.pair12 .== pair) .| (cooccur_res_df.pair21 .== pair), :p_gt][1]

        # Find the indices of the ligand and receptor in the COI correlation matrix
        ligand_index = findfirst(==(LR_pair_ligand), COI_cors_adata.var_names)
        receptor_index = findfirst(==(LR_pair_Receptor), COI_cors_adata.var_names)

        # Check if the pair is significant (p-value < 0.05) and the correlation is above the threshold
        if pair_p < 0.05 && COI_cors[ligand_index, receptor_index] > corr_th
            added_row = DataFrame(from=[LR_pair_ligand], to=[LR_pair_Receptor], correlation=[COI_cors[ligand_index, receptor_index]], ligand_FC=[ligand_FC], Receptor_FC=[Receptor_FC])
            append!(COI_enriched_LRs, added_row)
        end
    end

    # Sort the enriched LRs by correlation in decreasing order
    sort!(COI_enriched_LRs, rev=true, [:correlation])

    # Add a 'pair' column to the enriched LRs DataFrame
    COI_enriched_LRs[!, :pair] = string.(COI_enriched_LRs[!, :from], "_", COI_enriched_LRs[!, :to])

    Output_dict = Dict("enriched_LRs" => COI_enriched_LRs, "cooccurrence_table" => cooccur_COI_res)
    return Output_dict
end

new_enriched_LRs (generic function with 1 method)

In [54]:
spatial_object = mdata["Spatial"]
COI = ["CC4"]
Condition = unique(spatial_object.obs[!, "orig.ident"])
LR_list = all_LR_genes_comm
LR_pairs = LR_pairs
exp_th = 1
corr_th = 0.2

out = new_enriched_LRs(spatial_object, COI, Condition, LR_list, LR_pairs, exp_th, corr_th)

Preparing L-R presence/absence matrix


Calculating L-R pairs correlation
Preparing for cooccurrence
Cooccurrence calculation starts...
Cooccurrence calculation ended
Summary of cooccurrence results:
Probability table of cooccurrence results:


1 / 2


2 / 2




Dict{String, Any} with 2 entries:
  "cooccurrence_table" => Dict{Symbol, Any}(:percent_sig=>21.2766, :pairs=>47, …
  "enriched_LRs"       => [1m1×6 DataFrame[0m[0m…

In [56]:
out["cooccurrence_table"]

Dict{Symbol, Any} with 15 entries:
  :percent_sig          => 21.2766
  :pairs                => 47
  :pot_pairs            => 32640
  :spp_names            => ["AGRN", "TNFRSF14", "TNFRSF1B", "EPHA2", "WNT4", "E…
  :sites                => [13 13 … 13 13; 13 13 … 13 13; … ; 13 13 … 13 13; 13…
  :species              => 256
  :true_rand_classifier => 0.1
  :negative             => 0
  :co_occurrences       => 10
  :random               => 27
  :unclassifiable       => 10
  :results              => [1m47×13 DataFrame[0m[0m…
  :positive             => 10
  :spp_key              => [1m256×2 DataFrame[0m[0m…
  :omitted              => 32593

In [53]:
"""
Calculate significant co-occurring Ligand-Receptor pairs.

This function calculates co-occurring Ligand-Receptor (LR) pairs that are statistically significant based on expression levels and correlations in a spatial dataset.

Parameters:
- `adata::AnnData`: The (spatial) anndata dataset containing expression data.
- `COI::Vector{String}`: Cluster of Interest, a subset of spots to focus on.
- `Condition::Vector{String}`: Condition of interest within the dataset.
- `LR_list::Vector{String}`: List of ligands and receptors to consider.
- `LR_pairs::Vector{String}`: List of LR pairs to analyze.
- `exp_th::Real`: Expression threshold for binarizing the expression matrix.
- `corr_th::Real`: Correlation threshold for LR pairs.

Returns:
A dictionary containing:
- `"enriched_LRs"`: DataFrame of enriched LR pairs.
- `"cooccurrence_table"`: Co-occurrence analysis results.

"""
function enriched_LRs(
    adata::AnnData, COI::Vector{String}, Condition::Vector{String}, 
    LR_list::Vector{String}, LR_pairs::Vector{String}, 
    exp_th::Real, corr_th::Real)

    println("Preparing L-R presence/absence matrix")

    # Extract the expression matrix from spatial_object
    spatial_object_exp = adata.layers["counts"]
    spatial_object_exp_norm = adata.X

    # Subset the expression matrix for the interested ligands and receptors
    spatial_obj_exp_LR_subset_raw = adata[:, in.(adata.var.name, Ref(LR_list))]

    # Binarize the expression matrix based on the expression threshold
    spatial_obj_exp_LR_subset_raw_binary = spatial_obj_exp_LR_subset_raw.layers["counts"] .> exp_th
    spatial_obj_exp_LR_subset_raw.layers["binary"] = spatial_obj_exp_LR_subset_raw_binary

    LR_subset_raw_binary_mask_col = vec(sum(spatial_obj_exp_LR_subset_raw_binary, dims=1) .> 0)
    LR_subset_raw_binary_mask_row = vec(sum(spatial_obj_exp_LR_subset_raw_binary, dims=2) .> 0)

    LR_presence_absence = spatial_obj_exp_LR_subset_raw[LR_subset_raw_binary_mask_row, LR_subset_raw_binary_mask_col]
    LR_presence_absence_mat = LR_presence_absence.layers["binary"]

    # Filter spots based on COI and Condition
    mask = (adata.obs[:, "CompositionCluster_CC"] .∈ Ref(COI)) .& (adata.obs[:, "orig.ident"] .∈ Ref(Condition))
    COI_spots = adata.obs_names[mask]
    rest_of_spots = setdiff(adata.obs_names, COI_spots)

    println("Calculating L-R pairs correlation")
    COI_cors_adata = spatial_obj_exp_LR_subset_raw[mask, :]
    COI_cors = cor(Array(COI_cors_adata.layers["counts"]))
    COI_cors[isnan.(COI_cors)] .= 0.0

    println("Preparing for cooccurrence")
    common_spots = intersect(LR_presence_absence.obs_names, COI_spots)
    coocur_COI = LR_presence_absence[common_spots, :]
    coocur_COI_exp = DataFrame(Matrix(transpose(coocur_COI.layers["binary"])), common_spots)
    
    println("Cooccurrence calculation starts...")
    coocur_COI_exp_row_names = coocur_COI.var.name
    @rput coocur_COI_exp coocur_COI_exp_row_names
    R"""
    row.names(coocur_COI_exp) <- coocur_COI_exp_row_names
    cooccur_COI_res = ISCHIA.cooccur(mat=coocur_COI_exp, type="spp_site", thresh=TRUE, spp_names=TRUE)
    """
    @rget cooccur_COI_res
    println("Cooccurrence calculation ended")

    println("Summary of cooccurrence results:")
    display(R"summary(cooccur_COI_res)")

    println("Probability table of cooccurrence results:")
    display(R"library(cooccur); prob.table(cooccur_COI_res)")

    cooccur_res_df = cooccur_COI_res[:results]
    # Add a 'pair' column to the result DataFrame
    cooccur_res_df[!, :pair12] = string.(cooccur_res_df[!, :sp1_name], "_", cooccur_res_df[!, :sp2_name])
    cooccur_res_df[!, :pair21] = string.(cooccur_res_df[!, :sp2_name], "_", cooccur_res_df[!, :sp1_name])

    all_cooccur_pairs = Set([cooccur_res_df.pair12; cooccur_res_df.pair21])
    common_pairs = intersect(LR_pairs, all_cooccur_pairs)

    COI_enriched_LRs = DataFrame(from=String[], to=String[], correlation=Float64[], ligand_FC=Float64[], Receptor_FC=Float64[])
    pair_count = 0
    for pair in common_pairs
        pair_count += 1
        println("$pair_count / $(length(common_pairs))")

        # Split the LR pair into individual ligand and receptor
        LR_pair_words = split(pair, "_")
        LR_pair_ligand = String(LR_pair_words[1])
        LR_pair_Receptor = String(LR_pair_words[2])
        
        # Mean expression of the ligand in the Cluster of Interest (COI) spots and rest of the spots
        ligand_exp_COI_mean = mean(adata[COI_spots, LR_pair_ligand].X)
        ligand_exp_otherspots_mean = mean(adata[rest_of_spots, LR_pair_ligand].X)
        # Calculate the ligand fold change (FC) by dividing COI mean by rest of the spots mean
        ligand_FC = round(ligand_exp_COI_mean / ligand_exp_otherspots_mean, digits=4)
        
        Receptor_exp_COI_mean = mean(adata[COI_spots, LR_pair_Receptor].X)
        Receptor_exp_otherspots_mean = mean(adata[rest_of_spots, LR_pair_Receptor].X)
        Receptor_FC = round(Receptor_exp_COI_mean / Receptor_exp_otherspots_mean, digits=4)
        
        # Retrieve the p-value for the pair from the co-occurrence results DataFrame
        pair_p = cooccur_res_df[(cooccur_res_df.pair12 .== pair) .| (cooccur_res_df.pair21 .== pair), :p_gt][1]

        # Find the indices of the ligand and receptor in the COI correlation matrix
        ligand_index = findfirst(==(LR_pair_ligand), COI_cors_adata.var_names)
        receptor_index = findfirst(==(LR_pair_Receptor), COI_cors_adata.var_names)

        # Check if the pair is significant (p-value < 0.05) and the correlation is above the threshold
        if pair_p < 0.05 && COI_cors[ligand_index, receptor_index] > corr_th
            added_row = DataFrame(from=[LR_pair_ligand], to=[LR_pair_Receptor], correlation=[COI_cors[ligand_index, receptor_index]], ligand_FC=[ligand_FC], Receptor_FC=[Receptor_FC])
            append!(COI_enriched_LRs, added_row)
        end
    end

    # Sort the enriched LRs by correlation in decreasing order
    sort!(COI_enriched_LRs, rev=true, [:correlation])

    # Add a 'pair' column to the enriched LRs DataFrame
    COI_enriched_LRs[!, :pair] = string.(COI_enriched_LRs[!, :from], "_", COI_enriched_LRs[!, :to])

    Output_dict = Dict("enriched_LRs" => COI_enriched_LRs, "cooccurrence_table" => cooccur_COI_res)
    return Output_dict
end

enriched_LRs

In [54]:
spatial_object = mdata["Spatial"]
COI = ["CC4"]
Condition = unique(spatial_object.obs[!, "orig.ident"])
LR_list = all_LR_genes_comm
LR_pairs = LR_pairs
exp_th = 1
corr_th = 0.2

out = enriched_LRs(spatial_object, COI, Condition, LR_list, LR_pairs, exp_th, corr_th)

0.2

In [55]:
out = enriched_LRs(spatial_object, COI, Condition, LR_list, LR_pairs, exp_th, corr_th)

Preparing L-R presence/absence matrix
Calculating L-R pairs correlation
Preparing for cooccurrence
Cooccurrence calculation starts...


  |                                                                            

  |                                                                      |   0%  |                                                                            

  |                                                                      |   1%  |                                                                              |=                                                                     |   1%

RObject{RealSxp}
       Species          Sites       Positive       Negative         Random 
         256.0           13.0           10.0            0.0           27.0 
Unclassifiable Non-random (%) 
          10.0           21.3 
attr(,"class")
[1] "summary.cooccur"


RObject{VecSxp}
   sp1 sp2 sp1_inc sp2_inc obs_cooccur prob_cooccur exp_cooccur    p_lt    p_gt
1   35  54       4       5           3        0.118         1.5 0.99301 0.11888
2   35 134       4       4           3        0.095         1.2 0.99860 0.05175
3   35 135       4       5           3        0.118         1.5 0.99301 0.11888
4   35 168       4       4           2        0.095         1.2 0.94825 0.35385
5   35 180       4       4           2        0.095         1.2 0.94825 0.35385
6   35 194       4       8           3        0.189         2.5 0.90210 0.48951
7   35 229       4       5           4        0.118         1.5 1.00000 0.00699
8   54  89       5       3           2        0.089         1.2 0.96503 0.31469
9   54 121       5       3           2        0.089         1.2 0.96503 0.31469
10  54 134       5       4           3        0.118         1.5 0.99301 0.11888
11  54 135       5       5           4        0.148         1.9 0.99922 0.03186
12  54 160       5      

Cooccurrence calculation ended
Summary of cooccurrence results:
Call:
ISCHIA.cooccur(mat = coocur_COI_exp, type = "spp_site", 
    thresh = TRUE, spp_names = TRUE)

Of 32640 species pair combinations, 32593 pairs (99.86 %) were removed from the analysis because expected co-occurrence was < 1 and 47 pairs were analyzed

Cooccurrence Summary:
Probability table of cooccurrence results:


│   The co-occurrence model was run using 'thresh = TRUE.' The probability table may not include all species pairs
└ @ RCall C:\Users\mraadam\.julia\packages\RCall\gOwEW\src\io.jl:172


1 / 2
2 / 2


Dict{String, Any} with 2 entries:
  "cooccurrence_table" => OrderedDict{Symbol, Any}(:call=>:(var"ISCHIA.cooccur"…
  "COI.enriched_LRs"   => [1m1×6 DataFrame[0m[0m…

In [45]:
out["COI.enriched_LRs"]

Row,from,to,correlation,ligand_FC,Receptor_FC,pair
Unnamed: 0_level_1,String,String,Float64,Float64,Float64,String
1,C3,CXCR4,0.79157,1.3346,3.3869,C3_CXCR4


In [46]:
typeof(mdata["Spatial"])

AnnData