# Nichesphere differential co-localization tutorial

Nichesphere is an sc-verse compatible Python library which allows the user to find differentially co-localized cellular niches and biological processes involved in their interactions based on cell type pairs co-localization probabilities in different conditions. Cell type pair co-localization probabilities are obtained in different ways: from deconvoluted Visium 10x / PIC-seq data (probabilities of finding each cell type in each spot / multiplet), or counting cell boundaries overlaps for each cell type pair in single cell spatial data (MERFISH , CODEX ...). This tutorial focuses on defining groups of cells that converge or split in disease (Ischemia) based on differential co-localization. 

Nichesphere also offers the possibility to look at localized differential cell - cell communication based on Ligand-Receptor pairs expression data, such as results from CrossTalkeR [ref]. This is addressed in the localized differential communication tutorial.


## 1. Libraries and functions

In [None]:
import pandas as pd
import scipy
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import networkx as nx
import warnings
import scanpy as sc
import mudata as md
import numpy as np
from community_layout.layout_class import CommunityLayout
warnings.filterwarnings("ignore")
import os
import math

import nichesphere

## 2. Data at first glance

In this example we will use data from the Myocardial Infarction atlas from Kuppe, C. et. Al., 2022

In [None]:
mudata=md.read('heart_MI_ST_SC_23samples.h5mu')
mudata

This is a subset with 23 samples (samples with less than 1500 cells in the snRNA-seq data were filtered out), and 33 different cell subtypes

In [None]:
mudata['sc'].obsm['umap']=mudata['sc'].obsm['X_umap_harmony']
sc.pl.umap(mudata['sc'], 
           color=['cell_type', 'cell_subtype2'], wspace=0.3)

In this case, we will get cell type co-localization probabilities from **deconvoluted Visium** data (Cell type probabilities per spot): 

In a previous step, we used MOSCOT(Klein et. al., 2025) to deconvolute cell subtypes in visium slices from the same 23 samples , getting matrices of probabilities of each cell being in each spot. Then we got cell type probabilities per spot summing the probabilities of cells of the same kind in each spot; thus getting cell type probability matrices for all samples.

(you can have a closer look at these steps in the preprocessing tutorial)

In [None]:
CTprops=pd.read_csv('CTprops.csv', index_col=0)
CTprops.head()

From these deconvolution results, we can look at **cell type proportions per sample**. For this we will need the spot ID and sample correspondence:

In [None]:
spotSamples=mudata['visium'].obs.patient_region_id
spotSamples.reset_index().head()

A way to check the deconvolution proportions is using a clustermap

In [None]:
CTprops_sample=CTprops.copy()
CTprops_sample['sample']=mudata['visium'].obs.patient_region_id
sns.clustermap(CTprops_sample.groupby('sample').sum().T/CTprops_sample.groupby('sample').sum().sum(axis=1) , 
               cmap='Blues', method='ward')

Alternativelly, we can check the deconvolution proportions using barplots

In [None]:
t1=pd.DataFrame(CTprops[spotSamples=='control_P7'].sum(), columns=['control_P7'])
t2=pd.DataFrame(CTprops[spotSamples=='IZ_P15'].sum(), columns=['IZ_P15'])

In [None]:
sns.set(font_scale=1)
sns.set_style(rc = {'axes.facecolor': 'white'})

fig, axes = plt.subplots(1, 2, figsize=(20, 7))

sns.barplot(ax=axes[0], y=t1.sort_values('control_P7', ascending=False).index, x='control_P7', 
            data=t1.sort_values('control_P7', ascending=False), color='darkblue')
axes[0].set_title('control_P7')

sns.barplot(ax=axes[1], y=t2.sort_values('IZ_P15', ascending=False).index, x='IZ_P15',
            data=t2.sort_values('IZ_P15', ascending=False), color='darkred')
axes[1].set_title('IZ_P15')

We can visualize **cell type deconvolution results in slices** (spots are colored by the the cell type with highest proportion). For this we will need the spatial coordinates of the spots in the Visium slices need to be in the slot **uns['spatial']** of the Visium anndata object:

In [None]:
mudata['visium'].uns['spatial']=mudata['visium'].obsm['X_spatial']

In [None]:
idPat = 'GT_IZ_P9'
nichesphere.coloc.spatialCTPlot(adata=mudata['visium'][mudata['visium'].obs.patient_region_id==idPat].copy(), 
                                CTprobs=CTprops.loc[spotSamples.index[spotSamples==idPat]], 
                                cell_types=mudata['sc'].obs.cell_subtype2, spot_size=0.015, 
                                legend_fontsize=7)

## 3. Co-localization

We computed then co-localization probabilities from the cell type probability matrices. Here we got concatenated co-localization sample matrices of cell type x cell type.

Then we reshaped the co-localization data into a matrix of cell type pairs x samples.

(you can have a closer look at these steps in the preprocessing tutorial)

In [None]:
colocPerSample=pd.read_csv('colocPerSample.csv', index_col=0)
colocPerSample.head()

The sum of the probabilities of every cell type pair in a sample must be = 1

In [None]:
colocPerSample.sum(axis=1)

Same cell type interactions will be excluded later on, so we'll have a list of same cell type interaction pairs in order to subset the co-localization table we'll generate in the next step.

In [None]:
cell_types=CTprops.columns
oneCTints=cell_types+'-'+cell_types

**Conditions**

We will need the following metadata to subset the samples in **control (myogenic)** and **disease (ischemic)**:

In [None]:
sampleTypesDF=pd.read_csv('MI_sampleTypesDF.csv')
sampleTypesDF.head()

## 4. Differential co-localization analysis

We will test differential co-localization between **myogenic** and **ischemic** samples using Wilcoxon rank sums tests:

**Null Hypothesis (H0):**
The median of the population of differences between co-localization probabilities of cell types a and b in myogenic and ischemic samples is zero.

**Alternative Hypothesis (H1):**
The median of the population of differences between co-localization probabilities of cell types a and b in myogenic and ischemic samples is not equal to zero.

In [None]:
myo_iscDF=nichesphere.coloc.diffColoc_test(coloc_pair_sample=colocPerSample, 
                                           sampleTypes=sampleTypesDF, 
                                           exp_condition='ischemic', 
                                           ctrl_condition='myogenic')
myo_iscDF.head() # myp_iscDF type: pd.DataFrame

Then we will reshape the data to visualize the Wilcoxon test scores in a heatmap and filter non significant co-localization differences using the parameter **p** (in this case, scores with p-values > 0.1 are filtered out)

In [None]:
myo_isc_HM=nichesphere.tl.pval_filtered_HMdf(testDF=myo_iscDF, 
                                             oneCTinteractions=oneCTints, 
                                             p=0.1,                             #threshold p-value to filter
                                             cell_types=cell_types)
myo_isc_HM.head() # myo_isc_HM type: pd.DataFrame

As the cells classified as proliferating cells (prolif) are many different cell types and thus hard to interpret, we'll remove them for further analysis.

In [None]:
myo_isc_HM=myo_isc_HM.loc[myo_isc_HM.columns.str.contains('prolif')==False,myo_isc_HM.index.str.contains('prolif')==False]

We will also remove cells with no significant co-localization differences

In [None]:
myo_isc_HM=myo_isc_HM.loc[myo_isc_HM.sum()!=0,myo_isc_HM.sum()!=0]

Now we can plot the differential co-localization scores heatmap

In [None]:
sns.set(font_scale=1)
plot=sns.clustermap(myo_isc_HM, cmap='vlag', center=0, method='ward', cbar_kws={'label': 'diffColoc. Score'})

**Differential co-localization network**

To build the differential co-localization network, we will get an **adjacency matrix** (adj) based on the **euclidean distances** among the distributions of significant differential co-localization scores for the different cell types

In [None]:
HMdist=pd.DataFrame(scipy.spatial.distance.squareform(scipy.spatial.distance.pdist(myo_isc_HM)), 
                    columns=myo_isc_HM.columns, index=myo_isc_HM.index)

HMsimm=1-HMdist/HMdist.max().max()
##Cell pairs with not significant differential co-localization get 0
HMsimm[myo_isc_HM==0]=0

A **cell group dictionary** should be used here to visualize different cell groups in different colors. As we don't have cell groups yet, we'll have a dictionary of all cells in one group and a list of one color

In [None]:
niches_dict={'1_': list(myo_isc_HM.index)}
clist=['#4daf4a']

Now we can plot the differential co-localization network using the **colocNW** function from Nichesphere. This function has many parameters that can be tuned: 

**nodeSize** for example, defines how the size of the nodes will be calculated. Options are 'betweeness', 'pagerank' (both network statistics) and None (all nodes have the same size).
**alpha** indicates the transparency of the edges and in goes from 0 (completely transparent) to 1 (opaque)
**fsize** is the size of the figure (x,y)

This function returns the network with the edge weights corresponding to the diff. coloc. scores (positive and negative)

In [None]:
plt.rcParams['axes.facecolor'] = "None"
nichesphere.coloc.colocNW(x_diff=myo_isc_HM,            #differential co-localization matrix
                          adj=HMsimm,                   #adjacency matrix
                          cell_group=niches_dict, 
                          clist=clist, 
                          nodeSize='betweeness',        
                          lab_spacing=9,                #space between node and label
                          alpha=0.4,                    #edges transparency
                          fsize=(12,12))                #figure size

Now we'll do community detection using Louvain. First we will get the network from the adjacency matrix as we won't use the signed weights for this

In [None]:
gCol_unsigned=nx.from_pandas_adjacency(HMsimm, create_using=nx.Graph)

We will use the community-layout library function **CommunityLayout** to show the communities in a layout suited for this. This function is compatible with networkx (Hagberg et. al., 2008) community detection functions, which will be used internally as indicated by the parameters **community_algorithm** and **community_kwargs**

In [None]:
## Calculate community layout
cl=CommunityLayout(gCol_unsigned,
        community_compression = 0.4,
        layout_algorithm = nx.spring_layout,
        layout_kwargs = {"k":75, "iterations":1000},
        community_algorithm = nx.algorithms.community.louvain_communities,
        community_kwargs = {"resolution":1.1,  'seed':12, 'weight':'weight'})

We can extract the communities (niches) as follows:

In [None]:
d = {index: list(value) for index, value in enumerate(cl.communities())}
print(pd.DataFrame.from_dict(d, orient='index').T.to_string(index=False))

And then name them

In [None]:
niche_names=['1_Stromal', '2_Stressed_CM', '3_Healthy_CM', '4_Fibrotic']
niches_dict=dict(zip(niche_names,list(d.values()))) 
print(pd.DataFrame.from_dict(niches_dict, orient='index').T.to_string(index=False))

And assign them colors to color the network nodes according to their niche

In [None]:
clist=['#4daf4a', '#0072B5', '#BC3C29', '#ffff33']
niche_cols=pd.Series(clist, index=list(niches_dict.keys()))
niches_df=nichesphere.tl.cells_niche_colors(CTs=CTprops.columns, 
                                            niche_colors=niche_cols, 
                                            niche_dict=niches_dict)
niches_df.head()

Then we can get the node positions to input them to the nichesphere **colocNW** function through the **pos** parameter

In [None]:
pos=cl.full_positions

And plot the niches on the community layout

In [None]:
plt.rcParams['axes.facecolor'] = "None"

gCol=nichesphere.coloc.colocNW(x_diff=myo_isc_HM, 
                               adj=HMsimm,
                               cell_group=niches_dict, 
                               clist=clist, 
                               nodeSize='betweeness', 
                               layout=None,                         #layout needs to be set to None if we provide node positions
                               lab_spacing=0.05, 
                               thr=1, 
                               alpha=0.4, 
                               fsize=(10,10), 
                               pos=pos,                             #node positions (from the CommunityLayout function)
                               edge_scale=1,                        #edge width
                               legend_ax=[0.7, 0.05, 0.15, 0.2])    #legend position
#Legend
legend_elements1=[plt.Line2D([0], [0], marker="o" ,color='w', markerfacecolor=clist[i], lw=4, 
                             label=list(niches_dict.keys())[i], ms=10) for i in range(len(list(niches_dict.keys())))]
plt.gca().add_artist(plt.legend(handles=legend_elements1,loc='lower left', fontsize=13, title='Niches', 
                                alignment='left'))
#plt.savefig('diffColocNW_CD.pdf')

### Network Statistics

We can also calculate some network statistics with the networkx package functions (this will be done on the signed network):

In [None]:
t0=pd.DataFrame({'betweenness':[nx.betweenness_centrality(gCol)[x] for x in list(gCol.nodes)], 
                 'degree':[nx.degree_centrality(gCol)[x] for x in list(gCol.nodes)], 
                 'pagerank':[nx.pagerank(gCol, weight=None)[x] for x in list(gCol.nodes)]})
t0.index=list(gCol.nodes)

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(21, 7))
for i in range(len(t0.columns)):
    _ = sns.barplot(ax=axes[i], y=t0.sort_values(t0.columns[i], ascending=False).index[0:5], x=t0.columns[i], 
        data=t0.sort_values(t0.columns[i], ascending=False)[0:5], color='purple')
    axes[i].set_title(t0.columns[i])
fig.tight_layout()

So we can look separately at positive and negative degree:

In [None]:
## Positive edges stats
G_pos=gCol.copy()
to_remove=[(a,b) for a, b, attrs in G_pos.edges(data=True) if attrs["weight"] <= 0]
G_pos.remove_edges_from(to_remove)

t1=pd.DataFrame({'degree':[nx.degree_centrality(G_pos)[x] for x in list(G_pos.nodes)]})
t1.index=list(G_pos.nodes)

In [None]:
## Negative edges stats
G_neg=gCol.copy()
to_remove=[(a,b) for a, b, attrs in G_neg.edges(data=True) if attrs["weight"] >= 0]
G_neg.remove_edges_from(to_remove)

t2=pd.DataFrame({'degree':[nx.degree_centrality(G_neg)[x] for x in list(G_neg.nodes)]})
t2.index=list(G_neg.nodes)

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 7))

_=sns.barplot(ax=axes[0], y=t1.sort_values('degree', ascending=False).index[0:5], x='degree', 
              data=t1.sort_values('degree', ascending=False)[0:5], color='red')
axes[0].set_title('degree_positive')

_=sns.barplot(ax=axes[1], y=t2.sort_values('degree', ascending=False).index[0:5], x='degree', 
              data=t2.sort_values('degree', ascending=False)[0:5], color='blue')
axes[1].set_title('degree_negative')

fig.tight_layout()

caculate betweeness for positive and negative subgraph seperately

In [None]:
betw_pos = pd.DataFrame({'betweenness_pos':[nx.betweenness_centrality(G_pos)[x] for x in list(G_pos.nodes)]})
betw_pos.index=list(G_pos.nodes)

betw_neg = pd.DataFrame({'betweenness_neg':[nx.betweenness_centrality(G_neg)[x] for x in list(G_neg.nodes)]})
betw_neg.index=list(G_neg.nodes)

#### Prestige

In [None]:
def node_prestige(node, pos_degree, neg_degree):
    """
    calculate prestige of a node

    :pos_degree: degree of node in network for positive links
    :neg_degree: degree of node in network for negative links
    
    """
    
    pres = ( abs(pos_degree) - abs(neg_degree) ) / ( abs(pos_degree) + abs(neg_degree) )
    return pres


In [None]:
prestige = pd.DataFrame({'prestige':[node_prestige(node,t1['degree'][node],t2['degree'][node]) for node in list(gCol.nodes)]})
prestige.index = list(gCol.nodes)

In [None]:
### remove all entries with a prestige of -1. for better overview for plotting
prestige_removed = {key:val for key, val in prestige['prestige'].items() if val != -1.}
df_prestige = pd.DataFrame.from_dict({'prestige':prestige_removed})

In [None]:
### plot positive and negative prestige scores
fig, axes = plt.subplots(1, 2, figsize=(14, 7))

sns.barplot(ax=axes[0], y=df_prestige.sort_values('prestige', ascending=False).index[0:4], x='prestige', 
              data=df_prestige.sort_values('prestige', ascending=False)[0:4], color='red')
axes[0].set_title('prestige_positive')

sns.barplot(ax=axes[1], y=df_prestige.sort_values('prestige', ascending=True).index[0:6], x='prestige', 
              data=df_prestige.sort_values('prestige', ascending=True)[0:6], color='blue')
axes[1].set_title('prestige_negative')

fig.tight_layout()

Plot sign split degree and prestige

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(21, 7))
#sorted_y_order = prestige.sort_values('prestige', ascending=True).index
sorted_y_order = t2.sort_values('degree', ascending=False).index

sns.barplot(ax=axes[0], y=sorted_y_order, x='degree', 
              data=t2.loc[sorted_y_order], color='blue')
axes[0].set_title('degree_negative')

sns.barplot(ax=axes[1], y=sorted_y_order, x='degree', 
              data=t1.loc[sorted_y_order], color='red')
axes[1].set_title('degree_positive')

sns.barplot(ax=axes[2], y=sorted_y_order, x='prestige', 
              data=prestige.loc[sorted_y_order], color='purple')
axes[2].set_title('prestige')

fig.tight_layout()

#### Signed PageRank

Approach: calculate PageRank for positive and negative network seperatly

Reference: X. Yin, X. Hu, Y. Chen, X. Yuan, and B. Li, “Signed-PageRank: An Efficient Influence Maximization Framework for Signed Social Networks,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 5, pp. 2208–2222, May 2021, doi: 10.1109/TKDE.2019.2947421.

In [None]:
# merge results of previous calculations
results = pd.DataFrame()
results["degree"] = t0["degree"]
results["degree_pos"] = t1["degree"]
results["degree_neg"] = t2["degree"]
results["prestige"] = prestige["prestige"]
results["betweenness_pos"] = betw_pos["betweenness_pos"]
results["betweenness_neg"] = betw_neg["betweenness_neg"]

In [None]:
# split network into subgraphs with only positive and only negative edges
G_pos = nx.DiGraph()
G_neg = nx.DiGraph()

for u, v, data in gCol.edges(data=True):
    weight = data.get('weight', 1)
    if weight > 0:
        G_pos.add_edge(u, v, weight=weight)
    elif weight < 0:
        G_neg.add_edge(u, v, weight=abs(weight)) 

In [None]:
# calculate PageRank for positive and negative edges
PR_pos=pd.DataFrame({'pagerank_pos':[nx.pagerank(G_pos, weight='weight')[x] for x in list(G_pos.nodes)]})
PR_pos.index=list(G_pos.nodes)

PR_neg=pd.DataFrame({'pagerank_neg':[nx.pagerank(G_neg, weight='weight')[x] for x in list(G_neg.nodes)]})
PR_neg.index=list(G_neg.nodes)

# align index of pos and neg DataFrame
PR_pos = PR_pos.reindex(PR_neg.index)


In [None]:
# store results in DataFrame
results["pagerank_pos"] = PR_pos["pagerank_pos"]
results["pagerank_neg"] = PR_neg["pagerank_neg"]

# calculate signed PageRanks as difference of positive and negative PageRank
results["pagerank_signed"] = PR_pos["pagerank_pos"].fillna(0) - PR_neg["pagerank_neg"]

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(21, 7))
sorted_y_order = PR_signed["pagerank_signed"].sort_values(ascending=False).index

sns.barplot(ax=axes[0], y=sorted_y_order, x='pagerank_neg', 
              data=PR_signed.loc[sorted_y_order], color='blue')
axes[0].set_title('pagerank_neg')

sns.barplot(ax=axes[1], y=sorted_y_order, x='pagerank_pos', 
              data=PR_signed.loc[sorted_y_order], color='red')
axes[1].set_title('pagerank_pos')

sns.barplot(ax=axes[2], y=sorted_y_order, x='pagerank_signed', 
              data=PR_signed.loc[sorted_y_order], color='purple')
axes[2].set_title('pagerank_signed')

fig.tight_layout()

Unsigned and signed PageRank vs. unsigned degree

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(21, 7))
sorted_y_order = results["pagerank_signed"].sort_values(ascending=False).index

sns.barplot(ax=axes[0], y=sorted_y_order, x='pagerank_signed', 
              data=results.loc[sorted_y_order], color='blue')
axes[0].set_title('pagerank_signed')

sns.barplot(ax=axes[1], y=sorted_y_order, x='pagerank_unsigned', 
              data=results.loc[sorted_y_order], color='red')
axes[1].set_title('pagerank_unsigned')

sns.barplot(ax=axes[2], y=sorted_y_order, x='degree', 
              data=results.loc[sorted_y_order], color='purple')
axes[2].set_title('degree')

fig.tight_layout()

Unsigned and signed PageRank vs. positive and negative Degree

In [None]:
fig, axes = plt.subplots(1, 4, figsize=(28, 7))
sorted_y_order = results["pagerank_signed"].sort_values(ascending=False).index

sns.barplot(ax=axes[0], y=sorted_y_order, x='pagerank_signed', 
              data=results.loc[sorted_y_order], color='purple')
axes[0].set_title('pagerank_signed')

sns.barplot(ax=axes[1], y=sorted_y_order, x='pagerank_unsigned', 
              data=results.loc[sorted_y_order], color='green')
axes[1].set_title('pagerank_unsigned')

sns.barplot(ax=axes[2], y=sorted_y_order, x='degree_neg', 
              data=results.loc[sorted_y_order], color='blue')
axes[2].set_title('degree_negative')

sns.barplot(ax=axes[3], y=sorted_y_order, x='degree_pos', 
              data=results.loc[sorted_y_order], color='red')
axes[3].set_title('degree_positive')

fig.tight_layout()

In [None]:
results["delta"] = results["pagerank_signed"] - results["pagerank_unsigned"]

plt.figure(figsize=(10, 8))
plt.scatter(
    results["pagerank_unsigned"], 
    results["pagerank_signed"],
    c=results["delta"],
    cmap="coolwarm",  # blue = negative delta, red = positive
    edgecolor='k'
)

for idx, row in results.iterrows():
    plt.text(row["pagerank_unsigned"] + 0.001, row["pagerank_signed"] + 0.001, idx, fontsize=8)

plt.axhline(0, color='gray', linestyle='--')
plt.axvline(0, color='gray', linestyle='--')
plt.xlabel("Unsigned PageRank")
plt.ylabel("Signed PageRank")
plt.title("Signed vs Unsigned PageRank")
plt.colorbar(label="Delta (signed - unsigned)")
plt.tight_layout()
plt.show()


In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Normalize the colormap across both plots
norm = colors.Normalize(vmin=results['pagerank_signed'].min(), vmax=results['pagerank_signed'].max())
cmap = cm.coolwarm

# Plot 1: Unsigned PR vs Positive Degree
sc1 = axes[0, 0].scatter(
    results['degree_pos'],
    results['pagerank_unsigned'],
    c=results['pagerank_signed'],
    cmap=cmap,
    norm=norm,
    edgecolor='black',
    s=100
)
axes[0, 0].set_title("Unsigned PR vs Positive Degree")
axes[0, 0].set_xlabel("Positive Degree")
axes[0, 0].set_ylabel("Unsigned PageRank")

# Plot 2: Signed PR vs Positive Degree
sc2 = axes[0, 1].scatter(
    results['degree_pos'],
    results['pagerank_signed'],
    c=results['pagerank_signed'],
    cmap=cmap,
    norm=norm,
    edgecolor='black',
    s=100
)
axes[0, 1].set_title("Signed PR vs Positive Degree")
axes[0, 1].set_xlabel("Positive Degree")
axes[0, 1].set_ylabel("Signed PageRank")

# Plot 3: Unsigned PR vs Negative Degree
sc3 = axes[1, 0].scatter(
    results['degree_neg'],
    results['pagerank_unsigned'],
    c=results['pagerank_signed'],
    cmap=cmap,
    norm=norm,
    edgecolor='black',
    s=100
)
axes[1, 0].set_title("Unsigned PR vs Negative Degree")
axes[1, 0].set_xlabel("Negative Degree")
axes[1, 0].set_ylabel("Unsigned PageRank")

# Plot 4: Signed PR vs Negative Degree
sc4 = axes[1, 1].scatter(
    results['degree_neg'],
    results['pagerank_signed'],
    c=results['pagerank_signed'],
    cmap=cmap,
    norm=norm,
    edgecolor='black',
    s=100
)
axes[1, 1].set_title("Signed PR vs Negative Degree")
axes[1, 1].set_xlabel("Negative Degree")
axes[1, 1].set_ylabel("Signed PageRank")

# Colorbar outside the plots
cbar_ax = fig.add_axes([0.93, 0.15, 0.02, 0.7])
cbar = fig.colorbar(sc4, cax=cbar_ax)
cbar.set_label("Signed PageRank")

plt.tight_layout(rect=[0, 0, 0.9, 1])  # leave room for colorbar
plt.show()


#### Pearson's and Spearman’s Rank Correlation Coefficient

Calculate Pearsonr and Spearmanr to estiamte the correlation between pos/neg degree and signed/unsigned PR.
For small sample sizes with Spearman, consider performing a permutation test instead of relying on the asymptotic p-value.

In [None]:
from scipy.stats import pearsonr, spearmanr, permutation_test

In [None]:
row_labels = []
data = []

columns = pd.MultiIndex.from_tuples([
    ("Pearson", "r"), ("Pearson", "p"),
    ("Spearman", "r"), ("Spearman", "p_asymptotic"), ("Spearman", "p_empirical"),
])

for column1 in ["pagerank_unsigned", "pagerank_signed"]:
    for column2 in ["degree", "degree_pos", "degree_neg"]:
        if column1 != column2:
            value_name = f"{column1}+{column2}"
            x = results[column1].values
            y = results[column2].values

            # Pearson correlation
            pearson_r, pearson_p = pearsonr(x, y)

            # Spearman correlation (asymptotic)
            spearman_result = spearmanr(x, y)
            spearman_r = spearman_result.statistic
            spearman_p_asymptotic = spearman_result.pvalue

            # Permutation test for empirical p-value
            def stat_fn(x_perm):
                return spearmanr(x_perm, y).statistic

            perm_result = permutation_test(
                (x,),
                stat_fn,
                permutation_type='pairings',
                alternative='two-sided',
                n_resamples=10_000,  # increase for precision
                random_state=42
            )
            spearman_p_empirical = perm_result.pvalue

            data.append([
                pearson_r, pearson_p,
                spearman_r, spearman_p_asymptotic, spearman_p_empirical
            ])
            row_labels.append(value_name)

df_corr = pd.DataFrame(data, index=row_labels, columns=columns)


In [None]:
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: pagerank_unsigned vs degree_pos
column1 = "pagerank_unsigned"
column2 = "degree_pos"
r_value, p_value = pearsonr(results[column1], results[column2])

sns.regplot(
    x=column1,
    y=column2,
    data=results,
    ci=None,
    line_kws={"color": "salmon"},
    scatter_kws={"alpha": 0.6},
    ax=axes[0, 0]
)
axes[0, 0].set_title(f'Pearson r = {r_value:.4f}, p = {p_value:.4g}', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel(column1)
axes[0, 0].set_ylabel(column2)
axes[0, 0].grid(True)

# Plot 2: pagerank_unsigned vs degree_neg
column1 = "pagerank_unsigned"
column2 = "degree_neg"
r_value, p_value = pearsonr(results[column1], results[column2])

sns.regplot(
    x=column1,
    y=column2,
    data=results,
    ci=None,
    line_kws={"color": "salmon"},
    scatter_kws={"alpha": 0.6},
    ax=axes[0, 1]
)
axes[0, 1].set_title(f'Pearson r = {r_value:.4f}, p = {p_value:.4g}', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel(column1)
axes[0, 1].set_ylabel(column2)
axes[0, 1].grid(True)

# Plot 3: pagerank_signed vs degree_pos
column1 = "pagerank_signed"
column2 = "degree_pos"
r_value, p_value = pearsonr(results[column1], results[column2])

sns.regplot(
    x=column1,
    y=column2,
    data=results,
    ci=None,
    line_kws={"color": "salmon"},
    scatter_kws={"alpha": 0.6},
    ax=axes[1, 0]
)
axes[1, 0].set_title(f'Pearson r = {r_value:.4f}, p = {p_value:.4g}', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel(column1)
axes[1, 0].set_ylabel(column2)
axes[1, 0].grid(True)

# Plot 4: pagerank_signed vs degree_neg
column1 = "pagerank_signed"
column2 = "degree_neg"
r_value, p_value = pearsonr(results[column1], results[column2])

sns.regplot(
    x=column1,
    y=column2,
    data=results,
    ci=None,
    line_kws={"color": "salmon"},
    scatter_kws={"alpha": 0.6},
    ax=axes[1, 1]
)
axes[1, 1].set_title(f'Pearson r = {r_value:.4f}, p = {p_value:.4g}', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel(column1)
axes[1, 1].set_ylabel(column2)
axes[1, 1].grid(True)

plt.tight_layout()
plt.show()


In [None]:
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: pagerank_unsigned vs degree_pos
column1 = "pagerank_unsigned"
column2 = "degree_pos"
r_value, p_value = spearmanr(results[column1], results[column2])
ranked_data = results[[column1, column2]].rank()

sns.regplot(
    x=column1,
    y=column2,
    data=ranked_data,
    ci=None,
    line_kws={"color": "orange"},
    scatter_kws={"alpha": 0.6},
    ax=axes[0, 0]
)
axes[0, 0].set_title(f'Spearman r = {r_value:.4f}, p = {p_value:.4g}')
axes[0, 0].set_xlabel(f'Rank of {column1}')
axes[0, 0].set_ylabel(f'Rank of {column2}')
axes[0, 0].grid(True)

# Plot 2: pagerank_unsigned vs degree_neg
column1 = "pagerank_unsigned"
column2 = "degree_neg"
r_value, p_value = spearmanr(results[column1], results[column2])
ranked_data = results[[column1, column2]].rank()

sns.regplot(
    x=column1,
    y=column2,
    data=ranked_data,
    ci=None,
    line_kws={"color": "orange"},
    scatter_kws={"alpha": 0.6},
    ax=axes[0, 1]
)
axes[0, 1].set_title(f'Spearman r = {r_value:.4f}, p = {p_value:.4g}')
axes[0, 1].set_xlabel(f'Rank of {column1}')
axes[0, 1].set_ylabel(f'Rank of {column2}')
axes[0, 1].grid(True)

# Plot 3: pagerank_signed vs degree_pos
column1 = "pagerank_signed"
column2 = "degree_pos"
r_value, p_value = spearmanr(results[column1], results[column2])
ranked_data = results[[column1, column2]].rank()

sns.regplot(
    x=column1,
    y=column2,
    data=ranked_data,
    ci=None,
    line_kws={"color": "orange"},
    scatter_kws={"alpha": 0.6},
    ax=axes[1, 0]
)
axes[1, 0].set_title(f'Spearman r = {r_value:.4f}, p = {p_value:.4g}')
axes[1, 0].set_xlabel(f'Rank of {column1}')
axes[1, 0].set_ylabel(f'Rank of {column2}')
axes[1, 0].grid(True)

# Plot 4: pagerank_signed vs degree_neg
column1 = "pagerank_signed"
column2 = "degree_neg"
r_value, p_value = spearmanr(results[column1], results[column2])
ranked_data = results[[column1, column2]].rank()

sns.regplot(
    x=column1,
    y=column2,
    data=ranked_data,
    ci=None,
    line_kws={"color": "orange"},
    scatter_kws={"alpha": 0.6},
    ax=axes[1, 1]
)
axes[1, 1].set_title(f'Spearman r = {r_value:.4f}, p = {p_value:.4g}')
axes[1, 1].set_xlabel(f'Rank of {column1}')
axes[1, 1].set_ylabel(f'Rank of {column2}')
axes[1, 1].grid(True)

plt.tight_layout()
plt.show()

In [None]:
# Flatten multi-index columns
df_flat = df_corr.copy()
df_flat.columns = ['_'.join(col).strip() for col in df_flat.columns.values]
df_flat = df_flat.reset_index().rename(columns={"index": "Variable Pair"})

df_long = df_flat.melt(
    id_vars="Variable Pair",
    value_vars=["Pearson_r", "Spearman_r"],
    var_name="Correlation Type",
    value_name="Correlation Coefficient"
)

plt.figure(figsize=(12, 6))
sns.barplot(
    data=df_long,
    x="Variable Pair",
    y="Correlation Coefficient",
    hue="Correlation Type",
    palette=["#1f77b4", "#ff7f0e"]
)

for i, row in df_flat.iterrows():
    spearman_p = row["Spearman_p_empirical"]
    pearson_p = row["Pearson_p"]
    plt.text(i, 1.05, f'Spearman_p={spearman_p:.3f}', ha='center', va='bottom', fontsize=9, color='black')
    plt.text(i, 1.12, f'Pearson_p={pearson_p:.3f}', ha='center', va='bottom', fontsize=9, color='black')


plt.axhline(0, color='gray', linestyle='--')
plt.xticks(rotation=45, ha='right')
plt.title("Comparison of Pearson and Spearman Correlation Coefficients", pad=30)
plt.tight_layout()
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.legend(title="Correlation Type")
plt.savefig(os.path.join("figures",'pearson_spearman_correlation.jpg'))
plt.show()


df_pvals_long = df_flat.melt(
    id_vars="Variable Pair",
    value_vars=["Spearman_p_asymptotic", "Spearman_p_empirical"],
    var_name="P-value Type",
    value_name="P-value"
)

plt.figure(figsize=(12, 6))
sns.barplot(
    data=df_pvals_long,
    x="Variable Pair",
    y="P-value",
    hue="P-value Type",
    palette=["#2ca02c", "#d62728"]
)

plt.axhline(0, color='gray', linestyle='--')
plt.xticks(rotation=45, ha='right')
plt.title("Comparison of Pearson and Spearman Correlation Coefficients", pad=20)
plt.legend(title="P-value Type")
plt.tight_layout()
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.show()


#### Histogram
Comparison of signed PageRank and Prestige and positive/negative degree

##### PageRank

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(20, 15))

# mask for positive / negative signed PR
pos_PR_mask = results['pagerank_signed'] > 0
neg_PR_mask = results['pagerank_signed'] < 0

# mask for positive / negative degree
pos_degree_mask = results['degree_pos'] > 0
neg_degree_mask = results['degree_neg'] > 0

# mask to filter nodes based on which degree (positive or negative) is higher, only consider the higher degree
pos_degree_mask_filtered = results['degree_pos'] > results['degree_neg']
neg_degree_mask_filtered = results['degree_neg'] > results['degree_pos']



# Plot 1: Degree by Signed PR (Unfiltered)
pos_PR_combined_degrees_unfiltered = pd.concat([results.loc[pos_PR_mask, 'degree_pos'], -results.loc[pos_PR_mask, 'degree_neg']])
neg_PR_combined_degrees_unfiltered = pd.concat([results.loc[neg_PR_mask, 'degree_pos'], -results.loc[neg_PR_mask, 'degree_neg']])

all_degrees_unfiltered = pd.concat([pos_PR_combined_degrees_unfiltered, neg_PR_combined_degrees_unfiltered])
min_val_unfiltered = all_degrees_unfiltered.min()
max_val_unfiltered = all_degrees_unfiltered.max()
bins_unfiltered = np.linspace(min_val_unfiltered, max_val_unfiltered, 30)

axes[0, 0].hist(
    [pos_PR_combined_degrees_unfiltered, neg_PR_combined_degrees_unfiltered],
    bins=bins_unfiltered,
    color=['skyblue', 'salmon'],
    alpha=0.8,
    label=['Positive Signed PR', 'Negative Signed PR'],
    edgecolor='black'
)
axes[0, 0].axvline(0, color='black', linestyle='--', linewidth=1.5)
axes[0, 0].set_title("Distribution of Degree by Signed PR", fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel("Degree (Positive and Negative)")
axes[0, 0].set_ylabel("Frequency (Number of Nodes)")
axes[0, 0].legend(frameon=True, fontsize=10)
axes[0, 0].grid(axis='y', linestyle='--', alpha=0.6)

# Plot 2: Degree by Signed PR (Filtered by Dominant Degree)
pos_pr_combined_filtered = pd.concat([results.loc[pos_PR_mask & pos_degree_mask_filtered, 'degree_pos'], -results.loc[pos_PR_mask & neg_degree_mask_filtered, 'degree_neg']])
neg_pr_combined_filtered = pd.concat([results.loc[neg_PR_mask & pos_degree_mask_filtered, 'degree_pos'], -results.loc[neg_PR_mask & neg_degree_mask_filtered, 'degree_neg']])

all_degrees_pr_filtered = pd.concat([pos_pr_combined_filtered, neg_pr_combined_filtered])
min_val_pr_filtered = all_degrees_pr_filtered.min()
max_val_pr_filtered = all_degrees_pr_filtered.max()
bins_pr_filtered = np.linspace(min_val_pr_filtered, max_val_pr_filtered, 30)

axes[0, 1].hist(
    [pos_pr_combined_filtered, neg_pr_combined_filtered],
    bins=bins_pr_filtered,
    color=['skyblue', 'salmon'],
    alpha=0.8,
    label=['Positive Signed PR', 'Negative Signed PR'],
    edgecolor='black'
)
axes[0, 1].axvline(0, color='black', linestyle='--', linewidth=1.5)
axes[0, 1].set_title("Distribution of Degree (Filtered by Dominant Degree) by Signed PR", fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel("Degree (Positive and Negative)")
axes[0, 1].set_ylabel("Frequency (Number of Nodes)")
axes[0, 1].legend(frameon=True, fontsize=10)
axes[0, 1].grid(axis='y', linestyle='--', alpha=0.6)

# Plot 3: Signed PR by Degree Type (Unfiltered)
axes[1, 0].hist(
    [results.loc[pos_degree_mask, 'pagerank_signed'], results.loc[neg_degree_mask, 'pagerank_signed']],
    bins=20,
    color=['skyblue', 'salmon'],
    alpha=0.8,
    label=['Positive Degree', 'Negative Degree'],
    edgecolor='black'
)
axes[1, 0].axvline(0, color='black', linestyle='--', linewidth=1.5)
axes[1, 0].set_title("Distribution of Signed PR by Degree Type (Unfiltered)", fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel("Signed PageRank")
axes[1, 0].set_ylabel("Frequency (Number of Nodes)")
axes[1, 0].legend(frameon=True, fontsize=10)
axes[1, 0].grid(axis='y', linestyle='--', alpha=0.6)

# Plot 4: Signed PR by Degree Type (Filtered by Dominant Degree)
axes[1, 1].hist(
    [results.loc[pos_degree_mask_filtered, 'pagerank_signed'], results.loc[neg_degree_mask_filtered, 'pagerank_signed']],
    bins=20,
    color=['skyblue', 'salmon'],
    alpha=0.8,
    label=['Positive Degree', 'Negative Degree'],
    edgecolor='black'
)
axes[1, 1].axvline(0, color='black', linestyle='--', linewidth=1.5)
axes[1, 1].set_title("Distribution of Signed PR by Degree Type (Filtered by Dominant Degree)", fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel("Signed PageRank")
axes[1, 1].set_ylabel("Frequency (Number of Nodes)")
axes[1, 1].legend(frameon=True, fontsize=10)
axes[1, 1].grid(axis='y', linestyle='--', alpha=0.6)

plt.tight_layout()
plt.savefig(os.path.join("figures", 'histogram_combined_degree_PR.png'))
plt.show()


##### Prestige

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(20, 15))

# mask for positive / negative prestige
pos_prestige_mask = results['prestige'] > 0
neg_prestige_mask = results['prestige'] < 0

# mask for positive / negative degree
pos_degree_mask = results['degree_pos'] > 0
neg_degree_mask = results['degree_neg'] > 0

# mask to filter nodes based on which degree (positive or negative) is higher, only consider the higher degree
pos_degree_mask_filtered = results['degree_pos'] > results['degree_neg']
neg_degree_mask_filtered = results['degree_neg'] > results['degree_pos']


# Plot 1: Combined Degree by Prestige (Unfiltered)
pos_prestige_combined_degrees_unfiltered = pd.concat([results.loc[pos_prestige_mask, 'degree_pos'], -results.loc[pos_prestige_mask, 'degree_neg']])
neg_prestige_combined_degrees_unfiltered = pd.concat([results.loc[neg_prestige_mask, 'degree_pos'], -results.loc[neg_prestige_mask, 'degree_neg']])

all_degrees_unfiltered = pd.concat([pos_prestige_combined_degrees_unfiltered, neg_prestige_combined_degrees_unfiltered])
min_val_unfiltered = all_degrees_unfiltered.min()
max_val_unfiltered = all_degrees_unfiltered.max()
bins_unfiltered = np.linspace(min_val_unfiltered, max_val_unfiltered, 30)

axes[0, 0].hist(
    [pos_prestige_combined_degrees_unfiltered, neg_prestige_combined_degrees_unfiltered],
    bins=bins_unfiltered,
    color=['skyblue', 'salmon'],
    alpha=0.8,
    label=['Positive Prestige', 'Negative Prestige'],
    edgecolor='black'
)
axes[0, 0].axvline(0, color='black', linestyle='--', linewidth=1.5)
axes[0, 0].set_title("Distribution of Degree by Prestige (Unfiltered)", fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel("Degree (Positive and Negative)")
axes[0, 0].set_ylabel("Frequency (Number of Nodes)")
axes[0, 0].legend(frameon=True, fontsize=10)
axes[0, 0].grid(axis='y', linestyle='--', alpha=0.6)

# Plot 2: Combined Degree by Prestige (Filtered)
pos_prestige_combined_filtered = pd.concat([results.loc[pos_prestige_mask & pos_degree_mask_filtered, 'degree_pos'], -results.loc[pos_prestige_mask & neg_degree_mask_filtered, 'degree_neg']])
neg_prestige_combined_filtered = pd.concat([results.loc[neg_prestige_mask & pos_degree_mask_filtered, 'degree_pos'], -results.loc[neg_prestige_mask & neg_degree_mask_filtered, 'degree_neg']])

all_degrees_filtered = pd.concat([pos_prestige_combined_filtered, neg_prestige_combined_filtered])
min_val_filtered = all_degrees_filtered.min()
max_val_filtered = all_degrees_filtered.max()
bins_filtered = np.linspace(min_val_filtered, max_val_filtered, 30)

axes[0, 1].hist(
    [pos_prestige_combined_filtered, neg_prestige_combined_filtered],
    bins=bins_filtered,
    color=['skyblue', 'salmon'],
    alpha=0.8,
    label=['Positive Prestige', 'Negative Prestige'],
    edgecolor='black'
)
axes[0, 1].axvline(0, color='black', linestyle='--', linewidth=1.5)
axes[0, 1].set_title("Distribution of Degree by Prestige (Filtered by Dominant Degree)", fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel("Degree (Positive and Negative)")
axes[0, 1].set_ylabel("Frequency (Number of Nodes)")
axes[0, 1].legend(frameon=True, fontsize=10)
axes[0, 1].grid(axis='y', linestyle='--', alpha=0.6)

# Plot 3: Prestige by Degree Type (Unfiltered)
axes[1, 0].hist(
    [results.loc[pos_degree_mask, 'prestige'], results.loc[neg_degree_mask, 'prestige']],
    bins=20,
    color=['skyblue', 'salmon'],
    alpha=0.8,
    label=['Positive Degree', 'Negative Degree'],
    edgecolor='black'
)
axes[1, 0].axvline(0, color='black', linestyle='--', linewidth=1.5)
axes[1, 0].set_title("Distribution of Prestige by Degree Type (Unfiltered)", fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel("Prestige")
axes[1, 0].set_ylabel("Frequency (Number of Nodes)")
axes[1, 0].legend(frameon=True, fontsize=10)
axes[1, 0].grid(axis='y', linestyle='--', alpha=0.6)

# Plot 4: Prestige by Degree Type (Filtered)
axes[1, 1].hist(
    [results.loc[pos_degree_mask_filtered, 'prestige'], results.loc[neg_degree_mask_filtered, 'prestige']],
    bins=20,
    color=['skyblue', 'salmon'],
    alpha=0.8,
    label=['Positive Degree', 'Negative Degree'],
    edgecolor='black'
)
axes[1, 1].axvline(0, color='black', linestyle='--', linewidth=1.5)
axes[1, 1].set_title("Distribution of Prestige by Degree Type (Filtered by Dominant Degree)", fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel("Prestige")
axes[1, 1].set_ylabel("Frequency (Number of Nodes)")
axes[1, 1].legend(frameon=True, fontsize=10)
axes[1, 1].grid(axis='y', linestyle='--', alpha=0.6)

plt.tight_layout()
plt.savefig(os.path.join("figures", 'histogram_combined_degree_prestige.png'))
plt.show()


### Degree vs Centrality Metric for each node

In [None]:
def plot_signed_centrality(df, metric_col, title):
    """
    Generates bar plots for nodes, split by whether their centrality metric is positive or negative.

    Args:
        df (pd.DataFrame): The DataFrame containing the node data.
        metric_col (str): The name of the column containing the signed centrality metric.
        title (str): The title for the plot.
    """
    # Filter out rows where the metric value is NaN
    filtered_df = df.dropna(subset=[metric_col])

    # Separate nodes with positive and negative scores for the given metric
    positive_nodes = filtered_df[filtered_df[metric_col] >= 0]
    negative_nodes = filtered_df[filtered_df[metric_col] < 0]
    
    # Define colors for positive and negative degrees
    pos_degree_color = '#4CAF50'  # A nice green for positive degree
    neg_degree_color = '#F44336'  # A vibrant red for negative degree
    bar_width = 0.35

    # Plot for nodes with a positive metric score
    fig_pos, ax_pos = plt.subplots(figsize=(15, 8))
    ind_pos = np.arange(len(positive_nodes.index))
    ax_pos.bar(ind_pos - bar_width/2, positive_nodes['degree_pos'], bar_width, label='Positive Degree',
               color=pos_degree_color)
    ax_pos.bar(ind_pos + bar_width/2, positive_nodes['degree_neg'], bar_width, label='Negative Degree',
               color=neg_degree_color)
    ax_pos.set_ylabel('Degree Value', fontsize=12)
    ax_pos.set_xlabel('Node', fontsize=12)
    ax_pos.set_title(f'{title} (Positive Scores)', fontsize=16)
    ax_pos.set_xticks(ind_pos)
    ax_pos.set_xticklabels(positive_nodes.index, rotation=45, ha='right', fontsize=10)
    ax_pos.legend()
    ax_pos.yaxis.grid(True, linestyle='--', alpha=0.6)
    plt.tight_layout()
    plt.savefig(os.path.join("figures", f'{title}_positive_scores.png'))
    plt.show()

    # Plot for nodes with a negative metric score
    fig_neg, ax_neg = plt.subplots(figsize=(15, 8))
    ind_neg = np.arange(len(negative_nodes.index))
    ax_neg.bar(ind_neg - bar_width/2, negative_nodes['degree_pos'], bar_width, label='Positive Degree',
               color=pos_degree_color)
    ax_neg.bar(ind_neg + bar_width/2, negative_nodes['degree_neg'], bar_width, label='Negative Degree',
               color=neg_degree_color)
    ax_neg.set_ylabel('Degree Value', fontsize=12)
    ax_neg.set_xlabel('Node', fontsize=12)
    ax_neg.set_title(f'{title} (Negative Scores)', fontsize=16)
    ax_neg.set_xticks(ind_neg)
    ax_neg.set_xticklabels(negative_nodes.index, rotation=45, ha='right', fontsize=10)
    ax_neg.legend()
    ax_neg.yaxis.grid(True, linestyle='--', alpha=0.6)
    plt.tight_layout()
    plt.savefig(os.path.join("figures", f'{title}_negative_scores.png'))
    plt.show()

# Generate the plots for signed PageRank and prestige
plot_signed_centrality(results, 'pagerank_signed', 'Positive vs. Negative Degree for Signed PageRank')
plot_signed_centrality(results, 'prestige', 'Positive vs. Negative Degree for Signed Prestige')

### Niche plots

And also visualize niches in slices (spots are colored by the niche to which the cell type with highest proportion belongs)

Now we will visualize the **niches** in the **slices** coloring the Visium spots according to the niche of the cell type with the highest proportion. 

These are a couple **myogenic** slices, which will be at the **top** panels of the next figure:

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(21, 7))
plt.close(fig)
for idu,smpl in enumerate(list(sampleTypesDF['sample'][sampleTypesDF['sampleType']=='myogenic'][0:3])):  
    _ = nichesphere.coloc.spatialNichePlot(adata=mudata['visium'][mudata['visium'].obs.patient_region_id==smpl].copy(), 
                                       CTprobs=CTprops.loc[spotSamples.index[spotSamples==smpl]],    #dataframe of cell type probabilities per spot
                                       cell_types=mudata['sc'].obs.cell_subtype2,                    #categorical series of cell types
                                       nicheDF=niches_df, 
                                       spot_size=0.015, 
                                       niche_colors=niche_cols,                                      #series of colors with niche names as indexes
                                       legend_fontsize=7, save_name='_'+smpl+'.pdf',ax=axes[0][idu])

And a couple **ischemic** slices, which will be at the **bottom** panels of the next figure:

In [None]:
for idu,smpl in enumerate(list(sampleTypesDF['sample'][sampleTypesDF['sampleType']=='ischemic'][0:3])):  
    _ = nichesphere.coloc.spatialNichePlot(adata=mudata['visium'][mudata['visium'].obs.patient_region_id==smpl].copy(), 
                                       CTprobs=CTprops.loc[spotSamples.index[spotSamples==smpl]], 
                                       cell_types=mudata['sc'].obs.cell_subtype2, 
                                       nicheDF=niches_df, 
                                       spot_size=0.015, 
                                       niche_colors=niche_cols, 
                                       legend_fontsize=7, 
                                       save_name='_'+smpl+'.pdf',ax=axes[1][idu])

In [None]:
fig.tight_layout()
fig

For further analysis, like differential communication: https://nichesphere.readthedocs.io/en/latest/notebooks/Nichesphere_tutorial_MIvisium_comm.html 

, we will need the correspondence data between cell pairs and niche pairs

In [None]:
pairCatDFdir=nichesphere.tl.get_pairCatDFdir(niches_df)
pairCatDFdir.to_csv('pairCatDFdir_MIvisium_louvain.csv')
pairCatDFdir.head()

We will also need a filtering object **(colocFilt)** indicating which cell pairs are differentially co-localized to filter the communication data

In [None]:
## Get data of cells present in the adjacency matrix
pairCatDF_filter=[(pairCatDFdir.cell_pairs.str.split('->')[i][0] in HMsimm.index)&
                  (pairCatDFdir.cell_pairs.str.split('->')[i][1] in HMsimm.index) for i in pairCatDFdir.index]
pairCatDFdir_filt=pairCatDFdir[pairCatDF_filter]
oneCTints_filt=oneCTints[[i.split('-')[0] in HMsimm.index for i in oneCTints]]

In [None]:
## Get data to flag differentially co-localized cell pairs in the adjacency matrix
colocFilt=nichesphere.tl.getColocFilter(pairCatDF=pairCatDFdir_filt, 
                                        adj=HMsimm, 
                                        oneCTints=oneCTints_filt.str.replace('-', '->'))
colocFilt.to_csv('colocFilt_MIvisium_louvain.csv')
colocFilt.head()

We will need the niche - cell type - color correspondence data, the co-localization network and nodes positions for further analysis as well

In [None]:
niches_df.to_csv('niches_df_MIvisium_louvain.csv')
nx.write_graphml_lxml(gCol, "colocNW_MIvisium_louvain.graphml")
np.save('colocNW_pos.npy', pos)