# DeepLinc

- **Creator**: Sebastian Birk (<sebastian.birk@helmholtz-munich.de>).
- **Affiliation:** Helmholtz Munich, Institute of Computational Biology (ICB), Talavera-López Lab
- **Date of Creation:** 05.01.2023
- **Date of Last Modification:** 12.01.2023

- The DeepLinc source code is available at https://github.com/xryanglab/DeepLinc.
- The corresponding publication is Li, R. & Yang, X. De novo reconstruction of cell interaction landscapes from single-cell spatial transcriptome data with DeepLinc. Genome Biol. 23, 124 (2022).

- The logic to run DeepLinc is encapsulated into the Python script 'deeplinc.py'.
- We train DeepLinc models on the seqfish mouse organogenesis embryo 2 dataset with different adjacency matrices corresponding to 4, 8, 12, 16, and 20 average neighbors.
- We compute 2 runs per number of average neighbors resulting in a total of 10 runs. Each of the 2 runs uses a different random seed (seed 0 and seed 1).
- To train a DeepLinc model with 4 average neighbors, open a terminal, navigate to the 'deeplinc' folder, and run ```python deeplinc.py -e ../datasets/srt_data/gold/seqfish_mouse_organogenesis_embryo2_counts.csv -a ../datasets/srt_data/gold/seqfish_mouse_organogenesis_embryo2_adj4.csv -c ../datasets/srt_data/gold/seqfish_mouse_organogenesis_embryo2_coords.csv -r ../datasets/srt_data/gold/seqfish_mouse_organogenesis_embryo2_cell_types.csv -n run1 -i 40 --seed 0``` (after the data has been prepared further down in this notebook).
- The results of the trained DeepLinc models (latent space features & reconstructed adjacency matrices) are manually stored in ```../datasets/benchmark_data/deeplinc/embryo2```.
- According to the instructions in the Github repo, the authors use raw counts as input to DeepLinc. Therefore, we also use raw counts.
- To define the spatial neighborhood graph, the original DeepLinc paper uses the 3 nearest neighbors of a cell as neighbors as long as their distance to the cell would fall below a threshold radius. The threshold radius is determined by plotting the distribution of cell's distances to their 3 nearest neighbors and drawing a cutoff to remove outliers. The union of all neighbors is used as final spatial neighborhood graph (the adjacency matrix is made symmetric). We use the same method but without threshhold radius cutoff to make the benchmarking more simple and comparable between different methods.

## 1. Setup

### 1.1 Import Libraries

In [2]:
import os
from datetime import datetime

import anndata as ad
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scanpy as sc
import squidpy as sq

### 1.2 Define Parameters

In [5]:
dataset = "seqfish_mouse_organogenesis_embryo2"
cell_type_key = "celltype_mapped_refined"
latent_key = "deeplinc_latent"
leiden_resolution = 0.3 # used for Leiden clustering of latent space
random_seed = 0

### 1.3 Run Notebook Setup

In [6]:
sc.set_figure_params(figsize=(6, 6))

In [7]:
# Get time of notebook execution for timestamping saved artifacts
now = datetime.now()
current_timestamp = now.strftime("%d%m%Y_%H%M%S")

### 1.4 Configure Paths and Directories

In [8]:
data_folder_path = "../datasets/srt_data/gold/"
figure_folder_path = f"../figures/method_benchmarking/{dataset}/"
benchmark_data_folder_path = "../datasets/benchmark_data/"

## 2. Data

In [6]:
# Load Data
adata = sc.read_h5ad(data_folder_path + f"{dataset}.h5ad")

In [7]:
# Create csv files to run DeepLinc
counts_df = pd.DataFrame(adata.layers["counts"].toarray(), columns=adata.var_names)
counts_df.to_csv(f"{data_folder_path}/{dataset}_counts.csv", index=False)

coords_df = pd.DataFrame(adata.obsm["spatial"], columns=["X", "Y"])
coords_df.to_csv(f"{data_folder_path}/{dataset}_coords.csv", index=False)

for n_neighbors in [4, 8, 12, 16, 20]:
    # Compute spatial neighborhood graphs
    sq.gr.spatial_neighbors(adata,
                            coord_type="generic",
                            spatial_key="spatial",
                            n_neighs=n_neighbors)
    adj = adata.obsp["spatial_connectivities"].toarray()
    adj = adj + adj.T
    adj = np.where(adj>1, 1, adj)
    adj_df = pd.DataFrame(adj)
    adj_df.to_csv(f"{data_folder_path}/{dataset}_adj{n_neighbors}.csv", index=False)

cell_types_df = pd.DataFrame(adata.obs["celltype_mapped_refined"])
cell_types_df.rename(columns={"celltype_mapped_refined": "Cell_class_name"}, inplace=True)
cell_types_df["Cell_ID"] = np.arange(len(adata))
cell_types_df["Cell_class_id"] = cell_types_df["Cell_class_name"].cat.codes
cell_types_df = cell_types_df[["Cell_ID", "Cell_class_id", "Cell_class_name"]]
cell_types_df.to_csv(f"{data_folder_path}/{dataset}_cell_types.csv", index=False)

In [9]:
# Load latent space features into original adata after running 'deeplinc.py' script
adata_original = sc.read_h5ad(data_folder_path + f"{dataset}.h5ad")

## 3. DeepLinc Model

Now, train DeepLinc models with the ```deeplinc.py``` script as described at the top of the notebook and store the latent space features of trained models under ```../datasets/benchmark_data/deeplinc/embryo2/``` .

In [None]:
run_number = 0
for directory, subdirectory, file_list in os.walk(benchmark_data_folder_path + "deeplinc/embryo2"):
    for file in file_list:
        
        if "hidden2" in file:
            file_path = os.path.join(directory, file)
            
            adata = sc.read_h5ad(data_folder_path + f"{dataset}.h5ad")
            adata.obsm[latent_key] = np.load(file_path)
            
            # Use DeepLinc latent space for UMAP generation
            sc.pp.neighbors(adata, use_rep=latent_key, n_neighbors=directory[-1])
            sc.tl.umap(adata, min_dist=0.3)
            fig = sc.pl.umap(adata,
                             color=[cell_type_key],
                             title="Latent Space with Cell Types: DeepLinc",
                             return_fig=True)
            fig.savefig(f"{figure_folder_path}/latent_deeplinc_cell_types_run_{run_number + 1}_{current_timestamp}.png",
                        bbox_inches="tight")
            
            # Compute latent Leiden clustering
            sc.tl.leiden(adata=adata,
                         resolution=leiden_resolution,
                         random_state=random_seed,
                         key_added=f"latent_deeplinc_leiden_{str(leiden_resolution)}")
            
            # Create subplot of latent Leiden cluster annotations in physical and latent space
            fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(6, 12))
            title = fig.suptitle(t="Latent and Physical Space with Leiden Clusters: DeepLinc")
            sc.pl.umap(adata=adata,
                       color=[f"latent_deeplinc_leiden_{str(leiden_resolution)}"],
                       title=f"Latent Space with Leiden Clusters",
                       ax=axs[0],
                       show=False)
            sc.pl.spatial(adata=adata,
                          color=[f"latent_deeplinc_leiden_{str(leiden_resolution)}"],
                          spot_size=0.03,
                          title=f"Physical Space with Leiden Clusters",
                          ax=axs[1],
                          show=False)

            # Create and position shared legend
            handles, labels = axs[0].get_legend_handles_labels()
            lgd = fig.legend(handles, labels, bbox_to_anchor=(1.1, 0.90))
            axs[0].get_legend().remove()
            axs[1].get_legend().remove()

            # Adjust, save and display plot
            plt.subplots_adjust(wspace=0, hspace=0.2)
            fig.savefig(f"{figure_folder_path}/latent_physical_comparison_deeplinc_leiden_run_{run_number + 1}_{current_timestamp}.png",
                        bbox_extra_artists=(lgd, title),
                        bbox_inches="tight")
            plt.show()
            
            adata_original.obsm[latent_key + f"_run{run_number}"] = np.load(file_path)
            run_number += 1

# Label all 'gene programs' as active gene programs for subsequent benchmarking
adata_original.uns["deeplinc_active_gp_names"] = np.array([f"latent_{i}" for i in range(adata_new.obsm["deeplinc_latent_run1"].shape[1])])

# Store data to disk
adata_original.write(f"{data_folder_path}/{dataset}_deeplinc.h5ad")