# Tutorial Notebook: BAE analysis workflow for single-cell resolved interaction patterns of CCIMs

In this notebook ...

## Setup:

First we activate the environment and load all required packages for running the analysis.

In [None]:
#---Activate the enviroment:
using Pkg;

Pkg.activate("../");
Pkg.instantiate();
Pkg.status()

#---Load the BoostingAutoEncoder module:
include(projectpath * "/src/BAE.jl");
using .BoostingAutoEncoder

#---Load required packages for this notebook:
using RCall;
using DelimitedFiles;
using Plots;
using Random;
using StatsBase;
using VegaLite;  
using DataFrames;
using StatsPlots;

Next, we define the path to the directory, where the data that we want to analyze is located and specify, where we want to save the results.

In [None]:
#---Set paths to the data directory and the figures directory:
projectpath = joinpath(@__DIR__, "../"); 

# Set the path to the data directory (exchange the path below with the path to your data directory):
datapath = projectpath * "data/tutorial";

# Create the folder if it does not exist (exchange the path below with the path to where you want to store your results):
figurespath = projectpath * "figures/tutorial/"
if !isdir(figurespath)
    mkdir(figurespath)
end

## Load the single-cell RNA sequencing (scRNA-seq) data:

...

## Run NICHES for creating a CCIM from the scRNA-seq data:

NICHES can be used to compute CCIMs representing cell-to-cell communication. Alternatively, CCIMs can be constructed that represent cell-to-system or system-to-cell communiication. Each observation corresponds to a pair that represents either a cell pair with a sender and a receiver cell, a pair with a sender cell and a receiver system, or a sender system and a receiver cell.

...

## Set up and run BAE training:

Befor we train a BAE on the data, we need to specify the hyperparameters for training. 
...

In [None]:
#---Define hyperparameter for training a BAE:
HP = Hyperparameter(zdim=15, n_restarts=3, epochs=10, batchsize=2^9, η=0.01f0, λ=0.1f0, ϵ=0.001f0, M=1); 

Next, we create the neural network architecture for the BAE. 
...

In [None]:
#---Define the decoder architecture:
decoder = generate_BAEdecoder(p, HP; soft_clustering=true, modelseed=1);

#---Initialize the BAE model:
BAE = BoostingAutoencoder(; coeffs=zeros(Float32, p, HP.zdim), decoder=decoder, HP=HP);
summary(BAE)

We are now ready to train the model.

In [None]:
#---Train the BAE model:
# Set the number of restarts of the encoder weight matrix during training by setting n_restarts > 1:
@time begin
    output_dict = train_BAE!(X_st, BAE; MD=MD, track_coeffs=false, save_data=false, path=nothing, batchseed=1);
end

@info "Minimum Trainloss at: $(argmin(output_dict["trainloss"]))"

## Result visualization:

...

In [None]:
#---Generate distinct colors:
n_cols = 2*BAE.HP.zdim; 
custom_colorscheme = [hsl_to_hex(i / n_cols, 0.7, 0.5 + 0.1 * sin(i * 4π / BAE.HP.zdim)) for i in 1:n_cols]; #2π
custom_colorscheme_shuffled = shuffle(custom_colorscheme);

#----Compute 2D UMAP embedding of the learned BAE latent representation and add to the metadata:
plotseed = 7;
BAE.UMAP = generate_umap(BAE.Z', plotseed);
MD.obs_df[!, :UMAP1] = BAE.UMAP[:, 1];
MD.obs_df[!, :UMAP2] = BAE.UMAP[:, 2];

#---Randomly shuffle the observation indices for plotting:
rand_inds = shuffle(1:size(X_st, 1));
MD.obs_df = MD.obs_df[rand_inds, :];